EEAT OpenClaw Skill Audit

> AI Agent Skill Quality Assurance — This skill adapts the CORE-EEAT framework to evaluate OpenClaw Skills, ensuring they deliver meaningful utility while maintaining security and reliability.

Skill Overview

OpenClaw Skills are modular capability extensions for AI agents, defined by SKILL.md files with YAML frontmatter and prompt instructions. This skill evaluates skill quality through 80 standardized criteria across 8 core dimensions, generating comprehensive audit reports including utility scores, security assessments, and actionable improvement recommendations.

Core Transformation:

From: Install Skills blindly → Hope they work
To: Systematic vetting → Data-driven skill selection

OpenClaw Skill Structure

Every OpenClaw Skill consists of:

my-skill/
├── SKILL.md          # Core definition (YAML + Markdown instructions)
├── scripts/          # Optional executable scripts
│   └── main.py
└── references/       # Optional configuration and resources
    └── config.json

Key Components:

YAML Frontmatter: Skill metadata (name, description, version, dependencies, gates)
Prompt Instructions: How the AI should use this skill
Scripts: Optional executable code for complex operations
Gating Mechanism: Conditional activation (bins, env, os checks)

Applicable Scenarios

Use this skill when users request:

Skill Selection

"Evaluate this skill before installing"
"Compare two skills for the same task"
"Which skill should I use for X?"

Security Vetting

"Is this skill safe to run?"
"Scan for security vulnerabilities"
"Check permission boundaries"

Skill Development

"Audit my skill for quality issues"
"How to improve my skill's documentation?"
"What security best practices am I missing?"

Skill Maintenance

"Review installed skills for quality"
"Identify deprecated or risky skills"
"Prioritize skill updates"

Core Capabilities

This skill can:

Complete 80-Item Audit: Score each CORE-EEAT item adapted for OpenClaw Skills

Utility Scoring: Evaluate task completion efficiency and comparative value

Security Assessment: Three-level security evaluation (Pass/Caution/Risk)

Gating Validation: Check conditional activation requirements (bins, env, os)

Veto Item Detection: Flag critical security violations (command injection, data leakage)

Priority Ranking: Identify top 5 improvements by impact

Comparative Analysis: Compare skills for same use case

Skill Categories

This skill supports 6 OpenClaw Skill types, each with different evaluation priorities:

Productivity Skills

Definition: Gmail, Calendar, Google Drive, Microsoft Office integration
Focus: Task completion accuracy, API reliability, error handling
Weights: C: 30% | R: 25% | Exp: 20% | Ept: 15% | O: 5% | E: 0% | A: 5% | T: 0%

Development Skills

Definition: Code generation, debugging, GitHub automation, CI/CD
Focus: Code quality, correctness, testing, security best practices
Weights: C: 25% | O: 20% | R: 20% | Ept: 20% | E: 5% | Exp: 5% | A: 5% | T: 0%

Research Skills

Definition: Web search, web fetch, document summarization, data analysis
Focus: Information accuracy, source credibility, citation quality
Weights: C: 25% | R: 25% | A: 20% | E: 15% | O: 10% | Exp: 0% | Ept: 0% | T: 5%

Automation Skills

Definition: Browser automation, file operations, shell commands, task scheduling
Focus: Security, error handling, robustness, permissions
Weights: T: 30% | C: 25% | R: 20% | O: 15% | Exp: 5% | Ept: 5% | E: 0% | A: 0%

Content Skills

Definition: Text generation, translation, image generation, audio processing
Focus: Output quality, style consistency, creative value
Weights: C: 30% | E: 25% | Exp: 20% | O: 15% | Ept: 5% | R: 5% | A: 0% | T: 0%

System Skills

Definition: System monitoring, resource management, network tools, debugging
Focus: Performance, reliability, security, compatibility
Weights: T: 25% | C: 20% | R: 20% | E: 15% | O: 10% | Ept: 5% | Exp: 5% | A: 0%

8 Progressive Quality Gates

Gate 1: Metadata Validation (Pre-Installation)

When: Before installing any skill

Duration: 2-5 minutes

Items:

C01: YAML frontmatter present and valid
C02: Skill name and description clear
O01: Skill structure follows OpenClaw convention
T04: No suspicious dependencies or permissions

Deliverable: Metadata Validation Report

Failure: Do not install. Contact skill author or fix manually.

Gate 2: Gating Mechanism Check

When: After metadata validation, before activation

Duration: 1-2 minutes

Items:

O02: Required tools exist (bins check)
O03: Required environment variables set (env check)
O04: OS compatibility verified (os check)
T07: No conflicting permissions

Deliverable: Gating Compatibility Report

Failure: Skill will not activate. Fix environment or choose alternative.

Gate 3: Security Pre-Check

When: Before first execution

Duration: 3-5 minutes

Items:

T01: No command injection vulnerabilities
T02: No data leakage risks
T03: Input validation present
T04: Permissions are minimal principle

Deliverable: Security Pre-Check Report

Failure: Do not execute. Review code or choose alternative.

Gate 4: Prompt Quality Review

When: During skill development or installation

Duration: 5-10 minutes

Items:

C03: Instructions are clear and actionable
C04: Tool usage patterns are explicit
Ept03: Professional terminology used
Ept05: Follows prompt engineering best practices

Deliverable: Prompt Quality Report

Failure: Skill may misbehave. Refine SKILL.md instructions.

Gate 5: Script Security Audit

When: For skills with executable scripts

Duration: 10-20 minutes

Items:

T05: No hardcoded secrets or API keys
T06: No eval/exec of user input
T07: Proper error handling and logging
T08: Resource cleanup implemented

Deliverable: Script Security Audit

Failure: Security risk. Audit scripts or avoid skill.

Gate 6: Utility Testing

When: During skill evaluation

Duration: 15-30 minutes

Items:

C05: Completes intended tasks successfully
C06: Output quality meets expectations
Exp01: Usage examples provided and working
Exp02: Performance characteristics documented

Deliverable: Utility Test Report

Failure: Skill doesn't deliver value. Consider alternatives.

Gate 7: Comparative Analysis

When: Selecting between multiple skills for same task

Duration: 10-15 minutes

Items:

E01: Offers unique value vs alternatives
E02: Better performance or efficiency
A01: Active maintenance and updates
A02: Community adoption and reviews

Deliverable: Comparative Analysis Report

Failure: Alternative skill may be better. Choose based on scores.

Gate 8: Full 80-Item Audit

When: Critical skills, major updates, security reviews

Duration: 30-60 minutes

Items: All 80 items across all 8 dimensions

Deliverable: Comprehensive EEAT Audit Report

Failure: Address critical issues before production use.

Usage Examples

Example 1: Pre-Installation Security Check

User Request:

Check if this skill is safe to install:
clawhub install agent-browser

Response:

## EEAT Skill Security Audit: agent-browser

### Gate 1: Metadata Validation ✅
- [x] YAML frontmatter valid
- [x] Description clear: "Browser automation for web scraping"
- [x] Version: 1.2.3
- [x] Dependencies: playwright (verified safe)

### Gate 2: Gating Check ⚠️
- [x] Bins: playwright installed
- [x] Env: PLAYWRIGHT_BROWSERS_PATH set
- [ ] OS: macOS/Windows only (Linux limited support)

### Gate 3: Security Pre-Check ✅
- [x] T01: No command injection detected
- [x] T02: No data leakage risks
- [x] T03: Input validation on URLs
- [x] T04: Minimal permissions (browser control only)

### Security Score: 92/100
**Security Status**: CAUTION ⚠️
- Linux support limited
- Requires browser installation (system-level change)

### Recommendation
✅ **Install with Caution**
- Safe for macOS/Windows users
- Linux users should check compatibility
- Review permissions before first use

Example 2: Compare Two Skills

User Request:

Compare tavily-search vs web-search for web search

Response:

## Skill Comparative Analysis: Web Search

### Skill A: tavily-search

**Utility Score**: 85/100
- C05: High accuracy search results ✅
- C06: Clean, structured output ✅
- Exp01: Excellent usage examples ✅
- E01: AI-optimized for agents 🎯

**Security Score**: 95/100
- T01-T04: All checks pass ✅
- T07: No external dependencies ✅

**Overall**: 88/100 (Very Good)

### Skill B: web-search (bundled)

**Utility Score**: 70/100
- C05: Basic search functionality ✅
- C06: Output quality varies ⚠️
- Exp01: Limited examples ⚠️
- E01: Generic implementation ⚠️

**Security Score**: 90/100
- T01-T04: Safe but basic ✅
- T07: Uses external APIs ⚠️

**Overall**: 75/100 (Good)

### Recommendation
✅ **Choose tavily-search**
- +18 points higher utility
- AI-optimized for agent workflows
- Better structured output for downstream processing
- Higher security score

**Trade-off**: Requires API key setup vs. no setup for web-search

Example 3: Full Skill Audit

User Request:

Audit my custom skill: email-automation
[Paste SKILL.md content]

Response:

## EEAT Skill Audit: email-automation

### Overview
- **Skill Type**: Productivity
- **Total Score**: 62/100 (Fair)
- **Utility Score**: 75/100 (Good)
- **Security Score**: 45/100 (CAUTION ⚠️)
- **Veto Item**: ⚠️ T02 Triggered (Data Leakage Risk)

### Dimension Scores

| Dimension | Score | Rating | Weight | Weighted Score |
|-----------|-------|--------|--------|----------------|
| C — Correctness | 70/100 | Good | 30% | 21 |
| O — Organization | 65/100 | Fair | 5% | 3.25 |
| R — Referenceability | 50/100 | Fair | 25% | 12.5 |
| Exp — Experience | 80/100 | Good | 20% | 16 |
| Ept — Expertise | 55/100 | Fair | 15% | 8.25 |
| E — Exclusivity | 40/100 | Poor | 0% | 0 |
| A — Authority | 60/100 | Fair | 5% | 3 |
| T — Trust | 45/100 | Poor | 0% | 0 |
| **Weighted Total** | | | | **64** |

### Critical Issues (Veto Items)

⚠️ **T02: Data Leakage Risk**
**Issue**: Skill stores API credentials in plain text in SKILL.md

SKILL.md

credentials:

smtp_password: "mypassword123" # ⚠️ SECURITY RISK

**Action**: Move credentials to environment variables

credentials:

smtp_password: "${SMTP_PASSWORD}" # ✅ SECURE


### Top 5 Priority Improvements

1. **T02 Data Leakage** — Remove hardcoded credentials
   - Current: Fail | Potential Gain: 8 weighted points
   - Action: Use environment variables for all secrets

2. **R02 Coverage** — Add error handling examples
   - Current: Fail | Potential Gain: 6.25 weighted points
   - Action: Document error scenarios and recovery

3. **Ept01 Documentation** — Improve prompt instructions
   - Current: Partial | Potential Gain: 4.5 weighted points
   - Action: Add step-by-step usage examples

4. **R03 Source Authority** — Verify email library security
   - Current: Partial | Potential Gain: 3.75 weighted points
   - Action: Audit nodemailer dependency for vulnerabilities

5. **O01 Structure** — Add scripts/ directory for complex logic
   - Current: Partial | Potential Gain: 2.5 weighted points
   - Action: Move complex operations to Python scripts

### Action Plan

#### Quick Wins (Fix immediately)
- [ ] Move all credentials to environment variables
- [ ] Add error handling documentation

#### Medium Investment (This week)
- [ ] Add comprehensive usage examples
- [ ] Implement proper logging in scripts

#### Strategic (Next sprint)
- [ ] Add test suite with edge cases
- [ ] Implement retry logic for failed sends
- [ ] Add HTML email support

### Recommendation
⚠️ **Do Not Install Until Fixed**
- Security risk (T02 veto) must be addressed
- After fixes, expected score: 78/100 (Good)

Reference Documents

references/openclaw-skill-benchmark.md — Complete 80-item benchmark adapted for OpenClaw Skills
references/skill-security-checklist.md — Security-specific evaluation criteria
references/utility-testing-guide.md — How to test skill utility and comparative value
workflow-optimization-analysis.md — Adaptation strategy from code to skills

Key Differences: Code vs. Skill Audit

Aspect	Code Audit	Skill Audit
--------	-----------	-------------
Primary Focus	Code correctness, maintainability	Utility, security, reliability
Security Emphasis	SQL injection, XSS	Command injection, data leakage, permissions
Evaluation Method	Static analysis + testing	Comparative utility + security probes
Output Format	Code quality report	Utility score + security status label
Key Metrics	Test coverage, complexity	Task completion, risk level
Veto Items	Security bugs, logic errors	Security vulnerabilities, data risks
Automation Level	High (linters, type checkers)	Medium (requires manual security review)
Comparative Analysis	Code vs. requirements	Skill vs. baseline/skills

Success Points

Security-First Approach — OpenClaw Skills have system-level access; security is non-negotiable

Comparative Utility — Evaluate skills relative to baseline, not in isolation

Gating Validation — Ensure skills only activate when dependencies are met

Prompt Quality — SKILL.md instructions determine skill behavior; quality matters

Minimal Permissions — Skills should only request necessary access

Active Maintenance — Prioritize skills with recent updates and community support

Real-World Testing — Test with actual use cases, not synthetic scenarios

Optimization Recommendations

Based on OpenClaw's architecture and community best practices:

1. Add Skill Registry Integration

Integrate with ClawHub API for real-time skill metadata
Auto-fetch version history, update frequency, download counts
Community signals: stars, issues, last commit date

2. Implement Automated Security Scanning

Integrate with clawsec (ClawHub security scanner)
Auto-scan scripts/ directory for vulnerabilities
Check for hardcoded secrets, eval/exec patterns

3. Add Utility Benchmarking

Run comparative tests: baseline vs. with-skill
Measure task completion time, token efficiency, success rate
Generate utility scores similar to SkillTester framework

4. Create Skill Dependency Graph

Map skill dependencies (some skills require others)
Detect circular dependencies
Recommend optimal skill combinations

5. Implement Skill Conflict Detection

Detect skills with conflicting tool usage
Identify resource contention (browser, file locks)
Suggest skill compatibility matrix

6. Add Performance Profiling

Track skill execution time over sessions
Monitor API usage and costs
Identify bottlenecks in skill chains

7. Create Skill Reputation System

Track skill reliability across users
Aggregate success/failure rates
Community-rated skill quality scores

8. Implement A/B Testing Framework

Compare two skills for same task
Measure which completes faster/better
Data-driven skill selection

9. Add Skill Update Notifications

Monitor skill updates in ClawHub
Alert on breaking changes
Suggest upgrade timing

10. Create Skill Usage Analytics

Track which skills are used most frequently
Identify skill chains and workflows
Optimize skill loading order

Notes

This skill adapts EEAT from content/code evaluation to AI agent skill vetting
Security is elevated to critical importance due to system-level access
Utility is evaluated comparatively (skill vs. baseline), not absolutely
Gating mechanisms ensure skills only activate when dependencies are met
Community signals (downloads, stars, issues) inform Authority dimension
Prompt quality in SKILL.md directly impacts skill behavior
Skills with executable scripts require deeper security audit
This framework is designed for OpenClaw's modular skill architecture

openclaw-skill-eeta-audit

概述

EEAT OpenClaw Skill Audit

Skill Overview

OpenClaw Skill Structure

Applicable Scenarios

Skill Selection

Security Vetting

Skill Development

Skill Maintenance

Core Capabilities

Skill Categories

Productivity Skills

Development Skills

Research Skills

Automation Skills

Content Skills

System Skills

8 Progressive Quality Gates

Gate 1: Metadata Validation (Pre-Installation)

Gate 2: Gating Mechanism Check

Gate 3: Security Pre-Check

Gate 4: Prompt Quality Review

Gate 5: Script Security Audit

Gate 6: Utility Testing

Gate 7: Comparative Analysis

Gate 8: Full 80-Item Audit

Usage Examples

Example 1: Pre-Installation Security Check

Example 2: Compare Two Skills

Example 3: Full Skill Audit

SKILL.md

Reference Documents

Key Differences: Code vs. Skill Audit

Success Points

Optimization Recommendations

1. Add Skill Registry Integration

2. Implement Automated Security Scanning

3. Add Utility Benchmarking

4. Create Skill Dependency Graph

5. Implement Skill Conflict Detection

6. Add Performance Profiling

7. Create Skill Reputation System

8. Implement A/B Testing Framework

9. Add Skill Update Notifications

10. Create Skill Usage Analytics

Notes

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

eeat-content-quality-audit

valuemining-lengthybooks

herclaw-agentsystem