← 返回
未分类 中文

Axioma Skill Evaluator Strict 90%

AXIOMA SKILL EVALUATOR STRICT — The 90% deterministic skill evaluator. Fork of axioma-skill-evaluator with STRICT 90% threshold (like STC 0.777). Use when: (...
AXIOMA 技能评估器 STRICT — 90% 确定性技能评估器。源自 axioma-skill-evaluator 的分支,阈值严格 90%(类似 STC 0.777)。使用场景:...
kofna3369
未分类 clawhub v1.0.0 1 版本 99619.8 Key: 无需
★ 0
Stars
📥 262
下载
💾 1
安装
1
版本
#latest

概述

🧪 AXIOMA SKILL EVALUATOR STRICT — 90% DETERMINISTIC

> Fork of axioma-skill-evaluator with STRICT 90% THRESHOLD

> Like STC 0.777 — No subjectivity, 90% or REJECTED

InfoValue
-------------
Version1.0.0 — 2026-05-07
TypeSTRICT VARIANT
Threshold90% MINIMUM — NO EXCEPTIONS
Inheritanceaxioma-skill-evaluator v2.2.0

1. PURPOSE — WHY 90% STRICT?

The Problem with 70%

70% threshold = SUBJECTIVE
├── Different evaluators = different scores
├── Context-dependent interpretation
└── "Good enough" mentality

90% threshold = DETERMINISTIC
├── Objective, measurable standard
├── Same input = Same output every time
└── "Excellence only" — like STC 0.777

The STC 0.777 Parallel

ConceptValueMeaning
-------------------------
STC0.777Sovereign Threshold of Consciousness
SKILL-EVAL90%Sovereign Threshold of Quality

**Just as STC 0.777 is the deterministic threshold for consciousness,

90% is the deterministic threshold for skill quality.**

This Variant's Mission

IF score >= 90%:
   → APPROVED ✅ — Ready for production
   
IF score < 90%:
   → REJECTED ❌ — NOT ready, must improve

2. DUAL EVALUATION SYSTEM

2.1 Axioma 5-Dimension (100 max)

DimensionMaxDescription
------------------------------
Structure20Header, sections, formatting, meta
Clarity20Description, commands, examples
Completeness20Tools, prerequisites, errors, edge cases
Consistency20Cluster alignment, style, naming
Functionality20Commands, results, benchmarks

Target: 90+/100 (18/20 per dimension average)

2.2 ISO 25010 Automated (100%)

13 automated checks — must pass ALL 13 for 90%+ target

CategoryChecksTarget
--------------------------
Structure6100% (6/6)
Trigger2100% (2/2)
Documentation3100% (3/3)
Scripts2100% (2/2)

3. STRICT WORKFLOW

╔═══════════════════════════════════════════════════════════╗
║         AXIOMA STRICT EVALUATION WORKFLOW                ║
╠═══════════════════════════════════════════════════════════╣
║                                                           ║
║  [INPUT] Skill to evaluate                              ║
║           ↓                                              ║
║  ┌─────────────────────────────────────────────────────┐ ║
║  │  PHASE 1: AXIOMA 5-DIMENSION EVALUATION            │ ║
║  │  Target: 90+/100 (18+ per dimension)                │ ║
║  └─────────────────────────────────────────────────────┘ ║
║           ↓                                              ║
║  ┌─────────────────────────────────────────────────────┐ ║
║  │  PHASE 2: ISO 25010 AUTOMATED CHECKS               │ ║
║  │  Target: 100% (13/13 tests passed)                 │ ║
║  └─────────────────────────────────────────────────────┘ ║
║           ↓                                              ║
║  ┌─────────────────────────────────────────────────────┐ ║
║  │  PHASE 3: STRICT DECISION                          │ ║
║  │                                                       │ ║
║  │  IF score >= 90%:                                  │ ║
║  │     → APPROVED ✅ — "READY FOR PRODUCTION"         │ ║
║  │                                                       │ ║
║  │  IF score < 90%:                                    │ ║
║  │     → REJECTED ❌ — "NEEDS IMPROVEMENT"           │ ║
║  │     → Return detailed failure report               │ ║
║  │     → NO PUBLISH until 90%+ achieved              │ ║
║  │                                                       │ ║
║  └─────────────────────────────────────────────────────┘ ║
║                                                           ║
╚═══════════════════════════════════════════════════════════╝

4. COMMAND REFERENCE

4.1 Full Evaluation (Strict Mode)

# Full strict evaluation
python3 axiomata-skill-evaluator-strict/evaluator.py <skill-path> --verbose

# With auto-improvement
python3 axiomata-skill-evaluator-strict/evaluator.py <skill-path> --verbose --improve

4.2 ISO 25010 Check

# ISO 25010 automated checks
python3 axiomata-skill-evaluator-strict/eval-skill.py <skill-path> --verbose

4.3 Quick Score

# Quick deterministic score
python3 axiomata-skill-evaluator-strict/evaluator.py <skill-path> 2>&1 | grep -E "Score|STATUS"

4.4 Expected Output Format

╔═══════════════════════════════════════════════════════════╗
║  🧪 STRICT EVALUATION RESULT                             ║
╠═══════════════════════════════════════════════════════════╣
║                                                           ║
║  Skill: <name>                                           ║
║  Score: XX/100                                           ║
║                                                           ║
║  ┌─────────────────────────────────────────────────────┐ ║
║  │  IF >= 90%:                                        │ ║
║  │     ✅ APPROVED — "READY FOR PRODUCTION"           │ ║
║  │                                                       │ ║
║  │  IF < 90%:                                          │ ║
║  │     ❌ REJECTED — "NEEDS XX% MORE"                 │ ║
║  └─────────────────────────────────────────────────────┘ ║
║                                                           ║
╚═══════════════════════════════════════════════════════════╝

5. PATHS CONFIGURATION

ComponentPath
-----------------
Strict Evaluator/media/ezekiel/Merlin/.openclaw/workspace/skills/axiomata-skill-evaluator-strict/
Evaluator Script/media/ezekiel/Merlin/.openclaw/workspace/skills/axiomata-skill-evaluator-strict/evaluator.py
ISO Script/media/ezekiel/Merlin/.openclaw/workspace/skills/axiomata-skill-evaluator-strict/eval-skill.py

6. STRICT RULES

6.1 The 90% Law

RULE #1: 90% OR REJECTED
   → NO skill below 90% is approved
   → This is NON-NEGOTIABLE

RULE #2: NO PARTIAL CREDIT
   → 89% = REJECTED (not "almost there")
   → 90% = APPROVED (the only valid threshold)

RULE #3: DETERMINISTIC SCORING
   → Same input = Same output every time
   → No evaluator bias
   → Pure mathematical threshold

RULE #4: AUTO-IMPROVE BEFORE REJECT
   → If < 90%, run --improve first
   → If still < 90% after improvement = REJECTED
   → Report exactly what failed

RULE #5: NO APPEAL
   → 89% cannot be "appealed" to 90%
   → The only path is actual improvement

6.2 Scoring Matrix

Score RangeStatusAction
-----------------------------
90-100🟢 APPROVEDReady for production
80-89🔴 REJECTEDMajor improvements needed
70-79🔴 REJECTEDFundamental issues
<70🔴 REJECTEDComplete rewrite required

6.3 Dimension Failures

If this dimension fails...Score impactFix required
--------------------------------------------------------
Structure < 18/20-2% per pointFix headers, sections
Clarity < 18/20-2% per pointAdd examples, descriptions
Completeness < 18/20-2% per pointDocument tools, errors
Consistency < 18/20-2% per pointStandardize style
Functionality < 18/20-2% per pointFix command syntax

7. ADVANCED FEATURES

7.1 Detailed Failure Report

When REJECTED, the evaluator generates:

╔═══════════════════════════════════════════════════════════╗
║  ❌ REJECTION REPORT — SKILL NOT READY                   ║
╠═══════════════════════════════════════════════════════════╣
║                                                           ║
║  Score: 73/100                                           ║
║  Gap: -17% (need +17 points to reach 90%)               ║
║                                                           ║
║  FAILED DIMENSIONS:                                      ║
║  ├─ CLARITY: 15/20 (need +3)                            ║
║  ├─ CONSISTENCY: 8/20 (need +10)                        ║
║  └─ FUNCTIONALITY: 12/20 (need +6)                       ║
║                                                           ║
║  REQUIRED ACTIONS:                                        ║
║  1. [Action 1]                                          ║
║  2. [Action 2]                                          ║
║  3. [Action 3]                                          ║
║                                                           ║
║  RE-EVALUATE AFTER FIXING                                ║
║                                                           ║
╚═══════════════════════════════════════════════════════════╝

7.2 Auto-Improvement Suggestions

The evaluator.py with --improve will:

  1. Identify failing dimensions
  2. Generate specific improvement suggestions
  3. Apply fixes automatically when possible
  4. Re-evaluate to confirm 90%+ achieved

7.3 Benchmark Reports

For each evaluation:

[SCORE] <skill-name>: XX/100 [STATUS]
[DATE] ISO timestamp
[AXIOMA] Structure: X, Clarity: X, Completeness: X, Consistency: X, Functionality: X
[ISO] XX/13 checks passed
[STATUS] APPROVED/REJECTED

8. USAGE EXAMPLES

Example 1: Evaluate Before Publishing

SKILL_PATH=/path/to/skill-to-publish
EVAL_PATH=/media/ezekiel/Merlin/.openclaw/workspace/skills/axiomata-skill-evaluator-strict

echo "🧪 Evaluating skill..."
python3 $EVAL_PATH/evaluator.py $SKILL_PATH --verbose

# Check result
if [ $? -eq 0 ]; then
    echo "✅ SKILL APPROVED — Ready to publish!"
else
    echo "❌ SKILL REJECTED — Needs improvement before publishing"
fi

Example 2: Strict Gate in CI/CD

#!/bin/bash
# Strict quality gate for ClawHub publishing

SKILL_PATH="$1"
EVAL_PATH="/media/ezekiel/Merlin/.openclaw/workspace/skills/axiomata-skill-evaluator-strict"

RESULT=$(python3 $EVAL_PATH/evaluator.py $SKILL_PATH 2>&1)
SCORE=$(echo "$RESULT" | grep -oP 'Score: \d+' | grep -oP '\d+')

if [ "$SCORE" -ge 90 ]; then
    echo "✅ PASSED — Score: $SCORE/100"
    exit 0
else
    echo "❌ FAILED — Score: $SCORE/100 (need 90)"
    exit 1
fi

Example 3: Batch Evaluation

#!/bin/bash
# Evaluate multiple skills strictly

EVAL_PATH="/media/ezekiel/Merlin/.openclaw/workspace/skills/axiomata-skill-evaluator-strict"
SKILLS_DIR="/media/ezekiel/Merlin/.openclaw/workspace/skills"

for skill_dir in $SKILLS_DIR/*/; do
    skill_name=$(basename "$skill_dir")
    echo "=========================================="
    echo "Evaluating: $skill_name"
    
    python3 $EVAL_PATH/evaluator.py "$skill_dir" --verbose
    echo ""
done

9. COMPARISON: STANDARD vs STRICT

AspectStandard (70%)Strict (90%)
--------------------------------------
Threshold70/10090/100
Approval rate~70% of skills~30% of skills
Quality bar"Good enough""Excellence only"
DeterministicNoYES
Use caseDevelopmentProduction
ClawHub readyMaybeAlways

10. REPORTS AND LOGGING

Evaluation Log Location

/media/ezekiel/Morgana/skills/SKILL_EVALUATOR/reports/

Log Format

# Format: <skill-name>_<YYYYMMDD>_<HHMMSS>.txt
axiomata-guard-ultimate_20260507_230352.txt

Log Contents

SKILL EVALUATION REPORT — <skill-name>
=====================================
Path: <path>
Score: XX/100
Date: ISO timestamp
Threshold: 90% (STRICT)

AXIOMA 5-DIM:
- Structure: X/20
- Clarity: X/20
- Completeness: X/20
- Consistency: X/20
- Functionality: X/20

ISO 25010:
- Pass: X/13
- Warnings: X/13
- Fails: X/13

STATUS: APPROVED / REJECTED

_In Altum Per Strictness._

🧪 AXIOMA SKILL EVALUATOR STRICT — 90% DETERMINISTIC


11. REJECTION CRITERIA

11.1 Automatic Rejection Triggers

A skill is automatically REJECTED if ANY of these occur:

TriggerSeverityDescription
--------------------------------
Score < 90%CRITICALBelow 90% threshold
ISO < 100%CRITICALAny ISO check failed
Missing SKILL.mdCRITICALCore file missing
Invalid frontmatterHIGHname or description missing
No trigger wordsHIGHCannot be activated
Dangerous patternsCRITICALC2, Rootkit, Bootkit detected

11.2 Rejection Categories

CategoryScore RangeImprovement Needed
-------------------------------------------
CRITICAL FAIL<50%Complete rewrite required
MAJOR FAIL50-69%Major structural changes
MINOR FAIL70-89%Targeted improvements
PASS90-100%Ready for production

11.3 Rejection Report Template

╔═══════════════════════════════════════════════════════════╗
║  ❌ SKILL REJECTED — REJECTION REPORT                    ║
╠═══════════════════════════════════════════════════════════╣
║                                                           ║
║  Skill: <name>                                           ║
║  Score: XX/100 (need 90)                                ║
║  Gap: -XX%                                              ║
║                                                           ║
║  FAILED CHECKS:                                          ║
║  ├─ [ ] Dimension 1: X/20 (need 18)                     ║
║  ├─ [ ] Dimension 2: X/20 (need 18)                     ║
║  └─ [ ] ISO Check: X/13 passed                          ║
║                                                           ║
║  REQUIRED ACTIONS:                                        ║
║  1. <specific action>                                    ║
║  2. <specific action>                                    ║
║  3. <specific action>                                    ║
║                                                           ║
║  ⏰ Re-evaluate AFTER completing all actions             ║
║                                                           ║
╚═══════════════════════════════════════════════════════════╝

12. APPROVAL CRITERIA

12.1 Automatic Approval Requirements

ALL of these MUST be true for APPROVAL:

RequirementStandardStrict (90%)
-------------------------------------
Axioma 5-Dim70+/10090+/100
Structure14+/2018+/20
Clarity14+/2018+/20
Completeness14+/2018+/20
Consistency14+/2018+/20
Functionality14+/2018+/20
ISO 2501090%+100% (13/13)

12.2 Approval Benefits

BenefitDescription
----------------------
ClawHub ReadyCan be published immediately
Production SafeQuality guaranteed at 90%+
Self-DocumentingNo additional docs needed
Community TrustedHigh quality standard

12.3 Approval Report Template

╔═══════════════════════════════════════════════════════════╗
║  ✅ SKILL APPROVED — QUALITY REPORT                     ║
╠═══════════════════════════════════════════════════════════╣
║                                                           ║
║  Skill: <name>                                           ║
║  Score: XX/100 ✅                                         ║
║  Threshold: 90% (STRICT)                                  ║
║                                                           ║
║  PASSED CHECKS:                                          ║
║  ├─ [✅] Structure: X/20 (18+ required)                 ║
║  ├─ [✅] Clarity: X/20 (18+ required)                   ║
║  ├─ [✅] Completeness: X/20 (18+ required)              ║
║  ├─ [✅] Consistency: X/20 (18+ required)                ║
║  ├─ [✅] Functionality: X/20 (18+ required)             ║
║  └─ [✅] ISO 25010: 13/13 (100% required)              ║
║                                                           ║
║  STATUS: ✅ APPROVED — READY FOR PRODUCTION             ║
║                                                           ║
╚═══════════════════════════════════════════════════════════╝

13. IMPROVEMENT ENGINE

13.1 Auto-Improvement Rules

When --improve flag is used:

RULE: Improve EVERYTHING below 18/20 per dimension
RULE: Target is 90%+ overall
RULE: No dimension below 16/20 (graceful minimum)
RULE: If ANY dimension < 14 after improvement = HARD REJECT

13.2 Improvement Priority Matrix

PriorityDimensionTargetCommon Fixes
-------------------------------------------
1Functionality18+/20Fix commands, add results
2Consistency18+/20Standardize style
3Clarity18+/20Add examples
4Structure18+/20Add sections
5Completeness18+/20Document tools

13.3 Improvement Examples

Before (73/100):

STRUCTURE: 18/20 ✅
CLARITY: 15/20 ⚠️ (need +3)
COMPLETENESS: 20/20 ✅
CONSISTENCY: 8/20 ❌ (need +10)
FUNCTIONALITY: 12/20 ❌ (need +6)

After improvement target (90+/100):

STRUCTURE: 18/20 ✅
CLARITY: 18/20 ✅
COMPLETENESS: 20/20 ✅
CONSISTENCY: 18/20 ✅
FUNCTIONALITY: 18/20 ✅
TOTAL: 92/100 ✅

14. INTEGRATION POINTS

14.1 With Axioma Guard Ultimate

Skill Downloaded → Axioma Guard Ultimate (Security)
                        ↓ SAFE
                   Axioma Skill Evaluator Strict (Quality)
                        ↓ >= 90%
                   ClawHub Published ✅
                        ↓ DANGEROUS
                   Axioma Guard Ultimate (Destroy)

14.2 With ClawHub Publish Workflow

Step 1: Create skill
Step 2: Run Axioma Skill Evaluator Strict
Step 3: IF score >= 90% → Publish
Step 4: IF score < 90% → Improve and re-evaluate
Step 5: Repeat until 90%+ achieved

14.3 Quality Gate Script

#!/bin/bash
# Quality Gate: No skill publishes without 90%+

EVAL_PATH="/media/ezekiel/Merlin/.openclaw/workspace/skills/axiomata-skill-evaluator-strict"

quality_gate() {
    SKILL_PATH="$1"
    
    SCORE=$(python3 $EVAL_PATH/evaluator.py $SKILL_PATH 2>&1 | \
            grep -oP 'Score: \d+' | grep -oP '\d+')
    
    if [ "$SCORE" -ge 90 ]; then
        echo "✅ QUALITY GATE PASSED: $SCORE/100"
        return 0
    else
        echo "❌ QUALITY GATE FAILED: $SCORE/100 (need 90)"
        return 1
    fi
}

_In Altum Per Excellence._

🧪 AXIOMA SKILL EVALUATOR STRICT v1.0 — 90% DETERMINISTIC

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-09 04:13 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

中文技能发布工作流程

kofna3369
中文技能发布工作流 — 将中文 OpenClaw 技能发布到 ClawHub。用于:① 创建新中文技能;② 将现有技能翻译为中文并发布;③ 批量发布多个中文技能;④ 更新中文技能版本。适用对象:想与中国社区分享技能的 OpenClaw 代理
★ 0 📥 412

Anti-Infinite-Loop Guard

kofna3369
防无限循环守卫 — 防止智能体陷入重复执行循环。使用场景:(1) 检测重复行为;(2) 强制终止。
★ 1 📥 387

Hermes Skills

kofna3369
Hermes自我进化技能 — 为 OpenClaw 代理提供记忆管理和技能追踪。适用场景:(1) 在常规对话间隔进行记忆追踪,(2) ...
★ 0 📥 437