← 返回
未分类 中文

Nm Leyline Evaluation Framework

Patterns for building evaluation and scoring systems, quality gates, rubrics, and decision frameworks. Use for any scored assessment
提供加权评分、评分标准及决策阈值模式
athola athola 来源
未分类 clawhub v1.9.13 4 版本 100000 Key: 无需
★ 0
Stars
📥 429
下载
💾 1
安装
4
版本
#latest

概述

> Night Market Skill — ported from claude-night-market/leyline. For the full experience with agents, hooks, and commands, install the Claude Code plugin.

Table of Contents

Evaluation Framework

Overview

A generic framework for weighted scoring and threshold-based decision making. Provides reusable patterns for evaluating any artifact against configurable criteria with consistent scoring methodology.

This framework abstracts the common pattern of: define criteria → assign weights → score against criteria → apply thresholds → make decisions.

When To Use

  • Implementing quality gates or evaluation rubrics
  • Building scoring systems for artifacts, proposals, or submissions
  • Need consistent evaluation methodology across different domains
  • Want threshold-based automated decision making
  • Creating assessment tools with weighted criteria

When NOT To Use

  • Simple pass/fail without scoring needs

Core Pattern

1. Define Criteria

criteria:
  - name: criterion_name
    weight: 0.30          # 30% of total score
    description: What this measures
    scoring_guide:
      90-100: Exceptional
      70-89: Strong
      50-69: Acceptable
      30-49: Weak
      0-29: Poor

Verification: Run the command with --help flag to verify availability.

2. Score Each Criterion

scores = {
    "criterion_1": 85,  # Out of 100
    "criterion_2": 92,
    "criterion_3": 78,
}

Verification: Run the command with --help flag to verify availability.

3. Calculate Weighted Total

total = sum(score * weights[criterion] for criterion, score in scores.items())
# Example: (85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5

Verification: Run the command with --help flag to verify availability.

4. Apply Decision Thresholds

thresholds:
  80-100: Accept with priority
  60-79: Accept with conditions
  40-59: Review required
  20-39: Reject with feedback
  0-19: Reject

Verification: Run the command with --help flag to verify availability.

Quick Start

Define Your Evaluation

  1. Identify criteria: What aspects matter for your domain?
  2. Assign weights: Which criteria are most important? (sum to 1.0)
  3. Create scoring guides: What does each score range mean?
  4. Set thresholds: What total scores trigger which decisions?

Example: Code Review Evaluation

criteria:
  correctness: {weight: 0.40, description: Does code work as intended?}
  maintainability: {weight: 0.25, description: Is it readable?}
  performance: {weight: 0.20, description: Meets performance needs?}
  testing: {weight: 0.15, description: Tests detailed?}

thresholds:
  85-100: Approve immediately
  70-84: Approve with minor feedback
  50-69: Request changes
  0-49: Reject, major issues

Verification: Run pytest -v to verify tests pass.

Evaluation Workflow

**Verification:** Run the command with `--help` flag to verify availability.
1. Review artifact against each criterion
2. Assign 0-100 score for each criterion
3. Calculate: total = Σ(score × weight)
4. Compare total to thresholds
5. Take action based on threshold range

Verification: Run the command with --help flag to verify availability.

Common Use Cases

Quality Gates: Code review, PR approval, release readiness

Content Evaluation: Document quality, knowledge intake, skill assessment

Resource Allocation: Backlog prioritization, investment decisions, triage

Integration Pattern

# In your skill's frontmatter
dependencies: [leyline:evaluation-framework]

Verification: Run the command with --help flag to verify availability.

Then customize the framework for your domain:

  • Define domain-specific criteria
  • Set appropriate weights for your context
  • Establish meaningful thresholds
  • Document what each score range means

Detailed Resources

  • Scoring Patterns: See modules/scoring-patterns.md for detailed methodology
  • Decision Thresholds: See modules/decision-thresholds.md for threshold design

Exit Criteria

  • [ ] Criteria defined with clear descriptions
  • [ ] Weights assigned and sum to 1.0
  • [ ] Scoring guides documented for each criterion
  • [ ] Thresholds mapped to specific actions
  • [ ] Evaluation process documented and reproducible

版本历史

共 4 个版本

  • v1.9.13 当前
    2026-06-30 16:55 安全 安全
  • v1.9.12
    2026-06-19 19:58 安全 安全
  • v1.0.2
    2026-05-09 16:43 安全 安全
  • v1.0.1
    2026-05-07 16:05 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 276 📥 101,304
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 214 📥 71,081
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 142,590