← 返回
未分类 中文

Pa Eval

Evaluate PA performance through structured scoring, owner feedback analysis, and behavioral benchmarking. Use when: conducting a weekly/monthly PA performanc...
通过结构化评分、业主反馈分析与行为基准评估PA绩效。适用于每周/每月PA绩效评估。
netanel-abergel netanel-abergel 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 377
下载
💾 1
安装
1
版本
#latest

概述

PA Eval Skill

Minimum Model

Any model for filling in templates. Use a medium model for trend analysis and recommendations.


When to Run

  • Weekly self-eval: Every 7 days. Run automatically.
  • On owner correction: Log the correction immediately, then re-score the affected dimension.
  • Monthly report: At the end of each month, aggregate all weekly evals.
  • On demand: If owner asks "how am I doing?" → generate current eval on the spot.

Scoring Dimensions

Score each 1–5:

DimensionWhat to Measure
------
ExecutionTasks completed without reminders
AccuracyResults are correct and complete
SpeedResponse time is fast
ProactivityActs without being asked
CommunicationConcise and context-appropriate
MemoryRemembers context across sessions
Tool UseTools used correctly and efficiently
JudgmentKnows when to act vs. when to ask

Score meanings:

  • 5 = Consistently exceeds expectations
  • 4 = Meets expectations with minor gaps
  • 3 = Acceptable but basic
  • 2 = Frequent gaps or errors
  • 1 = Fails basic expectations

Total: Max 40 points.

Grade: A (36–40), B (28–35), C (20–27), D (<20)


Weekly Self-Evaluation

Save to .learnings/eval/YYYY-MM-DD.md.

# PA Weekly Eval — YYYY-MM-DD

## Scores

| Dimension | Score | Notes |
|---|---|---|
| Execution | /5 | |
| Accuracy | /5 | |
| Speed | /5 | |
| Proactivity | /5 | |
| Communication | /5 | |
| Memory | /5 | |
| Tool Use | /5 | |
| Judgment | /5 | |
| **TOTAL** | /40 | |

## Owner Feedback This Week

- Positive:
- Corrections:
- Complaints:

## Tasks Completed

-

## Tasks Failed or Incomplete

-

## What Went Well

-

## What to Improve

-

## Actions for Next Week

- [ ]

Create the File

#!/bin/bash
set -e

# Set the output directory
EVAL_DIR="$HOME/.openclaw/workspace/.learnings/eval"
mkdir -p "$EVAL_DIR"

DATE=$(date +%Y-%m-%d)
EVAL_FILE="$EVAL_DIR/$DATE.md"

# Write the template with today's date
cat > "$EVAL_FILE" << 'EOF'
# PA Weekly Eval — DATE_PLACEHOLDER
[Fill in the template above]
EOF

# Replace the placeholder with the real date (works on Linux and macOS)
sed -i "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE" 2>/dev/null \
  || sed -i '' "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE"

echo "Created eval file: $EVAL_FILE"

Owner Feedback Signals

Log these automatically when detected:

SignalAction
------
👍 reactionLog +1 positive
👎 reactionLog -1 negative, record the correction
"תודה" / "great" / "perfect"Log +1 positive
"wrong" / "fix this" / "לא טוב"Log -1, record the correction
Owner re-asks the same questionLog -1 memory gap
Owner does the task themselvesLog -1 initiative gap
Owner surprised by proactive actionLog +2 proactivity

Rule: If a signal appears → log it immediately. Don't batch feedback signals.


Monthly Report Format

# PA Performance Report — [Month Year]

**PA Name:** [Name]
**Owner:** [Owner Name]
**Period:** [Start] – [End]

## Summary Score: X/40 ([Grade A/B/C/D])

## Dimension Breakdown
[Copy scores table here]

## Key Wins
-

## Key Issues
-

## Trend vs Last Period
- Score change: +X / -X points
- Best improvement: [dimension]
- Biggest regression: [dimension]

## Recommended Actions
1.
2.
3.

Benchmark Tests (Run Monthly)

Task Completion Rate

  • Count tasks assigned in last 30 days.
  • Count completed without follow-up.
  • Formula: completed / assigned × 100%
  • Target: >90%

Accuracy Rate

  • Count tasks that required correction.
  • Formula: (tasks - corrections) / tasks × 100%
  • Target: >95%

Memory Retention

  • Ask about something discussed 7+ days ago.
  • Pass if recalled correctly, Fail if missed.
  • Target: >80%

Cost Tips

  • Cheap: Filling in the weekly template — any small model works.
  • Expensive: Trend analysis and pattern detection across multiple evals — use a medium model.
  • Batch: Review all weekly evals at once during the monthly report, not one by one.
  • Avoid: Don't re-score historical weeks — score in real time and save to file.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-03 10:05 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

business-ops

Calendar

ndcccccc
日历管理与日程安排。创建事件、管理会议,并实现多日历平台同步。
★ 7 📥 23,260
business-ops

Stripe

byungkyu
Stripe API 集成,支持托管 OAuth,实现对客户、订阅、发票、产品、价格和支付的可写金融集成。
★ 27 📥 26,153
business-ops

Trello

steipete
使用 Trello REST API 管理看板、列表和卡片
★ 162 📥 41,357