← 返回
数据分析

prompt-engineer-toolkit

Analyzes and rewrites prompts for better AI output, creates reusable prompt templates for marketing use cases (ad copy, email campaigns, social media), and s...
分析并重写提示以提升AI输出,为营销用例(广告文案、邮件营销、社交媒体)创建可复用提示模板,并实现其他功能。
alirezarezvani
数据分析 clawhub v1.0.0 2 版本 100000 Key: 无需
★ 0
Stars
📥 712
下载
💾 12
安装
2
版本
#latest

概述

Prompt Engineer Toolkit

Overview

Use this skill to move prompts from ad-hoc drafts to production assets with repeatable testing, versioning, and regression safety. It emphasizes measurable quality over intuition. Apply it when launching a new LLM feature that needs reliable outputs, when prompt quality degrades after model or instruction changes, when multiple team members edit prompts and need history/diffs, when you need evidence-based prompt choice for production rollout, or when you want consistent prompt governance across environments.

Core Capabilities

  • A/B prompt evaluation against structured test cases
  • Quantitative scoring for adherence, relevance, and safety checks
  • Prompt version tracking with immutable history and changelog
  • Prompt diffs to review behavior-impacting edits
  • Reusable prompt templates and selection guidance
  • Regression-friendly workflows for model/prompt updates

Key Workflows

1. Run Prompt A/B Test

Prepare JSON test cases and run:

python3 scripts/prompt_tester.py \
  --prompt-a-file prompts/a.txt \
  --prompt-b-file prompts/b.txt \
  --cases-file testcases.json \
  --runner-cmd 'my-llm-cli --prompt {prompt} --input {input}' \
  --format text

Input can also come from stdin/--input JSON payload.

2. Choose Winner With Evidence

The tester scores outputs per case and aggregates:

  • expected content coverage
  • forbidden content violations
  • regex/format compliance
  • output length sanity

Use the higher-scoring prompt as candidate baseline, then run regression suite.

3. Version Prompts

# Add version
python3 scripts/prompt_versioner.py add \
  --name support_classifier \
  --prompt-file prompts/support_v3.txt \
  --author alice

# Diff versions
python3 scripts/prompt_versioner.py diff --name support_classifier --from-version 2 --to-version 3

# Changelog
python3 scripts/prompt_versioner.py changelog --name support_classifier

4. Regression Loop

  1. Store baseline version.
  2. Propose prompt edits.
  3. Re-run A/B test.
  4. Promote only if score and safety constraints improve.

Script Interfaces

  • python3 scripts/prompt_tester.py --help
  • Reads prompts/cases from stdin or --input
  • Optional external runner command
  • Emits text or JSON metrics
  • python3 scripts/prompt_versioner.py --help
  • Manages prompt history (add, list, diff, changelog)
  • Stores metadata and content snapshots locally

Pitfalls, Best Practices & Review Checklist

Avoid these mistakes:

  1. Picking prompts from single-case outputs — use a realistic, edge-case-rich test suite.
  2. Changing prompt and model simultaneously — always isolate variables.
  3. Missing must_not_contain (forbidden-content) checks in evaluation criteria.
  4. Editing prompts without version metadata, author, or change rationale.
  5. Skipping semantic diffs before deploying a new prompt version.
  6. Optimizing one benchmark while harming edge cases — track the full suite.
  7. Model swap without rerunning the baseline A/B suite.

Before promoting any prompt, confirm:

  • [ ] Task intent is explicit and unambiguous.
  • [ ] Output schema/format is explicit.
  • [ ] Safety and exclusion constraints are explicit.
  • [ ] No contradictory instructions.
  • [ ] No unnecessary verbosity tokens.
  • [ ] A/B score improves and violation count stays at zero.

References

Evaluation Design

Each test case should define:

  • input: realistic production-like input
  • expected_contains: required markers/content
  • forbidden_contains: disallowed phrases or unsafe content
  • expected_regex: required structural patterns

This enables deterministic grading across prompt variants.

Versioning Policy

  • Use semantic prompt identifiers per feature (support_classifier, ad_copy_shortform).
  • Record author + change note for every revision.
  • Never overwrite historical versions.
  • Diff before promoting a new prompt to production.

Rollout Strategy

  1. Create baseline prompt version.
  2. Propose candidate prompt.
  3. Run A/B suite against same cases.
  4. Promote only if winner improves average and keeps violation count at zero.
  5. Track post-release feedback and feed new failure cases back into test suite.

版本历史

共 2 个版本

  • v2.1.1
    2026-03-30 20:16 安全 安全
  • v1.0.0 当前
    2026-03-19 01:29 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 64,987
content-creation

Marketing Strategy Pmm

alirezarezvani
负责定位、GTM策略、竞品分析及产品发布的产品营销技能。当用户询问产品定位、市场进入策略等话题时使用。
★ 37 📥 11,845
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 164 📥 59,823