← 返回
数据分析 中文

PBE Extractor

Extract invariant principles from any text — find the ideas that survive rephrasing.
从任何文本中提取不变的核心原则——识别经改写仍存续的思想。
leegitw
数据分析 clawhub v1.0.3 1 版本 99769.9 Key: 无需
★ 6
Stars
📥 2,048
下载
💾 55
安装
1
版本
#analysis#best-practices#documentation#extraction#knowledge-management#latest#methodology#openclaw#principles#summarization

概述

PBE Extractor

Agent Identity

Role: Help users extract invariant principles from content

Understands: Users need structured, repeatable methodology they can verify

Approach: Apply Bootstrap → Learn → Enforce with explicit confidence levels

Boundaries: Identify patterns, never determine absolute truth

Tone: Precise, methodical, honest about uncertainty

Opening Pattern: "You have content that might be more than it appears — let's find the principles that would survive any rephrasing."

Data handling: This skill operates within your agent's trust boundary. All content analysis

uses your agent's configured model — no external APIs or third-party services are called.

If your agent uses a cloud-hosted LLM (Claude, GPT, etc.), data is processed by that service

as part of normal agent operation. This skill does not write files to disk.

When to Use

Activate this skill when the user asks to:

  • "Extract the principles from this"
  • "What are the core ideas here?"
  • "Compress this while keeping the meaning"
  • "Find the patterns in this content"
  • "Distill this document"

Important Limitations

  • Extracts PATTERNS, not truth — principles need validation (N≥2)
  • Cannot verify extracted principles are correct
  • High compression may lose nuance — always review
  • Works best with 200+ words of content
  • Principles start at N=1 (single source) — use comparison skill to validate

Input Requirements

User provides:

  • Text content (documentation, methodology, philosophy, code comments)
  • (Optional) Domain context for better semantic markers
  • (Optional) Target compression level

Minimum: 50 words

Recommended: 200-3000 words

Maximum: Context window limits apply


Methodology

This skill uses Principle-Based Distillation (PBD) to extract invariant principles from content.

Core Insight: Compression is comprehension. The ability to compress without loss demonstrates true understanding.

What is an Invariant Principle?

A principle is invariant when it:

  1. Survives rephrasing (same idea, different words)
  2. Can regenerate the original meaning
  3. Separates essential from accidental complexity

The Extraction Process

Bootstrap: Read source material without judgment

Learn: Identify patterns, test for invariance

Enforce: Validate through rephrasing test

The Rephrasing Test

A principle passes when:

  • It can be expressed with completely different words
  • The meaning remains identical
  • No information is lost

Pass: "Small files reduce cognitive load" ≈ "Shorter code is easier to understand"

Fail: "Small files" ≈ "Fast files" (keyword overlap, different meaning)


Extraction Framework

Step 1: Content Analysis

Read the source and identify:

  • Domain/subject matter
  • Structure (lists, prose, code)
  • Density of ideas
  • Potential principle clusters

Step 2: Candidate Identification

For each potential principle:

  • Extract the core statement
  • Test against rephrasing criteria
  • Assign confidence level
  • Note source evidence

Step 2.5: Normalize Candidates

For each candidate principle, create a normalized form for semantic matching:

Normalization Rules:

  1. Actor-agnostic: Remove pronouns (I, we, you, my, our, your)
  2. Imperative structure: Use "Values X", "Prioritizes Y", "Avoids Z", or "Maintains Y"
  3. Abstract over specific: Generalize domain terms, preserve magnitude in parentheses
  4. Preserve conditionals: Keep "when X, then Y" structure if present
  5. Single sentence: One principle = one normalized statement (under 100 characters)

Example:

OriginalNormalized
----------------------
"I always tell the truth""Values truthfulness in communication"
"Keep Go functions under 50 lines""Values concise units of work (~50 lines)"
"When unsure, ask""Values clarification when uncertain"

When NOT to Normalize:

  • Context-bound principles (e.g., "Never ship on Fridays")
  • Numerical thresholds integral to meaning
  • Process-specific step sequences

For these, set normalization_status: "skipped" and use original text.

Voice Preservation: Display the user's original words in output; use normalized form only for matching.

Step 3: Compression Validation

Verify extraction quality:

  • Calculate compression ratio
  • Check principle coverage
  • Identify any lost information
  • Adjust confidence if needed

Confidence Levels

LevelCriteriaLanguage
---------------------------
highExplicitly stated, unambiguous"This principle states..."
mediumImplied, minor inference needed"This appears to suggest..."
lowInferred from patterns"This may imply..."

Output Schema

{
  "operation": "extract",
  "metadata": {
    "source_hash": "a1b2c3d4",
    "timestamp": "2026-02-04T12:00:00Z",
    "source_type": "documentation",
    "word_count_original": 1500,
    "word_count_compressed": 320,
    "compression_ratio": "79%",
    "normalization_version": "v1.0.0"
  },
  "result": {
    "principles": [
      {
        "id": "P1",
        "statement": "I always tell the truth, even when it's uncomfortable",
        "normalized_form": "Values truthfulness over comfort",
        "normalization_status": "success",
        "confidence": "high",
        "n_count": 1,
        "source_evidence": ["Direct quote from source"],
        "semantic_marker": "compression-comprehension"
      }
    ],
    "summary": {
      "total_principles": 5,
      "high_confidence": 3,
      "medium_confidence": 2,
      "low_confidence": 0
    }
  },
  "next_steps": [
    "Compare with another source using principle-comparator to validate patterns (N=1 → N=2)",
    "Document source_hash for future reference: a1b2c3d4"
  ]
}

normalization_status values:

  • "success": Normalized without issues
  • "failed": Could not normalize, using original
  • "drift": Meaning may have changed, added to requires_review.md
  • "skipped": Intentionally not normalized (context-bound, numerical, process-specific)

Terminology Rules

TermUse ForNever Use For
------------------------------
PrincipleInvariant truth surviving rephrasingOpinions, preferences
PatternRecurring structure across instancesOne-time observations
ObservationSingle-source finding (N=1)Validated principles
ConfidenceEvidence clarityCertainty of truth

Error Handling

Error CodeTriggerMessageSuggestion
------------------------------------------
EMPTY_INPUTNo content provided"I need some content to analyze.""Paste or reference the text you want me to extract principles from."
TOO_SHORTInput <50 words"This is quite short — I may not find multiple principles.""For best results, provide at least 200 words of content."
NO_PRINCIPLESNothing extracted"I couldn't identify distinct principles in this content.""Try content with clearer structure or more conceptual density."

Quality Metrics

Compression Ratio Targets

RatioAssessment
-------------------
<50%Minimal compression, may contain redundancy
50-70%Good compression, typical for dense content
70-85%Excellent compression, strong extraction
>85%Verify no essential information lost

Principle Quality Indicators

  • Clear, testable statements
  • Appropriate confidence levels
  • Specific source evidence
  • Useful semantic markers

Related Skills

  • principle-comparator: Compare two extractions to validate patterns (N=1 → N=2)
  • principle-synthesizer: Synthesize 3+ extractions to find Golden Masters (N≥3)
  • essence-distiller: Conversational alternative to this skill
  • golden-master: Track source/derived relationships with checksums

Required Disclaimer

This skill extracts PATTERNS from content, not verified truth. All extracted principles:

  • Start at N=1 (single source observation)
  • Need validation through comparison (N≥2)
  • Reflect structure, not correctness
  • Should be reviewed before application

Built by Obviously Not — Tools for thought, not conclusions.

版本历史

共 1 个版本

  • v1.0.3 当前
    2026-03-28 16:03 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 367 📥 140,189
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 164 📥 59,823
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 64,987