← 返回
效率工具 中文

Compress

Compress text semantically with iterative validation, anchor checksums, and verified information preservation.
使用迭代验证、固定校验和及信息完整性验证进行语义化文本压缩。
ivangdavila
效率工具 clawhub v1.0.0 1 版本 99782.9 Key: 无需
★ 2
Stars
📥 1,339
下载
💾 37
安装
1
版本
#latest

概述

⚠️ Important Limitations

This is SEMANTIC compression, not bit-perfect lossless.

  • L1-L2: Verified reconstruction, production-ready
  • L3-L4: Experimental, may lose subtle information
  • Never use for: Medical dosages, legal text, financial figures, safety-critical data

The Validation Loop

1. Compress original O → compressed C
2. Extract anchors from O (entities, numbers, dates)
3. Reconstruct C → R (without seeing O)
4. Verify: anchors match + semantic diff
5. If mismatch → refine C with missing info
6. Repeat until validated (max 3 iterations)

Convergence = verified. No convergence after 3 rounds = level too aggressive.


Quick Reference

TaskLoad
------------
Compression levels (L1-L4)levels.md
Validation algorithm detailsvalidation.md
Format-specific strategiesformats.md
Token budgeting and metricsmetrics.md

Compression Levels

LevelRatioReliabilityUse Case
-------------------------------------
L1~0.8x✅ HighProduction, human-readable
L2~0.5x✅ GoodSystem prompts, repeated use
L3~0.3x⚠️ ModerateExperimental, review output
L4~0.15x⚠️ LowResearch only, expect losses

Anchor Checksum System

Before compression, extract critical facts:

[ANCHORS: 3 people, $42,000, 2024-03-15, "Project Alpha"]

Reconstruction MUST reproduce these exactly. If anchors mismatch → compression failed.


Core Rules

  1. Always validate — Never trust compression without reconstruction test
  2. Use anchors — Extract numbers, names, dates before compressing
  3. Cap at L2 for production — L3-L4 are experimental
  4. Report confidence — Include iteration count and anchor match rate
  5. Independent verification — Consider different model for reconstruction

Cost-Benefit Reality

Each compression costs 3-4 LLM calls. Break-even calculation:

break_even_retrievals = compression_tokens / saved_tokens_per_use

Only cost-effective if: You'll retrieve the compressed content 6-8+ times.

For one-time use → just use the original text.


Before Compressing

  • [ ] Content type is NOT safety-critical
  • [ ] Target level chosen (L1-L2 recommended)
  • [ ] Anchors identified (numbers, names, dates)
  • [ ] ROI makes sense (multiple retrievals expected)

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 03:43 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 438 📥 147,478
productivity

Baidu web search

ide-rea
使用百度AI搜索引擎(BDSE)进行网络搜索。适用于获取实时信息、文档资料或研究课题。
★ 237 📥 105,456
productivity

Weather

steipete
获取当前天气和预报(无需API密钥)
★ 445 📥 226,230