← 返回
未分类

Citation Diversifier

Raise citation diversity/density (NO NEW FACTS): generate an in-scope “citation budget” plan per H3 so drafts stop failing the global unique-citation gate an...
Raise citation diversity/density (NO NEW FACTS): generate an in-scope “citation budget” plan per H3 so drafts stop failing the global unique-citation gate an...
willoscar willoscar 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 414
下载
💾 2
安装
1
版本
#latest

概述

Citation Diversifier (budget-as-constraints) [NO NEW FACTS]

Purpose: fix a common survey failure mode:

  • the draft reads under-cited (or reuses the same few citations everywhere)
  • the pipeline fails the global unique-citation gate

This skill does not change prose by itself.

It produces a constraint sheet: output/CITATION_BUDGET_REPORT.md.

Inputs

  • output/DRAFT.md
  • outline/outline.yml (H3 ids/titles; used to allocate budgets per subsection)
  • outline/writer_context_packs.jsonl (source of allowed_bibkeys_{selected,mapped,chapter,global} per H3)
  • citations/ref.bib

Output

  • output/CITATION_BUDGET_REPORT.md

Non-negotiables (NO NEW FACTS)

  • Only propose citation keys that exist in citations/ref.bib.
  • Only propose keys that are in-scope for the target H3 (prefer subsection-first scope; use chapter/global only when truly cross-cutting).
  • Do not propose “padding citations” that would require adding new claims or new numbers.

What a good budget report looks like (contract)

The report should feel like a constraint sheet, not a random list:

  • It states the blocking policy target and the gap-to-target (how many unique keys are missing; policy default is recommended).
  • For each H3, it proposes a scope-safe budget sized to actually close the gap:
  • small gaps: 3-6 keys / H3 is often enough
  • A150++ gaps: plan for ~6-12 keys / H3 (and avoid duplicates across H3 budgets)
  • It gives placement guidance (where in the subsection those keys can be embedded without adding new facts).

Canonical (parseable) lines required (downstream validators depend on these):

  • The target is derived from queries.md:citation_target (recommended by default for A150++).
  • - Global target (policy; blocking): >= ...
  • - Gap: (gap-to-target; if 0, injection can be a no-op PASS)

Optional (always reported; may be blocking depending on citation_target):

  • - Global recommended target: >= ...
  • - Gap to recommended:

Recommended prioritization (scope-safe):

  • allowed_bibkeys_selectedallowed_bibkeys_mappedallowed_bibkeys_chapter
  • Use allowed_bibkeys_global only for:
  • benchmarks/protocol papers
  • widely-used datasets/suites
  • cross-cutting surveys/method papers referenced across chapters

How this connects to writing (LLM-first)

After you generate the budget report:

  • Apply it using citation-injector (LLM edits to output/DRAFT.md, NO NEW FACTS).
  • Then run draft-polisher to remove any “budget dump voice” while keeping citation keys unchanged.

Important: citation-injector is LLM-first. Its script is validation-only.

Workflow

1) Diagnose the global situation

  • Read output/DRAFT.md and estimate the “unique-key gap” (or use pipeline-auditor’s FAIL reason).

2) Allocate budgets per H3 (scope-first)

  • Use outline/outline.yml to enumerate H3s in paper order.
  • For each H3, read its allowed key sets from outline/writer_context_packs.jsonl.
  • Pick a small set of unused keys that strengthen positioning without requiring new claims.

3) Write output/CITATION_BUDGET_REPORT.md

Required structure:

  • - Status: PASS|FAIL
  • - Global target (policy; blocking): >= ...
  • - Gap:
  • ## Summary (gap + strategy)
  • ## Per-subsection budgets (H3 id/title → suggested keys → placement hint)

Script (optional; deterministic report generator)

If you want a deterministic first-pass budget report, run the helper script. Treat it as a baseline and refine the plan as needed.

Quick Start

  • python scripts/run.py --help
  • python scripts/run.py --workspace workspaces/

All Options

  • --workspace
  • --unit-id (optional)
  • --inputs (rare override; prefer defaults)
  • --outputs (rare override; default writes output/CITATION_BUDGET_REPORT.md)
  • --checkpoint (optional)

Examples

  • Default IO:
  • python scripts/run.py --workspace workspaces/

Done criteria

  • output/CITATION_BUDGET_REPORT.md exists and has actionable, in-scope budgets.
  • After applying the plan via citation-injector, pipeline-auditor no longer FAILs on global unique citations.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 23:00 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

humanizer-zh

liuxy951129-cpu
去除文本中的 AI 生成痕迹。适用于编辑或审阅文本,使其听起来更自然、更像人类书写。 基于维基百科的"AI 写作特征"综合指南。检测并修复以下模式:夸大的象征意义、 宣传性语言、以 -ing 结尾的肤浅分析、模糊的归因、破折号过度使用、三段
★ 64 📥 30,830
content-creation

Marketing Skills

jchopard69
访问 23 个营销模块,提供转化率优化(CRO)、SEO、文案撰写、分析、发布、广告和社交媒体的清单、框架及可直接使用的交付物。
★ 145 📥 31,679
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 937 📥 211,961