概述

Citation Diversifier (budget-as-constraints) [NO NEW FACTS]

Purpose: fix a common survey failure mode:

the draft reads under-cited (or reuses the same few citations everywhere)
the pipeline fails the global unique-citation gate

This skill does not change prose by itself.

It produces a constraint sheet: output/CITATION_BUDGET_REPORT.md.

Inputs

output/DRAFT.md
outline/outline.yml (H3 ids/titles; used to allocate budgets per subsection)
outline/writer_context_packs.jsonl (source of allowed_bibkeys_{selected,mapped,chapter,global} per H3)
citations/ref.bib

Output

output/CITATION_BUDGET_REPORT.md

Non-negotiables (NO NEW FACTS)

Only propose citation keys that exist in citations/ref.bib.
Only propose keys that are in-scope for the target H3 (prefer subsection-first scope; use chapter/global only when truly cross-cutting).
Do not propose “padding citations” that would require adding new claims or new numbers.

What a good budget report looks like (contract)

The report should feel like a constraint sheet, not a random list:

It states the blocking policy target and the gap-to-target (how many unique keys are missing; policy default is recommended).
For each H3, it proposes a scope-safe budget sized to actually close the gap:
small gaps: 3-6 keys / H3 is often enough
A150++ gaps: plan for ~6-12 keys / H3 (and avoid duplicates across H3 budgets)
It gives placement guidance (where in the subsection those keys can be embedded without adding new facts).

Canonical (parseable) lines required (downstream validators depend on these):

The target is derived from queries.md:citation_target (recommended by default for A150++).
- Global target (policy; blocking): >= ...
- Gap: (gap-to-target; if 0, injection can be a no-op PASS)

Optional (always reported; may be blocking depending on citation_target):

- Global recommended target: >= ...
- Gap to recommended:

Recommended prioritization (scope-safe):

allowed_bibkeys_selected → allowed_bibkeys_mapped → allowed_bibkeys_chapter
Use allowed_bibkeys_global only for:
benchmarks/protocol papers
widely-used datasets/suites
cross-cutting surveys/method papers referenced across chapters

How this connects to writing (LLM-first)

After you generate the budget report:

Apply it using citation-injector (LLM edits to output/DRAFT.md, NO NEW FACTS).
Then run draft-polisher to remove any “budget dump voice” while keeping citation keys unchanged.

Important: citation-injector is LLM-first. Its script is validation-only.

Workflow

1) Diagnose the global situation

Read output/DRAFT.md and estimate the “unique-key gap” (or use pipeline-auditor’s FAIL reason).

2) Allocate budgets per H3 (scope-first)

Use outline/outline.yml to enumerate H3s in paper order.
For each H3, read its allowed key sets from outline/writer_context_packs.jsonl.
Pick a small set of unused keys that strengthen positioning without requiring new claims.

3) Write output/CITATION_BUDGET_REPORT.md

Required structure:

- Status: PASS|FAIL
- Global target (policy; blocking): >= ...
- Gap:
## Summary (gap + strategy)
## Per-subsection budgets (H3 id/title → suggested keys → placement hint)

Script (optional; deterministic report generator)

If you want a deterministic first-pass budget report, run the helper script. Treat it as a baseline and refine the plan as needed.

Quick Start

python scripts/run.py --help
python scripts/run.py --workspace workspaces/

All Options

--workspace
--unit-id (optional)
--inputs (rare override; prefer defaults)
--outputs (rare override; default writes output/CITATION_BUDGET_REPORT.md)
--checkpoint (optional)

Examples

Default IO:
python scripts/run.py --workspace workspaces/

Done criteria

output/CITATION_BUDGET_REPORT.md exists and has actionable, in-scope budgets.
After applying the plan via citation-injector, pipeline-auditor no longer FAILs on global unique citations.

版本历史

共 1 个版本

v1.0.0 当前

2026-03-30 23:00 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)