← 返回
AI智能 中文

docx-md

Low-level docx format tool for AI document review. Three operations: (1) read docx → output compact Markdown or JSON; (2) apply edits JSON back to docx (trac...
用于AI文档审查的底层docx格式工具。三种操作:(1)读取docx并输出精简Markdown或JSON;(2)将编辑JSON应用回docx(支持追踪...
yanweiliang323868-del
AI智能 clawhub v1.0.1 1 版本 99904.9 Key: 无需
★ 0
Stars
📥 1,050
下载
💾 140
安装
1
版本
#latest

概述

Word DOCX (OOXML) – docx-md

Overview

Three entry points: Read – output compact Markdown (default, token-efficient) or full JSON; Modify – apply AI-returned edits to the docx; Finalize – accept all revisions and remove all comments. Implemented via OOXML (ZIP + XML). No commercial Word libraries required.

Workflow

GoalAction
--------------
Get document for AIRead: run read script → Markdown (default) or JSON. Markdown includes blockIndex markers for edit targeting.
Apply AI edits to docxModify: run apply script with docx + edits JSON → new docx with track changes and comments.
Deliver final versionFinalize: run finalize script → new docx with no revisions/comments.

LLM-oriented pipeline

  1. Read – Parse docx; output Markdown (default) or JSON. Markdown uses prefix per block; revisions: {+inserted+} {-deleted-}; comments: [comment: text].
  2. Send the output + task prompt to the model; require the model to output only the edit JSON: blockIndex, originalContent, content, basis .
  3. Modify – Script infers op from blockIndex, originalContent, content, basis; converts to OOXML (w:ins / w:del / comment anchors), then write back to Word.
  4. Finalize – When the user confirms, run finalize to accept all revisions and remove all comments.

See references/llm-pipeline.md for the Markdown format, JSON schema, and edit format.

1. Read

  • Parse word/document.xml (w:body only) and word/comments.xml.
  • Output Markdown (default) or JSON. Markdown is compact and token-efficient.

Script: scripts/read_docx.py

# Default: Markdown output (token-efficient)
python3 skills/docx-md/scripts/read_docx.py document.docx
python3 skills/docx-md/scripts/read_docx.py document.docx -o result.md

# JSON output (full structure)
python3 skills/docx-md/scripts/read_docx.py document.docx -f json -o result.json

Options:

  • -o, --output – Output path (default: stdout)
  • -f, --formatmd (default) or json

2. Modify

  • Input: docx path + edit JSON { modifications: [{ blockIndex, originalContent, content, basis }] } (same blockIndex as read output).
  • Flow: Convert JSON to OOXML (w:ins / w:del / comments), then write back to Word.

Script: scripts/apply_edits_docx.py. Use - as edits file to read JSON from stdin.

python3 skills/docx-md/scripts/apply_edits_docx.py document.docx edits.json -o output.docx
python3 skills/docx-md/scripts/apply_edits_docx.py document.docx - -o output.docx  # stdin

Options: --author (default: "Review")

3. Finalize

  • Accept all revisions (flatten to final text), remove all comments. Save as new docx.
  • Uses docx-revisions to accept revisions (preserves encoding), then removes comment markup via regex on raw bytes.

Script: scripts/finalize_docx.py

Requires: pip install docx-revisions (see requirements.txt)

python3 skills/docx-md/scripts/finalize_docx.py input.docx -o output.docx

Resources

scripts/

  • read_docx.py – Read: python3 scripts/read_docx.py document.docx [-o out.md] [-f md|json]
  • apply_edits_docx.py – Modify: python3 scripts/apply_edits_docx.py document.docx edits.json -o output.docx
  • finalize_docx.py – Finalize: python3 scripts/finalize_docx.py input.docx -o output.docx

references/

  • ooxml.md – OOXML layout (document.xml, comments.xml, revisions, comments)
  • llm-pipeline.md – Pipeline: read → Markdown/JSON → model edits → modify; defines Markdown format, JSON shape (blockIndex, originalContent, content, basis)

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-29 09:17 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,363 📥 319,028
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,062 📥 799,811
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 839 📥 213,470