Convert Markdown files to Word (.docx) documents with preserved formatting. Supports single-file conversion and multi-file merge into one document.
User requests md → docx conversion
│
└─ python scripts/convert.py input.md -o output.docx [--toc]
│
├─ pypandoc available? → pandoc engine (full quality)
└─ pypandoc not found? → markdown-it-py + python-docx fallback
The script auto-selects the best available engine. No manual pandoc check needed.
The bundled scripts/convert.py handles everything:
python scripts/convert.py input.md -o output.docx
It automatically uses pypandoc (bundled pandoc) for best quality, or falls back to markdown-it-py + python-docx if pypandoc is unavailable.
Before converting, confirm with the user (or infer from context):
| Parameter | Options | Default |
|---|---|---|
| ----------- | --------- | --------- |
| Page size | A4 / US Letter | A4 |
| Font | Any system font | 等线 / Arial (11pt) |
| Heading font | Any system font | 黑体 / Arial Bold |
| Code highlighting | pygments theme name | tango (pandoc) / monospace grey bg (fallback) |
| Image handling | Embed / Link | Embed |
| Table of Contents | Yes / No | Yes (for documents with 3+ headings) |
| Line spacing | 1.0 / 1.15 / 1.5 / 2.0 | 1.15 |
| Language metadata | en / zh-CN / etc. | zh-CN (for Chinese documents) |
For Chinese documents, the skill automatically applies sensible Chinese typography defaults.
pandoc input.md -o output.docx \
--from markdown+autolink_bare_uris+task_lists \
--metadata title="Document Title" \
--toc --toc-depth=3 \
--syntax-highlighting=tango
Key pandoc flags explained:
--from markdown+autolink_bare_uris+task_lists — enables GitHub-Flavored Markdown extensions--toc --toc-depth=3 — generates a table of contents for headings level 1–3--syntax-highlighting=tango — syntax highlighting theme for code blocks--metadata lang="zh-CN" — add for Chinese documents (spellcheck/hyphenation)For simple documents (fewer than 3 headings), omit --toc to avoid a nearly-empty TOC.
python scripts/convert.py input.md output.docx
With explicit options:
python scripts/convert.py input.md output.docx \
--page-size A4 \
--font "Arial" \
--font-size 11 \
--toc
When merging, each input file becomes a chapter/section. Pandoc concatenates content intelligently:
pandoc file1.md file2.md file3.md -o merged.docx \
--from markdown+autolink_bare_uris+task_lists \
--metadata title="Merged Document" \
--toc --toc-depth=3 \
--syntax-highlighting=tango
Important for merge: If each file has its own # Title (H1), the resulting document will have multiple H1 headings, which creates a natural chapter structure. The TOC will reflect this.
For better chapter separation, insert page breaks between files:
# Add page break markers between files
for f in file1.md file2.md file3.md; do
cat "$f"
echo -e "\n\\newpage\n"
done | pandoc -o merged.docx --toc
python scripts/convert.py file1.md file2.md file3.md -o merged.docx --toc
The Python script automatically inserts page breaks between merged files.
After conversion, verify the output:
# Check file size and basic structure
python scripts/office/unpack.py output.docx /tmp/docx_check/ 2>/dev/null && \
echo "Valid DOCX structure" || echo "May need inspection"
If the validation script is unavailable, this is a basic sanity check:
python -c "from docx import Document; doc = Document('output.docx'); print(f'Paragraphs: {len(doc.paragraphs)}, Sections: {len(doc.sections)}')"
pandoc README.md -o README.docx --from markdown+autolink_bare_uris+task_lists
No TOC needed for a single-page README.
pandoc report.md -o report.docx \
--from markdown+autolink_bare_uris+task_lists \
--toc --toc-depth=3 \
--syntax-highlighting=pygments \
--metadata title="Technical Report"
# Files: chapter-01.md, chapter-02.md, chapter-03.md
pandoc chapter-*.md -o book.docx \
--from markdown+autolink_bare_uris+task_lists \
--toc --toc-depth=2 \
--metadata title="Complete Guide" \
--metadata author="Author Name"
pandoc document.md -o document.docx \
--from markdown+autolink_bare_uris+task_lists \
--metadata lang="zh-CN" \
--toc --toc-depth=3
After generating the .docx file, if the user wants additional formatting (headers/footers, page numbers, custom styles), use the docx skill for post-processing. The docx skill can:
| Problem | Solution |
|---|---|
| --------- | ---------- |
| Chinese characters render as squares | Install Chinese fonts on the system, or use the Python fallback which handles font fallback |
| Images not showing | Ensure image paths are correct and accessible; use absolute paths in the markdown |
| Code blocks lose formatting | Verify --syntax-highlighting flag is set in pandoc; the fallback uses monospace font with grey background |
| Table borders missing | Pandoc sometimes omits table borders; use the docx skill to add them after conversion |
| Math/formula rendering issues | Pandoc with --from markdown+tex_math_dollars handles LaTeX math natively |
| Very large files (100+ pages) | Split into chapters, convert individually, then use the docx skill to merge |
Useful extensions to enable via --from markdown+EXTENSION:
| Extension | Effect |
|---|---|
| ----------- | -------- |
autolink_bare_uris | Auto-link URLs |
task_lists | GitHub-style task lists |
tex_math_dollars | LaTeX math between $...$ |
footnotes | Footnote support |
pipe_tables | Pipe-style tables |
grid_tables | Grid-style tables |
strikeout | ~~strikethrough~~ text |
definition_lists | Definition lists |
fenced_code_attributes | Code block language attributes |
header_attributes | Header IDs and classes |
Boo哥AI智写 · 联系 QQ邮箱:409966830@qq.com · 智写万象,标定未来
共 1 个版本