← 返回
未分类 中文

Office → Markdown Skill

Converts office automation documents — PDF, PPTX, DOCX, XLSX, CSV — into clean, readable Markdown. Use this skill when a user explicitly asks to convert, ext...
将办公自动化文档(PDF、PPTX、DOCX、XLSX、CSV 等)转换为简洁可读的 Markdown。当用户明确要求转换时使用此技能。
naimalarain13 naimalarain13 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 334
下载
💾 0
安装
1
版本
#latest

概述

Office → Markdown Skill

Convert any uploaded office document to clean Markdown.

All conversion logic lives in scripts/ — load only the script you need.

> Security notes

> - Dependencies are installed into an isolated temp directory (/tmp/office_md_deps/) and pinned to reviewed versions. The system Python environment is not modified.

> - For scanned or image-only content, pages are sent to Anthropic's vision API. Always ask the user for confirmation before enabling vision (see Workflow step 3).


Script Reference

FormatExtensionsScript
---------------------------
PDF (text + scanned/image).pdfscripts/pdf-to-md.py
PowerPoint.pptx, .pptscripts/pptx-to-md.py
Word.docx, .docscripts/docx-to-md.py
Excel.xlsx, .xlsscripts/xlsx-to-md.py
CSV.csvscripts/csv-to-md.py

Workflow

1. Confirm conversion intent

Only proceed if the user has explicitly asked to convert, extract, or export

the document to Markdown. A bare file upload without a conversion request is

not sufficient to trigger this skill.

2. Run the matching script (text-only pass first)

python scripts/<script-name>.py \
  /mnt/user-data/uploads/<input-file> \
  /mnt/user-data/outputs/<stem>.md

Each script installs its own pinned dependencies into /tmp/office_md_deps/

on first run (isolated from the system Python environment).

3. Vision consent — REQUIRED before image extraction

If the script output indicates image-only pages were detected (or the document

is known to be scanned), stop and ask the user:

> "This document has N image-only page(s) that cannot be extracted without

> sending them to Anthropic's vision API. Page images will be transmitted

> externally for OCR. Would you like to proceed with vision extraction?"

Only if the user confirms, re-run with the --allow-vision flag:

python scripts/<script-name>.py \
  /mnt/user-data/uploads/<input-file> \
  /mnt/user-data/outputs/<stem>.md \
  --allow-vision

If the user declines, save the text-only result and note which pages were skipped.

4. Present the file

Use present_files with the output .md path, then give a brief summary:

  • File type and page/slide/sheet count
  • Whether vision was used and for how many pages (or that it was skipped)

How vision works (PDF / PPTX / DOCX)

Each script uses a two-pass strategy:

  1. Text pass — extract text normally (fast, no API call, always runs)
  2. Vision pass — only runs when --allow-vision is passed AND pages had no

extractable text; those pages are rendered and sent to the Claude vision API


Edge Cases

SituationBehaviour
----------------------
Fully scanned PDFAll pages flagged for vision; user confirmation required
Mixed PDF (some text, some images)Only image pages flagged; user confirmation required
User declines visionText-only .md is saved; skipped pages are noted inline
Password-protected fileScript exits with a clear error message
Very large PDF (50+ image pages)Script adds 0.3s sleep between vision calls
Image too large (>4MB base64)Reduce DPI: edit dpi=150dpi=100 in pdf-to-md.py
Encoding errors in CSVScript auto-retries with latin-1

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-08 13:22 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

office-efficiency

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 461 📥 153,659
office-efficiency

Nano Pdf

steipete
使用nano-pdf CLI通过自然语言指令编辑PDF
★ 277 📥 116,444
office-efficiency

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 384 📥 146,062