← 返回
未分类 中文

Office Document Extractor

Convert Microsoft Office documents (DOCX, XLSX, PPTX) to Markdown without any external dependencies. Use when the user needs to extract text from Word docume...
Convert Microsoft Office documents (DOCX, XLSX, PPTX) to Markdown without any external dependencies. Use when the user needs to extract text from Word docume...
michealxie001 michealxie001 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 400
下载
💾 0
安装
1
版本
#converter#docx#latest#markdown#office#pptx#python#xlsx

概述

Office Document Extractor

Zero-dependency converter for Microsoft Office documents. Extracts text and structure from DOCX, XLSX, and PPTX files into clean Markdown.

Quick Start

# Single file
python3 scripts/main.py report.docx -o report.md

# Batch convert a directory
python3 scripts/main.py ./documents --batch -o ./markdown

Supported Formats

FormatExtensionOutput
---------
Word.docxHeadings, paragraphs
Excel.xlsxTables (one per sheet)
PowerPoint.pptxSlides as sections

How It Works

  • DOCX: Parses the ZIP archive's XML directly using Python's zipfile and xml.etree
  • XLSX: Uses bundled openpyxl (pure Python, no C extensions)
  • PPTX: Parses the ZIP archive's slide XML directly

No external commands, no network calls, no pip install required.

Usage

Single File

python3 scripts/main.py <input_file> [-o <output.md>]

Auto-detects format from file extension. If -o is omitted, outputs to .md.

Batch Conversion

python3 scripts/main.py <input_directory> --batch [-o <output_directory>]

Converts all .docx, .xlsx, .pptx files in the directory. Results saved to markdown_output/ by default.

Resources

scripts/

  • main.py — Unified CLI for single-file and batch conversion
  • docx_extractor.py — DOCX → Markdown (standard library only)
  • xlsx_extractor.py — XLSX → Markdown tables (bundled openpyxl)
  • pptx_extractor.py — PPTX → Markdown (standard library only)

Bundled Dependencies

  • openpyxl/ — Pure Python Excel library (v3.1.5)
  • et_xmlfile/ — openpyxl dependency (pure Python)

Limitations

  • Does not extract images or embedded objects (text only)
  • Does not preserve complex formatting (colors, fonts, layouts)
  • Does not handle encrypted/password-protected files
  • No OCR for scanned documents (use OpenClaw's native pdf tool for that)

Why This Skill?

Existing markitdown-based skills require pip install or external CLI tools, which triggers ClawHub security warnings. This skill is 100% self-contained — install it and use it immediately, even offline.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 21:23 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

office-efficiency

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 389 📥 148,062
content-creation

中文学术论文写作助手

michealxie001
中文论文写作全流程助手,覆盖选题、构思、大纲、开题报告、文献综述、分章写作、引用管理、语言润色、投稿自检。专为中文、历史、哲学、文学等人文学科设计,内置《中华文哲研究集刊》等引用规范,支持脚注与参考文献自动格式化。仅基于用户提供真实资料,绝
★ 5 📥 1,761
office-efficiency

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 466 📥 155,757