← 返回
内容创作 中文

MarkItDown

MarkItDown is a Python utility from Microsoft for converting various files (PDF, Word, Excel, PPTX, Images, Audio) to Markdown. Useful for extracting structu...
MarkItDown是微软的一款Python工具,可将各类文件(PDF、Word、Excel、PPTX、图片、音频)转换为Markdown格式,便于提取结构化内容。
damirikys
内容创作 clawhub v1.0.4 1 版本 99727.9 Key: 无需
★ 5
Stars
📥 2,832
下载
💾 646
安装
1
版本
#latest

概述

MarkItDown Skill

Description

MarkItDown is a Python utility developed by Microsoft (source: https://github.com/microsoft/markitdown) for converting various files and office documents to Markdown. It allows me to easily extract structured text (including tables, headers, and lists) from complex formats to better understand their content. The conversion happens locally using installed Python libraries.

Safety Note: The installation process downloads the markitdown package and its dependencies from the official Python Package Index (PyPI). Processing certain formats (like YouTube URLs) requires external network access to fetch the content. Processing local files requires access to the directory where the target files are located.

Supported Formats

  • Office Documents: PowerPoint (PPTX), Word (DOCX), Excel (XLSX, XLS).
  • PDF
  • Images: Text extraction (OCR) and metadata (EXIF).
  • Audio/Video: Speech transcription (wav, mp3, Youtube URLs) and EXIF.
  • Web and Text: HTML, CSV, JSON, XML.
  • Archives and Books: ZIP archives, EPub.

Dependencies

The skill installs the utility in a local virtual environment. Most features work out-of-the-box thanks to the markitdown[all] dependencies installed via PyPI. For specific formats (audio/video), system libraries (e.g., ffmpeg) may be required and must be installed on the host.

When to Use

  • When you need to read, analyze, or extract information from PDF, Word, Excel, or PowerPoint files.
  • When document structure is important for the response (e.g., tables or formatted lists).
  • If you need to extract text from audio or video files, or "read" an image.

How to Use

The virtual environment is automatically set up when the skill is installed. You must run the utility from within the skill's folder.

Conversion with Console Output (STDOUT)

Useful for small files to see the result immediately.

./.venv/bin/markitdown /path/to/file.pdf

Conversion with File Output

The best option for large documents. Save the result to a .md file and read it using the read tool.

./.venv/bin/markitdown /path/to/file.pdf -o /path/to/result.md

Example: Excel Conversion

Navigate to the skill folder (e.g., cd ~/skills/markitdown) and execute:

./.venv/bin/markitdown ~/downloads/report.xlsx -o ~/downloads/report.md

After that, you can read the resulting report.md file.

版本历史

共 1 个版本

  • v1.0.4 当前
    2026-03-29 12:39 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Scrapling - Stealth Web Scraper

damirikys
使用 Scrapling 实现网页爬取——一个 Python 框架,具备反爬虫绕过(Cloudflare Turnstile、指纹伪造)、自适应元素追踪、静默...
★ 0 📥 2,002
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,144
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 857 📥 199,418