← 返回
内容创作 中文

DOCX TO HTML CONVERTER

Use this skill whenever the user has a DOCX file (.docx) and wants to convert, read, view, extract content from, or process it in any way — including summari...
当用户需要转换、读取、查看、提取内容或以任何方式处理DOCX文件(.docx)时使用此技能,包括摘要等操作。
bibekyess
内容创作 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 501
下载
💾 106
安装
1
版本
#latest

概述

DOCX to HTML Converter

This skill provides a straightforward method to convert Microsoft Word (.docx) documents into clean, semantic HTML, making them suitable for various web-based and AI-driven applications.

Compatibility

  • Python 3 (for the conversion wrapper)
  • Node.js with mammoth installed (core conversion engine)

To install Node.js dependencies, run once from the scripts/ directory:

npm install

Use Cases

  • Browser-Based Viewing: Convert DOCX documents for display in web browsers without requiring Microsoft Word.
  • AI-Ready Content: Prepare DOCX content for LLMs for tasks like summarization, Q&A, and semantic search.
  • Web Integration: Integrate Word document content into web applications, CMS, or online editors.
  • Data Extraction: Extract structured data (tables, lists, headings) from DOCX files for automated reporting and analysis.
  • Search and Indexing: Enable full-text and vector search by converting DOCX content into easily indexable HTML.

Workflow

  1. Locate DOCX File: Identify the path to the .docx file to convert.
  1. Run Conversion Script: Execute the Python wrapper from the skill's scripts/ directory:

```bash

python3 /scripts/convert.py

```

Replace with the actual path where this skill is installed.

  1. Verify Output: Open the generated .html file in a browser and check:
    • Headings (

      ,

      , etc.) appear at the correct hierarchy levels

    • Tables render with the expected rows and columns
    • Lists appear as bullet or numbered items (not plain text)
    • Bold, italic, and inline formatting are preserved
    • Images are visible (embedded as base64 by default)
  1. Process HTML: Use the resulting HTML for further tasks like summarization, indexing, or display.

Bundled Resources

  • scripts/docx-converter.js: Core Node.js conversion logic using mammoth.js.
  • scripts/convert.py: Python wrapper for invoking the Node.js converter.
  • scripts/package.json: Node.js dependency manifest (includes mammoth).

Technical Details

The conversion leverages mammoth.js, which prioritizes semantic meaning over visual replication:

  • Semantic Conversion: Document structure maps to proper HTML — headings become

    /

    , lists become
      /
        , etc.
      1. Basic Styling: Bold, italics, and common paragraph styles are preserved.
      2. Image Embedding: Images are extracted and embedded as base64 data URIs in the HTML output.

    Troubleshooting

    ProblemLikely CauseFix
    ---------
    node: command not foundNode.js not installedInstall Node.js (v16+)
    Cannot find module 'mammoth'npm deps missingRun npm install in scripts/
    Empty or garbled outputCorrupted or password-protected DOCXTry re-saving the file from Microsoft Word
    Missing imagesLarge embedded imagesCheck mammoth.js image size limits in docx-converter.js

    Limitations

    • Advanced or highly specific styling from the original DOCX may not be perfectly replicated in the HTML output.
    • Features like tracked changes, comments, or complex layout elements may be simplified or omitted.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 12:03 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

YouTube

byungkyu
使用托管OAuth集成YouTube Data API,支持搜索视频、管理播放列表、获取频道数据及评论互动,适用于用户需要时使用此技能。
★ 142 📥 41,025
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,136
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,421