← 返回
未分类 中文

formatferry-markdown

Local-first document-to-Markdown converter supporting 8 file types (HTML, DOCX, PDF, XLSX, CSV, JSON, XML, PPTX) and 8 output flavours (GitHub, CommonMark, S...
Local-first document-to-Markdown converter supporting 8 file types (HTML, DOCX, PDF, XLSX, CSV, JSON, XML, PPTX) and 8 output flavours (GitHub, CommonMark, S...
britrik
未分类 clawhub v1.1.5 2 版本 100000 Key: 无需
★ 0
Stars
📥 508
下载
💾 0
安装
2
版本
#batch#cleaning#cli#conversion#csv#docx#files#formatferry#html#latest#markdown#offline#pdf#productivity#text-processing#xlsx

概述

FormatFerry Markdown Converter

Local-first document-to-Markdown converter. File content is processed entirely in-process — nothing leaves your machine. Supports 8 input formats and 8 output flavours, with optional URL extraction and batch mode for premium users.

Key differentiator vs alternatives: Output flavours tailor Markdown to specific platforms (Slack bold vs GitHub bold, Confluence wiki markup, R Markdown, etc.). No other converter offers this.

Prerequisites

  • Node.js 18+ and npm must be installed
  • Install the CLI globally:
npm install -g formatferry

Supported File Types

FormatExtensionNotes
--------------------------
HTML.html, .htmWeb pages, snippets
Word.docxMicrosoft Word documents
PDF.pdfIncluding OCR for scanned documents
Excel.xlsxSpreadsheets with tables
CSV.csvComma-separated data
JSON.jsonStructured data
XML.xmlMarkup and data feeds
PowerPoint.pptxSlide content

Supported Markdown Flavours

Use the -f / --flavour flag to select output format:

  • github (default) — GitHub Flavored Markdown
  • commonmark — Standard CommonMark
  • slack — Slack-compatible markdown
  • discord — Discord-compatible markdown
  • reddit — Reddit-compatible markdown
  • confluence — Confluence wiki markup
  • rmarkdown — R Markdown
  • custom — Custom format

Usage Examples

# Convert a file
formatferry -i document.docx -o output.md

# Pipe HTML from stdin
echo '<h1>Hello</h1>' | formatferry

# Choose a flavour
formatferry -i notes.html -f slack -o notes.md

# Convert a PDF (includes OCR for scanned documents)
formatferry -i paper.pdf -o paper.md

# URL extraction (requires FORMATFERRY_API_KEY)
formatferry --url https://example.com/article -o article.md

# Batch convert (requires FORMATFERRY_LICENSE_KEY)
formatferry --batch "docs/**/*.docx" --output-dir ./markdown/

Environment Variables

Both environment variables are optional. The CLI works for local file conversion with zero credentials.

VariableRequiredPurpose
-----------------------------
FORMATFERRY_API_KEYNoNeeded only for --url flag (URL extraction). Not needed for local file conversion.
FORMATFERRY_LICENSE_KEYNoNeeded only for --batch mode (premium feature).

Set them via your shell profile or pass inline:

FORMATFERRY_API_KEY=ff_xxxxx formatferry --url https://example.com/article

Privacy

  • Local file conversion is fully in-process — file content is never uploaded or sent to any server
  • Optional license validation ping — if a license key is stored, the CLI may ping formatferry.vibingfun.com to check entitlement (cached for 24h, skippable with --offline)
  • URL extraction (--url) is the only feature that sends content to a server — it fetches and processes the URL server-side
  • --offline flag disables all network calls, falling back to cached or free-tier entitlements

Procedure

  1. Determine input type:
    • Text/pasted content → pipe to stdin or save to temp file
    • File path → use -i
    • URL → use --url (requires FORMATFERRY_API_KEY)
    • Multiple files → use --batch (requires FORMATFERRY_LICENSE_KEY)
  1. Execute conversion:

```bash

# Stdin (most common for agent use)

echo "$INPUT" | formatferry

# File

formatferry -i "$FILE_PATH" -o "$OUTPUT_PATH"

# URL

formatferry --url "$URL" -o output.md

```

  1. Capture output:
    • stdout is Markdown by default
    • Use -o to write directly to file
  1. Return clean Markdown to user
  1. Clean up temp files if created

Pitfalls & Recovery

IssueSolution
-----------------
formatferry: command not foundInstall via npm install -g formatferry
node: command not foundInstall Node.js 18+ first
API rate limit hitWait 60s or use local file input instead of URL
Large file (>20MB PDF)Consider splitting before conversion
Invalid URLVerify URL starts with http:// or https://
Empty outputVerify input has content; check for HTML entity encoding issues

Verification

# Test basic conversion
echo '<h1>Test</h1><p>Content</p>' | formatferry

# Verify no HTML tags remain
echo '<div>test</div>' | formatferry | grep -c '<.*>' || echo "Clean: 0 HTML tags"

# Test file conversion
echo '<p>File test</p>' > /tmp/test.html
formatferry -i /tmp/test.html

版本历史

共 2 个版本

  • v1.1.5 当前
    2026-06-03 13:07
  • v1.1.3
    2026-05-03 10:40 安全 安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,511
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,215 📥 266,591
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,903