← 返回
未分类 中文

CamScanner-Pdf2Markdown

Use CamScanner to convert PDF documents to Markdown format, powered by a high-precision document parsing engine that intelligently decomposes paragraphs, pre...
使用 CamScanner 将 PDF 文档转换为 Markdown 格式,依托高精度文档解析引擎,智能拆解段落,...
camscanner-ai camscanner-ai 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 354
下载
💾 0
安装
1
版本
#latest

概述

CamScanner PDF to Markdown

Overview

CamScanner provides a high-precision document parsing engine that converts PDF documents to Markdown format. It intelligently decomposes document paragraphs, precisely recognizes tables and multiple element types, and outputs structured results in reading order — empowering large language models to accurately understand document content. The workflow is a 3-step pipeline: upload the PDF, convert it, then download the result.

When to Use

  • User wants to convert a PDF to Markdown
  • User wants to extract text/content from a PDF as Markdown
  • User has a PDF and needs it as Markdown for further editing or processing

Privacy & Data

> Important: Privacy & Data Flow Notice

>

> - Third-party service: This skill sends your files to CamScanner's official servers (ai-tools.camscanner.com) for processing.

> - Data retention: CamScanner servers process your files in real-time. Files are not permanently stored on the server.

> - Local files: Output files are saved to your local filesystem at the path you specify.

API Reference

Base URL: https://ai-tools.camscanner.com

Supported Conversions

source_typetarget_typeOutput
----------------------------
pdfmd.md

Step 1: Upload PDF

BASE="https://ai-tools.camscanner.com"

IN_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/upload_file/execute" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "@/path/to/document.pdf" | jq -r '.tool_result.data.file_id')

Response:

{
  "code": 200,
  "tool": "upload_file",
  "tool_result": {
    "success": true,
    "data": {
      "file_id": "file_1741857600_ab12cd34ef56",
      "size": 24576
    }
  }
}

Step 2: Convert PDF to Markdown

OUT_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/convert_pdf/execute" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$IN_FILE_ID\",\"source_type\":\"pdf\",\"target_type\":\"md\",\"output_mode\":\"file_id\"}" \
  | jq -r '.tool_result.data.file_id')

Response:

{
  "code": 200,
  "tool": "convert_pdf",
  "tool_result": {
    "success": true,
    "data": {
      "file_id": "file_1741857701_9988aabbccdd",
      "target_type": "md"
    }
  }
}

Step 3: Download Result

curl -sS -X POST "$BASE/v1/tools/download_file/execute?response_mode=raw" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$OUT_FILE_ID\"}" \
  -o /path/to/output.md

Critical: The response_mode=raw query parameter is required to get the binary file. Without it, the response is JSON.

Quick Reference: Complete Pipeline

BASE="https://ai-tools.camscanner.com"
INPUT_PDF="/path/to/document.pdf"
OUTPUT_FILE="/path/to/output.md"

# Upload
IN_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/upload_file/execute" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "@$INPUT_PDF" | jq -r '.tool_result.data.file_id')

# Convert
OUT_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/convert_pdf/execute" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$IN_FILE_ID\",\"source_type\":\"pdf\",\"target_type\":\"md\",\"output_mode\":\"file_id\"}" \
  | jq -r '.tool_result.data.file_id')

# Download
curl -sS -X POST "$BASE/v1/tools/download_file/execute?response_mode=raw" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$OUT_FILE_ID\"}" \
  -o "$OUTPUT_FILE"

Common Mistakes

MistakeFix
-----------------------------------------------------------------------------------------------------------------
Forgetting response_mode=raw on downloadAlways append ?response_mode=raw to the download URL
Wrong Content-Type on uploadUpload uses application/octet-stream, not multipart/form-data
Using GET instead of POSTAll three endpoints use POST
Missing source_type in convert requestAlways include "source_type": "pdf"
Missing output_mode in convert requestAlways include "output_mode": "file_id" to get a downloadable file_id

Error Handling

Check each step before proceeding:

# After upload
if [ -z "$IN_FILE_ID" ] || [ "$IN_FILE_ID" = "null" ]; then
  echo "Upload failed"; exit 1
fi

# After convert
if [ -z "$OUT_FILE_ID" ] || [ "$OUT_FILE_ID" = "null" ]; then
  echo "Conversion failed"; exit 1
fi

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 11:28 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

CamScanner-Image2Markdown

camscanner-ai
使用 CamScanner 将图片转换为 Markdown 格式,依托高精度文档解析引擎智能分解段落,精准...
★ 0 📥 378

CamScanner-Any2Markdown

camscanner-ai
使用 CamScanner 将图片或 PDF 文档转换为 Markdown 格式。由高精度文档解析引擎提供支持,智能分解段落...
★ 0 📥 363

CamScanner-Pdf2Office

camscanner-ai
使用 CamScanner 将 PDF 文档转换为可编辑的 Word(.docx)或 Excel(.xlsx)格式,具备智能内容识别和精准格式保留
★ 0 📥 433