← 返回
未分类 中文

CamScanner-Any2Markdown

Use CamScanner to convert images or PDF documents to Markdown format. Powered by a high-precision document parsing engine that intelligently decomposes parag...
使用 CamScanner 将图片或 PDF 文档转换为 Markdown 格式。由高精度文档解析引擎提供支持,智能分解段落...
camscanner-ai
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 341
下载
💾 0
安装
1
版本
#latest

概述

CamScanner Any to Markdown

Overview

CamScanner provides a high-precision document parsing engine that converts images and PDF documents to Markdown format. It intelligently decomposes document paragraphs, precisely recognizes tables and multiple element types, handles complex image scenarios, and outputs structured results in reading order — empowering large language models to accurately understand document content. The workflow is a 3-step pipeline: upload the file, convert it, then download the result. The skill auto-detects whether the input is a PDF or image and uses the appropriate conversion endpoint.

When to Use

  • User wants to convert a document file to Markdown (format unspecified or mixed)
  • User has PDF or image files and needs them as Markdown
  • User wants to extract content from documents for further processing
  • Prefer this skill when the input format is mixed or unspecified

Privacy & Data

> Important: Privacy & Data Flow Notice

>

> - Third-party service: This skill sends your files to CamScanner's official servers (ai-tools.camscanner.com) for processing.

> - Data retention: CamScanner servers process your files in real-time. Files are not permanently stored on the server.

> - Local files: Output files are saved to your local filesystem at the path you specify.

API Reference

Base URL: https://ai-tools.camscanner.com

Supported Conversions

source_typetarget_typeOutputEndpoint
-----------------------------------------
pdfmd.mdconvert_pdf
imagemd.mdconvert_image

Format Detection

Determine the conversion endpoint based on file extension:

  • PDF files (.pdf): Use convert_pdf with "source_type": "pdf"
  • Image files (.png, .jpg, .jpeg, .bmp, .tiff, .webp): Use convert_image with "source_type": "image"

Step 1: Upload File

BASE="https://ai-tools.camscanner.com"

IN_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/upload_file/execute" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "@/path/to/document" | jq -r '.tool_result.data.file_id')

Response:

{
  "code": 200,
  "tool": "upload_file",
  "tool_result": {
    "success": true,
    "data": {
      "file_id": "file_1741857600_ab12cd34ef56",
      "size": 24576
    }
  }
}

Step 2: Convert to Markdown

For PDF files:

OUT_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/convert_pdf/execute" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$IN_FILE_ID\",\"source_type\":\"pdf\",\"target_type\":\"md\",\"output_mode\":\"file_id\"}" \
  | jq -r '.tool_result.data.file_id')

For image files:

OUT_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/convert_image/execute" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$IN_FILE_ID\",\"source_type\":\"image\",\"target_type\":\"md\",\"output_mode\":\"file_id\"}" \
  | jq -r '.tool_result.data.file_id')

Response:

{
  "code": 200,
  "tool": "convert_pdf",
  "tool_result": {
    "success": true,
    "data": {
      "file_id": "file_1741857701_9988aabbccdd",
      "target_type": "md"
    }
  }
}

Step 3: Download Result

curl -sS -X POST "$BASE/v1/tools/download_file/execute?response_mode=raw" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$OUT_FILE_ID\"}" \
  -o /path/to/output.md

Critical: The response_mode=raw query parameter is required to get the binary file. Without it, the response is JSON.

Quick Reference: Complete Pipeline

Convert a PDF to Markdown:

BASE="https://ai-tools.camscanner.com"
INPUT_FILE="/path/to/document.pdf"
OUTPUT_FILE="/path/to/output.md"

# Upload
IN_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/upload_file/execute" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "@$INPUT_FILE" | jq -r '.tool_result.data.file_id')

# Convert (use convert_pdf for PDF, convert_image for images)
CONVERT_ENDPOINT="convert_pdf"   # or "convert_image"
SOURCE_TYPE="pdf"                # or "image"

OUT_FILE_ID=$(curl -sS -X POST "$BASE/v1/tools/${CONVERT_ENDPOINT}/execute" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$IN_FILE_ID\",\"source_type\":\"$SOURCE_TYPE\",\"target_type\":\"md\",\"output_mode\":\"file_id\"}" \
  | jq -r '.tool_result.data.file_id')

# Download
curl -sS -X POST "$BASE/v1/tools/download_file/execute?response_mode=raw" \
  -H "Content-Type: application/json" \
  -d "{\"file_id\":\"$OUT_FILE_ID\"}" \
  -o "$OUTPUT_FILE"

Common Mistakes

MistakeFix
-----------------------------------------------------------------------------------------------------------------
Forgetting response_mode=raw on downloadAlways append ?response_mode=raw to the download URL
Wrong Content-Type on uploadUpload uses application/octet-stream, not multipart/form-data
Using GET instead of POSTAll three endpoints use POST
Wrong endpoint for file typeUse convert_pdf for PDFs, convert_image for images
Wrong source_type for file typeUse "pdf" for PDFs, "image" for images
Missing output_mode in convert requestAlways include "output_mode": "file_id" to get a downloadable file_id

Error Handling

Check each step before proceeding:

# After upload
if [ -z "$IN_FILE_ID" ] || [ "$IN_FILE_ID" = "null" ]; then
  echo "Upload failed"; exit 1
fi

# After convert
if [ -z "$OUT_FILE_ID" ] || [ "$OUT_FILE_ID" = "null" ]; then
  echo "Conversion failed"; exit 1
fi

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 14:07 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

CamScanner-Pdf2Office

camscanner-ai
使用 CamScanner 将 PDF 文档转换为可编辑的 Word(.docx)或 Excel(.xlsx)格式,具备智能内容识别和精准格式保留
★ 0 📥 433

CamScanner-Pdf2Markdown

camscanner-ai
使用 CamScanner 将 PDF 文档转换为 Markdown 格式,依托高精度文档解析引擎,智能拆解段落,...
★ 0 📥 383

CamScanner-Image2Markdown

camscanner-ai
使用 CamScanner 将图片转换为 Markdown 格式,依托高精度文档解析引擎智能分解段落,精准...
★ 0 📥 378