← 返回
未分类 Key 中文

PDF Parse

Parse a PDF into structured JSON: text, layout-aware blocks with bounding boxes, tables, and image metadata.
将 PDF 解析为结构化 JSON:包含文本、带有边界框的布局感知块、表格和图片元数据。
rishabhdugar rishabhdugar 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 438
下载
💾 2
安装
1
版本
#latest

概述

PDF Parse

What It Does

Parses a PDF into structured JSON with text content, layout-aware blocks (with normalized bounding boxes), tables, and image metadata.

When to Use

  • Extract structured data from PDFs (text, tables, images)
  • Get layout-aware content with bounding box coordinates
  • Parse invoices, forms, or reports into machine-readable format

Parsing Modes

ModeDescription
-------------------
textText only
layoutText + text blocks with bounding boxes
tablesText + table blocks
fullText + blocks + tables + images (default)

Required Inputs

Provide one of:

  • url — public URL to a PDF
  • Multipart upload with file field

Authentication

Send your API key in the CLIENT-API-KEY header.

Get your free API key at https://pdfapihub.com. Full API documentation is available at https://pdfapihub.com/docs.

Use Cases

  • Invoice Parsing — Extract line items, totals, and vendor info from PDF invoices
  • Resume Parsing — Extract structured data (name, experience, skills) from PDF resumes
  • Contract Analysis — Extract clauses, dates, and parties from legal PDF contracts
  • Form Data Extraction — Pull filled form fields and values from PDF forms
  • Research Paper Analysis — Extract text, tables, and figures from academic PDFs
  • Document Indexing — Parse PDFs into structured JSON for search engine indexing

Example Usage

curl -X POST https://pdfapihub.com/api/v1/pdf/parse \
  -H "CLIENT-API-KEY: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://pdfapihub.com/sample-pdfinvoice-with-image.pdf", "mode": "full", "pages": "1-3" }'

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 07:59 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Image OCR Parse

rishabhdugar
使用 PDFAPIHub 云 OCR API 将图像文字提取并上传至 pdfapihub.com 进行 Tesseract OCR 处理,支持灰度等预处理。
★ 1 📥 506

Generate PDF from HTML

rishabhdugar
根据HTML内容或公开URL生成PDF文档,支持自定义页面尺寸、字体、页边距、视口尺寸、动态参数替换等功能。
★ 0 📥 743

Generate Image

rishabhdugar
使用无头 Chromium 将 HTML 内容或公开 URL 生成PNG 图片,支持自定义尺寸、Retina 高清、全页面截图以及 Cookie 配置。
★ 0 📥 538