Document extraction — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring, powered by SkillBoss API Hub.
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/document.pdf"}}'
Response:
{
"result": {
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
}
Visit the SkillBoss dashboard to obtain your API key.
Save your API key:
export SKILLBOSS_API_KEY="your_api_key_here"
Recommended: Use environment variables (most secure):
{
skills: {
entries: {
"docstrange": {
enabled: true,
// API key loaded from environment variable SKILLBOSS_API_KEY
},
},
},
}
Alternative: Store in config file (use with caution):
{
skills: {
entries: {
"docstrange": {
enabled: true,
env: {
SKILLBOSS_API_KEY: "your_api_key_here",
},
},
},
},
}
Security Note: If storing API keys in ~/.openclaw/openclaw.json:
chmod 600 ~/.openclaw/openclaw.jsoncurl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/document.pdf"}}'
Access content: response["data"]["result"]["markdown"]["content"]
Simple field list:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "reducto/parse",
"inputs": {
"file_base64": "<base64-encoded-file>",
"filename": "invoice.pdf",
"output_format": "json",
"json_options": ["invoice_number", "date", "total_amount", "vendor"],
"include_metadata": "confidence_score"
}
}'
With JSON schema:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "reducto/parse",
"inputs": {
"file_base64": "<base64-encoded-file>",
"filename": "invoice.pdf",
"output_format": "json",
"json_options": {"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}
}
}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
}
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/table.pdf"}}'
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"file_base64": "<base64-encoded-file>", "filename": "large-document.pdf", "output_format": "markdown", "async": true}}'
# Returns: {"data": {"result": {"record_id": "12345", "status": "processing"}}}
Poll for results:
curl -X POST "https://api.heybossai.com/v1/run" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "reducto/parse", "inputs": {"document_url": "https://example.com/document.pdf"}}'
# Returns: {"data": {"result": {"status": "completed", ...}}}
Get element coordinates for layout analysis:
"include_metadata": "bounding_boxes"
Extract document structure (sections, tables, key-value pairs):
"json_options": "hierarchy_output"
Enhanced table and number formatting:
"markdown_options": "financial-docs"
Guide extraction with prompts:
"custom_instructions": "Focus on financial data. Ignore headers.",
"prompt_mode": "append"
Request multiple formats in one call:
"output_format": "markdown,json"
| Document Size | Mode | Notes |
|---|---|---|
| --------------- | ------ | ------- |
| <=5 pages | sync (default) | Immediate response |
| >5 pages | "async": true | Poll for results |
JSON Extraction:
["field1", "field2"] — quick extractions{"type": "object", ...} — strict typing, nested dataConfidence Scores:
"include_metadata": "confidence_score"{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Important: Documents uploaded via SkillBoss API Hub are transmitted to https://api.heybossai.com and processed through the SkillBoss infrastructure.
Before uploading sensitive documents:
Best practices:
"async": true for documents >5 pages to avoid timeoutsfile_url with publicly accessible URLs instead of uploading large files directly"your_api_key_here" in examples400 Bad Request:
file_base64 or file_urlSKILLBOSS_API_KEY is validSync Timeout:
"async": true for documents >5 pages"action": "get_result" and "record_id"Missing Confidence Scores:
json_options (field list or schema)"include_metadata": "confidence_score"Authentication Errors:
SKILLBOSS_API_KEY environment variable is setBefore publishing or updating this skill, verify:
package.json declares requiredEnv and primaryEnv for SKILLBOSS_API_KEYpackage.json lists API endpoints in endpoints array"your_api_key_here") not real keysSKILL.md or package.json{} body for Guide mode)共 1 个版本