Azure Content Understanding — Layout Analyzer

Extract structured content from documents using Azure's prebuilt-layout analyzer. Outputs Markdown and structured JSON with text, tables, figures, and document hierarchy.

Setup

Set environment variables:

export AZURE_CU_ENDPOINT="https://YOUR_RESOURCE.services.ai.azure.com/"
export AZURE_CU_API_KEY="YOUR_KEY_HERE"

Optional: set API version (defaults to 2025-05-01-preview):

export AZURE_CU_API_VERSION="2025-11-01"

Quick Usage

Analyze a URL and print Markdown

node scripts/analyze.mjs --url "https://example.com/document.pdf"

Analyze a local file (pipe via stdin)

cat invoice.pdf | node scripts/analyze.mjs --stdin --markdown output.md --output result.json

Save both Markdown and full JSON

node scripts/analyze.mjs --url "https://example.com/report.pdf" \
  --markdown report.md \
  --output report.json

Direct API Call

When the script isn't available, use curl:

# Submit analysis (preview API)
curl -s -X POST "$AZURE_CU_ENDPOINT/contentunderstanding/analyzers/prebuilt-layout:analyze?api-version=2025-05-01-preview" \
  -H "Ocp-Apim-Subscription-Key: $AZURE_CU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/doc.pdf"}'

# Response includes Operation-Location header — poll that URL for results

For GA API (2025-11-01), the body format changes:

{"inputs": [{"url": "https://example.com/doc.pdf"}]}

Output

Markdown

The analyzer produces GitHub Flavored Markdown preserving:

Headings (h1–h6)

Tables (as HTML

 blocks)Selection marks (☒ checked, ☐ unchecked)
Figures (with references)
Paragraphs with reading order
Structured JSON
The full result includes detailed per-element data:
pages — dimensions, word/line counts per page
paragraphs — text blocks with bounding regions and semantic roles
tables — cells with row/column spans
figures — detected images/charts with bounding regions
sections — hierarchical document structure
Supported Formats
PDF, JPEG, PNG, BMP, TIFF, HEIF, DOCX, XLSX, PPTX, HTML
Best Practices
Async operation — the API returns 202; poll Operation-Location for results
Poll interval — 3 seconds is reasonable; results typically arrive in 5–60 seconds
Large documents — up to 2,000 pages supported; processing time scales linearly
File upload — use Content-Type: application/octet-stream with binary body
Tables — rendered as HTML in markdown for complex layouts (merged cells, etc.)
API Reference
See references/api.md for full request/response details.

            
                版本历史
                                共 1 个版本
                
                                        
                        
                            v1.3.0
                                                        当前                        
                        
                            2026-03-30 15:45                             安全 安全                        
                    
                                    
                            
        

            
                安全检测
                                                    
                        腾讯云安全 (Keen)
                        
                            安全，无风险                        
                        查看报告                    
                                    
                        腾讯云安全 (Sanbu)
                        
                            安全，无风险                        
                        查看报告                    
                
                            
        

            🔗 相关推荐
            
                                
                    design-media
                    FLUX.2-pro Image Generation
                    zwcih
                    通过 Azure AI Foundry 使用 Black Forest Labs FLUX.2-pro 生成图像。适用于请求创建、生成图片、插图、照片或艺术作品时。
                    
                        ★ 0
                        📥 854
                    
                
                                
                    office-efficiency
                    Excel / XLSX
                    ivangdavila
                    创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件，支持可靠的公式、日期、类型、格式、重算及模板保留功能。
                    
                        ★ 384
                        📥 146,277
                    
                
                                
                    office-efficiency
                    Gog
                    steipete
                    Google Workspace 命令行工具，支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
                    
                        ★ 931
                        📥 187,164
                    
                
                            
        

    Skill工具集 © 2026

Azure Content Understanding Layout

概述

Azure Content Understanding — Layout Analyzer

Setup

Quick Usage

Analyze a URL and print Markdown

Analyze a local file (pipe via stdin)

Save both Markdown and full JSON

Direct API Call

Output

Markdown

Structured JSON

Supported Formats

Best Practices

API Reference

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

FLUX.2-pro Image Generation

Excel / XLSX

Gog