CLI tool for parsing documents and web pages into clean, structured text. Uses GPU acceleration for OCR and ML models.
docling CLI must be installed (e.g., via pipx install docling)docling "<URL>" --from html --to md
Output: creates a .md file in current directory (or use --output)
docling "<URL>" --from html --to text --output /tmp/docling_out
docling "/path/to/file.pdf" --ocr --device cuda --output /tmp/docling_out
| Option | Values | Description |
|---|---|---|
| -------- | -------- | ------------- |
--from | html, pdf, docx, pptx, image, md, csv, xlsx | Input format |
--to | md, text, json, yaml, html | Output format |
--device | auto, cuda, cpu | Accelerator (default: auto) |
--output | path | Output directory (recommended: use controlled temp dir) |
--ocr | flag | Enable OCR for images/scanned PDFs |
--tables | flag | Extract tables (default: on) |
⚠️ Avoid these flags unless you trust the source:
--enable-remote-services - can send data to remote endpoints--allow-external-plugins - loads third-party code--headers with untrusted values - can redirect requestsdocling "" --from html --to text --output /tmp/docling_out Docling supports GPU acceleration via CUDA (NVIDIA). Verify CUDA is available:
python -c "import torch; print(torch.cuda.is_available())"
See references/cli-reference.md for complete option list.
共 1 个版本