Use this skill when a user wants a supported local file converted into Markdown for later processing.
.pdf, .docx, .pptx, .xlsx, .jpg, .jpeg, .png, .gif, .bmp, .txt, .json, .xml, .md
ocr / vl / none for .docx, .pptx, .xlsx, and image files;
ocr / vl / vl-page / none for .pdf
0.0.1: convert-document-to-markdown-arm64:0.0.1 on ARM64 hosts,
convert-document-to-markdown-x64:0.0.1 on x64 hosts
markdown, logs, and meta..env file, then forwards it into the container automatically.file command. URL, health, and version commands are intentionally removed to keep startup lean.latest, do not build a fallback image at runtime, and do not treat .doc, .ppt, .xls, audio files, or unlisted image formats as supported inputs.crpi-4auaoyyj6r36p6lb.cn-hangzhou.personal.cr.aliyuncs.com/huozige_lab.convert-document-to-markdown-arm64:0.0.1 or convert-document-to-markdown-x64:0.0.1.IMAGE_REGISTRY or IMAGE_NAME. scripts/run_docker_cli.sh file
success is false, surface error.message and relevant logs.success is true, use markdown as the canonical output for downstream work.This skill is designed so the user does not need to re-enter Vision API settings on each run.
Preferred OpenClaw configuration in ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"convert_document_to_markdown": {
"enabled": true,
"apiKey": "sk-xxx",
"env": {
"VL_BASE_URL": "https://api.openai.com/v1",
"VL_MODEL": "gpt-4.1-mini"
}
}
}
}
}
This works because:
skillKey is convert_document_to_markdownprimaryEnv is VL_API_KEY, so apiKey maps to VL_API_KEYenv can hold VL_BASE_URL and VL_MODELRepository-local runtime configuration:
.env.example to .envVL_BASE_URL, VL_API_KEY, and VL_MODELcrpi-4auaoyyj6r36p6lb.cn-hangzhou.personal.cr.aliyuncs.com/huozige_labIMAGE_REGISTRY or IMAGE_NAMEscripts/run_docker_cli.sh, which loads .env, forwards any host VL_* variables into docker run, and pulls the correct fixed-version image if missingLocal file:
scripts/run_docker_cli.sh file ./notes.pdf --image-process-model ocr --format json
--image-process-model ocrDefault mode. Use Tesseract OCR for images.
--image-process-model vl Use a Vision API. Only choose this when the environment provides VL_API_KEY and related variables.
--image-process-model noneSkip image recognition for speed.
--image-process-model vl-pagePDF only. Do not use this mode for Office documents or image files.
--format json|markdown Use json unless the user explicitly wants raw Markdown on stdout.
--output Save the Markdown to a file. Prefer this only when you invoke docker run directly with a writable host mount.
--log-file Save detailed logs to a file. Prefer this only when you invoke docker run directly with a writable host mount.
uv, python, or any other local runtime path for production use.IMAGE_ARCH only when you have a concrete reason.IMAGE_REGISTRY plus the fixed version 0.0.1; only use IMAGE_NAME when you need to pass the full image reference explicitly.VL_BASE_URL, VL_API_KEY, and VL_MODEL are already configured via OpenClaw skill config or .env.markdown field..doc, .ppt, .xls, .wav, .mp3, .m4a, or .mp4, say the current skill does not reliably support it.success: true.共 1 个版本