Build a bounded local-first image-to-text bridge. This skill standardizes how screenshots, charts, and document images are converted into text-model-safe Markdown for downstream text-only models.
vision-context.mdtext-model-input.md--compose 时,返回可直接喂给文本模型的一体化文稿manifest.jsonUse this skill when the task is to:
If the user only wants image generation or style rendering, use codex-image-bridge instead.
Do not use this skill for:
codex-image-bridge)--check-envollama_reachable、codex_exists 等关键状态--image --markdown --compose Output files are written under:
vision-context.mdtext-model-input.md(compose 时)manifest.json脚本路径:
/Users/Admin/.agents/skills/codex-image-bridge-local/scripts/local_image_describe.pyRun:
local_image_describe.py --check-env
If ollama_reachable is false, skip local image recognition and jump to Step 3.
Default route:
local_image_describe.py --provider ollama --image "/path/to/screenshot.png"
Fallback sequence when local fails:
ollama 切换到 --ollama-model minicpm-v:8b(显存友好)ollama 切换到 --ollama-model llama3.2-vision:11b(语义密度优先)--provider codex(云端回退)For single-image usage:
local_image_describe.py --provider ollama --image "~/Desktop/架构图.png"
For article usage:
local_image_describe.py --provider ollama --markdown "/path/to/article.md" --compose
将 vision-context.md 或 text-model-input.md 的内容复制给文本模型继续推理。
ollama(默认): 本地优先,主打隐私和稳定成本codex(兜底): 当本地链路连续 3 次尝试失败时使用--ollama-model 仅在显存或稳定性问题时调整,不必每次替换| 症状 | 处理 |
|---|---|
| ------ | ------ |
ollama_connection_failed | 先 ollama serve,再重试 |
model_not_found | ollama pull gemma4:12b,再尝试 minicpm-v:8b |
ollama_empty_response | 换 minicpm-v:8b,必要时回退到 codex |
codex_timeout | 延长超时后重试(例如 --timeout-seconds 360) |
| 图片缺失 | 先修正路径,再重跑 |
| 输出模糊 | 回退高精度模型或 codex 强制重跑 |
When returning user-facing results, include exactly:
provider 与 model 实际使用情况--check-env 中的关键异常项共 1 个版本