← 返回
未分类 中文

Vlm Image Helper

Visual inspection helper for VLM and OCR workflows. Use when agent needs to help a vision model see an image more clearly before re-analysis: rotate misalign...
用于VLM和OCR工作流的视觉检查辅助工具。当智能体需要帮助视觉模型在重新分析前更清晰地查看图像时使用:可旋转歪斜图像、调整对比度/亮度、裁剪关键区域等。
testlbin
未分类 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 2
Stars
📥 410
下载
💾 29
安装
1
版本
#latest

概述

VLM Image Helper

Treat this skill as a visual aid for the model, not as a general image editor.

Use scripts/image_helper.py to create a clearer intermediate image, then re-run analysis on that result.

Core Workflow

  1. Start from the original image path, a raw base64 string, or a data URI.
  2. Apply the smallest transformation that is likely to remove the ambiguity.
  3. Prefer semantic crop presets over manual coordinates unless the exact box is already known.
  4. Return the processed image as a file or base64, then re-read that result.
  5. If the image is still unclear, iterate once with a tighter crop or stronger zoom instead of stacking many edits at once.

Quick Commands

# Rotate sideways text
python scripts/image_helper.py image.png --rotate 90 -o rotated.png

# Crop a likely area and zoom it
python scripts/image_helper.py image.png --crop-preset bottom-right --scale-preset x3 -o detail.png

# Improve low-contrast text
python scripts/image_helper.py image.png --auto-enhance -o enhanced.png

# Convert an existing file path directly to base64
python scripts/image_helper.py image.png --base64

Choose the Next Action

  • Text is sideways or upside down: use --rotate.
  • Only one region matters: use --crop-preset first, then add --scale-preset.
  • Small text or icons are hard to read: use --scale-preset x2 or x3.
  • Contrast is weak or edges are fuzzy: use --auto-enhance, or manually tune --contrast and --sharpness.
  • Another tool needs inline image data instead of a file path: add --base64.
  • The source image arrives as raw base64 or a data URI: use --input-mode auto or force --input-mode base64 / data-uri.

Input and Output Rules

  • Accept a file path, raw base64 string, or data URI as input.
  • Return a file with -o or return inline base64 with --base64.
  • Allow passthrough output with no edits when the only goal is format conversion or path-to-base64 conversion.

References

  • Full CLI reference: references/cli-reference.md
  • Crop and scale preset table: references/presets.md

Prerequisite

Install Pillow if it is missing:

pip install Pillow
# or
uv pip install Pillow

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-30 17:36 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,363 📥 319,023
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,219 📥 266,835
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,062 📥 799,783