← 返回
未分类 中文

Vision Tool

Image recognition using Ollama + qwen3.5:4b with think=False for reliable content extraction.
使用 Ollama + qwen3.5:4b 进行图像识别,think=False 确保可靠的内容提取。
huruilizhen
未分类 clawhub v1.1.3 1 版本 100000 Key: 无需
★ 0
Stars
📥 421
下载
💾 0
安装
1
版本
#image-recognition#latest#ollama#vision

概述

Vision Tool 👁️

Image recognition using Ollama + qwen3.5:4b. Uses /api/chat endpoint for direct content extraction.

Features

Direct content extraction - Uses /api/chat endpoint for clean output

Simplified architecture - No complex thinking field processing needed

English prompts - Optimized for English language analysis

Multi-channel support - Works in WeChat, Telegram, Discord, etc.

Error handling - Full error recovery and reporting

Installation

Prerequisites

  1. Ollama service: ollama serve (must be running)
  2. qwen3.5:4b model: ollama pull qwen3.5:4b
  3. Python 3.8+: Required for running the skill

Install the skill

clawhub install vision-tool

Development Setup (For Contributors)

If you want to contribute or modify the skill, see CONTRIBUTING.md for detailed development instructions.

Basic setup:

# Clone the repository
git clone https://github.com/HuRuilizhen/vision-tool
cd vision-tool

# Set up development environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

# Run tests
python3 -m pytest tests/

Usage

Basic usage

# From any OpenClaw channel
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg

# With custom prompt
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --prompt "Describe this image"

# Debug output
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --debug

Channel-specific examples

WeChat Channel:

# When receiving an image
exec: python3 /path/to/vision-tool/main.py "$IMAGE_PATH"

Telegram Channel:

# Reply to photo messages
exec: python3 /path/to/vision-tool/main.py "/path/to/telegram_photo.jpg"

Discord Channel:

# Process attachments
exec: python3 /path/to/vision-tool/main.py "./discord_attachment.jpg"

Example Output

Analysis (30.7s):
------------------------------------------------------------
The user wants a description of the image provided.
**1. Overall Composition:**
- It's a top-down view of a meal served on a white tray.
- There are six distinct dishes/bowls arranged...
**2. Detailed Breakdown of Dishes:**
- **Top Left:** A small white rectangular dish...
- **Top Middle:** A small white rectangular dish...
------------------------------------------------------------

How It Works

  1. Image reading: Reads and Base64 encodes the image
  2. API call: Calls Ollama /api/chat endpoint with qwen3.5:4b
  3. Direct extraction: Gets analysis directly from content field
  4. Fallback handling: Simple cleanup if thinking field is used
  5. Output formatting: Generates clean analysis results

Performance

  • Average processing time: 25-35 seconds per image (hardware dependent)
  • Image size support: 100KB-500KB recommended
  • Token consumption: ~2000 tokens per image
  • API endpoint: Uses /api/chat for direct content access

Troubleshooting

Common Issues

  1. Ollama not running: Run ollama serve first
  2. Model not installed: Run ollama pull qwen3.5:4b
  3. Image path incorrect: Use absolute paths or correct relative paths
  4. Timeout: Model may take 30+ seconds for complex images

Performance Tips

  • Compress images to under 300KB for faster processing
  • Use clear, concise prompts
  • Ensure Ollama has sufficient system resources

API Reference

Python API

from vision_core import VisionAnalyzer

analyzer = VisionAnalyzer()
result = analyzer.analyze_image("image.jpg", "Describe this image")
print(result["analysis"])

Command Line

# Basic analysis
python3 main.py image.jpg

# Custom prompt
python3 main.py image.jpg --prompt "What objects are in this image?"

# Debug mode
python3 main.py image.jpg --debug

Development

File Structure

vision-tool/
├── SKILL.md          # This documentation
├── main.py           # Main skill script
├── scripts/
│   └── vision_core.py  # Core analysis engine
└── tests/
    └── test_basic.py   # Basic tests

Testing

# Test with example image
python3 main.py /path/to/test.jpg --prompt "Test analysis"

# Run unit tests
python3 -m pytest tests/

Changelog

v1.1.0 (2026-04-13)

  • Uses /api/chat endpoint for direct content extraction
  • Simplified architecture without complex thinking field processing
  • Default English prompt "Describe this image"
  • Removed regex dependencies for cleaner code

v1.0.0 (2026-04-12)

  • Initial release

Contributing

Issues and pull requests are welcome. Please ensure tests pass before submitting.

License

This skill is part of the OpenClaw ecosystem.


Ready to use in all OpenClaw channels! 🚀

版本历史

共 1 个版本

  • v1.1.3 当前
    2026-05-07 04:34 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 323,950
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,353 📥 317,915
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,058 📥 797,564