概述

Vision Tool 👁️

Image recognition using Ollama + qwen3.5:4b. Uses /api/chat endpoint for direct content extraction.

Features

✅ Direct content extraction - Uses /api/chat endpoint for clean output

✅ Simplified architecture - No complex thinking field processing needed

✅ English prompts - Optimized for English language analysis

✅ Multi-channel support - Works in WeChat, Telegram, Discord, etc.

✅ Error handling - Full error recovery and reporting

Installation

Prerequisites

Ollama service: ollama serve (must be running)
qwen3.5:4b model: ollama pull qwen3.5:4b
Python 3.8+: Required for running the skill

Install the skill

clawhub install vision-tool

Development Setup (For Contributors)

If you want to contribute or modify the skill, see CONTRIBUTING.md for detailed development instructions.

Basic setup:

# Clone the repository
git clone https://github.com/HuRuilizhen/vision-tool
cd vision-tool

# Set up development environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

# Run tests
python3 -m pytest tests/

Usage

Basic usage

# From any OpenClaw channel
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg

# With custom prompt
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --prompt "Describe this image"

# Debug output
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --debug

Channel-specific examples

WeChat Channel:

# When receiving an image
exec: python3 /path/to/vision-tool/main.py "$IMAGE_PATH"

Telegram Channel:

# Reply to photo messages
exec: python3 /path/to/vision-tool/main.py "/path/to/telegram_photo.jpg"

Discord Channel:

# Process attachments
exec: python3 /path/to/vision-tool/main.py "./discord_attachment.jpg"

Example Output

Analysis (30.7s):
------------------------------------------------------------
The user wants a description of the image provided.
**1. Overall Composition:**
- It's a top-down view of a meal served on a white tray.
- There are six distinct dishes/bowls arranged...
**2. Detailed Breakdown of Dishes:**
- **Top Left:** A small white rectangular dish...
- **Top Middle:** A small white rectangular dish...
------------------------------------------------------------

How It Works

Image reading: Reads and Base64 encodes the image
API call: Calls Ollama /api/chat endpoint with qwen3.5:4b
Direct extraction: Gets analysis directly from content field
Fallback handling: Simple cleanup if thinking field is used
Output formatting: Generates clean analysis results

Performance

Average processing time: 25-35 seconds per image (hardware dependent)
Image size support: 100KB-500KB recommended
Token consumption: ~2000 tokens per image
API endpoint: Uses /api/chat for direct content access

Troubleshooting

Common Issues

Ollama not running: Run ollama serve first
Model not installed: Run ollama pull qwen3.5:4b
Image path incorrect: Use absolute paths or correct relative paths
Timeout: Model may take 30+ seconds for complex images

Performance Tips

Compress images to under 300KB for faster processing
Use clear, concise prompts
Ensure Ollama has sufficient system resources

API Reference

Python API

from vision_core import VisionAnalyzer

analyzer = VisionAnalyzer()
result = analyzer.analyze_image("image.jpg", "Describe this image")
print(result["analysis"])

Command Line

# Basic analysis
python3 main.py image.jpg

# Custom prompt
python3 main.py image.jpg --prompt "What objects are in this image?"

# Debug mode
python3 main.py image.jpg --debug

Development

File Structure

vision-tool/
├── SKILL.md          # This documentation
├── main.py           # Main skill script
├── scripts/
│   └── vision_core.py  # Core analysis engine
└── tests/
    └── test_basic.py   # Basic tests

Testing

# Test with example image
python3 main.py /path/to/test.jpg --prompt "Test analysis"

# Run unit tests
python3 -m pytest tests/

Changelog

v1.1.0 (2026-04-13)

Uses /api/chat endpoint for direct content extraction
Simplified architecture without complex thinking field processing
Default English prompt "Describe this image"
Removed regex dependencies for cleaner code

v1.0.0 (2026-04-12)

Initial release

Contributing

Issues and pull requests are welcome. Please ensure tests pass before submitting.

License

This skill is part of the OpenClaw ecosystem.

Ready to use in all OpenClaw channels! 🚀

版本历史

共 1 个版本

v1.1.3 当前

2026-05-07 04:34 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)