Image recognition using Ollama + qwen3.5:4b. Uses /api/chat endpoint for direct content extraction.
✅ Direct content extraction - Uses /api/chat endpoint for clean output
✅ Simplified architecture - No complex thinking field processing needed
✅ English prompts - Optimized for English language analysis
✅ Multi-channel support - Works in WeChat, Telegram, Discord, etc.
✅ Error handling - Full error recovery and reporting
ollama serve (must be running)ollama pull qwen3.5:4bclawhub install vision-tool
If you want to contribute or modify the skill, see CONTRIBUTING.md for detailed development instructions.
Basic setup:
# Clone the repository
git clone https://github.com/HuRuilizhen/vision-tool
cd vision-tool
# Set up development environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
# Run tests
python3 -m pytest tests/
# From any OpenClaw channel
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg
# With custom prompt
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --prompt "Describe this image"
# Debug output
exec: python3 /path/to/vision-tool/main.py /path/to/image.jpg --debug
WeChat Channel:
# When receiving an image
exec: python3 /path/to/vision-tool/main.py "$IMAGE_PATH"
Telegram Channel:
# Reply to photo messages
exec: python3 /path/to/vision-tool/main.py "/path/to/telegram_photo.jpg"
Discord Channel:
# Process attachments
exec: python3 /path/to/vision-tool/main.py "./discord_attachment.jpg"
Analysis (30.7s):
------------------------------------------------------------
The user wants a description of the image provided.
**1. Overall Composition:**
- It's a top-down view of a meal served on a white tray.
- There are six distinct dishes/bowls arranged...
**2. Detailed Breakdown of Dishes:**
- **Top Left:** A small white rectangular dish...
- **Top Middle:** A small white rectangular dish...
------------------------------------------------------------
ollama serve firstollama pull qwen3.5:4bfrom vision_core import VisionAnalyzer
analyzer = VisionAnalyzer()
result = analyzer.analyze_image("image.jpg", "Describe this image")
print(result["analysis"])
# Basic analysis
python3 main.py image.jpg
# Custom prompt
python3 main.py image.jpg --prompt "What objects are in this image?"
# Debug mode
python3 main.py image.jpg --debug
vision-tool/
├── SKILL.md # This documentation
├── main.py # Main skill script
├── scripts/
│ └── vision_core.py # Core analysis engine
└── tests/
└── test_basic.py # Basic tests
# Test with example image
python3 main.py /path/to/test.jpg --prompt "Test analysis"
# Run unit tests
python3 -m pytest tests/
Issues and pull requests are welcome. Please ensure tests pass before submitting.
This skill is part of the OpenClaw ecosystem.
Ready to use in all OpenClaw channels! 🚀
共 1 个版本