← 返回
未分类 Key 中文

Zai Vision

Z.AI Vision analysis using GLM-4.6V model for image and video understanding. Use when Claude needs to analyze images (screenshots, UI designs, photos, diagra...
Z.AI 视觉分析,基于 GLM-4.6V 模型进行图像和视频理解。在 Claude 需要分析图像(屏幕截图、UI 设计、照片、图表等)时使用。
twolfe1991-cloud twolfe1991-cloud 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 306
下载
💾 2
安装
1
版本
#latest

概述

Z.AI Vision

Overview

This skill provides Z.AI's GLM-4.6V vision model capabilities for analyzing images and videos through Python scripts. Use it for OCR, UI design analysis, technical diagrams, error screenshots, data visualizations, and video scene understanding.

Quick Start

Prerequisites

  1. Install the Z.AI SDK:
  2. pip install zai-sdk
    
  1. Set your API key:
  2. export ZAI_API_KEY='your-api-key'
    

The API key is required for all vision operations.

Basic Image Analysis

python3 /root/clawd/zai-vision/scripts/vision_analyze.py <image_path> "<prompt>"

Example:

python3 /root/clawd/zai-vision/scripts/vision_analyze.py screenshot.png "Describe this UI"

Basic Video Analysis

python3 /root/clawd/zai-vision/scripts/video_analyze.py <video_path> "<prompt>"

Example:

python3 /root/clawd/zai-vision/scripts/video_analyze.py clip.mp4 "What's happening?"

Capabilities

Image Analysis

OCR / Text Extraction

python3 /root/clawd/zai-vision/scripts/vision_analyze.py doc-scan.jpg "Extract all text"

UI Design Analysis

python3 /root/clawd/zai-vision/scripts/vision_analyze.py ui-mockup.png "Analyze this UI design and list all components"

Error Diagnosis

python3 /root/clawd/zai-vision/scripts/vision_analyze.py error.png "What error is shown and how do I fix it?"

Technical Diagrams

python3 /root/clawd/zai-vision/scripts/vision_analyze.py architecture.png "Explain this architecture diagram"

Data Visualization

python3 /root/clawd/zai-vision/scripts/vision_analyze.py chart.png "What insights does this chart show?"

Video Analysis

Scene Description

python3 /root/clawd/zai-vision/scripts/video_analyze.py demo.mp4 "Describe what's happening"

Note: Video analysis works best with short clips (≤8MB). Videos are processed frame-by-frame.

Parameters

ParameterDefaultPurpose
-------------------------------
--modelglm-4.6vVision model to use
--max-tokens2000Max response tokens
--temperature0.50-2, lower=factual, higher=creative
--jsonfalseOutput structured JSON

Example with parameters:

python3 /root/clawd/zai-vision/scripts/vision_analyze.py image.jpg "Describe this" \
  --temperature 0.3 \
  --max-tokens 500 \
  --json

Integration with Safe Scripts

When running in the /root/clawd workspace, use clawd-run for safety:

clawd-run /root/clawd/zai-vision/scripts/vision_analyze.py image.png "Analyze"

This provides automatic backups, validation, and timeout protection.

Error Handling

Missing API key:

❌ ZAI_API_KEY environment variable not set

Set it: export ZAI_API_KEY='your-key'

Image not found:

❌ Image file not found: /path/to/image.jpg

Verify the file path.

SDK not installed:

❌ zai-sdk not installed

Install with: pip install zai-sdk

Common Patterns

Pattern 1: Batch Process Multiple Images

for img in /path/to/images/*.png; do
  python3 /root/clawd/zai-vision/scripts/vision_analyze.py "$img" "Describe this image"
done

Pattern 2: Extract and Save JSON

python3 /root/clawd/zai-vision/scripts/vision_analyze.py image.jpg "Analyze" --json > output.json

Pattern 3: Specific Analysis Type

Code from screenshot:

python3 /root/clawd/zai-vision/scripts/vision_analyze.py code.png "Extract the code and explain what it does"

Form field extraction:

python3 /root/clawd/zai-vision/scripts/vision_analyze.py form.jpg "List all form fields and their types"

Brand guidelines check:

python3 /root/clawd/zai-vision/scripts/vision_analyze.py design.png "Check if this follows brand guidelines"

Tips for Best Results

  1. Specific prompts: "List all UI components" > "What's this?"
  2. High quality images: Better resolution = better understanding
  3. Temperature: 0.2-0.5 for factual, 0.7-1.0 for creative
  4. Video limits: Keep videos ≤8MB for best performance
  5. Handle errors: Always check return codes and error messages

Resources

Scripts

  • scripts/vision_analyze.py - Image analysis with GLM-4.6V
  • scripts/video_analyze.py - Video analysis (frame-by-frame)

References

  • references/API.md - Complete API documentation and examples

When to Use This Skill

Use this skill when you need to:

  • Analyze screenshots, photos, or images
  • Extract text from images (OCR)
  • Understand technical diagrams or charts
  • Diagnose errors from screenshots
  • Analyze UI designs or mockups
  • Describe video scenes
  • Process visual content programmatically

For more detailed API information, see references/API.md.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-08 03:31 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 430 📥 116,989
design-media

Video Frames

steipete
使用 ffmpeg 从视频中提取帧或短片。
★ 134 📥 52,887
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 331 📥 93,851