← 返回
未分类 Key 中文

GLM-V-Caption

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to descr...
使用智谱GLM‑V多模态模型系列为图像、视频和文档生成标题(描述)。当用户想要描述这些内容时使用此技能。
jaredforreal jaredforreal 来源
未分类 clawhub v1.0.3 2 版本 100000 Key: 需要
★ 1
Stars
📥 964
下载
💾 32
安装
2
版本
#latest

概述

GLM-V Caption Skill

Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.

When to Use

  • Describe, caption, summarize, or interpret image/video/document content
  • User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
  • Extract visual or textual information from media files
  • Compare multiple images
  • User provides an image/video/file and asks what's in it

Supported Input Types

TypeFormatsMax SizeMax CountBase64
----------------------------------------------------------------------
Imagejpg, png, jpeg5MB / 6000×6000px50
Videomp4, mkv, mov200MB
Filepdf, docx, txt, xlsx, pptx, jsonl50

⚠️ file_url cannot mix with image_url or video_url in the same request.

⚠️ Videos and files only support URLs — local paths and base64 are NOT supported (images only).

Resource Links

ResourceLink
------------------------------------------------------------------------------------------------------------------------------------------------
Get API Keyhttps://bigmodel.cn/usercenter/proj-mgmt/apikeys
API DocsChat Completions / 对话补全

Prerequisites

API Key Setup / API Key 配置(Required / 必需)

This script reads the key from the ZHIPU_API_KEY environment variable and shares it with other Zhipu skills.

脚本通过 ZHIPU_API_KEY 环境变量获取密钥,与其他智谱技能共用同一个 key。

Get Key / 获取 Key: Visit Zhipu Open Platform API Keys / 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式(任选一种):

  1. OpenClaw config (recommended) / OpenClaw 配置(推荐): Set in openclaw.json under skills.entries.glmv-caption.env:

```json

"glmv-caption": { "enabled": true, "env": { "ZHIPU_API_KEY": "你的密钥" } }

```

  1. Shell environment variable / Shell 环境变量: Add to ~/.zshrc:

```bash

export ZHIPU_API_KEY="你的密钥"

```

  1. .env file / .env 文件: Create .env in this skill directory:

```

ZHIPU_API_KEY=你的密钥

```

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

  1. ONLY use GLM-V API — Execute the script python scripts/glmv_caption.py
  2. NEVER caption media yourself — Do NOT try to describe content using built-in vision or any other method
  3. NEVER offer alternatives — Do NOT suggest "I can try to describe it" or similar
  4. IF API fails — Display the error message and STOP immediately
  5. NO fallback methods — Do NOT attempt captioning any other way

📋 Output Display Rules (MANDATORY)

After running the script, you must show the full raw output to the user exactly as returned. Do not summarize, truncate, or only say "generated". Users need the original model output to evaluate quality.

  • Image captioning: show the full caption text
  • Multiple images: show each image result
  • Video/files: show the full understanding result
  • If token usage is included, you may optionally display it

How to Use

Caption an Image

python scripts/glmv_caption.py --images "https://example.com/photo.jpg"
python scripts/glmv_caption.py --images /path/to/photo.png

Caption Multiple Images

python scripts/glmv_caption.py --images img1.jpg img2.png "https://example.com/img3.jpg"

Caption a Video

python scripts/glmv_caption.py --videos "https://example.com/clip.mp4"

Caption a Document

python scripts/glmv_caption.py --files "https://example.com/report.pdf"
python scripts/glmv_caption.py --files "https://example.com/doc1.docx" "https://example.com/doc2.txt"

Custom Prompt

python scripts/glmv_caption.py --images photo.jpg --prompt "Describe the architecture style in detail"

Save Result

python scripts/glmv_caption.py --images photo.jpg --output result.json

Thinking Mode

python scripts/glmv_caption.py --images photo.jpg --thinking

CLI Reference

python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]
ParameterRequiredDescription
-------------------------------------------------------------------------------------------------------------------------
--images, -iOne ofImage paths or URLs (supports multiple, base64 OK)
--videos, -vOne ofVideo paths or URLs (supports multiple, mp4/mkv/mov)
--files, -fOne ofDocument paths or URLs (supports multiple, pdf/docx/txt/xlsx/pptx/jsonl)
--prompt, -pNoCustom prompt (default: "请详细描述这张图片的内容" / "Please describe this image in detail")
--model, -mNoModel name (default: glm-4.6v)
--temperature, -tNoSampling temperature 0-1 (default: 0.8)
--top-pNoNucleus sampling 0.01-1.0 (default: 0.6)
--max-tokensNoMax output tokens (default: 1024, max 32768)
--thinkingNoEnable thinking/reasoning mode
--output, -oNoSave result JSON to file
--prettyNoPretty-print JSON output
--streamNoEnable streaming output

Note: --images, --videos, and --files are mutually exclusive per API limits.

Response Format

{
  "success": true,
  "caption": "A landscape photo showing a mountain range at sunset...",
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}

Key fields:

  • success — whether the request succeeded
  • caption — the generated caption text
  • usage — token usage statistics
  • warning — present when content was blocked by safety review
  • error — error details on failure

Error Handling

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Content filtered: warning field present → content blocked by safety review

版本历史

共 2 个版本

  • v1.0.3 当前
    2026-05-03 03:28 安全 安全
  • v1.0.1
    2026-03-30 14:45

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Marketing Skills

jchopard69
访问 23 个营销模块,提供转化率优化(CRO)、SEO、文案撰写、分析、发布、广告和社交媒体的清单、框架及可直接使用的交付物。
★ 144 📥 31,031
ai-agent

GLM-Master-Skill

jaredforreal
仅文档型主技能,用于 GLM 生态系统的发现与安装。此技能不执行脚本或子进程命令,提供精选...
★ 5 📥 1,380
content-creation

humanizer-zh

liuxy951129-cpu
去除文本中的 AI 生成痕迹。适用于编辑或审阅文本,使其听起来更自然、更像人类书写。 基于维基百科的"AI 写作特征"综合指南。检测并修复以下模式:夸大的象征意义、 宣传性语言、以 -ing 结尾的肤浅分析、模糊的归因、破折号过度使用、三段
★ 63 📥 29,729