Minimax Image Understanding

使用多模态大模型理解图片内容，生成业务含义描述。支持多种模型：(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等，生成精准的文字描述。

aidescend

内容创作 clawhub v1.0.0 1 版本 100000 Key: 需要

★ 0

Stars

📥 1,268

下载

💾 11

安装

版本

#latest

概述

图片理解

调用多模态大模型理解图片，生成精准的业务描述。

支持的模型

模型	环境变量	说明
------	----------	------
MiniMax VLM	`MINIMAX_API_KEY`, `MINIMAX_API_HOST`	默认，推荐用于中文理解
OpenAI	`OPENAI_API_KEY`	GPT-4V
Anthropic	`ANTHROPIC_API_KEY`	Claude Vision

使用方法

前提条件

设置对应模型的环境变量（至少一个）：

# MiniMax（默认）
export MINIMAX_API_KEY="your-minimax-key"
export MINIMAX_API_HOST="https://api.minimaxi.com"

# 或 OpenAI
export OPENAI_API_KEY="your-openai-key"

# 或 Anthropic
export ANTHROPIC_API_KEY="your-anthropic-key"

调用脚本

python3 <skill>/scripts/understand_image.py <图片路径> [model] [prompt]

参数：

图片路径：本地图片文件（PNG、JPG、JPEG、GIF、WebP）
model（可选）：minimax（默认）、openai、anthropic
prompt（可选）：自定义提示词

示例

# 使用默认（MiniMax）
python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png

# 指定模型
python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png openai

# 自定义提示词
python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png minimax "描述图表中的数据趋势"

输出

直接输出图片的业务含义描述，不再罗列元素位置，聚焦数据内容和业务逻辑。

版本历史

共 1 个版本

v1.0.0 当前

2026-03-30 09:23 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants

广告情报与应用数据分析助手，支持搜索广告素材、分析应用排名、下载量、收入及市场洞察，用于广告素材和竞品分析。

★ 295 📥 136,487

content-creation

Humanizer

biostartechnology

消除AI写作痕迹，使文本更自然真实。基于维基百科"AI写作特征"指南，识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。

★ 860 📥 199,821

content-creation

YouTube

byungkyu

使用托管OAuth集成YouTube Data API，支持搜索视频、管理播放列表、获取频道数据及评论互动，适用于用户需要时使用此技能。

★ 142 📥 41,068