← 返回
未分类 中文

see-video

Use when the user sends a video file or asks about video content. Extracts frames and injects them as an image grid directly into the LLM context — no proxy...
当用户发送视频文件或询问视频内容时使用。提取帧并将其作为图像网格直接注入LLM上下文,无需代理。
john-ver john-ver 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 419
下载
💾 0
安装
1
版本
#latest

概述

see-video

Extract frames from a video and inject them as a grid image + XML timestamps into LLM context.

Setup (first time only)

cd <skill directory>
npm install

Usage

node {baseDir}/scripts/inject.mjs <video_path> [--mode uniform|highlight] [--start N] [--end N]

On success, outputs JSON to stdout:

{
  "gridPath": "/tmp/video_llm-frames.jpg",
  "description": "<video_frames>...</video_frames>",
  "duration": 1326,
  "frameCount": 28,
  "layout": { "cols": 4, "rows": 7, "cellW": 384, "cellH": 216 },
  "videoWidth": 854,
  "videoHeight": 480,
  "inputSizeMb": 42.3
}

If the video exceeds 10 minutes and uniform mode was used without --start/--end, a hint field is included:

{
  "hint": "Video is 30 minutes long. This is a uniform overview. For better scene coverage re-run with --mode highlight, or use --start/--end to zoom into a specific section."
}

Recommended workflow for long videos:

  1. First run with --mode highlight — shows key scene changes across the whole video
  2. If the user wants detail on a specific section, re-run with --start N --end N

On error, writes ERROR: + Hint: to stderr and exits 1.

Injection procedure

Step 1 — Run the script (bash tool):

node {baseDir}/scripts/inject.mjs "/path/to/video.mp4"

Step 2 — Parse JSON:

Extract gridPath and description.

Step 3 — Inject image (read tool):

read <gridPath>

The read tool injects the jpg as a native multimodal image block into context.

After viewing the grid, use the description XML timestamps to reference frames:

> "Look at the grid image above. Use the timestamps in the description XML to analyze the video. The number in the top-left of each cell is the frame index."

On error:

  • Translate the Hint: message into natural language for the user. Do not paste raw error output.
  • If read fails — /tmp/ files are ephemeral. Re-run the script and read immediately.

Options

OptionDefaultDescription
---------
--mode uniformEvenly spaced frames
--mode highlightScene-change biased sampling
--start N0Segment start (seconds)
--end Nend of videoSegment end (seconds)

Diagnostics

ErrorCauseAction
---------
Input file not foundFile missing or dropped by channel media size limitAsk the user to share the file path directly as text
corrupt, incomplete, or unsupported formatDamaged file, interrupted transfer, or unsupported codecTry a different file, or use --start/--end to skip problematic sections
moov atom not foundIncomplete mp4 (streaming not finished)Retry with a complete file
ffmpeg not foundffmpeg not installedCheck ffmpeg installation

Notes

  • Frame count and cell size are determined automatically from video duration and aspect ratio
  • Grid is ~1500×1500px, cell long side 384–512px
  • Timestamps are in the description XML only, not overlaid on the image
  • Portrait and landscape videos both supported
  • Telegram users: if a video file is not attached to the message, check channels.telegram.mediaMaxMb in the OpenClaw config — the file may have been dropped at the channel level before reaching the agent

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 04:49 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 429 📥 116,659
design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 216 📥 47,149
knowledge-management

karpathy-llm-wiki

john-ver
基于 Karpathy 的 LLM‑Wiki 模式,持久化 Wiki 管理器,构建并维护一个结构化、互相链接的 Markdown Wiki,从您的来源中持续积累知识。
★ 8 📥 2,249