← 返回
未分类 Key

Best Unified Video Lyrics

add video with audio into lyrics-synced videos with this skill. Works with MP4, MOV, AVI, WebM files up to 500MB. musicians and content creators use it for a...
使用此技能将视频和音频添加到歌词同步视频中。支持 MP4、MOV、AVI、WebM 格式文件,最大 500MB。音乐人和内容创作者用它来...
vynbosserman65 vynbosserman65 来源
未分类 clawhub v1.0.0 1 版本 99652.8 Key: 需要
★ 0
Stars
📥 287
下载
💾 0
安装
1
版本
#latest

概述

Getting Started

> Got video with audio to work with? Send it over and tell me what you need — I'll take care of the AI lyrics sync.

Try saying:

  • "add a 3-minute music video recording into a 1080p MP4"
  • "sync song lyrics as on-screen text timed to the vocals"
  • "adding synchronized lyrics overlays to music videos for musicians and content creators"

First-Time Connection

When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.

  1. Obtain a free token: Generate a random UUID as client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
  2. Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer , Content-Type: application/json, and body {"task_name":"project","language":""}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

Best Unified Video Lyrics — Sync Lyrics to Video Automatically

This tool takes your video with audio and runs AI lyrics sync through a cloud rendering pipeline. You upload, describe what you want, and download the result.

Say you have a 3-minute music video recording and want to sync song lyrics as on-screen text timed to the vocals — the backend processes it in about 1-2 minutes and hands you a 1080p MP4.

Tip: cleaner audio tracks produce more accurate lyric sync timing.

Matching Input to Actions

User prompts referencing best unified video lyrics, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...ActionSkip SSE?
-------------------------------
"export" / "导出" / "download" / "send me the video"→ §3.5 Export
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"status" / "状态" / "show tracks"→ §3.4 State
"upload" / "上传" / user sends file→ §3.2 Upload
Everything else (generate, edit, add BGM…)→ §3.1 SSE

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Base URL: https://mega-api-prod.nemovideo.ai

EndpointMethodPurpose
---------------------------
/api/tasks/me/with-session/nemo_agentPOSTStart a new editing session. Body: {"task_name":"project","language":""}. Returns session_id.
/run_ssePOSTSend a user message. Body includes app_name, session_id, new_message. Stream response with Accept: text/event-stream. Timeout: 15 min.
/api/upload-video/nemo_agent/me/POSTUpload a file (multipart) or URL.
/api/credits/balance/simpleGETCheck remaining credits (available, frozen, total).
/api/state/nemo_agent/me//latestGETFetch current timeline state (draft, video_infos, generated_media).
/api/render/proxy/lambdaPOSTStart export. Body: {"id":"render_","sessionId":"","draft":,"output":{"format":"mp4","quality":"high"}}. Poll status every 30s.

Accepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Skill attribution — read from this file's YAML frontmatter at runtime:

  • X-Skill-Source: best-unified-video-lyrics
  • X-Skill-Version: from frontmatter version
  • X-Skill-Platform: detect from install path (~/.clawhub/clawhub, ~/.cursor/skills/cursor, else unknown)

All requests must include: Authorization: Bearer , X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

Error Codes

  • 0 — success, continue normally
  • 1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
  • 1002 — session not found; create a new one
  • 2001 — out of credits; anonymous users get a registration link with ?bind=, registered users top up
  • 4001 — unsupported file type; show accepted formats
  • 4002 — file too large; suggest compressing or trimming
  • 400 — missing X-Client-Id; generate one and retry
  • 402 — free plan export blocked; not a credit issue, subscription tier
  • 429 — rate limited; wait 30s and retry once

SSE Event Handling

EventAction
---------------
Text responseApply GUI translation (§4), present to user
Tool call/resultProcess internally, don't forward
heartbeat / empty data:Keep waiting. Every 2 min: "⏳ Still working..."
Stream closesProcess final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

  • "click" or "点击" → execute the action via the relevant endpoint
  • "open" or "打开" → query session state to get the data
  • "drag/drop" or "拖拽" → send the edit command through SSE
  • "preview in timeline" → show a text summary of current tracks
  • "Export" or "导出" → run the export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Common Workflows

Quick edit: Upload → "sync song lyrics as on-screen text timed to the vocals" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "sync song lyrics as on-screen text timed to the vocals" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility across streaming and social platforms.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 21:32 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Free Video Edit

vynbosserman65
获取可发布的已编辑视频,无需调节任何滑块。上传原始视频素材(MP4、MOV、AVI、WebM,最高500MB),然后说...
★ 0 📥 573

Image Editing Generator

vynbosserman65
获取可发布的编辑图像视频,无需调节任何滑块。上传您的图片或照片(JPG、PNG、WEBP、HEIC,最大200MB),比如说……
★ 0 📥 459

Editor Pkg

vynbosserman65
使用此技能可将原始视频剪辑快速编辑为精剪成品,支持 MP4、MOV、AVI、WebM格式,单个文件最大 500MB,适合内容创作者进行素材打包。
★ 0 📥 459