← 返回
未分类 Key

Lipsyncvideo Ai

Match audio tracks to lip movements in your videos. lipsyncvideo-ai uploads your clip to a cloud GPU, syncs the audio you provide to the speaker's mouth, and...
mory128
未分类 clawhub v1.0.1 100000 Key: 需要
★ 0
Stars
📥 320
下载
💾 0
安装

概述

Getting Started

> LipSync Video AI is ready. Upload your video and audio, or describe what you need synced.

Try saying:

  • "sync this voiceover to the speaker"
  • "replace the audio and match lip movements"
  • "dub this clip with my recording"

Initial Setup

First time running this, it connects to the processing backend. Shows a quick "Getting ready..." message.

Token: Check for NEMO_TOKEN in environment. If present, go straight to session setup.

  1. Grab a free token: Generate a UUID client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token using X-Client-Id header with your UUID. Response data.token is your auth token (100 credits, good for 7 days).
  2. Start session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent, Bearer auth, body: {"task_name":"project","language":""}. Save the session_id for later calls.

Raw JSON and tokens stay hidden from the user.

Sync Audio to Lip Movements in Your Clips

Upload your video with the audio you want synced. Cloud GPUs do the heavy lifting — no local processing.

Here is how it works in practice: had a training video where the speaker's mic died halfway through. Recorded a clean voiceover separately, uploaded both files, typed "sync the new audio to match the speaker's mouth movements" and got a clean result in about 75 seconds. Output is 1080p MP4.

Pro tip: shorter clips give tighter sync. If you have a long video, consider breaking it into segments first.

Request Categories

Your input gets matched to the right processing path automatically.

You type...Goes to...Uses SSE?
---------
"export" / "download" / "get video" / "导出"Export pipelineNo
"credits" / "balance" / "remaining" / "积分"Balance checkNo
"status" / "show me the tracks" / "状态"Session stateNo
"upload" / attached file / "上传"File ingestionNo
Anything else (sync, dub, match, adjust...)SSE processingYes

Backend Architecture

Files go to a GPU farm for processing. Output is encoded at 8Mbps for 1080p. Lip sync boundaries are frame-level accurate.

Required on every request: Authorization: Bearer and attribution headers X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution means export fails with 402.

Attribution comes from this file's YAML: X-Skill-Source is lipsyncvideo-ai, X-Skill-Version is whatever version is in frontmatter, X-Skill-Platform depends on install location (~/.clawhub/clawhub, ~/.cursor/skills/cursor, otherwise unknown).

Root URL: https://mega-api-prod.nemovideo.ai

New session: POST /api/tasks/me/with-session/nemo_agent with {"task_name":"project","language":""}. Returns task_id, session_id.

SSE message: POST /run_sse with {"app_name":"nemo_agent","user_id":"me","session_id":"","new_message":{"parts":[{"text":""}]}} and Accept: text/event-stream. Cap: 15 min.

File upload: POST /api/upload-video/nemo_agent/me/ — multipart (-F "files=@/path") or URL mode ({"urls":[""],"source_type":"url"}).

Balance: GET /api/credits/balance/simple returns available, frozen, total.

State: GET /api/state/nemo_agent/me//latest — check data.state.draft, data.state.video_infos, data.state.generated_media.

Export (free): POST /api/render/proxy/lambda with {"id":"render_","sessionId":"","draft":,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/ every 30s. Done when status = completed. File at output.url.

Handles: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Errors

CodeMeansFix
---------
0SuccessContinue
1001Bad tokenRe-authenticate via anonymous-token endpoint
1002No sessionMake a new one
2001No credits leftAnonymous: share registration link with ?bind=. Others: top up
4001Can't handle that file typeShare supported formats
4002Too largeSuggest trimming or compressing
400Missing X-Client-IdGenerate and retry
402Free plan export limitNeeds registration or upgrade
429Rate cappedWait 30s, try again once

Converting GUI Instructions

Backend outputs reference a visual interface. Convert them:

Backend outputYour action
------
"click [X]" / "点击"Invoke the API equivalent
"open [panel]" / "打开"Read session state
"drag/drop" / "拖拽"Post edit through SSE
"preview in timeline"Output track listing
"Export button" / "导出"Start export sequence

How SSE Works

Forward text events to user (after GUI translation). Absorb tool calls. Heartbeat and empty data lines = still processing. Every 2 minutes of quiet, say "Hang on, still processing..."

About 30% of edit ops return no text. If the stream closes empty, check state to confirm the edit stuck, then tell the user.

Draft keys: t (tracks), tt (track type: 0=video, 1=audio, 7=text), sg (segments), d (duration, ms), m (metadata).

Timeline (2 tracks): 1. Video: interview clip (0-45s) 2. Audio: dubbed voiceover (0-45s)

Common Workflows

Basic lip sync: Upload video + audio, ask for sync. Done.

Audio replacement: Upload new audio, tell the skill to swap it in and match the mouth movements.

Multi-speaker: Works best when speakers take turns. For overlapping speech, split into separate segments first.

FAQ

How accurate is the sync? Frame-level for clear speech. Mumbling or fast-talking may be slightly off.

What audio formats? MP3, WAV, M4A, AAC all work.

File size limit? 500MB. Compress if you're over.

Cost? First 100 operations free. No signup required.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 13:51 安全 安全

安全检测

暂无安全检测报告