← 返回
未分类 Key

Alibabacloud Video Translation

Alibaba Cloud IMS (Intelligent Media Services) based video translation Skill. Supports subtitle extraction (ASR/OCR), translation, and speech synthesis trans...
基于阿里云IMS(智能媒体服务)的视频翻译技能,支持字幕提取(ASR/OCR)、翻译和语音合成等功能。
sdk-team sdk-team 来源
未分类 clawhub v0.0.1 1 版本 100000 Key: 需要
★ 0
Stars
📥 284
下载
💾 0
安装
1
版本
#latest

概述

Video Translation Skill

One-click video translation powered by Alibaba Cloud IMS, supporting subtitle-level and speech-level translation.


Input Format Requirements

> IMPORTANT: Different APIs use different address formats!

API Address Format Reference

APIAddress FormatExample
------------------------------
SubmitIProductionJob (subtitle extraction)oss:// formatoss://my-bucket/videos/test.mp4
SubmitVideoTranslationJob (video translation)HTTP URL formathttps://my-bucket.oss-cn-shanghai.aliyuncs.com/videos/test.mp4

> Key: Subtitle extraction uses oss://, video translation uses HTTP URL!

User Input Handling

User Input TypeProcessing Method
------------------------------------
HTTP URL formatUse directly for video translation; convert to oss:// if subtitle extraction needed
oss:// formatUse directly for subtitle extraction; convert to HTTP URL for video translation
Local videoMUST ask for OSS upload path, save both formats after upload

Format Conversion Rules

oss:// format ⇄ HTTP URL format

oss://my-bucket/videos/test.mp4
    ⇄
https://my-bucket.oss-cn-shanghai.aliyuncs.com/videos/test.mp4

Conversion Formula:

  • oss:///https://.oss-.aliyuncs.com/
  • HTTP URL does not require signing, use Bucket domain format directly

Local Video Processing Flow

User provides local video path
    │
    ├─ AskUserQuestion: "Please provide OSS upload path (format: oss://<bucket>/<path>/<filename>.mp4)"
    │
    ├─ User specifies upload path
    │   ├─ Check if Bucket exists
    │   ├─ Upload file: aliyun oss cp <local_path> <oss_path>
    │   ├─ Save oss:// format → for subtitle extraction
    │   └─ Save HTTP URL format → for video translation
    │
    └─ User does not specify path → STOP, user MUST provide upload path

Upload Command:

aliyun oss cp <local_path> oss://<bucket>/<path>/<filename>.mp4

Save both formats after upload:

Local: /Users/demo/videos/test.mp4
Uploaded to: oss://my-bucket/videos/test.mp4
    ├─ oss:// format: oss://my-bucket/videos/test.mp4 (for subtitle extraction)
    └─ HTTP URL: https://my-bucket.oss-cn-shanghai.aliyuncs.com/videos/test.mp4 (for video translation)

Execution Gate Checklist

> Strict Requirement: Agent MUST execute in phase order, cannot proceed without passing current phase!

Phase 0: Environment and Credential Check (HARD-GATE)

Check ItemCommandPass ConditionFailure Handling
-------------------------------------------------------
CLI versionaliyun version>= 3.3.1STOP, see cli-installation-guide.md
Credential statusaliyun configure listValid statusSTOP, guide configuration
Plugin installationaliyun configure set --auto-plugin-install trueSetAuto-set

> HARD-GATE: Cannot proceed with any subsequent operations without passing!


Phase 1: Translation Mode Confirmation (BLOCKING)

AskUserQuestion: "Do you need subtitle translation (translate subtitles only) or speech translation (translate subtitles + replace voiceover)?"

┌─ Subtitle translation → NeedSpeechTranslate: false
└─ Speech translation → NeedSpeechTranslate: true

⚠️ No reply received → STOP, cannot proceed!

> DO NOT infer translation mode from input type!


Phase 2: Subtitle Processing Confirmation (BLOCKING)

AskUserQuestion: "Do you need to erase original subtitles from the video? Do you need to burn-in translated subtitles?"

⚠️ No reply received → STOP, cannot proceed!

Parameter Mapping:

FeatureParameterValue
---------------------------
Erase original subtitlesDetextArea"Auto" / coordinates / not set (no erasure)
Burn-in new subtitlesSubtitleConfigconfig object / not set (no burn-in)

Phase 3: Output Path Confirmation (Non-blocking)

ConditionProcessing Method
------------------------------
User explicitly specifiesUse user's path
User does not specifyUse default path and inform user

Default Output Rules:

  • Bucket: Same bucket as input video
  • Directory: Same directory as input video
  • Filename: {source}_translated_{random8}.mp4
  • Example: oss://bucket/videos/demo.mp4oss://bucket/videos/demo_translated_a1b2c3d4.mp4

> DO NOT use shell variables, use Python: python3 -c "import random; print(''.join(random.choices('abcdefghijkmnpqrstuvwxyz23456789', k=8)))"


Phase 4: Subtitle Review Confirmation (Conditional Blocking)

Trigger ConditionProcessing Method
--------------------------------------
User chooses to review subtitlesBLOCKING, MUST wait for user confirmation of review result
User does not need reviewNon-blocking, proceed

> CRITICAL: After subtitle extraction, MUST output content as-is for user review, DO NOT change format!


Scenario Entry Selector

> Key Points:

> 1. When user inputs local video, MUST first upload to OSS and get HTTP URL

> 2. When user does not provide subtitle, MUST ask if subtitle extraction and review is needed

User inputs video
    │
    ├─ Local video?
    │   └─ Yes → AskUserQuestion: "Please provide OSS upload path"
    │       ├─ User provides path → Upload to OSS → Convert to HTTP URL → Continue
    │       └─ User does not provide → STOP
    │
    ├─ oss:// format?
    │   └─ Yes → Inform user to convert to HTTP URL format
    │
    └─ HTTP URL format? → Continue
        │
        ├─ User provides SRT file?
        │   ├─ Yes → Input type = with_subtitle
        │   │   ├─ Translation mode = speech → 【Scenario 4】 ⚠️ MUST ask CustomSrtType
        │   │   └─ Translation mode = subtitle → 【Scenario 3】
        │   │
        │   └─ No → Input type = only_video ⚠️ MUST ask if review needed
        │       │
        │       ├─ AskUserQuestion: "Do you need to extract subtitles for review first, or translate directly?"
        │       │
        │       ├─ Need review → 【Scenario 2】 ⚠️ Phase 4 blocking
        │       │
        │       └─ Direct translation → 【Scenario 1】 (TextSource=OCR_ASR)
ScenarioNameBlocking PointTextSourceFlow
--------------------------------------------------
0Local video uploadOSS upload path inquiry-Upload→HTTP URL→Subsequent scenario
1Direct translationPhase 1, 2OCR_ASRSubmit translation directly
2Subtitle reviewPhase 1, 2, Subtitle review inquiry, Phase 4SubtitleFileExtract subtitle→Review→Translate
3Subtitle translation + user subtitlePhase 1, 2SubtitleFileUse user SRT to translate directly
4Speech translation + user subtitlePhase 1, 2 + CustomSrtType confirmationSubtitleFileConfirm subtitle language then translate

> Scenario 0 (Local video) detailed flow:

> 1. AskUserQuestion: "Please provide OSS upload path (format: oss:////.mp4)"

> 2. After user specifies path, execute aliyun oss cp

> 3. Convert to HTTP URL: https://.oss-.aliyuncs.com//.mp4

> 4. Continue with subsequent scenario flow

> Scenario 2 detailed flow:

> 1. Ask for subtitle detection region (roi parameter)

> 2. Call CaptionExtraction to extract subtitles, input and output use oss:// format

> 3. Output subtitle content as-is for user review

> 4. After user confirmation, use reviewed SRT to submit translation


Parameter Decision Table

> Decision Rules: Clearly define handling for each parameter, DO NOT assume arbitrarily!

ParameterTrigger ConditionHandling MethodDefault ValueProhibited Behavior
-----------------------------------------------------------------------------------
NeedSpeechTranslateAlwaysMUST askNoneDO NOT infer from input
NeedFaceTranslateAlwaysFixed valuefalseDO NOT set to true
DetextAreaUser chooses erasureMUST askNoneDO NOT set to Auto arbitrarily
SubtitleConfigUser chooses burn-inCan use defaultStandard styleDO NOT skip confirmation
TextSourceScenario decidesScenario rulesSee scenario mappingDO NOT choose arbitrarily
CustomSrtTypeScenario 4MUST askNoneDO NOT infer arbitrarily
OutputConfig.MediaURLOutput pathCan use defaultDefault rulesDO NOT use shell variables
JobParams.roiSubtitle extractionMUST ask[[0.5,1],[0,1]]DO NOT set default arbitrarily
SourceLanguageUser specifies or inferableCan use defaultAuto detectUse zh for Chinese only
TargetLanguageUser specifiesCan use defaultenAsk for other languages

TextSource Scenario Mapping:

ScenarioValueDescription
------------------------------
1OCR_ASRAuto-detect subtitles
2SubtitleFileReviewed SRT
3, 4SubtitleFileUser-provided SRT

CustomSrtType Trigger Rules:

ConditionValue
------------------
CaptionExtraction extractedSourceSrt
User provides subtitle (Scenario 4)MUST ask: SourceSrt / TargetSrt

Failure Protection Mechanism

> HARD-GATE: After speech translation fails, DO NOT auto-switch to subtitle translation!

API Error Handling

ErrorCodeHandling Action
----------------------------
Forbidden.SubscriptionRequiredSee ram-policies.md
InvalidParameterSee api-parameters.md
InputConfig.Subtitle is invalidSee troubleshooting.md
JobFailedRecord JobId, ask user if retry needed

SRT Format Repair Flow

Detect empty subtitle entries → Delete empty entries → Renumber → Upload repaired file → Inform user

See troubleshooting.md for details.


CLI Command Templates

> IMPORTANT: Before submitting API, MUST reference api-parameters.md to confirm parameter format!

See cli-commands.md for details.

Core Commands:

# Register media asset
aliyun ice register-media-info --input-url "oss://<bucket>/<object>" --media-type video --user-agent AlibabaCloud-Agent-Skills

# Submit subtitle extraction (use OSS path)
aliyun ice submit-iproduction-job \
  --function-name CaptionExtraction \
  --input "Media=oss://<bucket>/<object> Type=OSS" \
  --biz-output "Media=oss://<bucket>/<output>.srt Type=OSS" \
  --job-params '{"lang":"ch","roi":[[0.5,1],[0,1]]}' \
  --force \
  --user-agent AlibabaCloud-Agent-Skills

# Submit video translation
aliyun ice submit-video-translation-job \
  --user-agent AlibabaCloud-Agent-Skills

> CLI Format Key Points:

> - Subtitle extraction uses command name submit-iproduction-job (lowercase, - separator)

> - --input and --biz-output format: space-separated string "Media=... Type=OSS", NOT JSON

> - --job-params format: JSON string

> - MUST add --force to skip plugin parameter validation

> - All ICE commands MUST add --user-agent AlibabaCloud-Agent-Skills


Documentation Reference

DocumentContent
-------------------
workflow-details.mdDetailed execution flow for 4 scenarios
cli-commands.mdCLI command template library
troubleshooting.mdError handling details
api-parameters.mdComplete API parameter documentation
ram-policies.mdRAM permission requirements
cli-installation-guide.mdCLI installation guide

Key Constraints

  • Before submitting API, MUST reference api-parameters.md to confirm parameter format
  • All ICE CLI commands MUST add --user-agent AlibabaCloud-Agent-Skills
  • Subtitle extraction (SubmitIProductionJob) uses oss:// format
  • Video translation (SubmitVideoTranslationJob) uses HTTP URL format, no signing needed
  • Local videos MUST first be uploaded to OSS, user MUST provide upload path
  • NeedFaceTranslate MUST be false
  • SpeechTranslate and SubtitleTranslate are mutually exclusive
  • InputConfig.Subtitle MUST use HTTPS format, DO NOT use oss://
  • Speech translation + SRT input requires SpeechTranslate.CustomSrtType
  • DO NOT infer translation mode from input type

Task Polling

> Mandatory: MUST continuously poll task status until completion (State=Finished) or failure (State=Failed), DO NOT exit early!

Task TypeQuery CommandIntervalTimeout
---------------------------------------------
Subtitle extractionQueryIProductionJob30 seconds5 minutes
Video translationget-smart-handle-job30 seconds30 minutes

Polling Logic:

Loop polling until:
  - State == "Finished" → Return result
  - State == "Failed" → Report error
  - Exceeds 30 minutes → Report TimeoutError

Prohibited: Return after single query / Skip polling and return JobId directly

Time Reference (3-minute video):

  • Subtitle-level translation: 3-5 minutes
  • Speech-level translation: 10-20 minutes

Result Retrieval

# Get media asset info
aliyun ice get-media-info --media-id "<MediaId>"

# Generate signed URL (for private Bucket)
aliyun oss sign oss://<bucket>/<object> --timeout 3600

End of Document

版本历史

共 1 个版本

  • v0.0.1 当前
    2026-05-07 19:23 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Alibabacloud Find Skills

sdk-team
用于搜索、发现、浏览或查找阿里云(Alibaba Cloud)代理技能。触发词包括“查找X技能”“搜索阿里云…”等。
★ 0 📥 1,078
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 330 📥 93,472
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 428 📥 116,601