← 返回
未分类 Key 中文

Minimax-Multimodal-Toolkit

Use mmx to generate text, images, video, speech, and music via the MiniMax AI platform. Use when the user wants to create media content, chat with MiniMax mo...
通过MiniMax AI平台使用mmx生成文本、图片、视频、语音和音乐,适用于创建媒体内容或与MiniMax聊天
minimax-ai-dev
未分类 clawhub v1.0.2 1 版本 99601.5 Key: 需要
★ 19
Stars
📥 4,869
下载
💾 24
安装
1
版本
#latest

概述

MiniMax CLI — Agent Skill Guide

Use mmx to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform.

Prerequisites

# Install
npm install -g mmx-cli

# Auth (persisted to ~/.mmx/credentials.json)
mmx auth login --api-key sk-xxxxx

# Or pass per-call
mmx text chat --api-key sk-xxxxx --message "Hello"

Region is auto-detected. Override with --region global or --region cn.


Agent Flags

Always use these flags in non-interactive (agent/CI) contexts:

FlagPurpose
------
--non-interactiveFail fast on missing args instead of prompting
--quietSuppress spinners/progress; stdout is pure data
--output jsonMachine-readable JSON output
--asyncReturn task ID immediately (video generation)
--dry-runPreview the API request without executing
--yesSkip confirmation prompts

Commands

text chat

Chat completion. Default model: MiniMax-M2.7.

mmx text chat --message <text> [flags]
FlagTypeDescription
---------
--message string, required, repeatableMessage text. Prefix with role: to set role (e.g. "system:You are helpful", "user:Hello")
--messages-file stringJSON file with messages array. Use - for stdin
--system stringSystem prompt
--model stringModel ID (default: MiniMax-M2.7)
--max-tokens numberMax tokens (default: 4096)
--temperature numberSampling temperature (0.0, 1.0]
--top-p numberNucleus sampling threshold
--streambooleanStream tokens (default: on in TTY)
--tool string, repeatableTool definition JSON or file path
# Single message
mmx text chat --message "user:What is MiniMax?" --output json --quiet

# Multi-turn
mmx text chat \
  --system "You are a coding assistant." \
  --message "user:Write fizzbuzz in Python" \
  --output json

# From file
cat conversation.json | mmx text chat --messages-file - --output json

stdout: response text (text mode) or full response object (json mode).


image generate

Generate images. Model: image-01.

mmx image generate --prompt <text> [flags]
FlagTypeDescription
---------
--prompt string, requiredImage description
--aspect-ratio stringe.g. 16:9, 1:1
--n numberNumber of images (default: 1)
--subject-ref stringSubject reference: type=character,image=path-or-url
--out-dir stringDownload images to directory
--out-prefix stringFilename prefix (default: image)
mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
# stdout: image URLs (one per line in quiet mode)

mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
# stdout: saved file paths (one per line)

video generate

Generate video. Default model: MiniMax-Hailuo-2.3. This is an async task — by default it polls until completion.

mmx video generate --prompt <text> [flags]
FlagTypeDescription
---------
--prompt string, requiredVideo description
--model stringMiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast
--first-frame stringFirst frame image
--callback-url stringWebhook URL for completion
--download stringSave video to specific file
--asyncbooleanReturn task ID immediately
--no-waitbooleanSame as --async
--poll-interval numberPolling interval (default: 5)
# Non-blocking: get task ID
mmx video generate --prompt "A robot." --async --quiet
# stdout: {"taskId":"..."}

# Blocking: wait and get file path
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
# stdout: ocean.mp4

video task get

Query status of a video generation task.

mmx video task get --task-id <id> [--output json]

video download

Download a completed video by task ID.

mmx video download --file-id <id> [--out <path>]

speech synthesize

Text-to-speech. Default model: speech-2.8-hd. Max 10k chars.

mmx speech synthesize --text <text> [flags]
FlagTypeDescription
---------
--text stringText to synthesize
--text-file stringRead text from file. Use - for stdin
--model stringspeech-2.8-hd (default), speech-2.6, speech-02
--voice stringVoice ID (default: English_expressive_narrator)
--speed numberSpeed multiplier
--volume numberVolume level
--pitch numberPitch adjustment
--format stringAudio format (default: mp3)
--sample-rate numberSample rate (default: 32000)
--bitrate numberBitrate (default: 128000)
--channels numberAudio channels (default: 1)
--language stringLanguage boost
--subtitlesbooleanInclude subtitle timing data
--pronunciation string, repeatableCustom pronunciation
--sound-effect stringAdd sound effect
--out stringSave audio to file
--streambooleanStream raw audio to stdout
mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
# stdout: hello.mp3

echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3

music generate

Generate music. Model: music-2.5. Responds well to rich, structured descriptions.

mmx music generate --prompt <text> [--lyrics <text>] [flags]
FlagTypeDescription
---------
--prompt stringMusic style description (can be detailed)
--lyrics stringSong lyrics with structure tags. Use "\u65e0\u6b4c\u8bcd" for instrumental. Cannot be used with --instrumental
--lyrics-file stringRead lyrics from file. Use - for stdin
--vocals stringVocal style, e.g. "warm male baritone", "bright female soprano", "duet with harmonies"
--genre stringMusic genre, e.g. folk, pop, jazz
--mood stringMood or emotion, e.g. warm, melancholic, uplifting
--instruments stringInstruments to feature, e.g. "acoustic guitar, piano"
--tempo stringTempo description, e.g. fast, slow, moderate
--bpm numberExact tempo in beats per minute
--key stringMusical key, e.g. C major, A minor, G sharp
--avoid stringElements to avoid in the generated music
--use-case stringUse case context, e.g. "background music for video", "theme song"
--structure stringSong structure, e.g. "verse-chorus-verse-bridge-chorus"
--references stringReference tracks or artists, e.g. "similar to Ed Sheeran"
--extra stringAdditional fine-grained requirements
--instrumentalbooleanGenerate instrumental music (no vocals). Cannot be used with --lyrics or --lyrics-file
--aigc-watermarkbooleanEmbed AI-generated content watermark
--format stringAudio format (default: mp3)
--sample-rate numberSample rate (default: 44100)
--bitrate numberBitrate (default: 256000)
--out stringSave audio to file
--streambooleanStream raw audio to stdout

At least one of --prompt or --lyrics is required.

# Simple usage
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet

# Detailed prompt with vocal characteristics
mmx music generate --prompt "Warm morning folk" \
  --vocals "male and female duet, harmonies in chorus" \
  --instruments "acoustic guitar, piano" \
  --bpm 95 \
  --lyrics-file song.txt \
  --out duet.mp3

# Instrumental (use --instrumental flag)
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3

vision describe

Image understanding via VLM. Provide either --image or --file-id, not both.

mmx vision describe (--image <path-or-url> | --file-id <id>) [flags]
FlagTypeDescription
---------
--image stringLocal path or URL (auto base64-encoded)
--file-id stringPre-uploaded file ID (skips base64)
--prompt stringQuestion about the image (default: "Describe the image.")
mmx vision describe --image photo.jpg --prompt "What breed?" --output json

stdout: description text (text mode) or full response (json mode).


search query

Web search via MiniMax.

mmx search query --q <query>
FlagTypeDescription
---------
--q string, requiredSearch query
mmx search query --q "MiniMax AI" --output json --quiet

quota show

Display Token Plan usage and remaining quotas.

mmx quota show [--output json]

Tool Schema Export

Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:

# All tool-worthy commands (excludes auth/config/update)
mmx config export-schema

# Single command
mmx config export-schema --command "video generate"

Use this to dynamically register mmx commands as tools in your agent framework.


Exit Codes

CodeMeaning
------
0Success
1General error
2Usage error (bad flags, missing args)
3Authentication error
4Quota exceeded
5Timeout
10Content filter triggered

Piping Patterns

# stdout is always clean data — safe to pipe
mmx text chat --message "Hi" --output json | jq '.content'

# stderr has progress/spinners — discard if needed
mmx video generate --prompt "Waves" 2>/dev/null

# Chain: generate image → describe it
URL=$(mmx image generate --prompt "A sunset" --quiet)
mmx vision describe --image "$URL" --quiet

# Async video workflow
TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
mmx video task get --task-id "$TASK" --output json
mmx video download --task-id "$TASK" --out robot.mp4

Configuration Precedence

CLI flags → environment variables → ~/.mmx/config.json → defaults.

# Persistent config
mmx config set --key region --value cn
mmx config show

# Environment
export MINIMAX_API_KEY=sk-xxxxx
export MINIMAX_REGION=cn

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-04-30 07:13 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,374 📥 319,907
security-compliance

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,223 📥 267,466
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,073 📥 806,692