Create professional audio with AI — voiceovers, music, sound effects, and personalized avatar voices.
CellCog provides three voice providers, each with different strengths. Choose based on your needs:
| Scenario | Provider | Why |
|---|---|---|
| ---------- | ---------- | ----- |
| Standard narration/voiceover | OpenAI | Best voice style control, consistent quality |
| Emotional/dramatic delivery | ElevenLabs | Richest emotional range, supports emotion tags |
| Cloned voice (avatar) | MiniMax | Only provider with voice cloning support |
| Character voice with specific accent | ElevenLabs | 100+ diverse pre-made voices |
| Fine pitch/speed/volume control | MiniMax | Granular voice settings |
For your first CellCog task in a session, read the cellcog skill for the full SDK reference — file handling, chat modes, timeouts, and more.
OpenClaw (fire-and-forget):
result = client.create_chat(
prompt="[your task prompt]",
notify_session_key="agent:main:main",
task_label="my-task",
chat_mode="agent",
)
All agents except OpenClaw (blocks until done):
from cellcog import CellCogClient
client = CellCogClient(agent_provider="openclaw|cursor|claude-code|codex|...")
result = client.create_chat(
prompt="[your task prompt]",
task_label="my-task",
chat_mode="agent",
)
print(result["message"])
Best for standard narration, voiceovers, and single-speaker content with precise delivery control.
Key strength: Natural-language style instructions — describe the accent, tone, pacing, and emotion you want.
8 built-in voices:
| Voice | Gender | Characteristics |
|---|---|---|
| ------- | -------- | ---------------- |
| cedar | Male | Warm, resonant, authoritative, trustworthy |
| marin | Female | Bright, articulate, emotionally agile, professional |
| ballad | Male | Smooth, melodic, musical quality |
| coral | Female | Vibrant, lively, dynamic, spirited |
| echo | Male | Calm, measured, thoughtful, deliberate |
| sage | Female | Wise, contemplative, reflective |
| shimmer | Female | Soft, gentle, soothing, approachable |
| verse | Male | Poetic, rhythmic, artistic, expressive |
Best quality: cedar (male), marin (female).
Style customization examples:
Best for emotional delivery, dramatic content, character voices, and audiobook narration.
Key strength: Emotion tags embedded directly in text — [laughs], [sighs], [whispers], [excited], [sarcastic]. Plus 100+ diverse pre-made voices.
Emotion tags (use sparingly — 1-2 per paragraph):
| Tag | Effect |
|---|---|
| ----- | -------- |
[laughs] | Natural laughter |
[chuckles] | Soft/brief laughter |
[sighs] | Sighing |
[gasps] | Surprise/shock |
[whispers] | Whispering delivery |
[pause] | Natural pause/beat |
[sad], [happy], [excited], [angry], [sarcastic] | Emotional delivery |
Example prompt:
> "Generate speech using ElevenLabs with a warm British male voice:
> 'And then, just when everyone thought it was over... [pause] [whispers] it wasn't.'"
Best for cloned voices (avatars) and fine-grained voice control.
Key strength: MiniMax Speech 2.8 HD — studio-grade audio quality. Supports avatar cloned voice IDs for personalized content, plus 17+ standard pre-made voices with granular speed, pitch, and volume control.
Standard voices include: Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Wise_Woman, Friendly_Person, Young_Knight, Elegant_Man, and more.
Voice settings: emotion (happy/sad/angry/neutral/etc.), speed (0.5–2.0), volume (0–10), pitch (-12 to 12).
Users can create avatars on CellCog with their own cloned voice. When an avatar has a cloned voice, CellCog uses the MiniMax provider to generate speech that sounds like that person.
How it works:
Example prompt:
> "Generate a voiceover using my avatar Luna's voice: 'Welcome to our quarterly update. I'm excited to share some incredible results with you today.'"
This is powerful for creating consistent, personalized content — marketing videos, podcast intros, course narration — all in the user's own voice.
CellCog generates standalone sound effects from text descriptions. Royalty-free, 0.1 to 30 seconds.
Example prompts:
Tips for better SFX:
Create original music from text descriptions. 3 seconds to 10 minutes. Royalty-free.
Capabilities:
Example prompts:
For precise section-by-section control (exact timing per section), describe your composition plan in detail — CellCog handles the structure.
All generated music is royalty-free — use commercially without attribution or licensing fees.
All three voice providers support 40+ languages. Provide speech text in the target language:
English, Spanish, French, German, Italian, Portuguese, Chinese (Mandarin/Cantonese), Japanese, Korean, Hindi, Arabic, Russian, Polish, Dutch, Turkish, and many more.
Use chat_mode="agent" for all audio tasks. Audio generation executes efficiently in agent mode — no need for agent team.
Run /cellcog-setup (or /cellcog:cellcog-setup depending on your tool) to install and authenticate.
OpenClaw users: Run clawhub install cellcog instead.
Manual setup: pip install -U cellcog and set CELLCOG_API_KEY. See the cellcog skill for SDK reference.
共 2 个版本