概述

AudioClaw Voice AB Lab

What this skill is for

This skill is for commercial teams who need to test which spoken script performs best, while keeping the same voice across all variants.

That matters because otherwise too many variables change at once:

copy
tone
rhythm
voice persona

This skill keeps the voice fixed and lets you vary:

ad tone
hook style
urgency level
trust level
conversational warmth
regional wording style

Best business scenarios

1. Short-video ad hooks

Generate 4 to 8 spoken openers for the same product:

trust-first
benefit-first
urgency-first
concise-direct

Then synthesize all of them with the same voice for fast creative screening.

2. Livestream and promo voiceovers

Use the same host-like voice to test:

stronger urgency
softer recommendation
more premium wording
more sales-driven wording

3. Sales or private-domain follow-up

Generate multiple voice-note versions for:

reopening a lead
reminding a customer
sending a soft CTA
reducing pushiness while keeping conversion intent

4. Regional wording experiments

This skill can generate regional phrasing styles for comparison, while keeping the same voice.

Important:

this is wording-level regional style, not guaranteed full dialect TTS
it is useful for testing “which phrasing feels closer to the target audience”

Workflow

Start from either:

a typed campaign brief
or a spoken voice memo that follows labeled fields such as 产品 / 人群 / 卖点 / 优惠 / 行动

If the input is audio, run scripts/senseaudio_asr.py, then scripts/extract_spoken_brief.py.
If the input is already typed and structured enough, run scripts/run_typed_brief_pipeline.py directly, or call scripts/build_voice_ab_variants.py yourself.
Run scripts/build_voice_ab_variants.py to generate variants.
Pick one fixed voice_id.

If you have already created a cloned voice on the AudioClaw platform, use that cloned voice_id.
A prepared cloned voice id commonly looks like vc-..., and can be passed directly with --clone-voice-id.
If not, use one validated system voice.

If you want faster perceived processing for spoken briefs, enable stream ASR in scripts/senseaudio_asr.py or scripts/run_spoken_brief_pipeline.py.
Run scripts/batch_tts_variants.py to synthesize every variant with the same voice. This skill already uses AudioClaw streaming TTS under the hood and now records stream chunk metadata.

If the chosen voice is a clone id like vc-..., the batch TTS step now auto-routes to SenseAudio-TTS-1.5.

If the user wants to hear the results directly in Feishu or AudioClaw, run scripts/send_ab_variants_to_feishu.py after synthesis, or use scripts/run_spoken_brief_pipeline.py --send-feishu-audio / scripts/run_typed_brief_pipeline.py --send-feishu-audio.

This step reuses the previously built Feishu voice-reply path instead of sending plain files.
It transcodes the generated .mp3 variants into .ogg/.opus and sends them one by one as real audio messages.

Review:

generated copy
estimated points
output audio files
variant metadata for A/B tracking
optional Feishu send results

AudioClaw Trigger Pattern

Use this skill as an explicit task mode, not as a hidden background guess.

Recommended user trigger:

用 $senseaudio-voice-ab-lab 处理我刚发的语音。
产品：轻量保温杯
人群：通勤上班族
卖点：轻便保温不漏水
优惠：第二件半价
行动：现在点击下单
clone voice_id：your_clone_voice_id
生成 4 条口播，输出到 /tmp/voice_ab_run

If the user already sent a voice memo, the agent should:

Save the audio locally.
Run scripts/run_spoken_brief_pipeline.py.
Return:

a short summary of the extracted brief
the output directory
the best 2 to 4 audio variants for review

If the user says "一条一条发语音给我听" or "直接发到飞书里试听", the agent should:

Run the normal A/B pipeline first.
Then run scripts/send_ab_variants_to_feishu.py, or add --send-feishu-audio to scripts/run_spoken_brief_pipeline.py.
Prefer sending the variants one by one as Feishu audio messages instead of replying with local paths.
If the user only wants part of the set, use --limit or --variant-ids.

If the user gave a typed brief and also says "直接一条一条发语音给我听", the agent should:

Extract or confirm these fields:

campaign_name
product
audience
key_message
cta
optional offer
optional proof

Run scripts/run_typed_brief_pipeline.py.
Add --send-feishu-audio.
Do not stop at returning local audio paths unless the user explicitly asked for files only.

If the user does not provide a cloned voice, ask for either:

a prepared clone voice_id
or permission to fall back to a validated system voice_id

Design rules

Keep each script short enough to test quickly.
Change one creative dimension at a time if possible.
For spoken briefs, keep the input structured enough for deterministic extraction.
For real A/B testing, keep:
the same voice
the same audio format
the same sample rate
similar script length
Treat regional_style as a wording choice, not an official dialect model.
Official clone support is a two-step chain:
create the clone on the AudioClaw platform first
then pass the prepared clone voice_id into this skill for generation

API key lookup

For the generation side of this skill:

TTS-oriented scripts now default to SENSEAUDIO_API_KEY

Practical rule:

scripts/run_spoken_brief_pipeline.py, scripts/run_typed_brief_pipeline.py, and scripts/batch_tts_variants.py now default to SENSEAUDIO_API_KEY
If the host app injects SENSEAUDIO_API_KEY as a login token such as v2.public..., the shared bootstrap replaces it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before the synthesis step starts
The ASR scripts keep their own existing defaults and are intentionally not changed here

Resources

scripts/build_voice_ab_variants.py
Builds an A/B manifest from one campaign brief
scripts/senseaudio_asr.py
Calls AudioClaw ASR using either the official open API host or the official platform endpoint
Defaults to the official sense-asr-deepthink model for spoken briefs
scripts/extract_spoken_brief.py
Extracts a structured campaign brief from an ASR transcript
scripts/run_spoken_brief_pipeline.py
Runs the full spoken-brief pipeline end to end
Supports --stream-asr, --clone-voice-id, and --send-feishu-audio
scripts/run_typed_brief_pipeline.py
Runs the full typed-brief pipeline end to end
Supports --clone-voice-id and --send-feishu-audio
scripts/batch_tts_variants.py
Generates all audio variants with the same voice_id
scripts/send_ab_variants_to_feishu.py
Reuses the Feishu voice-reply delivery path to transcode and send the generated variants one by one as audio messages
scripts/export_ab_review_csv.py
Produces a review sheet for creative, growth, or Feishu-based internal scoring
references/commercial_ab_patterns.md
High-value use cases, testing advice, and regional-style notes
references/asr_brief_pipeline.md
Official ASR findings, constraints, and the recommended spoken brief format

版本历史

共 1 个版本

v1.0.1 当前

2026-05-01 23:15 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)