🆓 完全免费的本地文字转语音(TTS)与声音克隆技能。无需 API Key,无需联网,无任何使用限制。支持声音克隆(10秒音频克隆任意声音)、文字转语音(12+内置声音)、翻译配音(配合翻译工具实现多语言配音)。
High-quality 100% FREE local TTS voice synthesis and voice cloning skill. No API key required, no internet needed, unlimited usage. Supports voice cloning, text-to-speech, and translation dubbing.
| 项目 | 要求 |
|---|---|
| ------ | ------ |
| 芯片 | Apple Silicon (M1/M2/M3/M4) |
| 系统 | macOS 12.0+ |
| Python | 3.10+ |
| 内存 | 建议 8GB+ |
| 磁盘空间 | 约 3GB 用于模型文件 |
| Item | Requirement |
|---|---|
| ------ | ------------- |
| Chip | Apple Silicon (M1/M2/M3/M4) |
| OS | macOS 12.0+ |
| Python | 3.10+ |
| RAM | 8GB+ recommended |
| Disk Space | ~3GB for model files |
chmod +x install_dependencies.sh && ./install_dependencies.sh
# 1. 安装系统依赖
brew install python@3.10 ffmpeg
# 2. 安装 Python 包
python3.10 -m pip install mlx-audio
chmod +x install_dependencies.sh && ./install_dependencies.sh
# 1. Install system dependencies
brew install python@3.10 ffmpeg
# 2. Install Python packages
python3.10 -m pip install mlx-audio
python3.10 voice_cloning_demo.py
从参考音频文件克隆任意声音:
python3.10 voice_cloning_demo.py
Clone any voice from a reference audio file:
from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio
# Load model (auto-downloads on first run, ~3GB)
model = load_model('mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit')
# Generate cloned voice
generate_audio(
model=model,
text="Your text content here, supports long-form generation",
ref_audio="path/to/reference_audio.wav", # 10-30 second voice sample
lang_code="zh", # zh, en, ja, ko, etc.
file_prefix="output_filename", # Output: output_filename_000.wav
max_tokens=3000 # Prevent audio truncation
)
Generate speech without reference audio using pre-built voices:
from mlx_audio.tts.utils import load_model
model = load_model('mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit')
# Generate speech with built-in voice template
results = list(model.generate(
text='Hello world, this is a text-to-speech test',
voice='af_heart', # Voice template code
language='Chinese' # Language
))
# Save audio file
with open('output.wav', 'wb') as f:
for result in results:
f.write(result.audio)
| Code | Style | Gender |
|---|---|---|
| ------ | ------- | -------- |
af_heart | Warm & Friendly | Female |
af_chat | Conversational | Female |
af_narration | Storytelling | Female |
af_emo | Expressive | Female |
am_adventure | Adventurous | Male |
am_broadcast | Professional | Male |
am_chat | Conversational | Male |
am_narration | Storytelling | Male |
am_emo | Expressive | Male |
us_af | American English | Female |
us_am | American English | Male |
cn_am | Chinese Mandarin | Male |
jp_af | Japanese | Female |
# Recommended parameters for best quality
generate_audio(
model=model,
text="Your text here",
ref_audio="reference.wav",
lang_code="zh",
file_prefix="output",
max_tokens=3000, # Increase for longer text
temperature=0.8, # Diversity control (0.5-1.0)
repetition_penalty=1.1 # Reduce repetition
)
# Batch processing example
texts = [
"First paragraph text",
"Second paragraph text",
"Third paragraph text",
]
for i, text in enumerate(texts):
generate_audio(
model=model,
text=text,
ref_audio="reference_audio.wav",
lang_code="zh",
file_prefix=f"batch_{i:03d}",
max_tokens=3000
)
print(f"Generated: batch_{i:03d}_000.wav")
Q: Audio gets truncated?
# Increase max_tokens parameter
generate_audio(..., max_tokens=5000)
Q: Slow model download?
# Use Hugging Face mirror
export HF_ENDPOINT=https://hf-mirror.com
python3.10 your_script.py
Q: Poor cloning quality?
Q: Python version issues?
# Verify Python 3.10 path
which python3.10
/opt/homebrew/bin/python3.10 # Confirm path
# Use full path to call
/opt/homebrew/bin/python3.10 your_script.py
This skill is intended for legal and ethical use only. By using this skill, you agree to the following terms:
THIS SKILL IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
✅ Permitted Uses (Proper Authorization Required Where Noted):
❌ Prohibited Uses:
中文法律声明: 本技能仅用于合法合规用途。使用本技能即表示您同意:仅克隆您拥有或已获得明确书面授权的声音;遵守所有适用的版权和隐私法规;对使用本技能产生的任何后果承担全部责任。禁止将本技能用于欺诈、冒充或任何非法目的。
MIT License
Issues and Pull Requests are welcome! Feel free to contribute to this skill.
Keywords: TTS, Text-to-Speech, Voice Cloning, Qwen3, Qwen3-TTS, mlx-audio, Apple Silicon, Local Deployment, Audiobooks, Dubbing, Generative AI
共 1 个版本