← 返回
未分类 Key 中文

openai-tts

OpenAI Text-to-Speech API for high-quality speech synthesis. Use for generating natural-sounding audio from text with customizable voices and tones.
OpenAI 文字转语音 API,用于高质量语音合成。可将文本生成自然流畅的音频,支持自定义音色和语调。
lnj22 lnj22 来源
未分类 clawhub v0.1.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 321
下载
💾 0
安装
1
版本
#latest

概述

OpenAI Text-to-Speech

Generate high-quality spoken audio from text using OpenAI's TTS API.

Authentication

The API key is available as environment variable:

OPENAI_API_KEY

Models

  • gpt-4o-mini-tts - Newest, most reliable. Supports tone/style instructions.
  • tts-1 - Lower latency, lower quality
  • tts-1-hd - Higher quality, higher latency

Voice Options

Built-in voices (English optimized):

  • alloy, ash, ballad, coral, echo, fable
  • nova, onyx, sage, shimmer, verse
  • marin, cedar - Recommended for best quality

Note: tts-1 and tts-1-hd only support: alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer.

Python Example

from pathlib import Path
from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

# Basic usage
with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="coral",
    input="Hello, world!",
) as response:
    response.stream_to_file("output.mp3")

# With tone instructions (gpt-4o-mini-tts only)
with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="coral",
    input="Today is a wonderful day!",
    instructions="Speak in a cheerful and positive tone.",
) as response:
    response.stream_to_file("output.mp3")

Handling Long Text

For long documents, split into chunks and concatenate:

from openai import OpenAI
from pydub import AudioSegment
import tempfile
import re
import os

client = OpenAI()

def chunk_text(text, max_chars=4000):
    """Split text into chunks at sentence boundaries."""
    sentences = re.split(r'(?<=[.!?])\s+', text)
    chunks = []
    current_chunk = ""

    for sentence in sentences:
        if len(current_chunk) + len(sentence) < max_chars:
            current_chunk += sentence + " "
        else:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence + " "

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

def text_to_audiobook(text, output_path):
    """Convert long text to audio file."""
    chunks = chunk_text(text)
    audio_segments = []

    for chunk in chunks:
        with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as tmp:
            tmp_path = tmp.name

        with client.audio.speech.with_streaming_response.create(
            model="gpt-4o-mini-tts",
            voice="coral",
            input=chunk,
        ) as response:
            response.stream_to_file(tmp_path)

        segment = AudioSegment.from_mp3(tmp_path)
        audio_segments.append(segment)
        os.unlink(tmp_path)

    # Concatenate all segments
    combined = audio_segments[0]
    for segment in audio_segments[1:]:
        combined += segment

    combined.export(output_path, format="mp3")

Output Formats

  • mp3 - Default, general use
  • opus - Low latency streaming
  • aac - Digital compression (YouTube, iOS)
  • flac - Lossless compression
  • wav - Uncompressed, low latency
  • pcm - Raw samples (24kHz, 16-bit)
with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="coral",
    input="Hello!",
    response_format="wav",  # Specify format
) as response:
    response.stream_to_file("output.wav")

Best Practices

  • Use marin or cedar voices for best quality
  • Split text at sentence boundaries for long content
  • Use wav or pcm for lowest latency
  • Add instructions parameter to control tone/style (gpt-4o-mini-tts only)

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-05-07 22:24 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 218 📥 47,814
design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 430 📥 117,068
office-efficiency

pdf

lnj22
全面PDF工具,支持文本/表格提取、新PDF创建、合并/拆分文档、表单处理。当Claude需要...
★ 0 📥 542