← 返回
内容创作 Key 中文

Podcast Generation with Microsoft Foundry

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.
利用 Azure OpenAI GPT Realtime Mini 模型,通过 WebSocket 生成 AI 驱动的播客风格音频叙事。适用于构建文本转语音、音频叙事生成、内容转播客功能,或集成 Azure OpenAI Realtime API 以输出真实音频。涵盖从 React 前端到 Python FastAPI 后端 WebSocket 流式传输的全栈实现。
thegovind
内容创作 clawhub v0.1.0 1 版本 99577.6 Key: 需要
★ 3
Stars
📥 3,005
下载
💾 368
安装
1
版本
#latest

概述

Podcast Generation with GPT Realtime Mini

Generate real audio narratives from text content using Azure OpenAI's Realtime API.

Quick Start

  1. Configure environment variables for Realtime API
  2. Connect via WebSocket to Azure OpenAI Realtime endpoint
  3. Send text prompt, collect PCM audio chunks + transcript
  4. Convert PCM to WAV format
  5. Return base64-encoded audio to frontend for playback

Environment Configuration

AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini

Note: Endpoint should NOT include /openai/v1/ - just the base URL.

Core Workflow

Backend Audio Generation

from openai import AsyncOpenAI
import base64

# Convert HTTPS endpoint to WebSocket URL
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"

client = AsyncOpenAI(
    websocket_base_url=ws_url,
    api_key=api_key
)

audio_chunks = []
transcript_parts = []

async with client.realtime.connect(model="gpt-realtime-mini") as conn:
    # Configure for audio-only output
    await conn.session.update(session={
        "output_modalities": ["audio"],
        "instructions": "You are a narrator. Speak naturally."
    })
    
    # Send text to narrate
    await conn.conversation.item.create(item={
        "type": "message",
        "role": "user",
        "content": [{"type": "input_text", "text": prompt}]
    })
    
    await conn.response.create()
    
    # Collect streaming events
    async for event in conn:
        if event.type == "response.output_audio.delta":
            audio_chunks.append(base64.b64decode(event.delta))
        elif event.type == "response.output_audio_transcript.delta":
            transcript_parts.append(event.delta)
        elif event.type == "response.done":
            break

# Convert PCM to WAV (see scripts/pcm_to_wav.py)
pcm_audio = b''.join(audio_chunks)
wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)

Frontend Audio Playback

// Convert base64 WAV to playable blob
const base64ToBlob = (base64, mimeType) => {
  const bytes = atob(base64);
  const arr = new Uint8Array(bytes.length);
  for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
  return new Blob([arr], { type: mimeType });
};

const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
const audioUrl = URL.createObjectURL(audioBlob);
new Audio(audioUrl).play();

Voice Options

VoiceCharacter
------------------
alloyNeutral
echoWarm
fableExpressive
onyxDeep
novaFriendly
shimmerClear

Realtime API Events

  • response.output_audio.delta - Base64 audio chunk
  • response.output_audio_transcript.delta - Transcript text
  • response.done - Generation complete
  • error - Handle with event.error.message

Audio Format

  • Input: Text prompt
  • Output: PCM audio (24kHz, 16-bit, mono)
  • Storage: Base64-encoded WAV

References

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-28 14:08 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 294 📥 136,401
data-analysis

Azure Ai Agents Py - Microsoft Foundry

thegovind
使用 Azure AI Agents Python SDK (azure-ai-agents) 构建 AI 代理。适用于在 Azure AI Foundry 上创建代理、使用各种工具(文件搜索、代码解释器、Bing 接地、Azure AI
★ 0 📥 2,826
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 857 📥 199,253