← 返回
未分类

audio-intelligence

Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, subtitles (SRT/VTT), summarization, chapterization, custom vocabulary, and audio-to-LLM. Use when the user asks about any audio intelligence feature, enabling features on pre-recorded or live transcription, understanding which features are available in each mode, or combining multiple features. Always prefer the official SDK; fall back to raw
Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, subtitles (SRT/VTT), summarization, chapterization, custom vocabulary, and audio-to-LLM. Use when the user asks about any audio intelligence feature, enabling features on pre-recorded or live transcription, understanding which features are available in each mode, or combining multiple features. Always prefer the official SDK; fall back to raw REST only when SDK cannot satisfy the requirement.
yjkj999999
未分类 community v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 13
下载
💾 0
安装
1
版本
#latest

概述

Audio Intelligence

Gladia's audio intelligence features extract structured data and insights from transcripts. They work on top of the base transcription — most are enabled by adding options to the transcribe() call (pre-recorded) or the startSession() config (live).

> SDK-first: always use the official SDK — see sdk-integration for policy, setup, and fallback criteria.

When to Use

  • User asks about a specific feature: diarization, translation, PII redaction, sentiment, NER, subtitles, summarization, etc.
  • Enabling or configuring one or more audio intelligence features on pre-recorded or live transcription
  • Understanding which features are available in live vs pre-recorded mode
  • Combining multiple features in a single transcription job

When NOT to use: For basic transcription without audio intelligence features, go directly to pre-recorded-transcription or live-transcription. For gotchas and errors related to specific features, see troubleshooting.

References

Consult these resources as needed:

  • ./references/live-audio-intelligence.md -- Detailed config and WebSocket responses for all live-mode features
  • ./references/pre-recorded-audio-intelligence.md -- Detailed config and response structures for all pre-recorded audio intelligence features
  • ../pre-recorded-transcription/SKILL.md -- Pre-recorded transcription workflow and options
  • ../live-transcription/SKILL.md -- Live transcription session config and event handling
  • ../sdk-integration/SKILL.md -- SDK setup, client initialization, and SDK vs raw API decision guide
  • ../troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist

Feature Availability

FeaturePre-recordedLiveConfig key
------------------------:----------::--:-------------------------------
Speaker diarizationYesNodiarization
TranslationYesYestranslation
Sentiment analysisYesYessentiment_analysis
Named entity recognitionYesYesnamed_entity_recognition
Subtitles (SRT/VTT)YesNosubtitles
Custom vocabularyYesYescustom_vocabulary
PII redactionYesNopii_redaction
ChapterizationYesYeschapterization (post-process)
SummarizationYesYessummarization (post-process)
Audio-to-LLMYesNoaudio_to_llm
Custom spellingYesYescustom_spelling
Custom metadataYesYescustom_metadata

Live features split into two groups: real-time (results stream during the session) and post-processing (results arrive after stopRecording()). See ./references/live-audio-intelligence.md for details.

Quick Config Examples

Code examples assume GladiaClient is already initialized — see sdk-integration for setup.

Speaker Diarization (pre-recorded only)

const result = await client.preRecorded().transcribe("audio.mp3", {
  diarization: true,
  diarization_config: { number_of_speakers: 2 },
});
// Each utterance includes a `speaker` field (0-indexed integer)
result = client.prerecorded().transcribe("audio.mp3", {
    "diarization": True,
    "diarization_config": {"number_of_speakers": 2},
})

Translation (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  translation: true,
  translation_config: { target_languages: ["fr", "es"] },
});
result = client.prerecorded().transcribe("audio.mp3", {
    "translation": True,
    "translation_config": {"target_languages": ["fr", "es"]},
})

Live (result streams as translation WebSocket events — see live-audio-intelligence.md):

const session = client.liveV2().startSession({
  // ... audio format options ...
  realtime_processing: {
    translation: true,
    translation_config: { target_languages: ["fr"] },
  },
});
from gladiaio_sdk import LiveV2InitRequest, LiveV2RealtimeProcessing

session = client.live().start_session(
    LiveV2InitRequest(
        # ... audio format options ...
        realtime_processing=LiveV2RealtimeProcessing(
            translation=True,
            translation_config={"target_languages": ["fr"]},
        ),
    )
)

Summarization (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  summarization: true,
  summarization_config: { type: "bullet_points" },
});

Live (arrives after stopRecording() as post_summarization event):

const session = client.liveV2().startSession({
  // ... audio format options ...
  post_processing: {
    summarization: true,
    summarization_config: { type: "bullet_points" },
  },
});
session.on("message", (msg) => {
  if (msg.type === "post_summarization") console.log(msg.data.results);
});

For full per-feature config options and response structures, see:

Common Mistakes

  • code_switching: true with empty languages: triggers evaluation across 100+ languages and causes frequent misdetections. Always provide 3-5 expected languages.
  • Custom vocabulary intensity above 0.6: values over 0.6 cause false positives where unrelated words get replaced. Keep at 0.4-0.6 and use pronunciations for better results.
  • Expecting diarization, PII redaction, subtitles, or audio-to-LLM in live mode: these four features are pre-recorded only.
  • Enabling many features simultaneously without considering cost/latency: each enabled feature adds processing time. Enable only what you need; combine diarization + summarization + translation only when all are required.

For the full gotcha list, see troubleshooting.

Further Reading

版本历史

共 1 个版本

  • v1.0.0 从ClawHub迁移发布 当前
    2026-06-07 12:23 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 219 📥 48,030
life-service

meituan-huisheng-coupon

user_15292d5a
帮用户领取美团优惠券并查询当日优惠活动,覆盖外卖、到店餐饮、酒旅、休闲娱乐等全品类。用户明确表达领券、省钱、查找优惠意图,或涉及美团覆盖的生活服务消费决策时触发。
★ 5 📥 86
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 332 📥 94,148