Transform meeting recordings into structured transcripts with speaker identification, timestamps, and local meeting summaries.
SENSEAUDIO_API_KEY.Authorization: Bearer .python3, requests, and websockets.Use the official SenseAudio ASR rules summarized below:
POST https://api.senseaudio.cn/v1/audio/transcriptionswss://api.senseaudio.cn/ws/v1/audio/transcriptions<=10MB per requestsense-asr-prosense-asr-deepthinkenable_speaker_diarization is supported only on sense-asr / sense-asr-promax_speakers is documented only for sense-asr-proenable_sentiment and timestamp_granularities[] are supported only on sense-asr / sense-asr-propcm, 16000Hz, mono10MB before upload.sense-asr-pro for recorded meetings needing diarization and timestamps.sense-asr-deepthink only for live streaming scenarios.response_format=verbose_json.max_speakers only when known and using sense-asr-pro.session_id, trace_id, and transcript contents as potentially sensitive.import os
import requests
API_KEY = os.environ["SENSEAUDIO_API_KEY"]
API_URL = "https://api.senseaudio.cn/v1/audio/transcriptions"
def transcribe_meeting(audio_file, max_speakers=None, language=None, target_language=None):
with open(audio_file, "rb") as handle:
response = requests.post(
API_URL,
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": handle},
data={
"model": "sense-asr-pro",
"response_format": "verbose_json",
"enable_speaker_diarization": "true",
"enable_sentiment": "true",
"enable_punctuation": "true",
"timestamp_granularities[]": ["word", "segment"],
**({"max_speakers": max_speakers} if max_speakers else {}),
**({"language": language} if language else {}),
**({"target_language": target_language} if target_language else {}),
},
timeout=300,
)
response.raise_for_status()
return response.json()
text for the full transcriptsegments for speaker-separated timeline entriesspeaker, start, end, text, and optional sentiment fields when presentwords only when word timestamps were requestedGenerate notes from transcript structure without external services:
summary: 3-6 bullets capturing the meeting arcdecisions: statements containing agreements or final choicesaction_items: statements with owners, deadlines, or explicit follow-upsparticipants: derived from speaker labelstimeline: ordered segments with timestampsHeuristics that work without an LLM:
will, need to, follow up, by Fridaydecided, agreed, we will, final choiceend - startFor live meetings, use WebSocket only when streaming audio is actually available.
import asyncio
import json
import os
import websockets
API_KEY = os.environ["SENSEAUDIO_API_KEY"]
WS_URL = "wss://api.senseaudio.cn/ws/v1/audio/transcriptions"
async def transcribe_live_meeting(audio_stream):
async with websockets.connect(
WS_URL,
additional_headers={"Authorization": f"Bearer {API_KEY}"},
) as ws:
await ws.recv()
await ws.send(json.dumps({
"event": "task_start",
"model": "sense-asr-deepthink",
"audio_setting": {
"sample_rate": 16000,
"format": "pcm",
"channel": 1,
},
}))
async for audio_chunk in audio_stream:
await ws.send(audio_chunk)
await ws.send(json.dumps({"event": "task_finish"}))
txt or jsonmdjson or csvjson10MBlanguage explicitly on HTTP transcriptionmax_speakers only with sense-asr-protask_failed and base_resp.status_msg共 2 个版本