← 返回
未分类

Gipformer ASR

Vietnamese speech-to-text using Gipformer ASR (65M params, Zipformer-RNNT). Accepts audio of any length — the server handles VAD chunking, batching, and retu...
使用 Gipformer ASR(65M 参数,Zipformer‑RNNT)的越南语语音识别。支持任意长度音频,服务器负责 VAD 分块、批处理并返回结果。
ai-ggroup ai-ggroup 来源
未分类 clawhub v1.0.0 1 版本 99773.2 Key: 无需
★ 0
Stars
📥 440
下载
💾 4
安装
1
版本
#latest

概述

Gipformer ASR

Vietnamese speech recognition — send audio of any length, get transcript.

Huggingface Model: g-group-ai-lab/gipformer-65M-rnnt (65M params, int8/fp32 ONNX)

Architecture

flowchart TD
    A[Audio file] -->|base64 encode| B[POST /transcribe]
    B --> C[Decode & resample to 16kHz]
    C --> D[VAD chunking ≤ 20s]
    D --> E[Batch inference — sherpa-onnx]
    E --> F[Merge chunk texts]
    F --> G["{ transcript, chunks }"]

The client sends base64-encoded audio (any length, any format). The server decodes, chunks with VAD, infers in batches, and returns the full transcript.

Quick Start

1. Install dependencies

pip install -r {baseDir}/requirements.txt

System dependency: ffmpeg (required for M4A support).

2. Start the server

python {baseDir}/scripts/serve.py
# or with options:
python {baseDir}/scripts/serve.py --port 8910 --quantize int8 --max-batch-size 32

The server downloads the ASR model + VAD model on first run and listens on http://127.0.0.1:8910.

3. Transcribe audio

# Single file (any format)
python {baseDir}/scripts/transcribe.py audio.wav
python {baseDir}/scripts/transcribe.py recording.mp3

# Multiple files
python {baseDir}/scripts/transcribe.py *.wav

# JSON output with chunk details
python {baseDir}/scripts/transcribe.py audio.wav --json

# Save results
python {baseDir}/scripts/transcribe.py audio.wav -o results.json

4. Direct API call (curl)

# Transcribe (any length, any format)
curl -X POST http://127.0.0.1:8910/transcribe \
  -H "Content-Type: application/json" \
  -d "{\"audio_b64\": \"$(base64 -i audio.wav)\"}"

# Response:
# { "transcript": "full text...", "duration_s": 120.5, "process_time_s": 5.2,
#   "chunks": [{"text": "...", "start_s": 0.0, "end_s": 8.7}, ...] }

# Health check
curl http://127.0.0.1:8910/health

Audio Format

FormatExtensionSupport
----------------------------
WAV.wavNative (soundfile)
FLAC.flacNative (soundfile)
OGG.oggNative (soundfile)
MP3.mp3Native (soundfile)
M4A/AAC.m4aVia ffmpeg

All formats are converted to WAV 16-bit PCM mono 16kHz internally.

Server Tuning

FlagDefaultEffect
-----------------------
--quantizeint8fp32 for accuracy, int8 for speed/size
--max-batch-size16Higher = more throughput, more latency
--max-wait-ms100How long to wait before flushing a partial batch
--num-threads4ONNX runtime threads
--decoding-methodmodified_beam_searchgreedy_search for faster speed

API Reference

See references/api.md for full endpoint documentation.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-31 06:28 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 216 📥 46,624
design-media

Video Frames

steipete
使用 ffmpeg 从视频中提取帧或短片。
★ 132 📥 52,669
design-media

Openai Whisper

steipete
使用 Whisper CLI 进行本地语音转文字(无需 API 密钥)
★ 329 📥 93,002