← 返回
未分类 已验 中文

Flowyaipc Herdsman Skill En

Integration package for the Herdsman model engine. Used by other agent platforms to call scripts in this directory and protocol specifications when connectin...
Herdsman模型引擎的集成包,供其他代理平台调用本目录脚本和协议规范,以实现连接。
JasonKo jiejingke 来源
未分类 clawhub v1.0.1 2 版本 100000 Key: 无需
★ 0
Stars
📥 94
下载
💾 0
安装
2
版本
#latest

概述

Herdsman Skill

This directory is not a single script but an integration package for reuse by other agent platforms, enabling external agents to reliably access the Herdsman local model engine.

Use Cases

  • Other agent platforms need to use Herdsman as an OpenAI-compatible backend
  • Platforms want to access Herdsman via the Anthropic Messages-compatible interface
  • Platforms that support the AG-UI protocol need to connect to /agui
  • Need to reliably call text, image, OCR, embedding, and speech capabilities without writing long JSON in shell

Default Connection

  • Service address: http://127.0.0.1:8080
  • OpenAI root path: http://127.0.0.1:8080/v1
  • Anthropic endpoint: http://127.0.0.1:8080/v1/anthropic/messages
  • AGUI endpoint: http://127.0.0.1:8080/agui
  • API Key: Empty by default; if configured, use Authorization: Bearer

Mandatory Rules

1. Do not construct complex curl commands directly

Do not construct complex prompts, tools, base64 images, or long-timeout tasks directly in the shell. Prefer using Python scripts under scripts/, or generate temporary Python files following the same pattern.

2. Run model discovery first

Before calling any model, always run:

python headsman-skill/scripts/check_model.py

If you know the model name, you can also:

python headsman-skill/scripts/check_model.py "<model_id>"

3. Long tasks must explicitly set longer timeouts

  • Image generation, editing, img2img: recommended timeout >= 120
  • OCR: recommended timeout >= 120
  • Document parsing: recommended timeout >= 300
  • Speech synthesis, recognition, streaming: recommended timeout >= 120
  • Text chat: recommended timeout >= 60

4. Save image results to disk

If results will be reused in subsequent conversations, save them to outputs/ and return the absolute path or cache URL to the user.

Protocol Priority

OpenAI Compatible

Preferred for:

  • Chat completions
  • Tool calls
  • Embeddings
  • Rerank
  • Image generation / editing / img2img
  • OCR text recognition
  • Document parsing (PDF, DOCX, XLSX, PPTX)
  • Speech recognition / synthesis / streaming

Core endpoints:

  • GET /v1/models
  • POST /v1/chat/completions
  • POST /v1/embeddings
  • POST /v1/rerank
  • POST /v1/images/generations
  • POST /v1/images/edits
  • POST /v1/images/img2img
  • GET /v1/images/cache/:filename
  • POST /v1/ocr
  • POST /v1/documents/parse
  • POST /v1/audio/transcriptions
  • GET /v1/audio/transcriptions/stream?model= (WebSocket)
  • POST /v1/audio/speech
  • GET /v1/audio/speech/stream/:token
  • GET /v1/audio/info?model=

Additional parameters for chat completions (OpenAI Chat Completions compatible extensions):

ParameterTypeDescription
------------------------------
reasoning_effortstringReasoning level: low / medium / high; local llama.cpp maps to template parameters
thinking_enabledbooleanEnable or disable thinking mode for supported models; local llama.cpp maps to enable_thinking
thinking_tokensnumberThinking token budget; local llama.cpp maps to reasoning_budget

Anthropic Compatible

For platforms that only support the Anthropic Messages style, the endpoint is:

  • POST /v1/anthropic/messages

Therefore:

  • If the platform supports custom full endpoints, it can connect directly
  • If the SDK hardcodes /v1/messages, add a lightweight proxy on the platform side or use raw HTTP requests

AGUI

For platforms supporting the AG-UI protocol event stream:

  • POST /agui

AGUI is more suitable for protocol clients or SDKs; raw HTTP is not recommended. In the current state, state should at least provide model, and may optionally include webSearch, tools, task_type, pass_through.

Recommended Scripts

  • scripts/herdsman_client.py: General HTTP client wrapper
  • scripts/check_model.py: Model discovery and filtering
  • scripts/chat_completion.py: OpenAI chat completion (supports reasoning_effort / thinking)
  • scripts/generate_image.py: Text-to-image generation with auto-download
  • scripts/edit_image.py: Image editing with support for local files, URLs, masks, and additional reference images
  • scripts/img2img.py: Image-to-image (style transfer, inpainting)
  • scripts/ocr.py: OCR text recognition, supports direct local image recognition
  • scripts/parse_document.py: Document parsing, supports PDF, DOCX, XLSX, PPTX to text and structured data
  • scripts/transcribe_audio.py: Speech transcription, supports local files, URLs, and data URLs
  • scripts/audio_speech.py: Text-to-speech (TTS), supports VoiceDesign, VoiceClone, and streaming
  • scripts/anthropic_messages.py: Anthropic Messages compatible invocation

Directory Structure

  • references/api-examples.md: Capability-based call examples
  • references/platform-integration.md: OpenAI / Anthropic / AGUI integration guide
  • references/error-codes.md: Common errors and agent-side handling strategies
  • references/model-capabilities.md: Model capabilities and endpoint mapping
  • outputs/: Recommended directory for saving generated images

Best Practices

  1. Use check_model.py first to get installed models
  2. Choose OpenAI, Anthropic, or AGUI based on the platform protocol
  3. Use Python scripts instead of shell concatenation for long tasks
  4. Save image results as files or cache URLs, avoiding large base64 payloads
  5. When encountering model_not_found, model_not_installed, invalid_model_capability, re-run model discovery
  6. Speech transcription supports both JSON body (audio field) and multipart/form-data (file field)
  7. Before OCR, use check_model.py to confirm paddleocr-ppocrv5-server or another OCR model is installed
  8. Document parsing supports PDF, DOCX, XLSX, PPTX formats; files can be passed via path (local path), file_base64 (base64 data), or file (multipart)
  9. Enable --ocr-enabled for document parsing to OCR scanned PDF pages and embedded images in Office documents

Speech Extension: TTS Voice Clone + ASR Standalone Transcription

The following three scripts are advanced speech tools integrated with Herdsman, supporting a full workflow from audio conversion to ASR transcription to voice cloning.

Script Overview

ScriptFunctionExternal Dependency
-------------------------------------
scripts/convert_audio.pyAudio format conversion (any format to 16kHz WAV)ffmpeg
scripts/transcribe_standalone.pyASR speech transcription (pure urllib, no herdsman_client dependency)Herdsman ASR model
scripts/tts_voice_clone.pyVoice cloning TTS synthesisHerdsman qwen3-tts-voiceclone

convert_audio.py

Convert audio in any format (MP3/M4A/OGG, etc.) to 16kHz mono WAV. No Herdsman dependency.

uv run python scripts/convert_audio.py <input_path> [output_path]

Parameters:

  • input_path — Path to the reference audio file
  • output_path — Optional, defaults to same directory as input with .wav extension

Examples:

uv run python scripts/convert_audio.py ref.mp3
uv run python scripts/convert_audio.py ref.mp3 ref.wav

transcribe_standalone.py

Standalone ASR transcription script (pure urllib, no dependency on herdsman_client.py). Dynamic model selection, supports absolute output paths.

uv run python scripts/transcribe_standalone.py <audio_path> --model <model_id> [--language <language>] [--output <absolute_path>]

Parameters:

  • audio_path — Input audio file path (.wav/.mp3/.m4a, etc.)
  • --model — ASR model ID (required, dynamic selection)
  • --language — Language code (optional, auto-detect by default)
  • --output / -o — Output file absolute path, writes both .txt + .json (optional, prints only if not specified)
  • --timeout — Timeout in seconds (default 300)

Tested model recommendations:

ModelRecommendationNotes
-----------------------------
sherpa-onnx-paraformer-zh-small⭐ PreferredSimplified Chinese, preserves filler words, ~5s fastest
whisper-baseAlternativeGeneral high accuracy, Traditional Chinese output
funasr⚠️WebSocket streaming only, HTTP not supported
sherpa-onnx-streaming-zipformer-zh-14m⚠️Streaming only, HTTP does not support full transcription

Examples:

# Recommended
uv run python scripts/transcribe_standalone.py audio.wav --model sherpa-onnx-paraformer-zh-small --output "D:/result.txt"
# Print only
uv run python scripts/transcribe_standalone.py audio.wav --model whisper-base

tts_voice_clone.py

Voice cloning TTS synthesis using qwen3-tts-voiceclone. Three dynamic parameters: reference audio WAV, original text, target script.

uv run python scripts/tts_voice_clone.py <ref_audio_wav> <ref_text> <target_text> [--output <path>]

Parameters:

  • ref_audio_wav — 16kHz mono WAV path
  • ref_text — Original text corresponding to the reference audio
  • target_text — Target text to be synthesized with cloned voice
  • --output / -o — Output audio path (default ripple_tts_cloned.wav)
  • --timeout — Timeout in seconds (default 180)

Examples:

uv run python scripts/tts_voice_clone.py ref.wav "original text" "target synthesis text" -o output.wav

Full Workflow

# 1. Convert to WAV
uv run python scripts/convert_audio.py source.mp3 ref.wav

# 2. ASR transcription (extract audio text for comparison)
uv run python scripts/transcribe_standalone.py ref.wav --model sherpa-onnx-paraformer-zh-small --output "D:/transcribed.txt"

# 3. Voice clone synthesis
uv run python scripts/tts_voice_clone.py ref.wav "original text" "target synthesis text" -o final.wav

Notes

  • Reference audio recommended 10-60 seconds, low background noise, natural speech rate
  • The original text must exactly match the audio content, otherwise cloning quality is affected
  • ASR transcription supports absolute paths via --output for cross-directory use
  • Error messages output to stderr, normal results output to stdout

版本历史

共 2 个版本

  • v1.0.1 ## Herdsman Skill v1.1.0 This update is based on the 2026-06-25 API specification, adding **Document Parsing** capabilities and extending TTS parameter support. ### New: Document Parsing Parse **PDF, DOCX, XLSX, PPTX** files into text and structured data via `POST /v1/documents/parse` — no LibreOffice or ImageMagick required. ``` python scripts/parse_document.py ./report.pdf --model paddleocr-ppocrv5-server ``` - PDF uses LiteParse page parsing, returns `text_items` with coordinates - Office documents use Rust native parsing, return structured `blocks` (heading, table, etc.) - Scanned PDFs and embedded images in DOCX/PPTX support OCR fallback (`--ocr-enabled`) - Three input methods: local path, base64 data, multipart upload - Fine-grained control via `target_pages`, `max_pages`, `dpi`, OCR image limits ### Improvement: Speech Synthesis `POST /v1/audio/speech` now supports the `frames` parameter (Qwen-TTS optional maximum audio frames), available via the `--frames` CLI option. ### Changes - `scripts/herdsman_client.py` — New `document_parse()` / `document_parse_file()` methods - `scripts/parse_document.py` — New standalone document parsing script - `SKILL.md` — Core endpoints, recommended scripts, timeout guidance, best practices updated - `references/` — Capability mapping, call examples, integration guide, error codes all synchronized 当前
    2026-06-25 16:40 安全 安全
  • v1.0.0
    2026-06-06 07:16

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Find Skills

guipi888
场景驱动+关键词双模式技能发现工具。当用户用自然语言描述场景/需求(如"我想做一个海报""帮我分析股票"),或明确说"安装技能/find skills/找个skill"时,自动从官方内置、本地已安装、SkillHub、虾评、GitHub、C
★ 1,491 📥 556,733
office-efficiency

AutoDimension Report Skill En

jiejingke
处理供应链文档包中的 PDF、DOCX、XLSX——格式转换、图像提取、OCR、尺寸核对与审查报告生成。调用...
★ 0 📥 183
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 844 📥 324,386