← 返回
未分类 Key 中文

hugging-face-api

Search and discover Hugging Face open-source models and datasets, then run OpenAI-compatible chat or embedding inference securely with cost control.
搜索并发现Hugging Face开源模型和数据集,随后在安全且成本可控的情况下运行OpenAI兼容的聊天或嵌入推理。
simonpierreboucher02 simonpierreboucher02 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 159
下载
💾 0
安装
1
版本
#latest

概述

Hugging Face Agent Skill

A playbook for agents that use the Hugging Face MCP server. Follow these steps in order. Discover for free first; run billed inference only against confirmed-supported models.


1. Name

Hugging Face — open-source model and dataset discovery plus OpenAI-compatible inference (chat and embeddings) across inference providers, via 7 MCP tools.

2. Purpose

Use this skill to find open-source models and datasets on the Hugging Face Hub, confirm which models are runnable through the Inference router, and run chat completions and embeddings — while controlling cost, respecting licenses, and keeping the access token secret.

3. When to use Hugging Face

Use it when the task involves:

  • Open-source models (Llama, Qwen, Mistral, BGE, sentence-transformers, etc.).
  • Model or dataset discovery — search/inspect the Hub catalog.
  • OpenAI-compatible inference across providers — one interface, many providers.
  • Embeddings — vectors for semantic search, RAG, clustering.

4. When NOT to use it

  • If you need a specific closed/proprietary model (e.g. a vendor's flagship), call that vendor's provider directly.
  • If the task needs no model at all (pure local computation), skip inference.
  • If a cheaper or already-integrated tool already solves the task, use it.

5. Environment

Set one secret:

VariableRequiredNotes
---------------------------
HF_TOKENYeshf_.... Get it at https://huggingface.co/settings/tokens. Never expose it.

Optional: HF_HUB_BASE_URL, HF_ROUTER_BASE_URL, HF_TIMEOUT_MS, HF_MAX_RETRIES, LOG_LEVEL.

6. Operations (the 7 tools)

ToolUse it toCost
-----------------------
hf_search_modelsSearch Hub modelsFree
hf_model_infoInspect one model (license, task)Free
hf_search_datasetsSearch Hub datasetsFree
hf_list_inference_modelsList models runnable via routerFree
hf_chatOpenAI-style chat completionBilled
hf_embeddingsEmbedding vectorsBilled
hf_requestReach any other Hub/router endpointDepends

7. Discovery workflow (FREE)

Do this first; it costs nothing.

  1. hf_search_models — find candidates by task/author/popularity.
  2. hf_model_info — check pipeline_tag and cardData.license.
  3. hf_search_datasets — find data if needed.
  4. hf_list_inference_models — confirm the chosen model is actually runnable.

8. Inference workflow (BILLED)

  1. Choose a model that appears in hf_list_inference_models.
  2. For chat: call hf_chat with OpenAI-style messages and a bounded max_tokens.
  3. For vectors: call hf_embeddings with a batch of inputs (default model sentence-transformers/all-MiniLM-L6-v2).
  4. Report the model id and the returned usage.

9. Cost control

  • Hub discovery is free — use it liberally.
  • Inference is billed per provider — always:
  • Set max_tokens on hf_chat.
  • Prefer smaller models when quality allows.
  • Batch embeddings (array inputs) instead of per-item calls.
  • Cache embeddings and deterministic completions.

10. Error handling

ErrorReaction
-----------------
model_not_supported (402/403)Call hf_list_inference_models, pick a listed model, retry.
401 invalid tokenStop. Fix HF_TOKEN. Do not retry blindly.
402 creditsStop. Add credits or use a cheaper/free model.
429 rate limitBack off (server retries); slow down, batch, cache.

11. Security

  • Never print, log, or echo the hf_ token. The server redacts it; do not undo that.
  • Use a least-privilege token (read for discovery; inference only where needed).
  • Use placeholders (your_hf_token) in any shared config.

12. Reproducibility / model pinning

  • Use exact model ids (and a revision/commit if available) so runs are repeatable.
  • Use the same embedding model for indexing and querying in RAG.

13. Licensing

  • Before downstream use, check the model card's license (hf_model_infocardData.license).
  • Respect usage restrictions (commercial use, redistribution, gated access).

14. Agent checklist

  • [ ] Confirmed Hugging Face is the right tool (open-source / discovery / embeddings).
  • [ ] Discovered model via hf_search_models / hf_model_info (free).
  • [ ] Confirmed it is runnable via hf_list_inference_models.
  • [ ] Checked the license.
  • [ ] Set max_tokens (chat) / batched inputs (embeddings).
  • [ ] Did not expose the token.
  • [ ] Cited the exact model id and reported usage.

15. Example workflows

  • Find a model → run chat: hf_search_modelshf_model_infohf_list_inference_modelshf_chat. See recipes/find-and-run-model.md.
  • Build embeddings for RAG: hf_embeddings (batch) → store → query. See recipes/build-embeddings.md.
  • Dataset lookup: hf_search_datasetshf_request for details. See recipes/dataset-discovery.md.

16. Common mistakes

  • Calling hf_chat before confirming the model is supported (causes model_not_supported).
  • One embedding call per item instead of a batch (slow and costly).
  • Skipping the license check.
  • Exposing the token in logs or output.
  • Omitting max_tokens, leading to runaway generation cost.

17. Maintenance

  • The runnable model list changes — re-run hf_list_inference_models rather than hardcoding ids.
  • Re-check licenses when adopting a new model.
  • Rotate HF_TOKEN periodically.
  • Confirm endpoint/provider details against https://huggingface.co/docs when behavior changes.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-06-03 13:34

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

dev-programming

Mcporter

steipete
使用 mcporter CLI 直接列出、配置、认证及调用 MCP 服务器/工具(支持 HTTP 或 stdio),涵盖临时服务器、配置编辑及 CLI/类型生成功能。
★ 197 📥 68,061
data-analysis

Firecrawl

simonpierreboucher02
AI 原生网页抓取、爬取、域名映射和结构化提取。用于将网站转换为 LLM 可直接使用的 Markdown,爬取含动态内容的页面...
★ 0 📥 287
dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 681 📥 330,082