← 返回
未分类 Key

llm-api-model-verifier

This skill should be used when the user wants to verify whether an LLM API endpoint is actually serving the model it claims to be, or suspects model fraud (e.g., a provider claiming to offer GPT-5/GLM-5.1/Claude but actually returning a cheaper model). It provides a universal multi-probe behavioral fingerprinting workflow supporting OpenAI / Anthropic / Gemini / generic API formats, with ready-to-run scripts that measure latency and apply llm-verify + CISPA paper methodology.
LLM API Model Verifier — 检测出那些"挂羊头卖狗肉"的 AI API。 你是否遇到过这种情况:付费接入了号称 GLM-5.1 的 API,结果用起来感觉不对劲?响应风格像豆包,知识截止日期对不上,甚至直接在回答里说自己叫"Doubao"? 这个工具用行为指纹技术(而非轻信 API 返回的 model 字段)来验明正身: 🔍 中英双语身份探针,绕过系统提示词伪装 ⏱️ 响应延迟 + Token 用量分析,发现代理/降级 🛡️ 系统提示词泄露检测,抓现行 📊 支持 OpenAI / Anthropic / Gemini / 通用格式 📄 输出结构化报告:LEGITIMATE / SUSPICIOUS / FRAUD_DETECTED 基于 llm-verify 和 CISPA 论文方法论构建。 为了保证api安全,在测试前,可以新建一个apikey,测试完之后再删除,这样就万无一失。 仓库地址:https://github.com/add-matong/llm-api-model-verifier
user_c7dc7099
未分类 community v1.0.0 1 版本 100000 Key: 需要
★ 1
Stars
📥 70
下载
💾 0
安装
1
版本
#latest

概述

LLM API Model Verifier

Verify whether an LLM API endpoint is truthfully serving the model it claims.

Supports OpenAI, Anthropic, Gemini, and generic OpenAI-compatible APIs.

When to Use This Skill

Trigger this skill when the user:

  • Asks how to check if an API is really the model it claims
  • Suspects a model fraud (paying for X, getting Y)
  • Wants to audit an AI API provider before using it for research
  • Mentions verifying, testing, or fingerprinting an API endpoint

Core Principle

Behavioral fingerprinting beats system prompt lies. A provider can fake the

model field and set a system prompt to make the model lie about its identity.

But the model's free-form output under varied prompts reveals its true identity.

This tool combines two methodologies:

  1. llm-verify (mintesnot-teshome/llm-verify) — forensic prompt methodology
  2. CISPA paper (arXiv 2603.01919) — systematic API fraud detection with capability testing, token analysis, and safety behavior profiling

Supported API Formats

| Format | Flag | Endpoint Pattern | Auth Header |

|--------|------|-----------------|-------------|

| OpenAI-compatible (default) | openai | {base}/chat/completions | Authorization: Bearer |

| Anthropic native | anthropic | {base}/v1/messages | x-api-key |

| Gemini native | gemini | {base}/v1beta/models/{model}:generateContent | Authorization: Bearer |

| Generic (fallback) | generic | {base}/chat/completions | Authorization: Bearer |

Verification Dimensions (8 categories, 29 probes)

| Category | Probes | What It Tests | Severity |

|----------|--------|---------------|----------|

| Identity | 7 | Direct/indirect/adversarial identity questions in ZH+EN | HIGH |

| System Prompt Leak | 3 | Can we extract identity instructions from the system prompt? | HIGH |

| Knowledge Cutoff | 4 | Cutoff date consistency across languages; hallucination detection | MEDIUM |

| Structured Output | 2 | Can the model produce valid JSON? Tests instruction-following | MEDIUM |

| Refusal Boundary | 3 | Does the model over-refuse harmless but sensitive-sounding questions? | MEDIUM |

| Safety Behavior | 4 | How does the model handle safety-critical scenarios? | MEDIUM |

| Capability | 2 | Context window and self-reported capabilities | LOW |

| Model-Specific | 4 | Direct checks for Doubao/GLM/DeepSeek identity | HIGH |

Verification Workflow

Step 1: Ask the User for Required Info

Ask for:

  1. API Base URL (e.g. https://api.example.com/v1)
  2. API Key
  3. Claimed model name (e.g. gpt-5, glm-5.1, claude-opus-4)
  4. API format (if unknown, try openai first — most providers are OpenAI-compatible)

Step 2: Run Minimum Viable Probes (5 probes)

These 5 probes give the strongest signal with minimal API calls:

1. Identity (Chinese)        →  primary decision driver (中文探针绕过英文伪装)
2. Identity (English)        →  cross-language consistency check
3. System prompt leak (ZH)   →  smoking gun if leaked
4. Knowledge cutoff (ZH)     →  training data boundary
5. Knowledge cutoff (EN)     →  consistency check

Run with --probes minimum or call verify_model(..., probes="minimum").

Step 3: Analyze Identity Response

The model's answer to the identity probe is the primary signal:

| Answer Contains | Claimed Model | Verdict |

|-----------------|---------------|---------|

| "I'm Doubao / 豆包" | glm-5.1 (Zhipu) | 🔴 FRAUD |

| "I'm GPT / ChatGPT" | claude (Anthropic) | 🔴 FRAUD |

| "I'm DeepSeek" | gpt-5 (OpenAI) | 🔴 FRAUD |

| Matches claimed identity | — | ✅ Proceed |

Step 4: Check System Prompt Leak

If the model outputs something like:

# 身份定位
你是豆包,是由字节跳动公司自主研发的...

This is decisive evidence of true identity regardless of what the model normally claims.

Step 5: Run Standard Probes (15 probes)

Adds structured output, refusal boundary, safety behavior, and model-specific checks:

python verify_api_model.py ... --probes standard

Structured output test: If the model cannot generate valid JSON, it may lack instruction-following capability → suspicious for large models.

Refusal boundary test: Over-refusal of harmless questions (e.g., "how do lockpicks work?" for a locksmith) reveals the model's safety training fingerprint.

Safety behavior test: How the model handles misinformation, bias, and safety-critical scenarios. Different models have very different patterns.

Step 6 (Optional): Run Full Probe Set (29 probes)

For maximum evidence, run all 29 probes:

python verify_api_model.py ... --probes full

Using the Bundled Script

CLI Usage

python verify_api_model.py \
    --api-base "https://api.example.com/v1" \
    --api-key "sk-..." \
    --model "gpt-5" \
    --api-format openai \
    --probes standard \
    --output "results.json"

Python API Usage

from verify_api_model import verify_model

result = verify_model(
    api_base="https://api.example.com/v1",
    api_key="sk-...",
    model="glm-5.1",
    api_format="openai",   # or "anthropic", "gemini"
    probes="standard",     # "minimum" | "standard" | "full"
    output_file="results.json",
)

print(result["verdict"])   # "FRAUD_DETECTED" | "SUSPICIOUS" | ...
print(result["fraud_signals"])
print(result["warnings"])
print(result["token_usage_summary"])  # avg input/output tokens

The script:

  • Auto-detects the correct endpoint URL for each API format
  • Measures response latency and token usage for each probe
  • Runs all probes with rate limiting (1.5s delay between calls)
  • Saves full JSON results and prints a formatted summary
  • Detects fraud signals: identity mismatch, system prompt leak, cutoff inconsistency, safety failure, structured output failure, token anomaly, abnormal latency

Output and Verdict

After running, the script will automatically output a formatted summary including:

  • Claimed model and API endpoint info
  • Token usage statistics (avg input/output tokens)
  • Final verdict (FRAUD_DETECTED / SUSPICIOUS / INCONCLUSIVE / LEGITIMATE)
  • Key probe responses (identity, leak, cutoff, structured output, refusal, safety, specific)
  • Fraud signals with evidence
  • Warnings

As an AI agent, you MUST present this summary directly to the user. Do not just say "verification complete" — show the actual verdict, key evidence, and conclusions from the output.

| Verdict | Meaning | Action |

|---------|---------|--------|

| FRAUD_DETECTED | 1+ high-severity signals | Stop using; report to user with evidence |

| SUSPICIOUS | 1+ medium-severity signals or 2+ warnings | Run more probes; compare benchmarks |

| INCONCLUSIVE | Warnings but no strong signal | Need more evidence |

| LEGITIMATE | No significant signals | Likely genuine |

Important Notes

  • The API model field in the response is NOT trustworthy — trivially faked by gateway
  • Response latency is a useful secondary signal:
  • > 15s average → possible intermediate proxy (suspicious)
  • < 0.5s for a chat completion → suspiciously fast, possibly a small/fake model
  • Always run identity probes in both Chinese and English — fraud systems often only override one language
  • The reasoning_content field is not a reliable identifier — many models (DeepSeek-R1, GLM-5, Doubao) have it
  • Token usage analysis: abnormally low output tokens → possibly a small/quantized model; wildly inconsistent token counts → unstable API proxy
  • Refusal boundary: if the model over-refuses harmless questions, it reveals its safety training fingerprint — different models have very different patterns
  • Safety behavior: models that confirm misinformation or stereotypes have broken safety alignment — a strong fraud signal
  • For OpenAI-compatible APIs (most third-party providers), use --api-format openai

References

  • Original llm-verify methodology: https://github.com/mintesnot-teshome/llm-verify
  • CISPA paper (arXiv 2603.01919): systematic study of LLM API fraud
  • references/identity_probes.md — 20+ probe templates by category
  • references/known_fake_patterns.md — documented fraud cases and red flags
  • references/report_template.md — markdown template for documentation

版本历史

共 1 个版本

  • v1.0.0 Initial release 当前
    2026-05-03 14:06 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,808
it-ops-security

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装后可防止您和您的用户受到提示注入、数据泄露及恶意行为的侵害。
★ 116 📥 31,028
it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 31,094