Verify whether an LLM API endpoint is truthfully serving the model it claims.
Supports OpenAI, Anthropic, Gemini, and generic OpenAI-compatible APIs.
Trigger this skill when the user:
Behavioral fingerprinting beats system prompt lies. A provider can fake the
model field and set a system prompt to make the model lie about its identity.
But the model's free-form output under varied prompts reveals its true identity.
This tool combines two methodologies:
| Format | Flag | Endpoint Pattern | Auth Header |
|--------|------|-----------------|-------------|
| OpenAI-compatible (default) | openai | {base}/chat/completions | Authorization: Bearer |
| Anthropic native | anthropic | {base}/v1/messages | x-api-key |
| Gemini native | gemini | {base}/v1beta/models/{model}:generateContent | Authorization: Bearer |
| Generic (fallback) | generic | {base}/chat/completions | Authorization: Bearer |
| Category | Probes | What It Tests | Severity |
|----------|--------|---------------|----------|
| Identity | 7 | Direct/indirect/adversarial identity questions in ZH+EN | HIGH |
| System Prompt Leak | 3 | Can we extract identity instructions from the system prompt? | HIGH |
| Knowledge Cutoff | 4 | Cutoff date consistency across languages; hallucination detection | MEDIUM |
| Structured Output | 2 | Can the model produce valid JSON? Tests instruction-following | MEDIUM |
| Refusal Boundary | 3 | Does the model over-refuse harmless but sensitive-sounding questions? | MEDIUM |
| Safety Behavior | 4 | How does the model handle safety-critical scenarios? | MEDIUM |
| Capability | 2 | Context window and self-reported capabilities | LOW |
| Model-Specific | 4 | Direct checks for Doubao/GLM/DeepSeek identity | HIGH |
Ask for:
https://api.example.com/v1)
gpt-5, glm-5.1, claude-opus-4)
openai first — most providers are OpenAI-compatible)
These 5 probes give the strongest signal with minimal API calls:
1. Identity (Chinese) → primary decision driver (中文探针绕过英文伪装)
2. Identity (English) → cross-language consistency check
3. System prompt leak (ZH) → smoking gun if leaked
4. Knowledge cutoff (ZH) → training data boundary
5. Knowledge cutoff (EN) → consistency check
Run with --probes minimum or call verify_model(..., probes="minimum").
The model's answer to the identity probe is the primary signal:
| Answer Contains | Claimed Model | Verdict |
|-----------------|---------------|---------|
| "I'm Doubao / 豆包" | glm-5.1 (Zhipu) | 🔴 FRAUD |
| "I'm GPT / ChatGPT" | claude (Anthropic) | 🔴 FRAUD |
| "I'm DeepSeek" | gpt-5 (OpenAI) | 🔴 FRAUD |
| Matches claimed identity | — | ✅ Proceed |
If the model outputs something like:
# 身份定位
你是豆包,是由字节跳动公司自主研发的...
This is decisive evidence of true identity regardless of what the model normally claims.
Adds structured output, refusal boundary, safety behavior, and model-specific checks:
python verify_api_model.py ... --probes standard
Structured output test: If the model cannot generate valid JSON, it may lack instruction-following capability → suspicious for large models.
Refusal boundary test: Over-refusal of harmless questions (e.g., "how do lockpicks work?" for a locksmith) reveals the model's safety training fingerprint.
Safety behavior test: How the model handles misinformation, bias, and safety-critical scenarios. Different models have very different patterns.
For maximum evidence, run all 29 probes:
python verify_api_model.py ... --probes full
python verify_api_model.py \
--api-base "https://api.example.com/v1" \
--api-key "sk-..." \
--model "gpt-5" \
--api-format openai \
--probes standard \
--output "results.json"
from verify_api_model import verify_model
result = verify_model(
api_base="https://api.example.com/v1",
api_key="sk-...",
model="glm-5.1",
api_format="openai", # or "anthropic", "gemini"
probes="standard", # "minimum" | "standard" | "full"
output_file="results.json",
)
print(result["verdict"]) # "FRAUD_DETECTED" | "SUSPICIOUS" | ...
print(result["fraud_signals"])
print(result["warnings"])
print(result["token_usage_summary"]) # avg input/output tokens
The script:
After running, the script will automatically output a formatted summary including:
As an AI agent, you MUST present this summary directly to the user. Do not just say "verification complete" — show the actual verdict, key evidence, and conclusions from the output.
| Verdict | Meaning | Action |
|---------|---------|--------|
| FRAUD_DETECTED | 1+ high-severity signals | Stop using; report to user with evidence |
| SUSPICIOUS | 1+ medium-severity signals or 2+ warnings | Run more probes; compare benchmarks |
| INCONCLUSIVE | Warnings but no strong signal | Need more evidence |
| LEGITIMATE | No significant signals | Likely genuine |
model field in the response is NOT trustworthy — trivially faked by gateway
reasoning_content field is not a reliable identifier — many models (DeepSeek-R1, GLM-5, Doubao) have it
--api-format openai
references/identity_probes.md — 20+ probe templates by category
references/known_fake_patterns.md — documented fraud cases and red flags
references/report_template.md — markdown template for documentation
共 1 个版本