概述

LLM API Model Verifier

Verify whether an LLM API endpoint is truthfully serving the model it claims.

Supports OpenAI, Anthropic, Gemini, and generic OpenAI-compatible APIs.

When to Use This Skill

Trigger this skill when the user:

Asks how to check if an API is really the model it claims
Suspects a model fraud (paying for X, getting Y)
Wants to audit an AI API provider before using it for research
Mentions verifying, testing, or fingerprinting an API endpoint

Core Principle

Behavioral fingerprinting beats system prompt lies. A provider can fake the

model field and set a system prompt to make the model lie about its identity.

But the model's free-form output under varied prompts reveals its true identity.

This tool combines two methodologies:

llm-verify (mintesnot-teshome/llm-verify) — forensic prompt methodology
CISPA paper (arXiv 2603.01919) — systematic API fraud detection with capability testing, token analysis, and safety behavior profiling

Supported API Formats

|--------|------|-----------------|-------------|

| OpenAI-compatible (default) | openai | {base}/chat/completions | Authorization: Bearer |

| Anthropic native | anthropic | {base}/v1/messages | x-api-key |

| Gemini native | gemini | {base}/v1beta/models/{model}:generateContent | Authorization: Bearer |

| Generic (fallback) | generic | {base}/chat/completions | Authorization: Bearer |

Verification Dimensions (8 categories, 29 probes)

|----------|--------|---------------|----------|

| Capability | 2 | Context window and self-reported capabilities | LOW |

Verification Workflow

Step 1: Ask the User for Required Info

Ask for:

API Base URL (e.g. https://api.example.com/v1)
API Key
Claimed model name (e.g. gpt-5, glm-5.1, claude-opus-4)
API format (if unknown, try openai first — most providers are OpenAI-compatible)

Step 2: Run Minimum Viable Probes (5 probes)

These 5 probes give the strongest signal with minimal API calls:

1. Identity (Chinese)        →  primary decision driver (中文探针绕过英文伪装)
2. Identity (English)        →  cross-language consistency check
3. System prompt leak (ZH)   →  smoking gun if leaked
4. Knowledge cutoff (ZH)     →  training data boundary
5. Knowledge cutoff (EN)     →  consistency check

Run with --probes minimum or call verify_model(..., probes="minimum").

Step 3: Analyze Identity Response

The model's answer to the identity probe is the primary signal:

| Answer Contains | Claimed Model | Verdict |

|-----------------|---------------|---------|

| "I'm Doubao / 豆包" | glm-5.1 (Zhipu) | 🔴 FRAUD |

| "I'm GPT / ChatGPT" | claude (Anthropic) | 🔴 FRAUD |

| "I'm DeepSeek" | gpt-5 (OpenAI) | 🔴 FRAUD |

| Matches claimed identity | — | ✅ Proceed |

Step 4: Check System Prompt Leak

If the model outputs something like:

# 身份定位
你是豆包，是由字节跳动公司自主研发的...

This is decisive evidence of true identity regardless of what the model normally claims.

Step 5: Run Standard Probes (15 probes)

Adds structured output, refusal boundary, safety behavior, and model-specific checks:

python verify_api_model.py ... --probes standard

Structured output test: If the model cannot generate valid JSON, it may lack instruction-following capability → suspicious for large models.

Refusal boundary test: Over-refusal of harmless questions (e.g., "how do lockpicks work?" for a locksmith) reveals the model's safety training fingerprint.

Safety behavior test: How the model handles misinformation, bias, and safety-critical scenarios. Different models have very different patterns.

Step 6 (Optional): Run Full Probe Set (29 probes)

For maximum evidence, run all 29 probes:

python verify_api_model.py ... --probes full

Using the Bundled Script

CLI Usage

python verify_api_model.py \
    --api-base "https://api.example.com/v1" \
    --api-key "sk-..." \
    --model "gpt-5" \
    --api-format openai \
    --probes standard \
    --output "results.json"

Python API Usage

from verify_api_model import verify_model

result = verify_model(
    api_base="https://api.example.com/v1",
    api_key="sk-...",
    model="glm-5.1",
    api_format="openai",   # or "anthropic", "gemini"
    probes="standard",     # "minimum" | "standard" | "full"
    output_file="results.json",
)

print(result["verdict"])   # "FRAUD_DETECTED" | "SUSPICIOUS" | ...
print(result["fraud_signals"])
print(result["warnings"])
print(result["token_usage_summary"])  # avg input/output tokens

The script:

Auto-detects the correct endpoint URL for each API format
Measures response latency and token usage for each probe
Runs all probes with rate limiting (1.5s delay between calls)
Saves full JSON results and prints a formatted summary
Detects fraud signals: identity mismatch, system prompt leak, cutoff inconsistency, safety failure, structured output failure, token anomaly, abnormal latency

Output and Verdict

After running, the script will automatically output a formatted summary including:

Claimed model and API endpoint info
Token usage statistics (avg input/output tokens)
Final verdict (FRAUD_DETECTED / SUSPICIOUS / INCONCLUSIVE / LEGITIMATE)
Key probe responses (identity, leak, cutoff, structured output, refusal, safety, specific)
Fraud signals with evidence
Warnings

As an AI agent, you MUST present this summary directly to the user. Do not just say "verification complete" — show the actual verdict, key evidence, and conclusions from the output.

| Verdict | Meaning | Action |

|---------|---------|--------|

| FRAUD_DETECTED | 1+ high-severity signals | Stop using; report to user with evidence |

| SUSPICIOUS | 1+ medium-severity signals or 2+ warnings | Run more probes; compare benchmarks |

| INCONCLUSIVE | Warnings but no strong signal | Need more evidence |

| LEGITIMATE | No significant signals | Likely genuine |

Important Notes

The API model field in the response is NOT trustworthy — trivially faked by gateway
Response latency is a useful secondary signal:
> 15s average → possible intermediate proxy (suspicious)
< 0.5s for a chat completion → suspiciously fast, possibly a small/fake model
Always run identity probes in both Chinese and English — fraud systems often only override one language
The reasoning_content field is not a reliable identifier — many models (DeepSeek-R1, GLM-5, Doubao) have it
Token usage analysis: abnormally low output tokens → possibly a small/quantized model; wildly inconsistent token counts → unstable API proxy
Refusal boundary: if the model over-refuses harmless questions, it reveals its safety training fingerprint — different models have very different patterns
Safety behavior: models that confirm misinformation or stereotypes have broken safety alignment — a strong fraud signal
For OpenAI-compatible APIs (most third-party providers), use --api-format openai

References

Original llm-verify methodology: https://github.com/mintesnot-teshome/llm-verify
CISPA paper (arXiv 2603.01919): systematic study of LLM API fraud
references/identity_probes.md — 20+ probe templates by category
references/known_fake_patterns.md — documented fraud cases and red flags
references/report_template.md — markdown template for documentation

版本历史

共 1 个版本

v1.0.0 Initial release 当前

2026-05-03 14:06 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)