← 返回
未分类

Hallucination Detector

Detect and flag hallucinations in LLM outputs by cross-referencing claims against source documents, code, and verifiable data. Essential for RAG pipelines an...
检测并标记 LLM 输出中的幻觉,方法是将声明与源文档、代码和可验证数据进行交叉比对。对 RAG 流程至关重要...
charlie-morrison charlie-morrison 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 309
下载
💾 1
安装
1
版本
#latest

概述

Hallucination Detector

Detect and flag hallucinations in LLM-generated content by cross-referencing claims against source documents, codebase facts, API documentation, and verifiable data. Use when reviewing AI-generated documentation, code comments, summaries, or any content that needs factual accuracy.

Usage

"Check this AI-generated summary for hallucinations against the source docs"
"Verify the code documentation matches the actual implementation"
"Detect hallucinations in this RAG pipeline output"
"Fact-check this AI-generated API reference"
"Review these generated release notes for accuracy"

How It Works

1. Claim Extraction

Parse the LLM output and identify verifiable claims:

  • Factual assertions: "The function accepts 3 parameters", "The API returns JSON"
  • Numerical claims: "Performance improved by 40%", "Supports up to 10,000 connections"
  • Reference claims: "As documented in RFC 7231", "According to the README"
  • Code claims: "The processOrder() function handles refunds", "Uses AES-256 encryption"
  • Temporal claims: "Added in version 2.3", "Deprecated since 2024"
  • Relational claims: "Module A depends on Module B", "Function X calls Function Y"

2. Source Grounding

For each claim, identify and query the ground truth source:

Code verification:

# Verify function exists and has claimed signature
grep -rn "function processOrder\|def processOrder\|processOrder(" src/
# Verify parameter count
ast-grep --pattern 'function processOrder($$$)' src/
# Verify dependency relationship
grep -rn "import.*from.*moduleB\|require.*moduleB" src/moduleA/

Documentation verification:

# Check if referenced section exists
grep -rn "## Configuration" docs/
# Verify version claims
git log --oneline --all --grep="processOrder" | head -5
git tag --contains $(git log --format=%H -1 --all -- src/processOrder.ts)

API verification:

# Verify endpoint exists and returns claimed schema
grep -rn "router\.\(get\|post\|put\|delete\)" src/routes/ | grep "orders"
# Check response schema
grep -A 20 "res.json\|res.send" src/routes/orders.ts

3. Hallucination Classification

Each detected issue is classified:

  • 🔴 Fabrication: Claim has no basis in source material (made up entirely)
  • Example: "The calculateTax() function" when no such function exists
  • 🟠 Contradiction: Claim directly contradicts source material
  • Example: "Returns a string" when function actually returns a number
  • 🟡 Exaggeration: Claim overstates or embellishes source facts
  • Example: "Handles millions of requests" when docs say "tested to 10K"
  • 🔵 Outdated: Claim was true but is no longer accurate
  • Example: "Uses Express.js" when codebase migrated to Fastify
  • Unverifiable: Claim cannot be checked against available sources
  • Example: "Industry best practice" with no citation

4. Confidence Scoring

Each claim gets a confidence score:

  • Verified (✅ 90-100%): Claim matches source material exactly
  • Likely correct (🟢 70-89%): Claim is consistent but not directly verifiable
  • Uncertain (🟡 40-69%): Partial match or ambiguous source
  • Likely hallucinated (🟠 10-39%): Contradicts or unsupported by sources
  • Confirmed hallucination (🔴 0-9%): Demonstrably false

5. RAG Pipeline Analysis

When used in RAG context, also check:

  • Retrieval accuracy: Were the right chunks retrieved for the query?
  • Attribution fidelity: Does the output actually reflect the retrieved chunks?
  • Chunk boundary issues: Was a claim split across chunks, losing context?
  • Source mixing: Were facts from different sources incorrectly combined?
  • Inference vs citation: Is the model inferring beyond what sources state?

6. Pattern Detection

Identify common hallucination patterns:

  • Confident fabrication: High-confidence claims about nonexistent features
  • Plausible details: Made-up but realistic-sounding specifics
  • Version confusion: Mixing features from different software versions
  • Name substitution: Using similar but incorrect function/class names
  • Count inflation: Overstating numbers, capabilities, or performance
  • False relationships: Inventing connections between unrelated components

Output

## Hallucination Analysis Report

**Content reviewed:** AI-generated API documentation (2,400 words)
**Claims extracted:** 47
**Verification results:**

| Status | Count | % |
|--------|-------|---|
| ✅ Verified | 31 | 66% |
| 🟢 Likely correct | 6 | 13% |
| 🟡 Uncertain | 3 | 6% |
| 🔴 Hallucinated | 7 | 15% |

### 🔴 Confirmed Hallucinations

1. **"The /api/users endpoint supports bulk operations via POST /api/users/bulk"**
   - Reality: No bulk endpoint exists in routes/users.ts
   - Type: Fabrication
   - Fix: Remove bulk operations section entirely

2. **"Authentication uses OAuth 2.0 with PKCE flow"**
   - Reality: Uses session-based auth (express-session + passport-local)
   - Type: Contradiction
   - Fix: Replace with actual auth mechanism description

3. **"Rate limited to 1000 requests per minute per API key"**
   - Reality: No rate limiting configured (no rate-limit middleware found)
   - Type: Fabrication
   - Fix: Remove rate limiting claims or implement the feature

4. **"Response includes pagination metadata with total_count, page, and per_page"**
   - Reality: API returns raw arrays without pagination
   - Type: Fabrication
   - Fix: Document actual response format

[3 more...]

### 🟡 Uncertain Claims (need human review)

5. **"Handles up to 5,000 concurrent connections"**
   - No load testing data found — may be true but unverified
   
### 📊 Hallucination Hotspots
- Performance claims: 3/4 hallucinated (75%)
- Feature descriptions: 2/18 hallucinated (11%)
- Security section: 2/8 hallucinated (25%)

### 💡 Recommendations
- Performance section needs complete rewrite with actual benchmarks
- Security section should be reviewed by the auth team
- Consider adding integration tests that validate API documentation accuracy

Integration

  • CI pipeline: Run on generated docs before merge
  • RAG applications: Post-processing step to filter hallucinated content
  • Content review: Batch-check AI-generated articles or documentation
  • Code review: Verify AI-generated code comments match implementation

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-08 00:54 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 843 📥 323,708
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,407 📥 324,765
ai-agent

Find Skills

guipi888
场景驱动+关键词双模式技能发现工具。当用户用自然语言描述场景/需求(如"我想做一个海报""帮我分析股票"),或明确说"安装技能/find skills/找个skill"时,自动从官方内置、本地已安装、SkillHub、虾评、GitHub、C
★ 1,490 📥 555,583