← 返回
未分类

claude-intel-monitor

Detect intelligence degradation in Claude, GPT, and DeepSeek using 30 standardized Chinese benchmark questions across Math, Reasoning, and Code. Born from re...
使用30道标准化中文基准题(涵盖数学、推理和代码),检测Claude、GPT和DeepSeek的智能退化。源于研究
minirr890112-byte minirr890112-byte 来源
未分类 clawhub v1.1.0 2 版本 100000 Key: 无需
★ 0
Stars
📥 329
下载
💾 0
安装
2
版本
#ai#benchmark#claude#deepseek#gpt#latest#monitor#quality

概述

claude-intel-monitor — AI 降智检测工具

Detect intelligence degradation in AI models with standardized benchmarks. 30 curated Chinese questions across Math, Reasoning, and Code — designed around real degradation patterns from the Chinese developer community.

痛点来源 (Pain Signal Origins)

"Claude/GPT 降智" was a top-3 hot topic during April-May 2026 Chinese developer community scans:

  • CSDN: Multiple quantified analyses demonstrating Claude Opus 4.6 reasoning degradation (-67% depth, +98% hallucination)
  • V2EX claudecode node: 12-reply hot thread on Claude Code behavior changes
  • V2EX deepseek node: 4 posts on frequent service disruptions

Quick Start

pip install claude-intel-monitor

# Test a model
claude-intel-monitor test --model claude-sonnet-4 --provider anthropic

# Set baseline for degradation detection
claude-intel-monitor baseline --model claude-sonnet-4

# View history
claude-intel-monitor history

# Continuous watch mode
claude-intel-monitor watch --model claude-sonnet-4 --provider anthropic --interval 6h

Benchmark Structure

30 questions, 3 dimensions:

DimensionCountWeightDetection Target
--------------------------------------------
Math101.0xMathematical reasoning, hallucination tendency
Reasoning101.2xLogical reasoning, reduced safety awareness
Code101.3xCode quality, architectural degradation

All Chinese. Each answer validated by deterministic check functions (no AI grading bias).

Featured Baseline: DeepSeek 91.1%

🧠 Testing deepseek-chat via deepseek — 30 questions

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      91.1%  ██████████████████░░  ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

📊 DeepSeek first live baseline: 27/30 (91.1%)

Related Tools

版本历史

共 2 个版本

  • v1.1.0 当前
    2026-05-12 05:16 安全 安全
  • v1.0.0
    2026-05-08 01:55 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,119 📥 839,600
design-media

popular-web-designs

minirr890112-byte
从真实网站(Stripe、Linear、Vercel 等)提取的 54 个生产级设计系统。加载模板即可生成匹配的 HTML/CSS,实现视觉一致的效果。
★ 2 📥 2,065
ai-agent

Find Skills

guipi888
场景驱动+关键词双模式技能发现工具。当用户用自然语言描述场景/需求(如"我想做一个海报""帮我分析股票"),或明确说"安装技能/find skills/找个skill"时,自动从官方内置、本地已安装、SkillHub、虾评、GitHub、C
★ 1,486 📥 547,976