← 返回
未分类 中文

Agent Debugger

Debug AI agent issues systematically. Covers tool failures, infinite loops, context overflow, rate limits, and performance bottlenecks. Use when agents misbe...
系统化调试AI智能体问题。涵盖工具故障、无限循环、上下文溢出、速率限制及性能瓶颈。适用于智能体行为异常时。
engsathiago engsathiago 来源
未分类 clawhub v1.0.0 1 版本 99782.6 Key: 无需
★ 0
Stars
📥 459
下载
💾 1
安装
1
版本
#debug#error-handling#latest#loops#rate-limit#troubleshooting

概述

Agent Debugger

Systematic debugging for AI agent issues. When your agent misbehaves, this skill helps identify and fix the problem.

Common Agent Problems

1. Infinite Loops

Symptoms:

  • Agent repeats same action
  • Gets stuck in a pattern
  • Never completes task

Diagnosis:

Agent log shows:
- Same tool called 10+ times
- Same output format repeated
- No progress between iterations

Fixes:

Add iteration limit:

{
  "maxIterations": 5,
  "onLimit": "ask_user"
}

Add explicit stop condition:

In your instructions, add:
"If you've tried the same approach 3 times without success, stop and ask the user for guidance."

2. Tool Failures

Symptoms:

  • Tool returns error
  • Tool times out
  • Tool not found

Diagnosis:

Check:
- Tool exists in available_tools
- Parameters match tool schema
- Tool has required permissions
- Rate limits not exceeded

Fixes:

Validate parameters first:

# Before calling tool
required_params = tool.get("required", [])
for param in required_params:
    if param not in args:
        raise ValueError(f"Missing required parameter: {param}")

Add retry logic:

{
  "retries": 3,
  "retryDelay": 1000,
  "retryOn": ["rate_limit", "timeout", "5xx"]
}

3. Context Overflow

Symptoms:

  • "Context length exceeded" error
  • Agent forgets earlier conversation
  • Truncated outputs

Diagnosis:

Check context window:
- Current tokens vs max tokens
- Number of messages in history
- Size of file contents loaded

Fixes:

Use memory efficiently:

- Load only relevant files
- Use offset/limit for large files
- Summarize long conversations
- Clear old context periodically

Compress context:

# Instead of full file
content = read("file.txt", offset=1, limit=100)

# Use memory_search for specific info
results = memory_search("important decision")

4. Rate Limiting

Symptoms:

  • "Rate limit exceeded" error
  • Requests blocked
  • 429 status codes

Diagnosis:

Check:
- API rate limits (requests per minute/hour)
- Token limits (tokens per minute)
- Concurrent request limits
- Time until reset

Fixes:

Add backoff:

import time
import random

def call_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError as e:
            wait = (2 ** attempt) + random.random()
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Queue requests:

from queue import Queue
from threading import Thread

request_queue = Queue()

def process_queue():
    while True:
        task = request_queue.get()
        result = execute(task)
        request_queue.task_done()
        time.sleep(0.1)  # Rate limit: 10 req/s

5. Memory Issues

Symptoms:

  • Agent doesn't remember previous context
  • MEMORY.md not loaded
  • Memory files not found

Diagnosis:

Check:
- MEMORY.md exists
- memory/ directory exists
- Files have correct permissions
- Memory loaded at startup

Fixes:

Verify memory setup:

ls -la ~/.openclaw/workspace/
# Should show:
# MEMORY.md
# memory/

Add memory to instructions:

Before answering anything about prior work, decisions, dates, people, or todos: 
run memory_search on MEMORY.md + memory/*.md

6. Permission Errors

Symptoms:

  • "Permission denied"
  • "Access denied"
  • Tools not working

Diagnosis:

Check:
- User permissions
- File permissions
- Tool policies
- Sandbox restrictions

Fixes:

Check file permissions:

ls -la /path/to/file
chmod 600 ~/.openclaw/workspace/sensitive.json

Review tool policies:

{
  "tools": {
    "exec": {
      "security": "ask",  // or "allowlist" or "full"
      "ask": "on-miss"    // or "always" or "off"
    }
  }
}

7. Performance Issues

Symptoms:

  • Slow responses
  • Timeouts
  • High resource usage

Diagnosis:

Profile the agent:
- Time each tool call
- Count tokens used
- Measure context growth
- Identify bottlenecks

Fixes:

Optimize context:

# Instead of loading entire file
content = read("large_file.txt", limit=50)

# Use targeted search
results = memory_search("specific topic")

Reduce tool calls:

# Bad: Multiple calls
file1 = read("file1.txt")
file2 = read("file2.txt")
file3 = read("file3.txt")

# Good: Parallel or combined
files = read(["file1.txt", "file2.txt", "file3.txt"])

Debugging Workflow

Step 1: Reproduce

1. Document exact steps to trigger issue
2. Note expected vs actual behavior
3. Check if issue is consistent or intermittent
4. Try with minimal example

Step 2: Isolate

1. Disable other skills
2. Reduce context to minimum
3. Simplify task
4. Test each component separately

Step 3: Diagnose

1. Check logs (if available)
2. Review tool outputs
3. Examine context window
4. Verify configuration

Step 4: Fix

1. Apply fix
2. Test fix
3. Document fix
4. Update instructions if needed

Step 5: Prevent

1. Add guardrails
2. Update error handling
3. Add logging
4. Document in memory

Debugging Tools

Check Agent Status

# If you have access to session tools
status = session_status()
print(f"Model: {status['model']}")
print(f"Tokens used: {status['usage']['total_tokens']}")
print(f"Reasoning: {status['reasoning']}")

Clear Context

If agent is stuck:
1. Start new session
2. Load only essential memory
3. Re-approach task fresh

Enable Verbose Mode

{
  "thinking": "verbose",
  "reasoning": "on"
}

This shows internal reasoning, helping identify where logic fails.

Common Error Messages

ErrorCauseFix
-------------------
context_length_exceededToo much contextCompress, summarize, limit
rate_limit_exceededToo many requestsBackoff, queue, wait
tool_not_foundWrong tool nameCheck spelling, install skill
permission_deniedInsufficient accessCheck permissions, ask user
invalid_parametersWrong paramsValidate against schema
timeoutSlow responseIncrease timeout, optimize
memory_not_foundNo memory filesCreate MEMORY.md

Best Practices

1. Defensive Coding

# Always check before acting
if not os.path.exists(file):
    return "File not found"

try:
    result = risky_operation()
except ExpectedError:
    handle_error()

2. Progress Tracking

In agent instructions:
"Track your progress. After each major step, note what you've done and what's next."

3. Checkpointing

For long tasks:
- Save progress periodically
- Document current state
- Allow resuming from checkpoint

4. Logging

# Add to critical operations
log(f"Starting operation: {operation}")
log(f"Parameters: {params}")
log(f"Result: {result}")
log(f"Error: {error}")

When to Ask for Help

Ask the user when:

  • Multiple fix attempts failed
  • Issue is intermittent
  • Would require destructive actions
  • Need information only user has
  • Configuration changes needed

Prevention Tips

  1. Set limits early - max iterations, max tokens, max retries
  2. Validate inputs - check parameters before calling tools
  3. Handle errors gracefully - don't crash, report and adapt
  4. Log important events - helps debugging later
  5. Test edge cases - empty inputs, large files, special characters
  6. Monitor resources - tokens, time, memory usage
  7. Document quirks - save lessons in MEMORY.md

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 09:37 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Documentation Writer

engsathiago
编写清晰、全面的文档。涵盖 README 文件、API 文档、用户指南和代码注释。创建用户真正愿意阅读的文档。
★ 0 📥 1,157
ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,099 📥 825,237
ai-agent

Find Skills

guipi888
场景驱动+关键词双模式技能发现工具。当用户用自然语言描述场景/需求(如"我想做一个海报""帮我分析股票"),或明确说"安装技能/find skills/找个skill"时,自动从官方内置、本地已安装、SkillHub、虾评、GitHub、C
★ 1,460 📥 515,120