← 返回
未分类 中文

Cron Failure Runbook

Runbook for diagnosing failed cron jobs, LaunchAgents, heartbeats, and unattended automation by reproducing the scheduler context, preflighting dependencies,...
故障排查手册:诊断失败的 cron 任务、LaunchAgents、心跳和无人值守自动化,复现调度上下文,预检依赖项,...
nissan
未分类 clawhub v1.0.0 1 版本 99583.3 Key: 无需
★ 0
Stars
📥 239
下载
💾 1
安装
1
版本
#latest

概述

Cron Failure Runbook

Use when a scheduled job, LaunchAgent, cron task, heartbeat step, or nightly automation fails, silently no-ops, produces incomplete output, or repeatedly generates dream-cycle failure proposals.

Goal

Turn unattended failures into reproducible evidence and one of three outcomes:

  1. Fixed and verified.
  2. Deferred with owner/date/reason.
  3. Escalated with the exact missing credential, approval, service, or runtime condition.

Procedure

  1. Identify the scheduler context.
    • Job name, plist/cron entry, command, cwd, shell, user, and expected environment.
    • Last successful run and last failed/no-op run.
  1. Reproduce in the same runtime lane.
    • Run the exact command manually with the same env source where practical.
    • Capture stdout, stderr, exit code, cwd, PATH, and relevant env variable presence without printing secret values.
    • If the job depends on OpenClaw model calls, verify it uses gateway/Codex routing rather than raw OPENAI_API_KEY.
  1. Run preflights before the expensive or external step.
    • Auth: prove the running process can read the needed secret and make the smallest live API call.
    • Files: prove input paths exist and output directories are writable.
    • Network/service: prove target health endpoint or API is reachable.
    • Approval: prove an external write has approval or a preapproved workflow flag.
  1. Classify the failure.
    • auth: missing/expired token, wrong vault, wrong runtime env, insufficient scope.
    • runtime: wrong shell, PATH, Python/Node version, cwd, launchd env, permissions.
    • input: missing/stale source files, empty queue, unexpected schema.
    • external: API outage, 401/403, rate limit, deploy provider issue.
    • logic: script exits zero but produces no expected artifact/action.
  1. Close the loop.
    • Fix code/config if local and reversible.
    • Add a dry-run or preflight mode if the job cannot be safely tested live.
    • Update the relevant STATUS/runbook/memory with evidence.
    • If unresolved, record blocker, owner, next command, and alert threshold.

Verification Evidence

Every cron fix needs at least one of:

  • Manual reproduction command with exit code and expected output.
  • preflight-only or dry-run output proving dependencies are healthy.
  • Scheduler log excerpt showing the next run succeeded.
  • A deliberate deferred/blocked entry with owner, reason, and next check date.

Dream-Cycle Specific Checks

For dream-cycle failures:

  • bash -n scripts/dream-cycle.sh
  • python3 -m py_compile for every Python script touched by the cycle.
  • scripts/task-quality-judge.py --since 7 --dry-run
  • scripts/skill-evolver.py --since 7 --min-failures 2 --dry-run
  • scripts/dream-recurring-issues.py --since 7 --min-count 3 --dry-run
  • scripts/dream-cycle-action-summary.py --since-hours 26 --dry-run

Do not mark dream-cycle work complete if proposal files are merely pending. There must be a lifecycle status, a summary, and a next action.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-23 16:45 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Insight Engine

nissan
日志/指标 → Python统计 → 大模型解释 → Notion报告。使用时机:从AI系统日志中生成每日/每周/每月的运维洞察,p...
★ 0 📥 963
content-creation

Showcase Video Builder

nissan
使用 ffmpeg 将截图、头像和文字叠加制作精美的展示和演示视频,适用于演示视频、黑客马拉松展示、专业作品集等场景。
★ 0 📥 962
data-analysis

Observability Lgtm

nissan
Set up a full local LGTM observability stack (Loki + Grafana + Tempo + Prometheus + Alloy) for FastAPI apps. One Docker
★ 0 📥 977