← 返回
未分类 中文

Boot Resume

Zero-cooperation session recovery after gateway restart. No checkpoints, no hooks, no agent involvement — just reads the evidence and picks up where it left...
网关重启后的零协作会话恢复。无检查点、无钩子、无代理参与——仅读取证据并从中断处继续。
belugary belugary 来源
未分类 clawhub v1.1.1 1 版本 100000 Key: 无需
★ 1
Stars
📥 543
下载
💾 1
安装
1
版本
#latest#ops#recovery#resilience#restart#session

概述

Boot Resume

Zero-cooperation session recovery after gateway restart. No checkpoints, no hooks, no agent involvement — just reads the evidence and picks up where it left off.

Problem

When the gateway restarts, any in-flight agent turn dies mid-execution. Session history is preserved on disk, but the agent doesn't know it needs to continue. Users must manually tell each interrupted session to resume.

Checkpoint-based approaches require the agent to save state _before_ dying. Unexpected kills (SIGKILL, OOM, power loss) bypass this entirely.

Solution

A deterministic shell script runs on every gateway start via systemd ExecStartPost. No LLM in the detection loop.

┌─────────┐     ┌──────────┐     ┌──────────┐
│  Scan   │ ──▶ │  Detect  │ ──▶ │  Resume  │
│sessions │     │  JSONL   │     │ cron add │
│ .json   │     │  tail    │     │--sys-evt │
└─────────┘     └──────────┘     └──────────┘
  1. Scan — finds sessions updated within the last 20 minutes
  2. Detect — reads the last 5 JSONL lines to classify session state
  3. Resume — schedules a one-shot openclaw cron add --system-event --wake now to inject a continuation prompt

Key insight: the JSONL session files already contain all the evidence needed to detect an interruption — no pre-save required.

Detection Rules

Last JSONL EntryStatusMeaning
---------
toolResultINTERRUPTEDTool returned, agent never processed it
assistant (empty text)INTERRUPTEDTool call dispatched, killed before response
user (non-trivial)INTERRUPTEDMessage received, never processed
assistant (with text)COMPLETESession ended normally — skip
user (trivial: "ok", emoji)TRIVIALNo meaningful request pending — skip

Install

One command

bash {baseDir}/install.sh

Deploys three components:

  • boot-resume-check.sh~/.openclaw/workspace/scripts/
  • boot-resume.conf → systemd drop-in (triggers script on every gateway start)
  • boot-resume-wake.service → systemd user service (triggers script on system wake from sleep/suspend)

Manual

cp {baseDir}/scripts/boot-resume-check.sh ~/.openclaw/workspace/scripts/
chmod +x ~/.openclaw/workspace/scripts/boot-resume-check.sh

mkdir -p ~/.config/systemd/user/openclaw-gateway.service.d
cp {baseDir}/templates/boot-resume.conf ~/.config/systemd/user/openclaw-gateway.service.d/
cp {baseDir}/templates/boot-resume-wake.service ~/.config/systemd/user/

systemctl --user daemon-reload
systemctl --user enable boot-resume-wake.service

Verify

systemctl --user restart openclaw-gateway
sleep 20
cat /tmp/openclaw/boot-resume.log

Expected output:

[boot-resume] now=... cut=... (20min window)
[boot-resume] scanning agent: main
[boot-resume] candidates: 0 (agent=main)
[boot-resume] done

Test

  1. Send a message that triggers a multi-step task (web search, code analysis, etc.)
  2. Wait for the agent to start processing (tool calls in flight)
  3. systemctl --user restart openclaw-gateway
  4. Agent resumes automatically within ~35 seconds

Slash Command

When invoked as /boot-resume, run the script with --no-wait to skip the startup delay:

bash {baseDir}/scripts/boot-resume-check.sh --no-wait

Report results to the user: which sessions were resumed, or that none were found.

Configuration

VariableDefaultDescription
--------------------------------
WINDOW_MINUTES20How far back to scan for interrupted sessions
DELAY20sDelay before injecting the resume event

Edit at the top of scripts/boot-resume-check.sh.

Features

  • Dual trigger — covers both gateway restart (ExecStartPost) and system sleep/wake (systemd sleep.target)
  • Multi-agent support — scans all agents under ~/.openclaw/agents/, not just main
  • Smart filtering — skips system, heartbeat, cron, and subagent sessions automatically
  • Deduplication — respects restart-resume.json to avoid double-resuming planned restarts
  • Log rotation — auto-truncates log at 1000 lines
  • Error visibility — Python and cron errors are logged, not swallowed
  • Unique job names — timestamp-based to prevent conflicts on rapid restarts

Comparison

ApproachPre-save requiredSurvives SIGKILLLLM-free
------------
Checkpoint / snapshot filesYesNoNo
Pre-restart state dumpYesNoNo
Session history replayYesPartialNo
Post-hoc JSONL detection (this skill)NoYesYes

Logs

Output: /tmp/openclaw/boot-resume.log

Each run logs: timestamp, scan window, candidate count, per-session status, and whether a resume job was armed.

Limitations

  • 20-minute scan window (configurable) — sessions idle longer than this are not resumed
  • Resume prompt is generic — the agent relies on session context for continuity
  • Telegram/Discord message queues already handle unprocessed incoming messages — this skill targets mid-execution interruptions
  • Requires systemd (Linux); macOS users need manual launchd setup

Uninstall

rm ~/.config/systemd/user/openclaw-gateway.service.d/boot-resume.conf
systemctl --user disable boot-resume-wake.service 2>/dev/null
rm ~/.config/systemd/user/boot-resume-wake.service
systemctl --user daemon-reload
rm ~/.openclaw/workspace/scripts/boot-resume-check.sh
rm -rf ~/.openclaw/workspace/skills/boot-resume

版本历史

共 1 个版本

  • v1.1.1 当前
    2026-03-30 19:01 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,459
it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,774
life-service

Whoop Connect

belugary
将WHOOP可穿戴设备连接至OpenClaw,获取并本地存储恢复、睡眠、HRV、压力和训练数据。用户询问恢复、睡眠质量等情况时使用。
★ 1 📥 535