> A shark that stops swimming dies. An agent that waits for tools wastes compute.
Works with: Claude Code · Codex · Gemini CLI · Cursor · Windsurf · Aider · OpenClaw · any LLM agent
Trigger this skill when the user says:
Every LLM turn must complete in under 30 seconds.
If any operation would take longer:
sessions_spawn with mode: "run")You are never in I/O wait. You are always reasoning about something.
┌─────────────┐
│ DECOMPOSE │ Break task into N independent subtasks
└──────┬──────┘
│ spawn N remoras (+ 1 pilot fish when first completes early)
▼
┌─────────────┐
│ SPAWN │ sessions_spawn × N, all parallel, record session IDs
└──────┬──────┘
│ main agent keeps reasoning (never waits)
▼
┌─────────────┐ timeout/crash
│ MONITOR │ ──────────────────► MARK ⏱/❌ (partial still useful)
└──────┬──────┘
│ all done OR deadline hit
▼
┌─────────────┐
│ AGGREGATE │ Collect results, note failures, merge pilot fish draft
└──────┬──────┘
│
▼
┌─────────────┐
│ REPORT │ Single coherent response with failure count noted
└─────────────┘
No nested remoras. If a remora is running, it executes inline — remoras cannot spawn their own remoras. Only the main shark spawns.
think → call slow tool → WAIT 60s → think → call slow tool → WAIT 45s → ...
think → spawn remora(slow tool) → think about something else
→ spawn remora(another tool) → synthesize partial results
→ receive remora result → incorporate → swim on
When applying the Shark Pattern, structure your work like this:
Before calling any tool, ask: "Will this take more than 20-30 seconds?"
Slow tools (always spawn):
Fast tools (run inline, never spawn):
sessions_spawn({
task: "Do the slow thing and return the result",
mode: "run",
runtime: "subagent",
streamTo: "parent" // optional: stream output back
})
Spawn multiple remoras in parallel when possible — don't serialize unless there's a data dependency.
After spawning, immediately continue:
When remora results arrive, weave them in and continue. Never re-do work a remora already completed.
If your runtime keeps subagents alive after completion, close them once you've incorporated their result. In Codex that means: wait for the remora, use its output, then close_agent(id) unless you intentionally plan to reuse that same agent.
| Operation | Budget | Action |
|---|---|---|
| ----------- | -------- | -------- |
| File read | < 2s | Inline |
| Web search | 5-30s | Spawn |
| SSH command | 10-120s | Spawn |
| Build/test | 30-300s | Spawn |
| Coding agent | 60-600s | Spawn |
| Memory search | < 3s | Inline |
Without Shark (blocking):
1. Search web for X [wait 15s]
2. Search web for Y [wait 12s]
3. Fetch page Z [wait 8s]
4. SSH check server [wait 30s]
Total: ~65 seconds blocked
With Shark (non-blocking):
1. Spawn: search X [0s - spawned]
2. Spawn: search Y [0s - spawned]
3. Spawn: fetch Z [0s - spawned]
4. Spawn: SSH check [0s - spawned]
5. Plan synthesis while waiting [15s of actual thinking]
6. All results arrive → synthesize
Total: ~15s of thinking + max(tool times) in parallel
> 🦈 Shark mode — spawning [N] remoras for [tasks], continuing...
Use this format after each remora or pilot fish completes. Works in Telegram, Discord, Signal, iMessage — anywhere.
🦈 3 remoras · 1 pilot fish
◉ [A] task name here ████████████ ✅ 9s
◉ [B] task name here ████████████ ✅ 33s
○ [C] task name here ░░░░░░░░░░░░ pending
◈ [P] Pilot fish ██████░░░░░░ ~14s left
↳ continuing...
Symbols:
◉ = remora (completed)○ = remora (pending)⊙ = remora (running)◈ = pilot fish (time-bounded)████████████ = done bar (12 blocks)██████░░░░░░ = partial (filled = elapsed / total budget)░░░░░░░░░░░░ = not startedProgress fill: filled = round(elapsed / timeout * 12) blocks of █, remainder ░
Only post an update when something changes (remora completes or pilot fish starts/ends). Don't spam — one update per event.
After all remoras done:
> 🦈 All fins in — synthesising [N] results + pilot draft
Then deliver the report.
> Pilot fish swim alongside sharks doing prep work. When you have idle time, use it.
When one remora returns early and others are still running:
remora A ──────► result (early)
remora B ────────────────────────────► result
remora C ──────────────────────────────────► result
main: spawn A, B, C
A done → spawn pilot-fish(A's result, timeout=est_remaining)
pilot-fish: pre-analyse A, draft partial report, validate data...
B done → pilot-fish still running, feed B's result in (or kill+reuse)
C done → kill pilot-fish, synthesise A+B+C+pilot-fish draft
runTimeoutSeconds equal to estimated remaining wait// remoras A (fast) and B (slow) both spawned
// A finishes in 10s, B will take another 30s
// Spawn pilot fish with 25s budget:
sessions_spawn({
task: "Pre-analyse these results from remora A.
Validate the data, note any gaps, draft the structure
of the final report. Stop after 25 seconds.",
runTimeoutSeconds: 25,
mode: "run"
})
// Main agent continues doing other work
// When B finishes → kill pilot fish → synthesise A + B + pilot draft
Before every tool call, ask: "Will this take more than 10 seconds?"
Estimated time < 10s? → run inline
Estimated time ≥ 10s? → spawn remora
Unknown latency? → spawn remora (assume slow)
Data dependency on another remora? → wait, then inline
Already at 8 remoras? → queue, don't stack
Always spawn: web search/fetch, SSH, build/test, coding agents, CI triggers, API calls with unknown latency
Always inline: file read, memory lookup, string ops, math, local config reads
remoras will fail, timeout, or return garbage. Plan for it.
◉ [A] task ████████████ ⏱ 30s [timeout]
◉ [A] task ████████████ ❌ [error: connection refused]
close_agent(id) once their output is delivered.⏱ = timeout with partial, ❌ = hard error with nothingsessions_spawn call with mode: "run", runtime: "subagent", and runTimeoutSeconds set. A remora is specifically a timed sub-agent — untimed subagents are not remoras.runTimeoutSeconds — confirmed realVerified against OpenClaw source: runTimeoutSeconds: z.number().int().min(0).optional() — maps to the subagent wait timeout. Use it. Hard-kills the sub-agent process after N seconds, partial output returned.
pilotFishTimeout = min(estimatedRemaining * 0.8, 25)
estimatedRemaining = how long you think the slowest remaining remora will takeExample: slowest remaining remora estimated at 30s → pilot fish timeout = min(24, 25) = 24s
yieldMs > 30000 in exec calls — this holds the main turn hostageprocess(action=poll, timeout > 20000) in the main session — same reasonsleep or wait loops in the main threadrunTimeoutSeconds on remoras — unbound sub-agents are not sharksThe 30s cap isn't just a guideline — here's how to actually enforce it per runtime.
sessions_spawn({
task: "...",
mode: "run",
runtime: "subagent",
runTimeoutSeconds: 30 // hard kill after 30s — agent gets SIGTERM
})
runTimeoutSeconds is enforced by the OpenClaw runtime — the sub-agent process is killed if it exceeds it. Partial output is still returned.
exec({
command: "some-slow-command",
timeout: 30, // hard kill in seconds
background: true, // don't block the main agent turn
yieldMs: 500 // poll back quickly to check
})
timeout kills the process. background: true means the main agent doesn't wait — it gets a session handle and can check back with process(poll).
timeout 30 gemini -p "task here"
# or on Windows:
Start-Process gemini -ArgumentList '-p "task"' -Wait -Timeout 30
Wrap the CLI invocation with OS-level timeout / Start-Process -Timeout.
runTimeoutSecondssessions_spawn({
task: "pre-analyse partial results, draft structure, flag gaps",
mode: "run",
runTimeoutSeconds: estimatedRemainingMs / 1000, // die before the last remora
})
Set it to slightly less than your estimated remaining wait — so the pilot fish always finishes before you need to synthesise.
[timeout] in the progress bar instead of ✅⊙ [A] slow task ████████████ ⏱ 30s [timeout — partial result]
You can't hard-kill an LLM mid-turn, but you can:
thinking: "none" for fast sub-tasks that don't need deep reasoningRule of thumb: if a task description is >3 sentences, it probably needs to be split into remoras.
The Shark Pattern is runtime-agnostic. remoras can be any agent type.
sessions_spawn({
task: "...",
mode: "run",
runtime: "subagent",
runTimeoutSeconds: 30 // hard cap for pilot fish
})
sessions_spawn({
task: "...",
runtime: "acp",
agentId: "codex",
mode: "run",
runTimeoutSeconds: 30
})
Codex-specific lifecycle:
spawn_agent(...) or the runtime-equivalent remora launcherwait_agent(...)send_input(...)close_agent(id) so the agent does not linger in the sessionGemini CLI is a local process — spawn via exec with a timeout:
exec({
command: "gemini -p \"task description here\"",
timeout: 30, // hard cap in seconds
background: true, // don't block main agent
yieldMs: 500 // check back quickly
})
For Gemini sub-tasks, use exec with timeout + background: true rather than sessions_spawn. Treat the process handle the same way — continue working, collect output when it lands.
You can mix runtimes in the same shark run:
spawn remora A → Codex (coding task)
spawn remora B → Gemini (web search / analysis)
spawn remora C → Claude subagent (reasoning)
spawn pilot fish → Claude subagent (pre-analysis, time-bounded)
| Task type | Best runtime |
|---|---|
| ----------- | ------------- |
| Code generation / editing | Codex |
| Web search / summarise | Gemini CLI |
| Multi-step reasoning | Claude subagent |
| File ops / SSH / shell | exec (background) |
| Pre-analysis / drafting | Claude subagent (pilot fish) |
For slow shell commands (>5s), use the shark-exec companion skill:
shark-exec/SKILL.md in this repoexec call in background + cron pollerThe 30-second rule is best enforced at the shell level, not inside a turn.
Use shark.sh (or shark.ps1 on Windows) to run Claude in a bounded loop:
./shark.sh "find the latest ChatterPC version, check pve3, summarise GitHub issues"
Each iteration:
claude --print with a hard timeout 25s shell wrapper.shark-done → loop exitsThis is identical to the Ralph Loop pattern, but with the Shark Pattern as the prompt — Claude spawns remoras for slow work, keeps each turn under 25s, and the shell loop enforces the hard cut.
| Use case | Approach |
|---|---|
| ---------- | ---------- |
| Single fast task (<30s total) | claude --print "..." directly |
| Multi-step task, slow tools | ./shark.sh "..." loop |
| CI/build watching | shark-exec (background + cron) |
| Interactive chat | OpenClaw main session |
| Variable | Default | Description |
|---|---|---|
| ---------- | --------- | ------------- |
SHARK_MAX_LOOPS | 50 | Maximum iterations before giving up |
SHARK_LOOP_TIMEOUT | 25 | Per-turn timeout in seconds (hard kill) |
When Claude determines the task is done, it writes to .shark-done:
TASK_COMPLETE
<brief summary of what was accomplished>
The loop detects this file and exits cleanly.
When the user invokes these commands, follow the instructions for each.
/shark Apply the Shark Pattern to the given task. Decompose, spawn remoras for slow ops, keep the main fin moving. Follow all rules in this SKILL.md.
/shark-loop [--max-loops N] [--timeout S] Run the external shark loop enforcer. Execute:
$env:SHARK_MAX_LOOPS = "<N>"
$env:SHARK_LOOP_TIMEOUT = "<S>"
powershell.exe -ExecutionPolicy Bypass -File "<skill_dir>/shark.ps1" "<task>"
Defaults: --max-loops 50, --timeout 25. On Linux/Mac use shark.sh instead.
/shark-statusCheck current shark state:
/shark-exec/state/pending.json — report active background jobs (label, command, elapsed time, whether overdue past maxSeconds).shark-done exists, show its contentsSHARK_LOG.md exists, show the last 10 lines/shark-cleanRemove shark state files: .shark-done, SHARK_LOG.md, shark-exec/state/pending.json. Report what was cleaned.
/shark-autotuneAnalyse timing history and recommend optimal settings.
/state/timings.jsonl — each line is:```json
{"ts":1710000000,"loop":1,"elapsed_s":12.3,"timeout_s":25,"result":"ok|timeout|done","task_hash":"abc123"}
```
```
Current: SHARK_LOOP_TIMEOUT=25 SHARK_MAX_LOOPS=50
Recommended: SHARK_LOOP_TIMEOUT=N SHARK_MAX_LOOPS=M
Rationale:
```
Both shark.sh and shark.ps1 automatically record per-loop timings to state/timings.jsonl. Each entry includes:
ts — Unix timestamploop — loop iteration numberelapsed_s — actual wall-clock seconds for this turntimeout_s — configured timeout for this runresult — "ok" (completed), "timeout" (hit limit), "done" (task finished)task_hash — 8-char hash correlating loops within a single runUse /shark-autotune to analyse this data and tune your settings.
mode: "run", runtime: "subagent"npm install -g @google/gemini-cli共 1 个版本