Repeatable, enforced process for executing a build queue across sessions. Same discipline as the heartbeat — mandatory checklists, verification gates, no shortcuts.
If a project has NEXT_TASKS.md, follow this protocol automatically. Ray only needs to say "work on [project]" or "keep going."
□ 0. Context Check (HARD STOP)
session_status FIRST□ 1. Read Project State (BEFORE model selection — you need to see the work first)
NEXT_TASKS.md — find the first unchecked [ ] itemSTATUS.md (if exists) — understand current build statememory/YYYY-MM-DD.md — know what happened recently□ 2. Model Selection (NOW that you've seen the tasks)
session_status(model="...")□ 3. Verify Previous Session's Work (DO THIS BEFORE NEW WORK)
□ 4. Baseline Test Run (NON-NEGOTIABLE)
```bash
python3 -m pytest -x -q --ignore=tests/integration
```
N passed, 0 failedtest-suite-repair skill□ 5. Git Checkpoint
git add -A && git commit -m "checkpoint: session start — N tests passing"□ 6. Check Jinx Status
ssh -i ~/.ssh/lucky_to_mac luckyai@100.90.7.148 "curl -s http://localhost:3001/status"□ 7. Plan the Session
coding-agent skill for the implementationFor EACH task, follow this loop without skipping steps:
□ Read the task description in NEXT_TASKS.md
□ Read the relevant source files (use the quick reference table at bottom of NEXT_TASKS.md)
□ Read the spec section — find the actual requirement in the spec and quote it:
```
SPEC REQUIREMENT: "The Judge worker evaluates 5-10 variants and selects
the best with a decision trace including confidence score."
— master_spec_full.txt, Section 4.2
```
Why: This is how we avoid building the wrong thing. If you can't find a spec reference, ask Ray before implementing.
□ Define "done" — one sentence: "This task is done when ___"
□ Jinx check: "Can Jinx do this?" If it doesn't need live coding/testing → queue for Jinx, skip to next task
□ Jinx split check: Before coding, break the task into parts. Can any part run on Jinx?
□ Write the code
□ Stay focused — one task only. Don't drift into adjacent work.
□ If you discover something broken, note it in NEXT_TASKS.md as a new item but don't fix it now unless it blocks you
□ Check skills — if this task type matches a skill (judge-worker-pattern, pipeline-debugging, ui-truth-mapping, etc.), read and follow that skill first
□ Write tests for the new code (if the task warrants tests)
□ Run the full fast test suite:
```bash
python3 -m pytest -x -q --ignore=tests/integration
```
□ Must pass with 0 failures before proceeding
□ Test count must be ≥ baseline — you cannot lose tests
This is the step that gets skipped. Don't skip it.
□ Switch to Opus model for verification: session_status(model="opus")
□ Read back the code you wrote — does it do what the task asked? Not "does it compile" — does it actually implement the requirement?
□ Check the output — call the function/endpoint/CLI and look at what it returns. Print it. Read it.
```python
# Actually run it and look at the output
result = the_thing_you_built(test_input)
print(json.dumps(result, indent=2))
```
□ Compare to spec — re-read the spec requirement you quoted in Step 1. Does the output match?
□ Edge cases — empty input, null, unexpected types, boundary values
□ If UI copy: Read the copy. Does it make sense to a human who isn't a developer?
□ Switch back to Sonnet after verification (don't burn Opus tokens on the next task's implementation)
Verification failures:
[partial] with notes on what's missing, add fix as new task□ Git checkpoint: git add -A && git commit -m "done: [task description] — N tests passing"
□ Queue Jinx follow-up BEFORE marking done: After EVERY completed task, ask "What can Jinx do now?"
□ Update NEXT_TASKS.md: Check the box [ ] → [x], add notes if needed
□ RUN THE ENFORCEMENT GATE (MANDATORY — NO EXCEPTIONS):
```bash
~/.openclaw/workspace/scripts/enforce.sh "
```
Update memory/YYYY-MM-DD.md using this format:
### Task: [name from NEXT_TASKS.md]
- **Status:** ✅ Done / ⚠️ Partial / ❌ Blocked
- **What was built:** [1-2 sentences]
- **Files changed:** [list]
- **Tests:** [before] → [after] (added N)
- **Verified:** [how — what you checked beyond tests]
- **Spec match:** [yes/no — quote spec if relevant]
- **Jinx queued:** [what follow-up work]
- **Surprises:** [anything unexpected, lessons learned]
Update STATUS.md if this was a significant milestone.
Log to Mission Control if available.
□ Did this task produce a reusable pattern?
□ If yes → note it in the Learning Capture section of today's memory
□ If it's substantial → create a skill (check existing skills first to avoid duplicates)
□ Run session_status — check context usage
□ If context > 60%: 1-2 more tasks max. Be selective.
□ If context > 70%: Finish current task, go to Phase 3 (wrap-up). STOP new work.
□ If context OK: Move to next unchecked task, repeat from Step 1.
When you hit a blocker:
□ Classify:
□ Handle by type:
[blocked: waiting on Ray — reason] in NEXT_TASKS.md, message Ray, skip to next task□ Never let one blocker stop all progress.
□ If ALL tasks blocked:
□ NEXT_TASKS.md current:
[x]□ Final test run:
```bash
python3 -m pytest -x -q --ignore=tests/integration
```
□ Final git checkpoint:
```bash
git add -A && git commit -m "session end: [N] tasks done, [M] tests passing"
```
□ Structured handoff note in memory/YYYY-MM-DD.md:
```markdown
## Session Wrap-Up [time]
```
□ Update STATUS.md with current test count and completion %
□ Queue Jinx overnight work if applicable
□ DO NOT leave work half-done. Finish or revert.
When to message Ray:
Don't message Ray for: routine task completion, passing tests, minor progress. Save it for batches.
A task is NOT done unless ALL of these are true:
Red Lines (Never Cross):
After completing any task, write one line:
If it's worth keeping long-term → update MEMORY.md at end of day.
共 1 个版本