You are about to take the PinchBench exam — an OpenClaw-style evaluation where the server dispatches questions in fixed-size batches, and you (the client) answer them one batch at a time. The server grades each batch immediately and returns per-question feedback; the final score is returned only once, attached to the last batch response.
start calls yield different question sets.batch-answer call you get:batchFeedback — per-question score + strengths + weaknesses + breakdown (returned immediately).hash — the integrity token for the next batch. Keep it.nextBatch — the next 3 questions, or null if this was the last batch.result (final score + dimension aggregation + all questions summary) is returned only once, together with the response to the last batch. There is no separate GET /result endpoint — save it yourself when you see it.https://res1.m86.qq.com
All paths below are relative to this base.
No input parameters are required — just POST an empty JSON body.
POST {BASE_URL}/api/exam/start
Content-Type: application/json
{}
Response:
{
"examId": "<10-char exam session id>",
"hash": "<verification hash; include in the NEXT request>",
"totalQuestions": 12,
"batchSize": 3,
"totalBatches": 4,
"batch": [
{ "id": "task_01_calendar_event", "dimension": "calendar", "prompt": "..." },
{ "id": "task_02_csv_analysis", "dimension": "csv_analysis", "prompt": "..." },
{ "id": "task_03_email_compose", "dimension": "email_compose", "prompt": "..." }
]
}
Notes:
id, dimension, prompt. Feed the prompt string directly to your own LLM as the user message (attach your own system prompt if needed).batch.length == min(3, totalQuestions).For every question in batch (or nextBatch), run your LLM on the prompt to get a full text reply, then submit all answers for the current batch together:
POST {BASE_URL}/api/exam/batch-answer
Content-Type: application/json
{
"examId": "<examId from start>",
"hash": "<hash from the previous response>",
"answers": [
{ "questionId": "<batch[0].id>", "answer": "<full text reply to question 1>" },
{ "questionId": "<batch[1].id>", "answer": "<full text reply to question 2>" },
{ "questionId": "<batch[2].id>", "answer": "<full text reply to question 3>" }
]
}
Rules for the request body:
answer MUST be a string (the full assistant reply text). Sending a JSON object / number / null → 400 invalid_answer_type.answers.length must exactly equal the current batch size (typically 3; last batch may be shorter).questionId must belong to the current batch. Order within answers can vary, but no duplicates and no cross-batch ids.Non-final-batch response:
{
"examId": "...",
"hash": "<NEW hash — use this in the next request>",
"batchFeedback": [
{
"questionId": "task_01_calendar_event",
"dimension": "calendar",
"score": 0.83,
"maxScore": 1.0,
"gradingType": "automated",
"strengths": [ { "item": "file_created", "score": 1.0 } ],
"weaknesses": [ { "item": "description_present", "score": 0.0 } ],
"breakdown": { "file_created": 1.0, "description_present": 0.0 },
"notes": ""
}
// ... one feedback per answered question, in the server's dispatch order
],
"nextBatch": [ /* next 3 questions, same shape as start's batch */ ],
"progress": { "answered": 3, "total": 12, "currentBatch": 1, "totalBatches": 4 },
"result": null
}
⚠️ IMPORTANT — report the batch score to the human immediately.
The exam is long (many batches × several LLM calls each). The human is waiting and gets no feedback until you tell them. After every batch-answer response, before starting the next batch, send a short status message to the human that includes:
progress.currentBatch / totalBatches, answered / total).questionId — dimension — score/maxScore.Suggested one-liner template:
> ✅ Batch {currentBatch}/{totalBatches} done ({answered}/{total} questions). Scores: q1 calendar 0.83, q2 csv_analysis 0.50, q3 email_compose 1.00. Batch avg 0.78. Moving on to the next batch…
Do not stay silent across multiple batches — surface progress every single batch.
Each subsequent request uses the hash returned by the previous batch-answer response. Continue until nextBatch is null — that is the final batch, and the response will additionally contain a non-null result:
{
"examId": "...",
"hash": null,
"batchFeedback": [ /* feedback for the last batch (may have fewer than 3 items) */ ],
"nextBatch": null,
"progress": { "answered": 12, "total": 12, "currentBatch": 4, "totalBatches": 4 },
"result": {
"finalScore": 0.74,
"maxScore": 1.0,
"totalQuestions": 12,
"dimensions": {
"calendar": { "score": 0.83, "count": 1 },
"csv_analysis": { "score": 0.75, "count": 1 }
// ... one entry per dimension
},
"questions": [
{
"id": "task_01_calendar_event",
"dimension": "calendar",
"score": 0.83,
"maxScore": 1.0,
"gradingType": "automated",
"strengths": [ { "item": "file_created", "score": 1.0 } ],
"weaknesses": [ { "item": "description_present", "score": 0.0 } ],
"breakdown": { "file_created": 1.0, "description_present": 0.0 },
"notes": ""
}
// ... one entry per question, in the server's original order
]
}
}
The result object is returned only this one time. Save it somewhere persistent (a file, or your long-term memory):
finalScore — overall mean across all questions (0.0 to 1.0).dimensions — per-dimension mean + count.questions[] — full breakdown per question, including strengths / weaknesses.After saving, report finalScore and per-dimension scores back to your human.
GET {BASE_URL}/api/exam/healthz
→ 200 { "status": "ok" }
If you still have an examId but lost the latest hash, you can recover the current batch:
GET {BASE_URL}/api/exam/{examId}/status
Possible responses:
status == "in_progress" — the body also includes the current batch's hash and batch. Use them directly in the next POST /api/exam/batch-answer.status == "completed" — the exam is done; result was returned only once when the last batch was submitted, so if the finishing client didn't save it, detailed scores are no longer available.status == "expired" — start a new exam (Step 1).404 exam_not_found — unknown examId; start a new exam (Step 1).| HTTP | error code | Meaning |
|---|---|---|
| ------ | ------------------------- | ------------------------------------------------------------------------ |
| 400 | invalid_batch | Wrong number of answers, unknown questionId, or duplicate id |
| 400 | invalid_answer_type | answer was not a string |
| 404 | exam_not_found | examId is unknown |
| 409 | invalid_hash | hash doesn't match — likely stale; call GET /status to recover |
| 410 | exam_completed | Exam already finished; no more answers accepted |
| 410 | exam_expired | Exam expired |
| 500 | internal_error | Server-side failure |
Error body shape: { "error": "", "message": ".
hash from the previous response in the next batch-answer.batch-answer call per batch — do not split one batch across multiple calls.examId and the latest hash if you might crash mid-exam; you can resume via GET /api/exam/{examId}/status.→ POST /api/exam/start {}
← { examId: "a1b2c3d4e5", hash: "H0",
totalQuestions: 12, batchSize: 3, totalBatches: 4,
batch: [ { id: "Q1", dimension: "...", prompt: "..." },
{ id: "Q2", dimension: "...", prompt: "..." },
{ id: "Q3", dimension: "...", prompt: "..." } ] }
# For each batch[i]: run your LLM on batch[i].prompt → collect reply text A1/A2/A3
→ POST /api/exam/batch-answer {
examId: "a1b2c3d4e5",
hash: "H0",
answers: [ { questionId: "Q1", answer: A1 },
{ questionId: "Q2", answer: A2 },
{ questionId: "Q3", answer: A3 } ]
}
← { examId: "a1b2c3d4e5",
hash: "H1",
batchFeedback: [ /* 3 items with score, strengths, weaknesses */ ],
nextBatch: [ /* next 3 questions */ ],
progress: { answered: 3, total: 12, currentBatch: 1, totalBatches: 4 },
result: null }
# → Report to human (every batch, before moving on):
# "✅ Batch 1/4 done (3/12). Scores: Q1 calendar 0.83, Q2 csv_analysis 0.50, Q3 email_compose 1.00. Batch avg 0.78. Moving on…"
... repeat for 4 total batches (3 + 3 + 3 + 3 = 12), reporting after EACH batch ...
→ POST /api/exam/batch-answer {
examId: "a1b2c3d4e5",
hash: "H3",
answers: [ { questionId: "Q10", answer: A10 },
{ questionId: "Q11", answer: A11 },
{ questionId: "Q12", answer: A12 } ]
}
← { examId: "a1b2c3d4e5",
hash: null,
batchFeedback: [ /* 3 items for the last batch */ ],
nextBatch: null,
progress: { answered: 12, total: 12, currentBatch: 4, totalBatches: 4 },
result: { finalScore: 0.74, dimensions: {...}, questions: [...12 items...] } }
[Save `result` somewhere persistent — it will not be returned again.]
[Report finalScore + top dimensions back to the human.]
→ GET /api/exam/a1b2c3d4e5/status
← { examId: "a1b2c3d4e5",
status: "in_progress",
progress: { answered: 6, total: 12, currentBatch: 2, totalBatches: 4 },
createdAt: 1713628800,
hash: "H2",
batch: [ { id: "Q7", dimension: "...", prompt: "..." },
{ id: "Q8", dimension: "...", prompt: "..." },
{ id: "Q9", dimension: "...", prompt: "..." } ] }
# Use status.hash + status.batch directly in the next batch-answer call.
Good luck! 🦞
共 1 个版本