← 返回
沟通协作

Scrask

When the user sends a screenshot via any chat surface (Telegram, iMessage, Slack, etc.), parse it for events and tasks using OpenClaw's configured vision LLM...
当用户在任意聊天界面(如Telegram、iMessage、Slack等)发送截图时,使用OpenClaw配置的视觉语言模型解析其中的事件和任务。
devsandip
沟通协作 clawhub v1.0.3 2 版本 100000 Key: 无需
★ 0
Stars
📥 1,223
下载
💾 18
安装
2
版本
#latest

概述

Scrask Bot

Overview

Scrask is a screenshot-to-intent parser. The user sends a screenshot via whatever chat surface

they have wired into OpenClaw (Telegram, iMessage, Slack, etc.). Scrask:

  1. Decides whether the screenshot contains any actionable content (event, reminder, task). If not, ignores it.
  2. Extracts every actionable item — a single screenshot may yield both an event and a task.
  3. Emits structured intent JSON.
  4. The OpenClaw agent then delegates each item to the user's installed destination skill:
    • destination: "calendar"calctl / accli / apple-calendar / brainz-calendar / gcal-pro / etc.
    • destination: "task"apple-reminders / things-mac / notion / etc.

Scrask never writes to a store directly. No service account JSON, no OAuth, no API keys for the

calendar/task layer — that's the destination skill's job.

Invocation

Scrask is invoked in two ways. The platform tries explicit invocation first; if no alias matches, it falls back to the implicit trigger conditions.

Explicit override (checked first)

If the user message begins with any of these aliases (case-insensitive, with or without a @ or / prefix), the platform dispatches to Scrask regardless of the implicit conditions below:

  • scrask
  • scrask this
  • screenshot
  • screenshot to calendar

Examples that force-route to Scrask:

  • scrask this (with an attached image)
  • @scrask (with an attached image)
  • /scrask (with an attached image)
  • screenshot to calendar (with an attached image)

When invoked explicitly with no image attached, Scrask responds with a brief prompt asking the user to attach a screenshot, then stops. Do not run the parser without an image.

Implicit (default, used when no alias matches)

The OpenClaw agent reads the incoming message and activates Scrask when:

  1. The user sends a message in any connected chat surface that contains an image attachment.
  2. The image appears to be a screenshot — not a photo of a person, place, or physical object.
  3. No other skill has already claimed the image.

Do not activate (implicitly) for:

  • Photos of people, places, food, scenery.
  • Screenshots of code, errors, or UI bugs (leave for other skills).
  • Images the user explicitly asks to edit, describe, or analyze for another purpose.

The implicit path is the one users will hit by default. The explicit aliases exist for two cases:

  1. Debugging / power-user override — force Scrask to run on an ambiguous image the agent would otherwise route elsewhere (or skip).
  2. Recovery — if the agent misses an obvious screenshot, the user can recover with scrask this instead of resending.

Step-by-Step Instructions

Step 1: Acknowledge Immediately

Reply on the user's current chat surface so they know the skill is working:

> "📸 Got it — analyzing your screenshot..."

Step 2: Run the Parser

python3 {baseDir}/scripts/scrask_bot.py \
  --image-path "<path-to-temp-image>" \
  --provider "$CONFIG_VISION_PROVIDER" \
  --timezone "$CONFIG_TIMEZONE" \
  --confidence-threshold "$CONFIG_CONFIDENCE_THRESHOLD" \
  --actionable-threshold "$CONFIG_ACTIONABLE_THRESHOLD" \
  --type-threshold "$CONFIG_TYPE_THRESHOLD" \
  --field-threshold "$CONFIG_FIELD_THRESHOLD"

The script reads credentials from the environment — never pass them on the command line.

In default auto mode it routes by what is available:

  • GEMINI_API_KEY set → Gemini-first with Claude fallback (cheap + fast path).
  • ANTHROPIC_API_KEY set (no Gemini key) → Claude only.
  • Neither set → OpenClaw's configured vision LLM, read from the platform-injected env vars

OPENCLAW_VISION_PROVIDER, OPENCLAW_VISION_KEY, and optional OPENCLAW_VISION_MODEL.

So the skill works out of the box for any OpenClaw user with a vision-capable LLM

configured at the platform level. Bringing your own Gemini key only adds the cost-and-speed

optimisation on top.

The script returns JSON with:

  • success — whether parsing worked
  • no_actionable_content — true if nothing actionable was found
  • actionable_confidence — 0.0–1.0, how sure the parser is the screenshot is actionable
  • needs_actionable_confirmation — true if actionable_confidence is in the maybe band;

the bot should confirm "is this actually an event or task?" before dispatching

  • items[] — one entry per detected item with:
  • type, destination, confidence (legacy aggregate), type_confidence
  • confidences{} — per-field 0.0–1.0 scores (title, date, time, location,

participants, description, priority, …)

  • needs_confirmation — true when there is at least one outstanding clarification
  • clarifications[] — targeted questions to ask the user, e.g.

{ "field": "time", "question": "What time is dinner with Priya?", "reason": "low_confidence" }

  • all the extracted fields (title, date, time, location, participants, etc.)
  • summary_text — chat-ready preview of what was found; send this verbatim, do not rephrase
  • screenshot_summary, parse_notes — context

Step 3: Handle the Output

If no_actionable_content is true:

Silently ignore the screenshot — or, if the user clearly meant for scrask to act on it,

reply with the summary_text field (which is a polite "couldn't find anything" message).

If success is true:

Send the summary_text value back to the user on the same chat surface. Then process each item.

Step 4: Route Each Item to a Destination Skill

For every item in items[]:

If needs_actionable_confirmation: true (top level):

Send summary_text (which already opens with "Is this actually an event or task?") and wait for

the user. On "yes", proceed item-by-item below. On "no", reply "Got it, skipped ✓" and stop.

For each item — if needs_confirmation: false (no outstanding clarifications):

Invoke the appropriate destination skill without asking the user first.

  • destination: "calendar" → invoke the user's installed calendar skill. Preference order:

calctlaccliapple-calendarbrainz-calendargcal-pro → first available.

  • destination: "task" → invoke the user's installed task skill. Preference order:

apple-remindersthings-macnotion → first available.

Pass the item fields (title, date, time, end_time, end_date, location, participants,

description, recurrence, online_link, etc.) to whatever creation command that skill exposes.

If end_date is present and different from date, treat the item as a multi-day event.

For each item — if needs_confirmation: true:

The clarifications[] array lists the specific things to ask. Each entry has:

  • field — which field needs clarification (e.g. "time", "date", "type")
  • question — the user-facing question (already pre-formatted with the item title)
  • reason"missing" (value is null) or "low_confidence" (extracted but uncertain) or

"low_type_confidence" (unsure whether this is a calendar event or a task)

The summary_text already renders these as a bullet list. Ask the user the questions in order

and patch the corresponding fields with their replies. Once every clarification is resolved,

route the item to the destination skill as above. If the user says skip at any point, drop

the item and confirm "Got it, skipped ✓".

For the special case of field: "type", the user's reply determines whether the item routes to

calendar or task — update destination accordingly before dispatch.

Step 5: Confirm Saves

After each destination skill returns, relay a one-line confirmation to the user. Examples:

  • 📅 Added to Calendar via calctl: Team Standup — 2026-03-01 at 09:00
  • 🔔 Added to Reminders: Pay electricity bill (due 2026-02-28)
  • ✅ Added to Things: Send Sandip my resume

If the destination skill errors, surface the error and ask whether to retry with a different destination.

Edge Cases

ScenarioBehavior
------
Single screenshot has both an event and a taskProcess each independently; route to its own destination.
Event implies a prep step (e.g. dinner at a restaurant → book table)The parser emits BOTH an event and a prep reminder. Inferred fields on the prep reminder land in the 0.65–0.80 band, so most prep reminders hit needs_confirmation: true with targeted clarifications (typically time and date).
Multi-day event (trip, conference)end_date is set and differs from date. Pass both to the calendar skill (e.g. calctl add --date --end-date --all-day).
Rescheduled / cancelled eventParser extracts the NEW date; parse_notes flags it as a reschedule. Confirm with user before overwriting any existing entry.
Screenshot is in Hindi, Tamil, or another languageTitle and description are already in English; language holds the ISO code. Save as-is.
Recurring event ("every Monday")Pass recurrence and recurrence_day to the calendar skill.
Date has already passedFlag in the reply: "⚠️ This date has already passed. Save anyway?"
Screenshot of someone's calendaralready_in_calendar_hint: true — reply: "Looks like this is already in your calendar 🗓️" and skip.
No calendar / task skill installedReply with the missing-skill hint and stop.
Zoom/Meet link foundPass online_link to the calendar skill; it should set both location and description.
Meme / non-actionable screenshotno_actionable_content: true — ignore silently unless user clearly asked for action.

Configuration

{
  "skills": {
    "entries": {
      "scrask-bot": {
        "enabled": true,
        "env": {
          // Both keys are OPTIONAL in v4.2+. Without either, Scrask uses
          // OpenClaw's configured vision LLM via the platform-injected
          // OPENCLAW_VISION_* env vars. Setting GEMINI_API_KEY opts into
          // the cheap+fast Gemini routing. Setting ANTHROPIC_API_KEY adds
          // Claude as a fallback (or as the primary if no Gemini key).
          "GEMINI_API_KEY": "AIza-your-gemini-key",
          "ANTHROPIC_API_KEY": "sk-ant-your-key-here"
        },
        "config": {
          "vision_provider": "auto",
          "fallback_threshold": 0.60,
          "timezone": "Asia/Kolkata",
          "confidence_threshold": 0.75,
          "actionable_threshold": 0.70,
          "type_threshold": 0.70,
          "field_threshold": 0.70
        }
      }
    }
  }
}

ANTHROPIC_API_KEY is optional. Without it, auto mode runs Gemini only.

Permissions Required

  • image:read — to access the screenshot from the chat surface.
  • network:outbound — to call the vision model API (Gemini and optionally Claude).
  • chat:reply — to send confirmation messages back via the user's chat surface.
  • Whatever permissions the downstream calendar / task skill needs (handled by that skill).

版本历史

共 2 个版本

  • v1.0.3 当前
    2026-05-26 22:43
  • v1.0.1
    2026-03-29 08:24 安全 安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

communication-collaboration

Slack

steipete
当需要通过 slack 工具从 Clawdbot 控制 Slack 时使用,包括在频道或私信中回复消息或置顶/取消置顶项目。
★ 157 📥 47,670
communication-collaboration

Gmail

byungkyu
Gmail API 集成,托管 OAuth,支持读取、发送和管理邮件、线程、标签及草稿,适用于需要与 Gmail 交互的场景。
★ 72 📥 37,718
communication-collaboration

imap-smtp-email

gzlicanyi
使用IMAP/SMTP读取和发送邮件;检查新/未读邮件、获取内容、搜索邮箱、标记已读/未读、发送带附件的邮件。支持...
★ 113 📥 52,385