← 返回
未分类 中文

FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones

Give AI agents physical eyes and a hand to operate any smartphone. Built-in 50+ atomic actions (click, swipe, OCR, screenshot). AI can call them directly. Also supports YAML scripts. Script Store: once one AI learns an app, all AI agents instantly master it. For strong-reasoning AI models: they can autonomously explore new apps via camera + robot arm — see, reason, act, verify. This closed-loop lets AI use any phone function, almost infinitely.
赋予AI代理物理眼和手,以操控任意手机。内置50+原子操作(点击、滑动、OCR、截图),AI直接调用。支持YAML脚本,脚本库使一个AI学会后全体即时掌握。强推理AI可通过摄像头+机械臂自主探索新应用——看、推理、行动、验证,形成闭环,实现几乎无限的手机功能操作。
hamlet0168 hamlet0168 来源
未分类 clawhub v1.0.2 1 版本 97500 Key: 无需
★ 1
Stars
📥 19
下载
💾 0
安装
1
版本
#latest

概述

When to Use This Skill

An AI Agent MUST use this skill when the user wants to physically interact with a real smartphone screen using a robot arm. This includes:

  • Clicking, tapping, swiping, or typing on a phone.
  • Opening an app on a real phone.
  • Finding text or icons on the phone screen (OCR or template matching).
  • Automating a sequence of actions on a phone (e.g., daily check‑in, repeated workflow).
  • Writing, running, or managing YAML scripts for automation.
  • Any mention of: robot arm, FlexArm, calibration, camera view, physical click, or real phone.

Example user utterances that should trigger this skill:

  • “Click the WeChat icon on my phone.”
  • “Swipe down on the screen.”
  • “Open TikTok on the real device.”
  • “Find the ‘Settings’ button on the screen and tap it.”
  • “Write a YAML script to check weather every morning at 8 AM.”
  • “Run my automation script for Qishui Music.”
  • “Use the robot arm to type ‘Hello’ into the search box.”

Do NOT use this skill if:

  • The user asks a pure knowledge question (e.g., “What’s the capital of France?”).
  • The user wants to operate a virtual or simulated phone (e.g., Android emulator without physical arm).
  • The user simply asks for code generation without any intent to execute it on a real phone.

FlexArm Robot — AI Agent Skill Reference

> Phone screen automation via robot arm + camera vision. Uses a camera to detect the phone screen area, maps pixel coordinates to physical arm coordinates, and performs precise clicks and swipes.

Environment & Initialization

All API calls in this skill depend on the RobotArmServer.exe service.

Before using this skill, the following conditions must be met:

  • Calibration Tool: RobotArmCalibration.exe

Download: Official Release Page

Latest version: v2.0.0

Size: ~160 MB (compressed)

  • Server Program: RobotArmServer.exe (included in RobotArmServer.zip)

Download: FlexArm v2.0.1 Release

Latest version: v2.0.0

Size: ~231 MB (compressed)

  • Installation:
  1. Download RobotArmServer.zip from the link above
  2. Extract to any directory (e.g., D:\FlexArm) — this becomes the project root
  3. Run RobotArmServer.exe as Administrator (first run requires admin privileges to install the driver)
  4. Verify: visit http://127.0.0.1:7826/api/health — should return {"ok":true}
  • Directory Convention:

All relative paths (e.g., scripts/, icons/) are relative to the project root above.

Do not modify files inside the _internal/ directory.

> ⚠️ If the service is not running, this skill cannot perform any operations.

> Before starting a task, always check /api/health status.

Important: Fixed Port

All API requests must use port 7826, not 5000.

http://127.0.0.1:7826/api/*

The port is fixed at 7826 and cannot be changed. Flask's default port 5000 does not apply.

Important: Chinese Characters in curl

Do NOT use curl to send Chinese characters in JSON. curl corrupts UTF-8 encoding and the server won't correctly recognize Chinese keywords, causing lookup failures.

# ❌ Wrong: curl corrupts Chinese characters in JSON
curl -X POST http://127.0.0.1:7826/api/find_text -d '{"text_keyword":"领取"}'

# ✅ Correct: use Python requests for Chinese parameters
python -c "import requests; r = requests.post('http://127.0.0.1:7826/api/find_text', json={'text_keyword': '领取'}); print(r.text)"

APIs with English-only parameters (e.g., detect_desktop, click_icon, run_script, click_at) may use curl. APIs involving Chinese keywords (find_text, click_text, detect_page page names) must use Python.


Complete API Index (54 endpoints)

> All endpoints below are accessible via HTTP POST/GET at http://127.0.0.1:7826.

System & Status (8)

#MethodEndpointDescription
----------------------------------
1GET/Service root
2GET/api/healthHealth check (service, arm, camera status)
3GET/api/arm_statusArm status (COM port, service, calibration)
4GET/api/get_frame_infoGet frame dimensions
5GET/api/get_overlayGet current overlay (vision match result)
6GET/api/get_phone_cornersGet phone screen 4-corner coordinates
7GET/api/is_phone_presentCheck if phone is in frame
8GET/api/is_screen_onCheck if screen is on

Display & Control (5)

#MethodEndpointDescription
----------------------------------
9GET/POST/api/show_windowOpen camera display window
10GET/POST/api/hide_windowClose camera display window
11POST/api/toggle_phone_cornersToggle phone screen border overlay
12POST/api/change_focusAdjust camera focus (delta value)
13GET/api/screenshotsList historical screenshot files

Actions (17)

#MethodEndpointDescription
----------------------------------
14POST/api/go_homeHome — return to desktop
15POST/api/go_backBack navigation
16POST/api/go_forwardForward navigation
17POST/api/resetReset robot arm to origin
18POST/api/clear_overlayClear vision match overlay boxes
19GET/POST/api/run_appLaunch a specified app
20POST/api/swipe_upSwipe up (large/small)
21POST/api/swipe_downSwipe down (large/small)
22POST/api/swipe_up_normalStandard swipe up (~80% success)
23POST/api/swipe_down_normalStandard swipe down
24POST/api/swipeCustom swipe (start/end percentages)
25POST/api/close_all_appsClose all background apps
26POST/api/click_iconTemplate-matching icon click
27POST/api/click_iconsClick multiple icons sequentially
28POST/api/click_icon_many_timesClick same icon multiple times
29POST/api/click_textOCR text search and click
30POST/api/click_atClick at frame pixel coordinates
31POST/api/clickClick at phone percentage coordinates
32POST/api/click_roiClick center of an ROI area
33POST/api/screenshotScreenshot (save file / return base64)
34POST/api/reload_gesturesReload gesture config (hot-reload)

Vision Detection (12)

#MethodEndpointDescription
----------------------------------
35POST/api/find_templateFull-screen template matching
36POST/api/find_template_roiROI-based template matching
37POST/api/find_textOCR text search
38POST/api/find_text_roiROI-based OCR text search
39POST/api/find_all_textRecognize all text on screen
40POST/api/find_all_templatesAll templates must match
41POST/api/find_any_templateAny template match is sufficient
42POST/api/count_templateCount template occurrences
43POST/api/detect_desktopDetect current desktop page
44POST/api/detect_pageDetect current app page
45POST/api/wait_for_templatePoll until template appears
46POST/api/wait_for_pagePoll until target page appears

Script Control (4)

#MethodEndpointDescription
----------------------------------
47POST/api/run_scriptExecute YAML script (async)
48GET/api/script_statusCheck if a script is running
49GET/api/script_progressGet script execution progress
50POST/api/stop_scriptForce-stop a running script

Configuration (3)

#MethodEndpointDescription
----------------------------------
51GET/PUT/api/config/dailyRead/update daily automation config
52GET/PUT/api/config/app/Read/update app page config
53GET/PUT/api/config/gestureRead/update gesture config

System Management (1)

#MethodEndpointDescription
----------------------------------
54POST/api/shutdownGraceful service shutdown

System Architecture

User / AI Agent
    │
    ├── HTTP API (POST/GET http://127.0.0.1:7826/api/*)
    │       Controls robot arm, camera, script execution
    │
    └── YAML Scripts (scripts/*.yaml)
            Define automation workflows (click icons, find text, loops, conditions)

For AI Agents: You cannot see the camera feed directly. Use these APIs to understand the phone screen state:

  • GET /api/get_frame_info — frame metadata
  • GET /api/is_phone_present — detect if phone is in frame
  • GET /api/is_screen_on — detect if screen is lit
  • POST /api/screenshot {"return_base64": true} — get base64 image data
  • POST /api/detect_page — detect current page name

Core Principles:

  • Arm / camera / script engine are exclusive resources — only one task may use them at a time
  • All vision operations rely on template matching (icons) and OCR (text)
  • Synchronous blocking — except /api/run_script, all API commands are synchronous and blocking. You must wait for the HTTP response (with "ok": true indicating completion) before sending the next command. run_script launches a background thread and returns immediately; monitor it with script_status, script_progress, stop_script
  • Resource protection: action commands are rejected with "script is running" during script execution
✅ Safe to call while script runs❌ Rejected while script runs
------
health, arm_status, script_status, script_progressrun_script (only one at a time)
get_frame_info, get_overlay, get_phone_cornersclick, click_icon, click_text, click_at, click_roi
is_phone_present, is_screen_onswipe, swipe_up, swipe_down, swipe_up_normal, swipe_down_normal
find_template, find_all_templates, find_any_template, count_template (cv2, thread-safe)go_home, go_back, go_forward
screenshot (base64 or file)find_text, find_all_text, find_text_roi (PaddleOCR singleton, not thread-safe)
wait_for_template (uses cv2 internally)detect_desktop, detect_page, wait_for_page (may call OCR)
reset, clear_overlay, close_all_apps, run_app, reload_gestures
  • Scripts are interpreted, no compilation needed, edit-and-run
  • The robot arm auto-resets after script completion

Quick Start: Hello FlexArm

Step 1: Confirm service is running

# Start the service
RobotArmServer.exe
# Default port: 7826

On startup the program auto-detects and initializes:

  1. Checks if the robot arm Windows service is running
  2. If not, tries to auto-install and start it
  3. Auto-detects the arm's COM port and connects
  4. Auto-loads the latest calibration file (or starts auto-calibration if none exists)
  5. Checks license (shows activation dialog if unlicensed)

> Note: First-time use requires Administrator privileges to install the Windows service. Run RobotArmServer.exe as Administrator. In daily use, administrator rights are not needed if the service is already installed.

> If service installation fails, the program still starts but the arm is unavailable. You can manually run robot-arm-service\安装.bat as Administrator to install the service.

Step 2: Test connectivity

curl http://127.0.0.1:7826/api/health
# Returns: {"ok": true, "data": {"status": "running", ...}}

Step 3: Open camera window, confirm phone is visible

curl -X POST http://127.0.0.1:7826/api/show_window -H "Content-Type: application/json" -d '{}'

After opening the window, you should see the phone screen. Press ESC to close.

> For AI Agents: Check screen state without seeing the window

>

> AI Agents cannot see the window. Use these APIs instead:

>

> ```bash

> # Get frame metadata

> curl http://127.0.0.1:7826/api/get_frame_info

> # Returns: {"ok":true,"data":{"width":960,"height":540}}

>

> # Detect if phone is in frame (brightness check)

> curl http://127.0.0.1:7826/api/is_phone_present

> # Returns: {"ok":true,"data":{"present":true}}

>

> # Detect if screen is on

> curl http://127.0.0.1:7826/api/is_screen_on

> # Returns: {"ok":true,"data":{"screen_on":true}}

>

> # Get current frame (base64, parseable by AI)

> curl -X POST http://127.0.0.1:7826/api/screenshot -H "Content-Type: application/json" -d '{"return_base64":true,"phone_only":true}'

> # Returns: {"ok":true,"data":{"base64":"iVBORw0KGgoAAAANSUhEUg..."}}

>

> # Save screenshot to file

> curl -X POST http://127.0.0.1:7826/api/screenshot -H "Content-Type: application/json" -d '{"filename":"screenshots/test.png"}'

> # Returns: {"ok":true,"data":{"path":"E:\\robot_arm\\screenshots\\test.png"}}

>

> # List historical screenshots

> curl http://127.0.0.1:7826/api/screenshots?limit=5

> # Returns: [{"filename":"...","size":123456,"time":"2026-05-24 18:00:12"},...]

> ```

Step 4: Execute your first action — click center of screen

curl -X POST http://127.0.0.1:7826/api/click \
  -H "Content-Type: application/json" \
  -d '{"x": 0.5, "y": 0.5}'

The robot arm automatically moves to and clicks the center of the phone screen (x: 0.5, y: 0.5 are percentage coordinates, range 0-1).

No icon templates or configuration needed — just calibrate and go.

Step 5: Write your first script

> ⚠️ Important: All YAML script files must use UTF-8 without BOM encoding. UTF-8 with BOM causes parse failures or garbled Chinese parameters.

Create scripts/hello_flexarm.yaml:

name: hello_flexarm
description: "First FlexArm script — experience clicking, swiping, waiting"

steps:
  # 1. Click center of screen
  - action: click
    x: 0.5
    y: 0.5

  # 2. Wait 1 second
  - action: wait
    seconds: 1

  # 3. Click bottom of screen (back navigation)
  - action: click
    x: 0.5
    y: 0.95

  # 4. Wait 2 seconds
  - action: wait
    seconds: 2

  # 5. Large swipe up (page turn)
  - action: swipe_up
    large: true

  # 6. Wait 1 second
  - action: wait
    seconds: 1

  # 7. Small swipe down
  - action: swipe_down
    large: false

  # 8. Click top-right corner
  - action: click
    x: 0.85
    y: 0.1

Run it:

curl -X POST http://127.0.0.1:7826/api/run_script \
  -H "Content-Type: application/json" \
  -d '{"path": "scripts/hello_flexarm.yaml"}'

> The example above only needs calibration — no icon templates or page config required.

>

> To learn more advanced actions (icon clicks, OCR text clicks, page switching, conditional branches), continue reading to understand icon templates and page definitions.


Configuration: Teach the Program About Your Phone

Phone Desktop Config — scripts/configs/app_desktop.yaml

Required, filename is fixed as app_desktop.yaml. The program uses it to identify pages on the phone desktop.

app_name: Phone Desktop

pages:
  - name: desktop_page0           # Page name, arbitrary string
    min_match: 2                  # At least 2 features must match
    must_features:                # Required (all must pass)
      - name: Phone
        type: image
        path: icons/app_phone.png
        mask: false               # false=4-corner sampling, loose matching
      - name: Camera
        type: image
        path: icons/app_camera.png
        mask: false
    features:                     # Optional (need min_match to pass)
      - name: Messages
        type: image
        path: icons/app_message.png
        mask: false
      - name: Settings
        type: image
        path: icons/app_settings.png
        mask: false

  - name: desktop_page1
    min_match: 3
    must_features:
      - name: Phone
        type: image
        path: icons/app_phone.png
        mask: false
      - name: Camera
        type: image
        path: icons/app_camera.png
        mask: false
    features:
      - name: Qishui Music
        type: image
        path: icons/app_qishui.png
        mask: false
      - name: WeChat
        type: image
        path: icons/app_wechat.png
        mask: false

  - name: TaskSwitcher          # Multi-task switching view
    min_match: 0
    must_features:
      - name: RecentApps
        type: image
        path: icons/task_show.png
        mask: false
      - name: Trash
        type: image
        path: icons/task_delete.png
        mask: false
    features: []

Key Fields:

  • must_features: all must match, or the page is skipped
  • features: optional; passes if matched count >= min_match
  • mask: false: uses 4-corner background sampling, tolerates icon size/position variation
  • mask: true (or omitted): strict template matching, suitable for fixed UI elements

App Page Config — scripts/configs/app_xxx.yaml

One config file per app, defining all recognizable pages within that app.

app_name: Qishui Music

pages:
  - name: Music
    min_match: 1
    must_features: []
    features:
      - name: BottomPlayerBar
        type: image
        path: icons/qishui/music_playing.png
        mask: false

  - name: Rewards
    min_match: 1
    must_features:
      - name: RewardsTitle
        type: text
        text: "福利"
    features: []

> When switch_page in a script doesn't match any page, it auto-executes the default branch — no separate config needed.

How to Create Icon Templates

  1. Send an email with your license code to hamlet0168@163.com to get the helper tools, including screen capture and script testing features.

Or manually: take a phone screenshot → crop the icon → place it in the icons/ directory.

Icon Requirements:

  • PNG format
  • Icon body should be complete with 2-3px empty border
  • Do not include dynamic content (countdowns, animations)
  • Use lowercase English + underscores for filenames, e.g., app_qishui.png

YAML Script Language

Basic Structure

name: script_name
description: "Script description"

steps:
  - action: action_type
    param1: value1
    param2: value2

Supported Actions

actionParametersDescription
--------------------------------
click_iconpath, threshold, roi, maskTemplate-matching icon click
click_iconspaths, intervalClick multiple icons sequentially (arm exits frame between clicks)
click_icon_many_timespath, count, intervalClick same spot multiple times without reset
dial_numbernumber, intervalSmart dialing (maps number to digit icons, supports # and *)
click_texttext, roi, min_scoreOCR text search and click
clickx, yClick phone percentage coordinates (0-1), ±30px random offset
click_atcam_x, cam_yClick frame pixel coordinates (precise, no offset)
click_roiroi, labelClick center of ROI area (phone screen percentage)
find_all_textroi, min_scoreRecognize all text, return list + positions + confidence
swipesx, sy, ex, ey, steps, step_wait_msCustom swipe (start/end percentages)
swipe_uplarge: true/falseSwipe up (large/small)
swipe_downlarge: true/falseSwipe down (large/small)
swipe_up_normalnoneStandard swipe up (~80% success)
swipe_down_normalnoneStandard swipe down
go_homemax_retriesReturn to desktop (detect → swipe up → detect loop)
go_backnoneBack navigation
go_forwardnoneForward navigation
resetnoneReset robot arm
clear_overlaynoneClear vision overlay boxes
run_appapp_nameLaunch app (go_home → detect page → swipe → click icon)
close_all_appsmax_swipesClose all background apps
screenshotfilename, phone_only, show_board, return_base64Screenshot
reload_gesturenoneReload gesture config (hot-reload)
set_video_to_coinvalueSet video-to-coin earning mode
waitsecondsWait (supports ranges like 2-5)
loopcount, stepsLoop sub-steps (supports random ranges like count: 3-5)
if_foundtype, path/text, then, elseConditional: if target found, run then; else run else
if_found_roitype, path/text, roi, then, elseSame as above but with ROI-limited search
if_progress_stoptemplate, roi, then, elseProgress bar stall detection
if_video_to_cointhen, elseBranch based on video-to-coin mode state
if_randomchance, then, elseRandom probability branch
detect_desktopconfigDetect if currently on desktop (no assertion)
detect_pageconfigDetect current page name (no assertion)
is_screen_onnoneCheck if screen is lit
assert_desktopconfigMust be on desktop, error if not
switch_pageconfig, casesDetect page → match cases → default if no match
run_scriptpathExecute sub-script (synchronous, returns on completion)
stop_loopnoneBreak current loop
stop_scriptnoneStop current script level (sub-script only stops itself)
logmessagePrint log message

Parameter Details

click_icon

- action: click_icon
  path: icons/app_qishui.png     # Icon path (relative to project root)
  threshold: 0.75                # Match threshold (default 0.75)
  roi: [0.1, 0.2, 0.5, 0.6]     # Search area [sx, sy, ex, ey] (phone percentage, 0-1)
  mask: false                    # false=loose, true=strict (default true)

click_text

- action: click_text
  text: "领取"                   # Text to find
  roi: [0.3, 0.5, 0.7, 0.8]     # Optional search area
  min_score: 0.5                 # OCR minimum confidence (default 0.3)

click (percentage coordinates)

- action: click
  x: 0.5                         # X percentage (0=left, 1=right)
  y: 0.96                        # Y percentage (0=top, 1=bottom)

loop

- action: loop
  count: 10                      # Fixed count
  # count: 3-5                   # Random range also supported
  steps:
    - action: click_text
      text: "领取"
    - action: wait
      seconds: 2

if_found (conditional branch)

- action: if_found
  type: image/text               # image=template matching, text=OCR
  path: icons/qishui/cross.png   # For type=image
  text: "继续观看"               # For type=text
  roi: [0.7, 0.0, 1.0, 0.15]    # Optional search area
  then:
    - action: click_icon
      path: icons/qishui/cross.png
      roi: [0.7, 0.0, 1.0, 0.15]
  else:
    - action: wait
      seconds: 2

if_random (random branch)

- action: if_random
  chance: 0.4                    # 40% chance to take then branch
  then:
    - action: log
      message: "Took then branch"
  else:
    - action: log
      message: "Took else branch"

switch_page (page switching)

- action: switch_page
  config: scripts/configs/app_qishui.yaml   # Page config file
  cases:
    Music:                                   # When "Music" page matches
      - action: click
        x: 0.5
        y: 0.96
    Rewards:
      - action: click_text
        text: "福利"
    default:                                 # When no page matches
      - action: swipe_up
        large: true

Script Nesting

steps:
  - action: run_script
    path: qishui/run_ad_card.yaml     # Execute sub-script
  - action: wait
    seconds: 5

After a sub-script finishes, execution returns to the parent script.

Random Wait

- action: wait
  seconds: 2-5          # Random wait between 2~5 seconds

Execution Model

  • Scripts execute sequentially — each action completes before the next begins
  • loop repeats its sub-steps for the specified count
  • switch_page iterates through all page definitions until a match is found
  • run_script is a sub-call — returns to the parent when done
  • stop_loop breaks the current loop
  • stop_script stops the current level (in a sub-script, only stops that sub-script)
  • After script completion or error, the robot arm auto-resets

HTTP API Reference

Basic Info:

  • Address: http://127.0.0.1:7826
  • All POST endpoints expect JSON body
  • Success: {"ok": true, "data": {...}}
  • Failure: {"ok": false, "error": "error message"}

1. Health Check

GET /api/health

Returns service status, port, uptime, etc.

2. Arm Status

GET /api/arm_status

Returns COM port, connection status, movement range, etc.

3. Camera Frame

Get frame info

GET /api/get_frame_info

Returns:

{"ok": true, "data": {"width": 540, "height": 960, "fps": 29.5}}

Show/hide window

POST /api/show_window
POST /api/hide_window

Detect phone presence

GET /api/is_phone_present
GET /api/is_phone_present?bright_threshold=60&bright_ratio=0.08

Returns:

{"ok": true, "data": {"present": true}}

Detect screen on/off

GET /api/is_screen_on
GET /api/is_screen_on?dark_threshold=30&dark_ratio=0.7

Returns:

{"ok": true, "data": {"screen_on": true}}

Toggle phone corners

POST /api/toggle_phone_corners

Overlays a green phone screen border on the display window.

Screenshot

POST /api/screenshot {"path": "screenshots/test.png"}    # Save to file
POST /api/screenshot {"return_base64": true}             # Return base64 (recommended for AI Agents)
POST /api/screenshot {"phone_only": true}                # Crop to phone area only
POST /api/screenshot {"show_board": true}                # Full view with ruler

List screenshots

GET /api/screenshots
GET /api/screenshots?limit=10

Returns:

{"ok":true,"data":[{"filename":"20260524_1800_phone.png","path":"E:\\robot_arm\\screenshots\\...","size":123456,"time":"2026-05-24 18:00:12"},...]}

4. Page Detection

Detect desktop

POST /api/detect_desktop {"desktop_config": "scripts/configs/app_desktop.yaml"}

Returns:

{"ok": true, "data": {"matched": true, "page_name": "desktop_page1", "score": 0.84}}

Detect specific page

POST /api/detect_page {"config_path": "scripts/configs/app_qishui.yaml", "threshold": 0.75}

Returns:

{"ok": true, "data": {"matched": true, "page_name": "Rewards", "score": 0.82}}

5. Vision Search

Find icon

POST /api/find_template
{"path": "icons/app_qishui.png", "threshold": 0.75, "roi": [0.1, 0.2, 0.5, 0.6], "auto_mask": false}

Returns:

{"ok": true, "data": {"x": 257, "y": 453, "w": 52, "h": 53, "score": 0.9446}}

Find text

POST /api/find_text {"text_keyword": "领取", "roi": [0.3, 0.5, 0.7, 0.8], "min_score": 0.5}

Returns:

{"ok": true, "data": {"x": 300, "y": 600, "w": 40, "h": 20, "text": "领取奖励", "score": 0.91}}

ROI-based text search

POST /api/find_text_roi {"roi": [0.0, 0.6, 1.0, 1.0], "text_keyword": "夸克", "min_score": 0.3}

Similar to find_text but requires roi (array format [sx, sy, ex, ey], 0-1).

ROI-based template matching

POST /api/find_template_roi {"path": "icons/app_qishui.png", "roi": [0.1, 0.2, 0.5, 0.6], "threshold": 0.75}

Similar to find_template but requires a roi region.

Adjust camera focus

POST /api/change_focus {"value": 2}     # Focus near +2
POST /api/change_focus {"value": -2}    # Focus far -2

Incremental adjustment (range 0~500). Returns {"ok": true, "data": {"focus": 310.0}}.

Find all text

POST /api/find_all_text {"min_score": 0.5}

Returns all recognized text on screen.

Performance Warning: find_all_text does a full-screen OCR scan using CPU inference. Time varies by text density:

  • Sparse text (<20 items): ~15 seconds
  • Dense text (novel apps, etc.): 90~120+ seconds

Best Practice: Use find_text with a specific roi whenever possible — it's orders of magnitude faster.

Find all templates

POST /api/find_all_templates {"template_paths": ["icons/a.png", "icons/b.png"], "threshold": 0.75}

Returns true only if all templates are found.

Find any template

POST /api/find_any_template {"template_paths": ["icons/a.png", "icons/b.png"], "threshold": 0.75}

Returns the first matched icon.

6. Actions

Return to desktop

POST /api/go_home {"max_retries": 5}

max_retries: maximum retry attempts, default 5 (detect desktop → swipe up → re-detect).

Back / Forward

POST /api/go_back
POST /api/go_forward

Reset

POST /api/reset

Swipe up / down

POST /api/swipe_up {"large": true}     # Large swipe up
POST /api/swipe_down {"large": false}  # Small swipe down

Click icon

POST /api/click_icon
{"path": "icons/app_qishui.png", "threshold": 0.75, "roi": [0.1, 0.2, 0.5, 0.6], "mask": false, "reset": true}

Click multiple icons

POST /api/click_icons
{"paths": ["icons/phone/num1.png", "icons/phone/num3.png", "icons/phone/num2.png"], "interval": 1}

Clicks each icon sequentially. After each click the arm exits the frame, then resets after all clicks. Returns {"ok": true, "data": {"clicked": true, "success_count": N, "failed": []}}.

Click same icon multiple times

POST /api/click_icon_many_times
{"path": "icons/qishui/like.png", "count": 3, "interval": 0.5}

Searches for the icon once, then clicks the same position multiple times without moving or resetting. Resets only after all clicks. Returns {"ok": true, "data": {"clicked": true, "clicks": 3}}.

Click text

POST /api/click_text
{"text": "领取", "roi": [0.3, 0.5, 0.7, 0.8], "min_score": 0.5}

Click coordinates

POST /api/click {"x": 0.5, "y": 0.96}

Click ROI center

POST /api/click_roi {"roi": [0.3, 0.5, 0.7, 0.8]}

Custom swipe

POST /api/swipe
{"sx": 0.5, "sy": 0.8, "ex": 0.5, "ey": 0.1, "steps": 5, "duration": 0.3}

sx/sy/ex/ey are phone percentage coordinates, steps is the number of steps, duration is total swipe time (seconds).

Close all apps

POST /api/close_all_apps {"max_swipes": 15}

Launch app

GET /api/run_app?app_name=汽水音乐
# or
POST /api/run_app {"app_name": "汽水音乐"}

Looks up the app icon in app_desktop.yaml and clicks it.

7. Script Control

Run script

POST /api/run_script {"path": "scripts/qishui_daily.yaml"}

Returns:

{"ok": true, "data": {"script": "D:\\...\\scripts/qishui_daily.yaml", "status": "started"}}

Check script status

GET /api/script_status

Returns:

{"ok": true, "data": {"running": false, "current_script": null}}

Get script progress

GET /api/script_progress

Returns full execution log + stats:

{
  "ok": true,
  "data": {
    "running": true,
    "script": "scripts/xxx.yaml",
    "current_step": {"step_index": 5, "action": "switch_page", "target": "拨号页", "status": "ok", "detail": "Matched branch: 拨号页", "timestamp": 1780329450.91},
    "steps_log": [
      {"step_index": 0, "action": "script_start", "target": "test_66", "status": "ok", "detail": "5 top-level steps", "timestamp": ...},
      ...
    ],
    "stats": {
      "total_steps": 24,
      "completed_steps": 23,
      "failed_steps": 0,
      "elapsed": 61.6,
      "status": "running"
    }
  }
}
  • steps_log: complete step history with timestamps
  • stats: progress stats including elapsed time, completed/failed counts
  • Idle state: {"running": false, "script": null, "current_step": null, "steps_log": [], "stats": {}}

Stop script

POST /api/stop_script

Wait for template

POST /api/wait_for_template {"path": "icons/qishui/reward_popup.png", "timeout": 10, "interval": 0.5}

Polls until the template appears or timeout. Checks every interval seconds within timeout seconds.

Wait for page

POST /api/wait_for_page {"config_path": "scripts/configs/app_xxx.yaml", "target_name": "RewardsPage", "timeout": 15}

Polls until the specified page appears or timeout.

Clear overlay

POST /api/clear_overlay

8. Configuration

Read/update daily config

GET  /api/config/daily
PUT  /api/config/daily  {"windows": [...]}

Read/update app page config

GET  /api/config/app/qishui           # Returns app_qishui.yaml content
PUT  /api/config/app/qishui           # Update config (body is YAML text)

Read/update gesture config

GET  /api/config/gesture
PUT  /api/config/gesture  {...}

9. Shutdown

POST /api/shutdown

Graceful shutdown: detect running script → safe terminate → arm reset → release resources → process exit.

> Note: Do not kill the process directly — the COM port won't be released, and you'll need to reinstall the driver on next startup.

10. System Tray

RobotArmServer.exe minimizes to the system tray on launch:

  • Left click / double-click: Show/hide console window
  • Right-click menu: "Show/Hide Console", "Exit Service"
  • Tray exit = API /api/shutdown (same graceful shutdown flow)

Directory Structure

RobotArmServer/
├── RobotArmServer.exe          ← Main program
├── _internal/                  ← Program libraries (do not touch)
├── scripts/                    ← YAML scripts directory
│   ├── hello_flexarm.yaml      ← Your first script
│   ├── daily_config.yaml       ← Daily scheduled task config
│   ├── configs/
│   │   ├── app_desktop.yaml    ← Phone desktop config (required)
│   │   ├── app_qishui.yaml     ← Qishui Music page config
│   │   └── app_kuaishou.yaml   ← Kuaishou page config
│   └── qishui/                 ← Sub-scripts directory
│       ├── run_*.yaml
│       └── music_actions.yaml
├── icons/                      ← Icon templates directory
│   ├── app_phone.png
│   ├── app_camera.png
│   ├── app_qishui.png
│   └── qishui/
│       ├── cross.png
│       └── ...
├── calibrations/               ← Calibration results (auto-generated)
├── screenshots/                ← Screenshot save directory
├── camera_config.json          ← Camera focus config
├── device_config.json          ← Device config
├── gesture_config.json         ← Swipe gesture config
└── robot-arm-service/          ← Windows service driver

FAQ

Q: The robot arm doesn't move?

  1. Make sure robot-arm-service\安装.bat has been run as Administrator
  2. Verify GET /api/arm_status returns "connected": true
  3. Ensure calibration is complete (a JSON file exists in calibrations/)

Q: Icon not found?

  1. Confirm the icon file exists in icons/
  2. Lower the threshold (e.g., 0.65)
  3. Set mask: false (4-corner background sampling, more tolerant)
  4. Specify a roi to narrow the search area
  5. Check that the icon template matches the on-screen icon (size, color, background)

Q: Text not found?

  1. Improve image clarity (adjust camera focus: POST /api/change_focus {"value": 5})
  2. Raise min_score to 0.5+ to reduce false matches
  3. Specify a roi to narrow the search area
  4. Use find_all_text to see what text the OCR actually recognizes

Q: Script execution interrupted?

  1. Check GET /api/script_progress to see which step is stuck
  2. Check log output (service console)
  3. Ensure the phone hasn't shown a system dialog (permissions, notifications, etc.)

Q: How to add automation for a new app?

  1. Capture the app icon → place it in icons/
  2. Update scripts/configs/app_desktop.yaml (add the new icon to features)
  3. Create scripts/configs/app_xxx.yaml (define each page inside the app)
  4. Write scripts/run_xxx.yaml (define the workflow)
  5. Run: POST /api/run_script {"path": "scripts/run_xxx.yaml"}

Error Handling Guide

All APIs return a uniform format: {"ok": true/false, "data": {...}, "error": "..."}

Common Errors & Strategies

ErrorCauseResolution
--------------------------
"error": "Script is running"A script is executing in the backgroundCheck script_status, wait for completion, or call stop_script
"error": "RobotActions not initialized"Arm not connected / service not startedGuide user to check robot-arm-service\安装.bat
"error": "Missing parameter: path"Incomplete request parametersCheck API call parameters
"ok": false, "data": null (find_template)Icon not foundLower threshold or check icon file, do not retry indefinitely
"ok": false, "data": null (find_text)OCR text not foundWiden roi or lower min_score, try at most 2-3 times then report
"error": "Unauthorized"License check failedGuide user to activate

Agent Retry Recommendations

  • Icon lookup: fail → lower threshold → retry once → still fail → report to user
  • Text lookup: fail → widen ROI → retry once → still fail → report to user
  • Page detection: no match → try go_back or go_home → re-detect → still no match → report to user
  • Script conflict: received "script running" → check script_status → if running, wait or report to user

Agent Conversation Example: Find and Open Qishui Music

Below is a complete example showing how an AI Agent combines APIs to "find and open the Qishui Music app on the desktop":

Step 1: Check service status

curl http://127.0.0.1:7826/api/health
# Returns: {"ok":true,"data":{"status":"running","arm_connected":true,...}}

Step 2: Detect current desktop

curl -X POST http://127.0.0.1:7826/api/detect_desktop -H "Content-Type: application/json" -d '{}'
# Returns: {"ok":true,"data":{"page_name":"desktop_page1","score":0.94,"matched":true}}

Step 3: Find the Qishui Music icon

curl -X POST http://127.0.0.1:7826/api/find_template -H "Content-Type: application/json" -d '{"path":"icons/app_qishui.png","threshold":0.75}'
# Returns: {"ok":true,"data":{"x":242,"y":516,"w":55,"h":55,"score":0.92}}

Step 4: Click the icon

curl -X POST http://127.0.0.1:7826/api/click_icon -H "Content-Type: application/json" -d '{"path":"icons/app_qishui.png"}'
# Returns: {"ok":true,"data":{"clicked":true,"score":0.92,...}}

Step 5: Wait for app launch, detect page

python -c "import requests,time; time.sleep(2)"
python -c "import requests; r=requests.post('http://127.0.0.1:7826/api/detect_page',json={'config_path':'scripts/configs/app_qishui.yaml'}); print(r.text)"
# Returns: {"ok":true,"data":{"page_name":"Music","score":0.85,"matched":true}}

Step 6: Confirm phone is in frame

curl http://127.0.0.1:7826/api/is_phone_present
# Returns: {"ok":true,"data":{"present":true}}

✅ Task complete: Qishui Music is open, currently on the Music page.

Or more directly, if app_desktop.yaml is properly configured and the Qishui Music icon exists, you can use the run_app API endpoint directly. It will intelligently auto-navigate, find the correct desktop page, and click the icon.

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-06-09 19:56

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,352 📥 317,835
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 323,899
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,058 📥 797,040