← 返回
未分类 中文

adb-phone-control

Use when the user asks to control, operate, or automate an Android phone via ADB — tapping, swiping, typing, launching apps, or any UI interaction on a conne...
当用户要求通过 ADB 控制、操作或自动化 Android 手机(如点击、滑动、输入、启动应用或任何 UI 交互)时使用。
txmonkey txmonkey 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 425
下载
💾 0
安装
1
版本
#latest

概述

ADB Phone Control

Control Android devices through ADB with a structured observe-locate-act-verify loop.

Requirements

  • adb — Android Debug Bridge, must be in PATH
  • python3 — Required for app_explorer.py
  • ADB_OUTPUT_DIR (optional env var) — Directory for saving screenshots and UI dumps; defaults to current working directory

Permissions Used

This skill executes the following on the connected Android device:

  • adb shell input — tap, swipe, text input
  • adb shell uiautomator dump — UI hierarchy extraction
  • adb shell screencap — screen capture
  • adb shell am broadcast — ADBKeyboard IME input (for CJK text)
  • adb shell service call clipboard — clipboard-based text input fallback

Prerequisites

Before any operation, verify device connection:

adb devices

If no device found, instruct the user to:

  1. Connect via USB and enable USB Debugging
  2. Or connect wirelessly: adb connect :5555

Core Principle

NEVER guess coordinates from screenshots. ALWAYS use UI hierarchy as the primary locator.

Screenshots are for human-readable context and visual verification. UI dumps give exact pixel bounds.

Operation Loop

Every interaction follows this cycle:

┌─────────────────────────────────────────┐
│  1. OBSERVE  — dump UI + screenshot     │
│  2. LOCATE   — find element by text/id  │
│  3. ACT      — tap / swipe / type       │
│  4. VERIFY   — screenshot + dump again  │
│  5. REPEAT   — next action or done      │
└─────────────────────────────────────────┘

Do NOT skip the VERIFY step. UI transitions may take time; always confirm before proceeding.

Helper Functions

Source the helper script before starting any operation session:

source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh

Available Functions

FunctionUsageDescription
------------------------------
adb_dumpadb_dumpDump UI hierarchy to /tmp/ui_dump.xml
adb_screenshotadb_screenshotCapture screen to /tmp/adb_screen.png
adb_observeadb_observeDump UI + screenshot in one call
adb_tap_text "Submit"Find element by text, tap center
adb_tap_id "btn_send"Find element by resource-id, tap center
adb_tap_xy 540 1200Tap exact coordinates
adb_swipe x1 y1 x2 y2 [ms]Swipe between points (default 300ms)
adb_input_text "hello"Type text (supports spaces and CJK)
adb_key Send keyevent (BACK, HOME, ENTER, etc.)
adb_hide_keyboardPress BACK to dismiss keyboard
adb_scroll_downSwipe up to scroll content down
adb_scroll_upSwipe down to scroll content up
adb_long_press x y [ms]Long press at coordinates (default 1000ms)
adb_wait [seconds]Sleep before next action (default 1s)
adb_screen_sizeGet device screen resolution
adb_launch_app Launch app by package name
adb_find_package Search installed packages by keyword
adb_bounds_center "bounds_string"Parse "[x1,y1][x2,y2]" → center x y

Element Lookup Details

adb_tap_text and adb_tap_id work by:

  1. Running adb_dump to get fresh UI hierarchy
  2. Parsing the XML for matching text= or resource-id= attributes
  3. Extracting the bounds="[x1,y1][x2,y2]" attribute
  4. Computing center point: ((x1+x2)/2, (y1+y2)/2)
  5. Executing adb shell input tap

If multiple matches are found, the function taps the first match and prints a warning.

If no match is found, the function prints an error — fall back to adb_screenshot + Read tool for visual inspection.

Standard Operating Procedure

Phase 1: Setup

# Source helpers
source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh

# Verify connection
adb devices

# Get screen resolution (important for swipe calculations)
adb_screen_size

Phase 2: Navigate & Operate

For each interaction step:

# 1. Observe current state
adb_observe
# Then read /tmp/adb_screen.png with the Read tool to see the screen

# 2. Locate and act (prefer text/id over raw coordinates)
adb_tap_text "Create"
# or: adb_tap_id "iv_send"
# or as last resort: adb_tap_xy 540 2009

# 3. Wait for transition
adb_wait 2

# 4. Verify result
adb_screenshot
# Then read /tmp/adb_screen.png to confirm the action worked

Phase 3: Text Input

# Tap the input field first
adb_tap_text "Search..."
adb_wait 1

# Type text
adb_input_text "Hello World"

# Hide keyboard before tapping other elements
adb_hide_keyboard
adb_wait 1

# Now safe to tap other buttons
adb_tap_text "Send"

Critical Rules

1. UI Dump First, Screenshot Second

  • uiautomator dump gives exact bounds, element states (enabled/focused/clickable), text content, and resource IDs
  • Screenshots only for: visual verification, understanding layout context, or when UI dump fails (e.g., animations, WebView content)
  • When UI dump returns elements with NAF="true", the element has No Accessible Framework info — use screenshot + coordinates as fallback

2. Keyboard Awareness

  • Always hide keyboard before tapping non-input elements. The keyboard shifts the layout, making UI dump bounds stale.
  • After typing, call adb_hide_keyboard then adb_dump before tapping anything else.
  • If uiautomator dump returns ERROR: could not get idle state, the keyboard animation may still be running — wait 1s and retry.

3. Wait Strategy

  • After tap: wait 1s before next dump/screenshot
  • After launching app: wait 2-3s
  • After page navigation: wait 2s
  • After typing: wait 0.5s
  • If UI hasn't changed after action: wait longer, up to 5s, then re-check
  • Never blindly chain actions without verification

4. Chinese / CJK Text Input

adb shell input text does not support CJK characters natively. The helper adb_input_text handles this by:

  • Using adb shell am broadcast with ADBKeyboard if available
  • Falling back to clipboard-based input: copy to clipboard via adb shell service call clipboard, then paste

If ADB Keyboard IME is installed (com.android.adbkeyboard), enable it:

adb shell ime set com.android.adbkeyboard/.AdbIME

5. Coordinate System

  • All coordinates are in physical pixels matching the device resolution
  • adb shell wm size returns the canonical resolution (e.g., 1080x2340)
  • Screenshot pixel dimensions may differ from device resolution — never estimate coordinates from screenshot pixel positions
  • Always derive coordinates from uiautomator dump bounds

6. Handling Failures

If an action doesn't produce the expected result:

  1. Re-dump UI hierarchy — the element may have moved or state changed
  2. Take a screenshot — visual context may reveal popups, loading states, or errors
  3. Check if the element is enabled="true" and clickable="true" before tapping
  4. If element is not found by text, try partial match or search by resource-id
  5. If the app is in a WebView, UI dump may not capture web elements — use screenshot + coordinate estimation as fallback

7. App Launch

Prefer adb_find_package + adb_launch_app over monkey command:

# Find the app
adb_find_package "wechat"
# Launch it
adb_launch_app "com.tencent.mm"

Limitations

  • uiautomator dump doesn't work during animations — wait for idle state
  • WebView/Flutter/game content may not appear in UI hierarchy — use screenshot-based approach
  • Some custom views may have empty text and no resource-id — use bounds + screenshot cross-reference
  • Maximum ~100 actions per task is a reasonable limit to avoid infinite loops

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-03 09:35 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 844 📥 324,929
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,410 📥 325,168
ai-agent

Find Skills

guipi888
场景驱动+关键词双模式技能发现工具。当用户用自然语言描述场景/需求(如"我想做一个海报""帮我分析股票"),或明确说"安装技能/find skills/找个skill"时,自动从官方内置、本地已安装、SkillHub、虾评、GitHub、C
★ 1,492 📥 557,589