Control Android devices through ADB with a structured observe-locate-act-verify loop.
app_explorer.pyThis skill executes the following on the connected Android device:
adb shell input — tap, swipe, text inputadb shell uiautomator dump — UI hierarchy extractionadb shell screencap — screen captureadb shell am broadcast — ADBKeyboard IME input (for CJK text)adb shell service call clipboard — clipboard-based text input fallbackBefore any operation, verify device connection:
adb devices
If no device found, instruct the user to:
adb connect :5555 NEVER guess coordinates from screenshots. ALWAYS use UI hierarchy as the primary locator.
Screenshots are for human-readable context and visual verification. UI dumps give exact pixel bounds.
Every interaction follows this cycle:
┌─────────────────────────────────────────┐
│ 1. OBSERVE — dump UI + screenshot │
│ 2. LOCATE — find element by text/id │
│ 3. ACT — tap / swipe / type │
│ 4. VERIFY — screenshot + dump again │
│ 5. REPEAT — next action or done │
└─────────────────────────────────────────┘
Do NOT skip the VERIFY step. UI transitions may take time; always confirm before proceeding.
Source the helper script before starting any operation session:
source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh
| Function | Usage | Description |
|---|---|---|
| ---------- | ------- | ------------- |
adb_dump | adb_dump | Dump UI hierarchy to /tmp/ui_dump.xml |
adb_screenshot | adb_screenshot | Capture screen to /tmp/adb_screen.png |
adb_observe | adb_observe | Dump UI + screenshot in one call |
adb_tap_text "Submit" | Find element by text, tap center | |
adb_tap_id "btn_send" | Find element by resource-id, tap center | |
adb_tap_xy 540 1200 | Tap exact coordinates | |
adb_swipe x1 y1 x2 y2 [ms] | Swipe between points (default 300ms) | |
adb_input_text "hello" | Type text (supports spaces and CJK) | |
adb_key | Send keyevent (BACK, HOME, ENTER, etc.) | |
adb_hide_keyboard | Press BACK to dismiss keyboard | |
adb_scroll_down | Swipe up to scroll content down | |
adb_scroll_up | Swipe down to scroll content up | |
adb_long_press x y [ms] | Long press at coordinates (default 1000ms) | |
adb_wait [seconds] | Sleep before next action (default 1s) | |
adb_screen_size | Get device screen resolution | |
adb_launch_app | Launch app by package name | |
adb_find_package | Search installed packages by keyword | |
adb_bounds_center "bounds_string" | Parse "[x1,y1][x2,y2]" → center x y |
adb_tap_text and adb_tap_id work by:
adb_dump to get fresh UI hierarchytext= or resource-id= attributesbounds="[x1,y1][x2,y2]" attribute((x1+x2)/2, (y1+y2)/2)adb shell input tap If multiple matches are found, the function taps the first match and prints a warning.
If no match is found, the function prints an error — fall back to adb_screenshot + Read tool for visual inspection.
# Source helpers
source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh
# Verify connection
adb devices
# Get screen resolution (important for swipe calculations)
adb_screen_size
For each interaction step:
# 1. Observe current state
adb_observe
# Then read /tmp/adb_screen.png with the Read tool to see the screen
# 2. Locate and act (prefer text/id over raw coordinates)
adb_tap_text "Create"
# or: adb_tap_id "iv_send"
# or as last resort: adb_tap_xy 540 2009
# 3. Wait for transition
adb_wait 2
# 4. Verify result
adb_screenshot
# Then read /tmp/adb_screen.png to confirm the action worked
# Tap the input field first
adb_tap_text "Search..."
adb_wait 1
# Type text
adb_input_text "Hello World"
# Hide keyboard before tapping other elements
adb_hide_keyboard
adb_wait 1
# Now safe to tap other buttons
adb_tap_text "Send"
uiautomator dump gives exact bounds, element states (enabled/focused/clickable), text content, and resource IDsNAF="true", the element has No Accessible Framework info — use screenshot + coordinates as fallbackadb_hide_keyboard then adb_dump before tapping anything else.uiautomator dump returns ERROR: could not get idle state, the keyboard animation may still be running — wait 1s and retry.adb shell input text does not support CJK characters natively. The helper adb_input_text handles this by:
adb shell am broadcast with ADBKeyboard if availableadb shell service call clipboard, then pasteIf ADB Keyboard IME is installed (com.android.adbkeyboard), enable it:
adb shell ime set com.android.adbkeyboard/.AdbIME
adb shell wm size returns the canonical resolution (e.g., 1080x2340)uiautomator dump boundsIf an action doesn't produce the expected result:
enabled="true" and clickable="true" before tappingPrefer adb_find_package + adb_launch_app over monkey command:
# Find the app
adb_find_package "wechat"
# Launch it
adb_launch_app "com.tencent.mm"
uiautomator dump doesn't work during animations — wait for idle state共 1 个版本