AI remote control for Windows desktop. Captures screen on-demand via POST request and provides an HTTP API for mouse/keyboard actions.
/capture to save screenshot, returns file pathscreenshot.jpg saved in skill folder, overwritten each time, auto-deleted on exitRun directly on Windows without WSL:
# Install dependencies (one-time)
pip install pywin32 pillow mss
# Start the server
cd %USERPROFILE%\.openclaw\workspace\skills\wincontrol
python server.py
Or use the provided batch file:
.\start.bat
Screenshots are saved as screenshot.jpg in the skill directory and auto-deleted when the server stops.
If you prefer WSL, the server runs on Windows Python but can be controlled from WSL:
# From WSL
cd ~/.openclaw/workspace/skills/wincontrol
./start.sh
Or start manually with PowerShell 7:
'/mnt/c/Program Files/PowerShell/7/pwsh.exe' -Command \
"python //wsl.localhost/Ubuntu/home/\$USER/.openclaw/workspace/skills/wincontrol/server.py" &
# Health check
curl http://localhost:8767/ping
# Output: {"ok": true}
# Capture a screenshot
curl -X POST http://localhost:8767/capture
# Output: {"ok": true, "path": ".../screenshot.jpg"}
Native Windows: Open screenshot.jpg in the skill directory
WSL: Access via the skill folder path
skills/wincontrol/
├── SKILL.md # This file
├── server.py # Main server (runs on Windows)
├── start.bat # Start script (Native Windows)
├── start.sh # Start script (WSL)
├── stop.sh # Stop script (WSL)
└── screenshot.jpg # Latest screenshot (auto-created, auto-cleaned)
curl -X POST http://localhost:8767/capture
Returns: {"ok": true, "path": ".../screenshot.jpg", "quality": 90}
Each capture overwrites screenshot.jpg in the skill directory. The file is automatically deleted when the server stops.
Optional quality override:
# Lower quality for faster capture/smaller file
curl -X POST http://localhost:8767/capture -H "Content-Type: application/json" -d '{"quality": 60}'
# Higher quality for detailed screenshots
curl -X POST http://localhost:8767/capture -d '{"quality": 95}'
| Quality | File Size (1080p) | Use Case |
|---|---|---|
| --------- | ------------------- | ---------- |
| 60 | ~80KB | Fast capture, quick checks |
| 90 (default) | ~150KB | Clear UI text, general use |
| 95 | ~200KB | Maximum detail, presentations |
# Move cursor (no click)
curl -X POST http://localhost:8767/move -d '{"x": 500, "y": 300}'
# Click
curl -X POST http://localhost:8767/click -d '{"x": 500, "y": 300}'
# Drag
curl -X POST http://localhost:8767/drag -d '{"x1": 100, "y1": 200, "x2": 300, "y2": 400}'
# Scroll
curl -X POST http://localhost:8767/scroll -d '{"x": 500, "y": 300, "direction": "down", "amount": 3}'
# Type text
curl -X POST http://localhost:8767/enter -d '{"keys": ["Hello World"]}'
# Press special key
curl -X POST http://localhost:8767/enter -d '{"keys": ["Enter"]}'
# Key combination (Ctrl+C)
curl -X POST http://localhost:8767/enter -d '{"keys": ["Ctrl", "C"]}'
# Mixed sequence: type, press key, then combo
curl -X POST http://localhost:8767/enter -d '{"keys": ["Hello", "Enter", "Ctrl", "A"]}'
The /enter endpoint accepts a list of keys and handles them sequentially:
Single keys:
Enter, Return, Escape, EscBackspace, Tab, SpaceDelete, DelUp, Down, Left, RightHome, End, PageUp, PageDownF1 through F12Modifiers (combine with other keys):
Ctrl, ControlAltShiftWin, Windows# Using PowerShell
$response = Invoke-RestMethod -Uri "http://localhost:8767/capture" -Method Post
Start-Process $response.path
FILE=$(curl -s -X POST http://localhost:8767/capture | python3 -c "import sys,json; print(json.load(sys.stdin)['path'])")
read "$FILE"
# Click somewhere
curl -X POST http://localhost:8767/click -d '{"x": 500, "y": 300}'
sleep 0.5
# Capture to see result
curl -X POST http://localhost:8767/capture
# Win+R to open Run
curl -X POST http://localhost:8767/enter -d '{"keys": ["Win", "R"]}'
sleep 0.5
# Type notepad and press Enter
curl -X POST http://localhost:8767/enter -d '{"keys": ["notepad", "Enter"]}'
sleep 1
# Type message
curl -X POST http://localhost:8767/enter -d '{"keys": ["Hello from WinControl!"]}'
```powershell
pip install pywin32 pillow mss
```
~/.openclaw/workspace/skills/wincontrol/```powershell
python server.py
```
See WSL2 Configuration section below.
POST /capturescreenshot.jpg in skill directoryTo change quality, edit server.py:
QUALITY = 90 # 1-100
The server automatically cleans up the screenshot when stopped.
Press Ctrl+C in the PowerShell window, or:
# Find and stop the process
Get-Process python | Where-Object {$_.CommandLine -like '*wincontrol*'} | Stop-Process
./stop.sh
Or manually:
# Kill Python process on Windows using PowerShell 7
'/mnt/c/Program Files/PowerShell/7/pwsh.exe' -Command \
"Get-Process python -ErrorAction SilentlyContinue | Where-Object {\$_.CommandLine -like '*wincontrol*'} | Stop-Process -Force"
Issue: pip install pywin32 fails
pip install pywin32 --upgrade --force-reinstallIssue: Port 8767 already in use
netstat -ano | findstr :8767taskkill /PID /F Issue: Cannot access from WSL
Screenshots are saved as screenshot.jpg in the skill folder, accessible from both Windows and WSL.
Issue: Server starts but curl fails
lsof -i :8767kill Issue: Python module errors
'/mnt/c/Program Files/PowerShell/7/pwsh.exe' -Command "pip install pywin32 pillow mss"
┌─────────────┐ ┌──────────────┐ ┌────────────────┐
│ Client │────▶│ server.py │────▶│ screenshot.jpg│
│ (curl/ps) │ │ (localhost) │ │ (skill dir) │
└─────────────┘ └──────────────┘ └────────────────┘
│
┌─────┴─────┐
│ Port │
│ 8767 │
└───────────┘
The server runs on Windows Python. Screenshots are saved as screenshot.jpg in the skill directory and automatically deleted when the server stops (Ctrl+C).
import requests
import time
API = "http://localhost:8767"
def capture():
"""Capture screen and return file path"""
r = requests.post(f"{API}/capture")
return r.json().get("path")
def click(x, y):
requests.post(f"{API}/click", json={"x": x, "y": y})
def enter(keys):
requests.post(f"{API}/enter", json={"keys": keys})
# Example workflow
if __name__ == "__main__":
# Type text, press Enter, then select all
enter(["Hello World", "Enter", "Ctrl", "A"])
time.sleep(0.5)
# Capture result
screenshot_path = capture()
print(f"Screenshot saved to: {screenshot_path}")
localhost only (not accessible from network)screenshot.jpg in skill folder, not system directories// Capture and view
const result = await exec("curl -s -X POST http://localhost:8767/capture");
const data = JSON.parse(result.stdout);
// Path will be Windows format: C:\Users\...
await read(data.path); // OpenClaw handles Windows paths
// Capture and view
const result = await exec("curl -s -X POST http://localhost:8767/capture");
const data = JSON.parse(result.stdout);
await read(data.path); // Path: /mnt/c/Users/...
// Or take action then capture
await exec("curl -X POST http://localhost:8767/click -d '{\"x\":500,\"y\":300}'");
await exec("sleep 0.5");
const screenshot = await exec("curl -s -X POST http://localhost:8767/capture");
MIT-0
2.0.1
共 3 个版本