desktop-controlAdvanced desktop automation with mouse, keyboard, and screen control
Install via ClawdBot CLI:
clawdbot install matagul/desktop-controlThe most advanced desktop automation skill for OpenClaw. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.
First, install required dependencies:
```bash
pip install pyautogui pillow opencv-python pygetwindow
```
```python
from skills.desktop_control import DesktopController
dc = DesktopController(failsafe=True)
dc.move_mouse(500, 300) # Move to coordinates
dc.click() # Left click at current position
dc.click(100, 200, button="right") # Right click at position
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c") # Copy
dc.press("enter")
screenshot = dc.screenshot()
position = dc.get_mouse_position()
```
move_mouse(x, y, duration=0, smooth=True)Move mouse to absolute screen coordinates.
Parameters:
x (int): X coordinate (pixels from left)y (int): Y coordinate (pixels from top)duration (float): Movement time in seconds (0 = instant, 0.5 = smooth)smooth (bool): Use bezier curve for natural movementExample:
```python
dc.move_mouse(1000, 500)
dc.move_mouse(1000, 500, duration=1.0)
```
move_relative(x_offset, y_offset, duration=0)Move mouse relative to current position.
Parameters:
x_offset (int): Pixels to move horizontally (positive = right)y_offset (int): Pixels to move vertically (positive = down)duration (float): Movement time in secondsExample:
```python
dc.move_relative(100, 50, duration=0.3)
```
click(x=None, y=None, button='left', clicks=1, interval=0.1)Perform mouse click.
Parameters:
x, y (int, optional): Coordinates to click (None = current position)button (str): 'left', 'right', 'middle'clicks (int): Number of clicks (1 = single, 2 = double)interval (float): Delay between multiple clicksExample:
```python
dc.click()
dc.click(500, 300, clicks=2)
dc.click(button='right')
```
drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')Drag and drop operation.
Parameters:
start_x, start_y (int): Starting coordinatesend_x, end_y (int): Ending coordinatesduration (float): Drag durationbutton (str): Mouse button to useExample:
```python
dc.drag(100, 100, 500, 500, duration=1.0)
```
scroll(clicks, direction='vertical', x=None, y=None)Scroll mouse wheel.
Parameters:
clicks (int): Scroll amount (positive = up/left, negative = down/right)direction (str): 'vertical' or 'horizontal'x, y (int, optional): Position to scroll atExample:
```python
dc.scroll(-5)
dc.scroll(10)
dc.scroll(5, direction='horizontal')
```
get_mouse_position()Get current mouse coordinates.
Returns: (x, y) tuple
Example:
```python
x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")
```
type_text(text, interval=0, wpm=None)Type text with configurable speed.
Parameters:
text (str): Text to typeinterval (float): Delay between keystrokes (0 = instant)wpm (int, optional): Words per minute (overrides interval)Example:
```python
dc.type_text("Hello World")
dc.type_text("Hello World", wpm=60)
dc.type_text("Hello World", interval=0.1)
```
press(key, presses=1, interval=0.1)Press and release a key.
Parameters:
key (str): Key name (see Key Names section)presses (int): Number of times to pressinterval (float): Delay between pressesExample:
```python
dc.press('enter')
dc.press('space', presses=3)
dc.press('down')
```
hotkey(*keys, interval=0.05)Execute keyboard shortcut.
Parameters:
*keys (str): Keys to press togetherinterval (float): Delay between key pressesExample:
```python
dc.hotkey('ctrl', 'c')
dc.hotkey('ctrl', 'v')
dc.hotkey('win', 'r')
dc.hotkey('ctrl', 's')
dc.hotkey('ctrl', 'a')
```
key_down(key) / key_up(key)Manually control key state.
Example:
```python
dc.key_down('shift')
dc.type_text("hello") # Types "HELLO"
dc.key_up('shift')
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')
```
screenshot(region=None, filename=None)Capture screen or region.
Parameters:
region (tuple, optional): (left, top, width, height) for partial capturefilename (str, optional): Path to save imageReturns: PIL Image object
Example:
```python
img = dc.screenshot()
dc.screenshot(filename="screenshot.png")
img = dc.screenshot(region=(100, 100, 500, 300))
```
get_pixel_color(x, y)Get color of pixel at coordinates.
Returns: RGB tuple (r, g, b)
Example:
```python
r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")
```
find_on_screen(image_path, confidence=0.8)Find image on screen (requires OpenCV).
Parameters:
image_path (str): Path to template imageconfidence (float): Match threshold (0-1)Returns: (x, y, width, height) or None
Example:
```python
location = dc.find_on_screen("button.png")
if location:
x, y, w, h = location
# Click center of found image
dc.click(x + w//2, y + h//2)
```
get_screen_size()Get screen resolution.
Returns: (width, height) tuple
Example:
```python
width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")
```
get_all_windows()List all open windows.
Returns: List of window titles
Example:
```python
windows = dc.get_all_windows()
for title in windows:
print(f"Window: {title}")
```
activate_window(title_substring)Bring window to front by title.
Parameters:
title_substring (str): Part of window title to matchExample:
```python
dc.activate_window("Chrome")
dc.activate_window("Visual Studio Code")
```
get_active_window()Get currently focused window.
Returns: Window title (str)
Example:
```python
active = dc.get_active_window()
print(f"Active window: {active}")
```
copy_to_clipboard(text)Copy text to clipboard.
Example:
```python
dc.copy_to_clipboard("Hello from OpenClaw!")
```
get_from_clipboard()Get text from clipboard.
Returns: str
Example:
```python
text = dc.get_from_clipboard()
print(f"Clipboard: {text}")
```
'a' through 'z'
'0' through '9'
'f1' through 'f24'
'enter' / 'return''esc' / 'escape''space' / 'spacebar''tab''backspace''delete' / 'del''insert''home''end''pageup' / 'pgup''pagedown' / 'pgdn''up' / 'down' / 'left' / 'right''ctrl' / 'control''shift''alt''win' / 'winleft' / 'winright''cmd' / 'command' (Mac)'capslock''numlock''scrolllock''.' / ',' / '?' / '!' / ';' / ':''[' / ']' / '{' / '}''(' / ')''+' / '-' / '*' / '/' / '='Move mouse to any corner of the screen to abort all automation.
```python
dc = DesktopController(failsafe=True)
```
```python
dc.pause(2.0)
if dc.is_safe():
dc.click(500, 500)
```
Require user confirmation before actions:
```python
dc = DesktopController(require_approval=True)
dc.click(500, 500) # Prompt: "Allow click at (500, 500)? [y/n]"
```
```python
dc = DesktopController()
dc.click(300, 200)
dc.type_text("John Doe", wpm=80)
dc.press('tab')
dc.type_text("john@example.com", wpm=80)
dc.press('tab')
dc.type_text("SecurePassword123", wpm=60)
dc.press('enter')
```
```python
region = (100, 100, 800, 600) # left, top, width, height
img = dc.screenshot(region=region)
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
img.save(f"capture_{timestamp}.png")
```
```python
dc.key_down('ctrl')
dc.click(100, 200) # First file
dc.click(100, 250) # Second file
dc.click(100, 300) # Third file
dc.key_up('ctrl')
dc.hotkey('ctrl', 'c')
```
```python
dc.activate_window("Calculator")
time.sleep(0.5)
dc.type_text("5+3=", interval=0.2)
time.sleep(0.5)
dc.screenshot(filename="calculation_result.png")
```
```python
dc.drag(
start_x=200, start_y=300, # File location
end_x=800, end_y=500, # Folder location
duration=1.0 # Smooth 1-second drag
)
```
duration=0get_screen_size() to confirm dimensionsinterval for reliabilityDesktopController(failsafe=False)Install all:
```bash
pip install pyautogui pillow opencv-python pygetwindow
```
Built for OpenClaw - The ultimate desktop automation companion š¦
Generated Feb 23, 2026
This skill can automate UI testing for desktop applications by simulating mouse clicks, keyboard inputs, and verifying screen elements. It enables regression testing, reducing manual effort and improving software quality assurance.
It automates repetitive data entry tasks across spreadsheets, forms, and databases by typing text, navigating fields, and clicking buttons. This increases efficiency and reduces human error in administrative workflows.
The skill allows for remote control of desktop environments to perform troubleshooting, software installations, or user training. It supports multi-monitor setups and precise actions, enhancing IT support services.
It can automate in-game actions like movement, clicking, and key presses for testing or repetitive tasks in desktop games. Features like image recognition help detect on-screen elements for adaptive automation.
Automates editing tasks in graphic design or video software by controlling mouse movements, keyboard shortcuts, and window management. This streamlines repetitive steps in creative production pipelines.
Offer the skill as part of a subscription-based automation platform, providing regular updates, cloud integration, and premium support. Revenue is generated through monthly or annual fees from businesses and developers.
Provide custom automation solutions and integration services for enterprises using this skill. Revenue comes from project-based fees, training workshops, and ongoing maintenance contracts.
Release a free version with basic features to attract users, then monetize through a paid tier offering advanced capabilities like image recognition, multi-monitor support, and priority support. Revenue is generated from premium upgrades.
š¬ Integration Tip
Ensure proper dependency installation and test failsafe features in controlled environments to prevent unintended actions during automation.
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Manage n8n workflows and automations via API. Use when working with n8n workflows, executions, or automation tasks - listing workflows, activating/deactivating, checking execution status, manually triggering workflows, or debugging automation issues.
Design and implement automation workflows to save time and scale operations as a solopreneur. Use when identifying repetitive tasks to automate, building workflows across tools, setting up triggers and actions, or optimizing existing automations. Covers automation opportunity identification, workflow design, tool selection (Zapier, Make, n8n), testing, and maintenance. Trigger on "automate", "automation", "workflow automation", "save time", "reduce manual work", "automate my business", "no-code automation".
Browser automation via Playwright MCP server. Navigate websites, click elements, fill forms, extract data, take screenshots, and perform full browser automation workflows.
API-first email platform designed for AI agents. Create and manage dedicated email inboxes, send and receive emails programmatically, and handle email-based workflows with webhooks and real-time events. Use when you need to set up agent email identity, send emails from agents, handle incoming email workflows, or replace traditional email providers like Gmail with agent-friendly infrastructure.