playwright-selector-finder
Version:
A tool for finding and interacting with elements using Playwright's vision locators
375 lines (290 loc) • 9.87 kB
Markdown
# Playwright Selector Finder
A powerful tool for finding and interacting with elements using Playwright's vision locators. This package makes web automation more resilient by using visual recognition and natural language descriptions.
## Features
- 👁️ Vision-based element location
- 🗣️ Natural language element descriptions
- ⚡ Simple and intuitive API
- 🎯 Configurable vision settings
- 🔄 Common browser automation actions
- 📝 Full TypeScript support
## Installation
```bash
npm install playwright-selector-finder @playwright/test playwright
```
## Usage
```typescript
import MCPServer from 'playwright-selector-finder';
async function example() {
// Create a new instance
const server = new MCPServer({
headless: false,
visionTimeout: 5000,
visionConfidence: 0.7
});
try {
// Start the server
await server.start();
// Navigate to a website
await server.callTool('browser_navigate', {
url: 'https://example.com'
});
// Click using natural language description
await server.callTool('browser_click', {
element: 'Login button in the top right corner'
});
// Type text
await server.callTool('browser_type', {
element: 'Username field',
text: 'myusername',
submit: true
});
} finally {
// Always close the server
await server.stop();
}
}
```
## Available Tools
### Navigation
- `browser_navigate`: Navigate to a URL
- `browser_go_back`: Go back to previous page
- `browser_go_forward`: Go forward to next page
### Interactions
- `browser_click`: Click on an element
- `browser_hover`: Hover over an element
- `browser_type`: Type text into an element
### Keyboard
- `browser_press_key`: Press a keyboard key
### Utilities
- `browser_wait`: Wait for specified time
- `browser_save_as_pdf`: Save page as PDF
- `browser_close`: Close the browser
## Configuration Options
```typescript
interface MCPServerOptions {
headless?: boolean; // Run in headless mode
visionTimeout?: number; // Vision locator timeout (ms)
visionConfidence?: number; // Vision confidence threshold (0-1)
}
```
## Using with Test Frameworks
### Jest Example
```typescript
import MCPServer from 'playwright-selector-finder';
describe('Website Tests', () => {
let server: MCPServer;
beforeAll(async () => {
server = new MCPServer({ headless: true });
await server.start();
});
afterAll(async () => {
await server.stop();
});
test('can login', async () => {
await server.callTool('browser_navigate', {
url: 'https://example.com'
});
await server.callTool('browser_click', {
element: 'Login button'
});
});
});
```
## Best Practices
1. Always use `try/finally` to ensure server cleanup
2. Configure appropriate timeouts for your use case
3. Use descriptive element descriptions
4. Handle errors appropriately
5. Consider using headless mode in CI/CD
## Requirements
- Node.js >= 18.0.0
- Playwright >= 1.42.0
## License
Apache-2.0 - see LICENSE for more details.
## Playwright MCP
A Model Context Protocol (MCP) server that provides browser automation capabilities using [Playwright](https://playwright.dev). This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.
### Key Features
- **Fast and lightweight**: Uses Playwright's accessibility tree, not pixel-based input.
- **LLM-friendly**: No vision models needed, operates purely on structured data.
- **Deterministic tool application**: Avoids ambiguity common with screenshot-based approaches.
### Use Cases
- Web navigation and form-filling
- Data extraction from structured content
- Automated testing driven by LLMs
- General-purpose browser interaction for agents
### Example config
```js
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest"
]
}
}
}
```
### Running headless browser (Browser without GUI).
This mode is useful for background or batch operations.
```js
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest",
"--headless"
]
}
}
}
```
### Running headed browser on Linux w/o DISPLAY
When running headed browser on system w/o display or from worker processes of the IDEs,
you can run Playwright in a client-server manner. You'll run the Playwright server
from environment with the DISPLAY
```sh
npx playwright run-server
```
And then in MCP config, add following to the `env`:
```js
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest"
],
"env": {
// Use the endpoint from the output of the server above.
"PLAYWRIGHT_WS_ENDPOINT": "ws://localhost:<port>/"
}
}
}
}
```
### Tool Modes
The tools are available in two modes:
1. **Snapshot Mode** (default): Uses accessibility snapshots for better performance and reliability
2. **Vision Mode**: Uses screenshots for visual-based interactions
To use Vision Mode, add the `--vision` flag when starting the server:
```js
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest",
"--vision"
]
}
}
}
```
Vision Mode works best with the computer use models that are able to interact with elements using
X Y coordinate space, based on the provided screenshot.
### Snapshot Mode
The Playwright MCP provides a set of tools for browser automation. Here are all available tools:
- **browser_navigate**
- Description: Navigate to a URL
- Parameters:
- `url` (string): The URL to navigate to
- **browser_go_back**
- Description: Go back to the previous page
- Parameters: None
- **browser_go_forward**
- Description: Go forward to the next page
- Parameters: None
- **browser_click**
- Description: Perform click on a web page
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- **browser_hover**
- Description: Hover over element on page
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- **browser_drag**
- Description: Perform drag and drop between two elements
- Parameters:
- `startElement` (string): Human-readable source element description used to obtain permission to interact with the element
- `startRef` (string): Exact source element reference from the page snapshot
- `endElement` (string): Human-readable target element description used to obtain permission to interact with the element
- `endRef` (string): Exact target element reference from the page snapshot
- **browser_type**
- Description: Type text into editable element
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- `text` (string): Text to type into the element
- `submit` (boolean): Whether to submit entered text (press Enter after)
- **browser_press_key**
- Description: Press a key on the keyboard
- Parameters:
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
- **browser_snapshot**
- Description: Capture accessibility snapshot of the current page (better than screenshot)
- Parameters: None
- **browser_save_as_pdf**
- Description: Save page as PDF
- Parameters: None
- **browser_wait**
- Description: Wait for a specified time in seconds
- Parameters:
- `time` (number): The time to wait in seconds (capped at 10 seconds)
- **browser_close**
- Description: Close the page
- Parameters: None
### Vision Mode
Vision Mode provides tools for visual-based interactions using screenshots. Here are all available tools:
- **browser_navigate**
- Description: Navigate to a URL
- Parameters:
- `url` (string): The URL to navigate to
- **browser_go_back**
- Description: Go back to the previous page
- Parameters: None
- **browser_go_forward**
- Description: Go forward to the next page
- Parameters: None
- **browser_screenshot**
- Description: Capture screenshot of the current page
- Parameters: None
- **browser_move_mouse**
- Description: Move mouse to specified coordinates
- Parameters:
- `x` (number): X coordinate
- `y` (number): Y coordinate
- **browser_click**
- Description: Click at specified coordinates
- Parameters:
- `x` (number): X coordinate to click at
- `y` (number): Y coordinate to click at
- **browser_drag**
- Description: Perform drag and drop operation
- Parameters:
- `startX` (number): Start X coordinate
- `startY` (number): Start Y coordinate
- `endX` (number): End X coordinate
- `endY` (number): End Y coordinate
- **browser_type**
- Description: Type text at specified coordinates
- Parameters:
- `text` (string): Text to type
- `submit` (boolean): Whether to submit entered text (press Enter after)
- **browser_press_key**
- Description: Press a key on the keyboard
- Parameters:
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
- **browser_save_as_pdf**
- Description: Save page as PDF
- Parameters: None
- **browser_wait**
- Description: Wait for a specified time in seconds
- Parameters:
- `time` (number): The time to wait in seconds (capped at 10 seconds)
- **browser_close**
- Description: Close the page
- Parameters: None