@auto-browse/auto-browse

# Auto-Browse: AI Enabled Browser Automation **Auto Browse** is the easiest way to connect your AI agents with the browser using natural language. [![Auto-Browse Launch Video](https://img.youtube.com/vi/VxJg3RRShoY/maxresdefault.jpg)](https://youtu.be/VxJg3RRShoY) 🎥 [Watch the launch video](https://youtu.be/VxJg3RRShoY) ## Quick start An AI-powered browser automation agent for automating browser tasks and Write Playwright tests that enables natural language interactions with web pages. ## Examples Check out our [TypeScript BDD Example Repository](https://github.com/auto-browse/auto-browse-typescript-bdd-example) to see a complete implementation using Auto Browse with BDD testing patterns. ## Installation ```bash npm install @auto-browse/auto-browse ``` ## ⚠️ Important: Playwright Version Requirements > **Note:** Auto Browse currently requires specific versions of Playwright. This requirement will be relaxed in future versions. ### Required Versions ```bash "@playwright/test": "1.52.0-alpha-1743011787000" "playwright": "1.52.0-alpha-1743011787000" ``` ### Version Conflicts If you're using Auto Browse alongside an existing Playwright setup, you must upgrade to these specific versions. Here's how to handle common issues: 1. **Installation Conflicts** ```bash npm install --legacy-peer-deps ``` This flag helps resolve peer dependency conflicts during installation. 2. **Multiple Playwright Versions** - Remove existing Playwright installations - Clear npm cache if needed: `npm cache clean --force` - Reinstall with the required versions 3. **Project Compatibility** - Update your project's Playwright configuration - Ensure your existing tests are compatible with the alpha version - Consider using a separate test environment if needed > 🔄 Future releases will support a wider range of Playwright versions. Subscribe to our GitHub repository for updates. ## Configuration Auto Browse requires environment variables for the LLM (Language Model) configuration. Create a `.env` file in your project root: ```env # OpenAI (default) OPENAI_API_KEY=your_openai_api_key_here LLM_PROVIDER=openai # Optional, defaults to openai AUTOBROWSE_LLM_MODEL=gpt-4o-mini # Optional, defaults to gpt-4o-mini # Google AI GOOGLE_API_KEY=your_google_key_here LLM_PROVIDER=google AUTOBROWSE_LLM_MODEL=gemini-2.0-flash-lite # Azure OpenAI AZURE_OPENAI_API_KEY=your_azure_key_here AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ AZURE_OPENAI_API_VERSION=2024-12-01-preview AZURE_OPENAI_API_DEPLOYMENT_NAME=your-deployment-name LLM_PROVIDER=azure # Anthropic ANTHROPIC_API_KEY=your_anthropic_key_here LLM_PROVIDER=anthropic AUTOBROWSE_LLM_MODEL=claude-3 # Google Vertex AI GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json LLM_PROVIDER=vertex # Ollama BASE_URL=http://localhost:11434 # Optional, defaults to this value LLM_PROVIDER=ollama AUTOBROWSE_LLM_MODEL=llama3.1 ``` You can find an example configuration in `example.env`. ### Environment Variables | Variable | Description | Default | Required For | | ---------------------------------- | -------------------------------------- | ------------------------ | ------------ | | `LLM_PROVIDER` | LLM provider to use | `openai` | No | | `AUTOBROWSE_LLM_MODEL` | The LLM model to use | `gpt-4o-mini` | No | | `OPENAI_API_KEY` | OpenAI API key | - | OpenAI | | `GOOGLE_API_KEY` | Google AI API key | - | Google AI | | `AZURE_OPENAI_API_KEY` | Azure OpenAI API key | - | Azure | | `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL | - | Azure | | `AZURE_OPENAI_API_VERSION` | Azure OpenAI API version | `2024-12-01-preview` | Azure | | `AZURE_OPENAI_API_DEPLOYMENT_NAME` | Azure OpenAI deployment name | - | Azure | | `ANTHROPIC_API_KEY` | Anthropic API key | - | Anthropic | | `GOOGLE_APPLICATION_CREDENTIALS` | Path to Google Vertex credentials file | - | Vertex AI | | `BASE_URL` | Ollama API endpoint | `http://localhost:11434` | No | ## Supported LLM Providers Auto Browse supports multiple LLM providers: - OpenAI (default) - GPT-4 and compatible models - Google AI - Gemini models - Azure OpenAI - GPT models on Azure - Anthropic - Claude models - Google Vertex AI - PaLM and Gemini models - Ollama - Run models locally ## Usage ### Standalone Mode (Without Playwright Test) Auto Browse can also be used outside of Playwright test context. Here's a complete form automation example: ```typescript import { auto } from "@auto-browse/auto-browse"; async function main() { try { // Navigate to the form await auto("go to https://httpbin.org/forms/post"); // Take a snapshot to analyze the page structure await auto("take a snapshot"); // Fill out the form await auto('type "John Doe" in the customer name field'); await auto('select "Large" for size'); await auto('select "Mushroom" for topping'); await auto('check "cheese" in extras'); // Submit the form await auto("click the Order button"); // Take a snapshot of the response page await auto("take a snapshot of the response page"); } catch (error) { console.error("Error:", error); } } // Run the script main().catch(console.error); ``` In standalone mode, Auto Browse automatically: - Manages browser lifecycle - Creates and configures pages - Handles cleanup To run standalone scripts: ```bash npx ts-node your-script.ts ``` ### Playwright Test Mode ```typescript import { test, expect } from "@playwright/test"; import { auto } from "@auto-browse/auto-browse"; test("example test", async ({ page }) => { await page.goto("https://example.com"); // Get text using natural language const headerText = await auto("get the header text", { page }); // Type in an input using natural language await auto('type "Hello World" in the search box', { page }); // Click elements using natural language await auto("click the login button", { page }); }); ``` ### Auto-Detection Mode The package automatically detects the current page context, so you can skip passing the page parameter: ```typescript import { test, expect } from "@playwright/test"; import { auto } from "@auto-browse/auto-browse"; test("simplified example", async ({ page }) => { await page.goto("https://example.com"); // No need to pass page parameter const headerText = await auto("get the header text"); await auto('type "Hello World" in the search box'); await auto("click the login button"); }); ``` ### BDD Mode with Playwright-BDD Auto Browse seamlessly integrates with [playwright-bdd](https://github.com/vitalets/playwright-bdd) for behavior-driven development. This allows you to write expressive feature files and implement steps using natural language commands. #### Example Feature File ```gherkin # features/homepage.feature Feature: Playwright Home Page Scenario: Check title Given navigate to https://playwright.dev When click link "Get started" Then assert title "Installation" ``` #### Step Definitions ```typescript import { auto } from "@auto-browse/auto-browse"; import { Given, When as aistep, Then } from "./fixtures"; // Generic step that handles any natural language action aistep(/^(.*)$/, async ({ page }, action: string) => { await auto(action, { page }); }); ``` #### Setup Requirements 1. Install dependencies: ```bash npm install --save-dev @playwright/test @cucumber/cucumber playwright-bdd ``` 2. Configure `playwright.config.ts`: ```typescript import { PlaywrightTestConfig } from "@playwright/test"; const config: PlaywrightTestConfig = { testDir: "./features", use: { baseURL: "https://playwright.dev" } }; export default config; ``` This integration enables: - Natural language test scenarios - Reusable step definitions - Cucumber reporter integration - Built-in Playwright context management ### Supported Actions 1. **Clicking Elements** ```typescript await auto("click the submit button"); await auto("click the link that says Learn More"); ``` 2. **Typing Text** ```typescript await auto('type "username" in the email field'); await auto('enter "password123" in the password input'); ``` ## Features Core Features: - Natural language commands for browser automation - AI-powered computer and browser agent - Automate any browser task - Automatic page/context detection - TypeScript support - Playwright test integration - Zero configuration required Supported Operations: - Page Navigation (goto URL, back, forward) - Element Interactions (click, type, hover, drag-and-drop) - Form Handling (select options, file uploads, form submission) - Visual Verification (snapshots, screenshots, PDF export) - Keyboard Control (key press, text input) - Wait and Timing Control - Assertions and Validation ## Best Practices 1. **Be Descriptive** ```typescript // Good await auto("click the submit button in the login form"); // Less Clear await auto("click submit"); ``` 2. **Use Quotes for Input Values** ```typescript // Good await auto('type "John Doe" in the name field'); // Not Recommended await auto("type John Doe in the name field"); ``` 3. **Leverage Existing Labels** - Use actual labels and text from your UI in commands - Maintain good accessibility practices in your app for better automation ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. Thanks to Playwright Team for creating Playwright MCP and Playwright BDD. ## License MIT