@ejazullah/smart-browser-automation

Version:

A smart AI-driven browser automation library and REST API server using MCP (Model Context Protocol) and LangChain for multi-step task execution. Includes both programmatic library usage and HTTP API server for remote automation.

github.com/Ejazullah42/smart-browser-automation

Ejazullah42/smart-browser-automation

361 lines (271 loc) • 9.16 kB

Markdown

# Smart Browser Automation A powerful AI-driven browser automation library using MCP (Model Context Protocol) and LangChain. This tool can execute complex multi-step browser automation tasks through programmatic library usage. ## 🚀 Features - **AI-Powered**: Smart task execution using LangChain and LLM integration - **Multi-step Automation**: Execute complex browser workflows - **MCP Integration**: Model Context Protocol for advanced browser control - **Multiple LLM Support**: HuggingFace, Ollama, and extensible architecture - **Flexible Configuration**: Easy setup and customization options ## 📦 Installation ```bash npm install @ejazullah/smart-browser-automation ``` ## 🏃‍♂️ Quick Start ```javascript import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation'; // Configuration const llmConfig = new HuggingFaceConfig("your_huggingface_token"); const mcpEndpoint = 'http://your-mcp-endpoint'; const cdpEndpoint = 'wss://your-cdp-endpoint'; // Create automation instance const automation = new SmartBrowserAutomation({ maxSteps: 10, temperature: 0.0 }); try { // Initialize await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint); // Execute task const result = await automation.executeTask( "go to https://duckduckgo.com/ and search for 'AI tools'", { verbose: true } ); console.log("Task completed:", result); } finally { // Clean up await automation.close(); } ``` ## 📚 Examples ### Basic Usage ```javascript // examples/search-example.js import { SmartBrowserAutomation, HuggingFaceConfig } from '../index.js'; async function searchExample() { const automation = new SmartBrowserAutomation({ maxSteps: 10 }); const llmConfig = new HuggingFaceConfig("your_token"); await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint); const result = await automation.executeTask( "go to https://example.com and find the contact information" ); await automation.close(); } ``` ### TypeScript Usage This package includes full TypeScript declarations for better development experience: ```typescript import { SmartBrowserAutomation, HuggingFaceConfig, type TaskExecutionOptions, type TaskExecutionResult } from '@ejazullah/smart-browser-automation'; async function typedExample() { // Configuration with type checking const config = new HuggingFaceConfig('your-api-key'); const automation = new SmartBrowserAutomation({ maxSteps: 10, temperature: 0.1 }); // Options with proper typing const options: TaskExecutionOptions = { verbose: true, onProgress: (update) => { console.log(`Step ${update.step}: ${update.message}`); } }; await automation.initialize(config, mcpEndpoint, cdpEndpoint); // Result with proper typing const result: TaskExecutionResult = await automation.executeTask( "Navigate to Google and search for TypeScript tutorials", options ); console.log(`Completed ${result.steps} steps, success: ${result.success}`); await automation.close(); } ``` ## 🛠️ Configuration ### LLM Configurations ```javascript // HuggingFace const hfConfig = new HuggingFaceConfig("hf_token", { model: "microsoft/DialoGPT-medium", temperature: 0.0 }); // Ollama const ollamaConfig = new OllamaConfig("ollama_endpoint", { model: "llama2", temperature: 0.1 }); ``` ## Publishing to NPM 1. **Login to npm:** ```bash npm login ``` 2. **Use the publishing script:** ```bash ./publish.sh ``` 3. **Or manually:** ```bash npm version patch # or minor/major npm publish ``` ## 📈 Use Cases - **Web Scraping**: Automated data extraction from websites - **E2E Testing**: End-to-end testing automation - **Form Automation**: Automated form filling and submission - **Social Media Management**: Automated posting and interactions - **Website Monitoring**: Change detection and monitoring - **Data Entry**: Bulk data processing and entry tasks ## 📖 Documentation - [Examples](examples/) - [License](LICENSE) ## 🤝 Contributing 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## 🆘 Support - **Issues**: [GitHub Issues](https://github.com/Ejazullah42/smart-browser-automation/issues) - **Documentation**: Check the `examples/` directory --- Made with ❤️ by [Ejaz Ullah](https://github.com/Ejazullah42) ## Features - 🤖 **AI-Driven**: Uses advanced language models to understand and execute complex browser tasks - 🔄 **Multi-Step Execution**: Automatically performs sequences of actions to complete tasks - 🧠 **Smart Decision Making**: Analyzes page content and decides next actions intelligently - 🔌 **Multiple LLM Support**: Works with Hugging Face, Ollama, OpenAI, and other providers - 🎯 **Task Completion Detection**: Knows when a task is fully completed - 📊 **Detailed Logging**: Provides comprehensive execution logs and results ## Installation ```bash npm install @ejazullah/smart-browser-automation ``` ## Quick Start ```javascript import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation'; // Configure your LLM const llmConfig = new HuggingFaceConfig("your-hugging-face-api-key"); // MCP and WebDriver configuration const mcpEndpoint = 'http://your-mcp-server:8006/mcp'; const driverUrl = 'wss://your-webdriver-endpoint'; // Create automation instance const automation = new SmartBrowserAutomation({ maxSteps: 10, temperature: 0.0 }); // Initialize and execute task await automation.initialize(llmConfig, mcpEndpoint, driverUrl); const result = await automation.executeTask( "go to https://example.com and fill out the contact form" ); console.log(result); await automation.close(); ``` ## Configuration Options ### LLM Configurations #### Hugging Face ```javascript import { HuggingFaceConfig } from '@ejazullah/smart-browser-automation'; const config = new HuggingFaceConfig( "your-api-key", "Qwen/Qwen3-Coder-480B-A35B-Instruct" // optional model ); ``` #### Ollama ```javascript import { OllamaConfig } from '@ejazullah/smart-browser-automation'; const config = new OllamaConfig( "http://localhost:11434", // optional base URL "llama2" // optional model ); ``` #### OpenAI ```javascript import { OpenAIConfig } from '@ejazullah/smart-browser-automation'; const config = new OpenAIConfig("your-api-key", "gpt-4"); ``` ### Automation Options ```javascript const automation = new SmartBrowserAutomation({ maxSteps: 15, // Maximum steps to execute temperature: 0.1, // LLM temperature (0.0 = deterministic) }); ``` ## API Reference ### SmartBrowserAutomation #### Constructor - `new SmartBrowserAutomation(config)` - `config.maxSteps` (number): Maximum execution steps (default: 10) - `config.temperature` (number): LLM temperature (default: 0.0) #### Methods ##### `initialize(llmConfig, mcpEndpoint, driverUrl)` Initialize the automation system. - `llmConfig`: LLM configuration object - `mcpEndpoint`: MCP server endpoint URL - `driverUrl`: WebDriver WebSocket URL ##### `executeTask(taskDescription, options)` Execute an automation task. - `taskDescription` (string): Natural language description of the task - `options.verbose` (boolean): Enable detailed logging (default: true) - `options.systemPrompt` (string): Custom system prompt for the AI Returns: ```javascript { success: boolean, steps: number, results: Array, completed: boolean } ``` ##### `close()` Clean up and close connections. ## Examples ### Search Example ```javascript const result = await automation.executeTask( "go to https://duckduckgo.com/ and search for 'AI tools'" ); ``` ### Form Filling Example ```javascript const result = await automation.executeTask( "navigate to the contact page, fill out the form with name 'John Doe' and email 'john@example.com', then submit it" ); ``` ### E-commerce Example ```javascript const result = await automation.executeTask( "go to the online store, search for 'laptop', filter by price under $1000, and add the first result to cart" ); ``` ## Error Handling ```javascript try { await automation.initialize(llmConfig, mcpEndpoint, driverUrl); const result = await automation.executeTask("your task here"); if (!result.success) { console.error("Task failed:", result); } } catch (error) { console.error("Automation error:", error); } finally { await automation.close(); } ``` ## Requirements - Node.js 18+ - A running MCP server with browser capabilities - Access to a WebDriver endpoint - API key for your chosen LLM provider ## License MIT ## Contributing Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository. ## Support For issues and questions, please visit our [GitHub Issues](https://github.com/Ejazullah42/smart-browser-automation/issues) page.