@ejazullah/smart-browser-automation
Version:
A smart AI-driven browser automation library and REST API server using MCP (Model Context Protocol) and LangChain for multi-step task execution. Includes both programmatic library usage and HTTP API server for remote automation.
361 lines (271 loc) • 9.16 kB
Markdown
# Smart Browser Automation
A powerful AI-driven browser automation library using MCP (Model Context Protocol) and LangChain. This tool can execute complex multi-step browser automation tasks through programmatic library usage.
## 🚀 Features
- **AI-Powered**: Smart task execution using LangChain and LLM integration
- **Multi-step Automation**: Execute complex browser workflows
- **MCP Integration**: Model Context Protocol for advanced browser control
- **Multiple LLM Support**: HuggingFace, Ollama, and extensible architecture
- **Flexible Configuration**: Easy setup and customization options
## 📦 Installation
```bash
npm install @ejazullah/smart-browser-automation
```
## 🏃♂️ Quick Start
```javascript
import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
// Configuration
const llmConfig = new HuggingFaceConfig("your_huggingface_token");
const mcpEndpoint = 'http://your-mcp-endpoint';
const cdpEndpoint = 'wss://your-cdp-endpoint';
// Create automation instance
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.0
});
try {
// Initialize
await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);
// Execute task
const result = await automation.executeTask(
"go to https://duckduckgo.com/ and search for 'AI tools'",
{ verbose: true }
);
console.log("Task completed:", result);
} finally {
// Clean up
await automation.close();
}
```
## 📚 Examples
### Basic Usage
```javascript
// examples/search-example.js
import { SmartBrowserAutomation, HuggingFaceConfig } from '../index.js';
async function searchExample() {
const automation = new SmartBrowserAutomation({ maxSteps: 10 });
const llmConfig = new HuggingFaceConfig("your_token");
await automation.initialize(llmConfig, mcpEndpoint, cdpEndpoint);
const result = await automation.executeTask(
"go to https://example.com and find the contact information"
);
await automation.close();
}
```
### TypeScript Usage
This package includes full TypeScript declarations for better development experience:
```typescript
import {
SmartBrowserAutomation,
HuggingFaceConfig,
type TaskExecutionOptions,
type TaskExecutionResult
} from '@ejazullah/smart-browser-automation';
async function typedExample() {
// Configuration with type checking
const config = new HuggingFaceConfig('your-api-key');
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.1
});
// Options with proper typing
const options: TaskExecutionOptions = {
verbose: true,
onProgress: (update) => {
console.log(`Step ${update.step}: ${update.message}`);
}
};
await automation.initialize(config, mcpEndpoint, cdpEndpoint);
// Result with proper typing
const result: TaskExecutionResult = await automation.executeTask(
"Navigate to Google and search for TypeScript tutorials",
options
);
console.log(`Completed ${result.steps} steps, success: ${result.success}`);
await automation.close();
}
```
## 🛠️ Configuration
### LLM Configurations
```javascript
// HuggingFace
const hfConfig = new HuggingFaceConfig("hf_token", {
model: "microsoft/DialoGPT-medium",
temperature: 0.0
});
// Ollama
const ollamaConfig = new OllamaConfig("ollama_endpoint", {
model: "llama2",
temperature: 0.1
});
```
## Publishing to NPM
1. **Login to npm:**
```bash
npm login
```
2. **Use the publishing script:**
```bash
./publish.sh
```
3. **Or manually:**
```bash
npm version patch # or minor/major
npm publish
```
## 📈 Use Cases
- **Web Scraping**: Automated data extraction from websites
- **E2E Testing**: End-to-end testing automation
- **Form Automation**: Automated form filling and submission
- **Social Media Management**: Automated posting and interactions
- **Website Monitoring**: Change detection and monitoring
- **Data Entry**: Bulk data processing and entry tasks
## 📖 Documentation
- [Examples](examples/)
- [License](LICENSE)
## 🤝 Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🆘 Support
- **Issues**: [GitHub Issues](https://github.com/Ejazullah42/smart-browser-automation/issues)
- **Documentation**: Check the `examples/` directory
---
Made with ❤️ by [Ejaz Ullah](https://github.com/Ejazullah42)
## Features
- 🤖 **AI-Driven**: Uses advanced language models to understand and execute complex browser tasks
- 🔄 **Multi-Step Execution**: Automatically performs sequences of actions to complete tasks
- 🧠 **Smart Decision Making**: Analyzes page content and decides next actions intelligently
- 🔌 **Multiple LLM Support**: Works with Hugging Face, Ollama, OpenAI, and other providers
- 🎯 **Task Completion Detection**: Knows when a task is fully completed
- 📊 **Detailed Logging**: Provides comprehensive execution logs and results
## Installation
```bash
npm install @ejazullah/smart-browser-automation
```
## Quick Start
```javascript
import { SmartBrowserAutomation, HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
// Configure your LLM
const llmConfig = new HuggingFaceConfig("your-hugging-face-api-key");
// MCP and WebDriver configuration
const mcpEndpoint = 'http://your-mcp-server:8006/mcp';
const driverUrl = 'wss://your-webdriver-endpoint';
// Create automation instance
const automation = new SmartBrowserAutomation({
maxSteps: 10,
temperature: 0.0
});
// Initialize and execute task
await automation.initialize(llmConfig, mcpEndpoint, driverUrl);
const result = await automation.executeTask(
"go to https://example.com and fill out the contact form"
);
console.log(result);
await automation.close();
```
## Configuration Options
### LLM Configurations
#### Hugging Face
```javascript
import { HuggingFaceConfig } from '@ejazullah/smart-browser-automation';
const config = new HuggingFaceConfig(
"your-api-key",
"Qwen/Qwen3-Coder-480B-A35B-Instruct" // optional model
);
```
#### Ollama
```javascript
import { OllamaConfig } from '@ejazullah/smart-browser-automation';
const config = new OllamaConfig(
"http://localhost:11434", // optional base URL
"llama2" // optional model
);
```
#### OpenAI
```javascript
import { OpenAIConfig } from '@ejazullah/smart-browser-automation';
const config = new OpenAIConfig("your-api-key", "gpt-4");
```
### Automation Options
```javascript
const automation = new SmartBrowserAutomation({
maxSteps: 15, // Maximum steps to execute
temperature: 0.1, // LLM temperature (0.0 = deterministic)
});
```
## API Reference
### SmartBrowserAutomation
#### Constructor
- `new SmartBrowserAutomation(config)`
- `config.maxSteps` (number): Maximum execution steps (default: 10)
- `config.temperature` (number): LLM temperature (default: 0.0)
#### Methods
##### `initialize(llmConfig, mcpEndpoint, driverUrl)`
Initialize the automation system.
- `llmConfig`: LLM configuration object
- `mcpEndpoint`: MCP server endpoint URL
- `driverUrl`: WebDriver WebSocket URL
##### `executeTask(taskDescription, options)`
Execute an automation task.
- `taskDescription` (string): Natural language description of the task
- `options.verbose` (boolean): Enable detailed logging (default: true)
- `options.systemPrompt` (string): Custom system prompt for the AI
Returns:
```javascript
{
success: boolean,
steps: number,
results: Array,
completed: boolean
}
```
##### `close()`
Clean up and close connections.
## Examples
### Search Example
```javascript
const result = await automation.executeTask(
"go to https://duckduckgo.com/ and search for 'AI tools'"
);
```
### Form Filling Example
```javascript
const result = await automation.executeTask(
"navigate to the contact page, fill out the form with name 'John Doe' and email 'john@example.com', then submit it"
);
```
### E-commerce Example
```javascript
const result = await automation.executeTask(
"go to the online store, search for 'laptop', filter by price under $1000, and add the first result to cart"
);
```
## Error Handling
```javascript
try {
await automation.initialize(llmConfig, mcpEndpoint, driverUrl);
const result = await automation.executeTask("your task here");
if (!result.success) {
console.error("Task failed:", result);
}
} catch (error) {
console.error("Automation error:", error);
} finally {
await automation.close();
}
```
## Requirements
- Node.js 18+
- A running MCP server with browser capabilities
- Access to a WebDriver endpoint
- API key for your chosen LLM provider
## License
MIT
## Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.
## Support
For issues and questions, please visit our [GitHub Issues](https://github.com/Ejazullah42/smart-browser-automation/issues) page.