UNPKG

@mements/gx

Version:

Galaxy Browser Automation CLI and Library

390 lines (309 loc) β€’ 12.3 kB
# GX - Galaxy Browser Automation 🌌 Universal CLI and library for browser automation and file editing. A revolutionary browser automation system that bypasses Content Security Policy (CSP) restrictions without using `eval()` or `Function` constructor. Features both programmatic API and interactive CLI modes. ## πŸš€ Key Features - **πŸ”’ CSP-Safe**: Works with strict Content Security Policies - **⚑ Synchronous API**: Get immediate results, not promises - **🎯 30+ Methods**: Complete DOM manipulation, navigation, and form handling - **πŸ”„ Real-time Communication**: SSE-based browser extension communication - **πŸ“ Self-Documenting**: Built-in API documentation endpoint - **πŸ›‘οΈ Secure**: No code injection, extension-only operation ## πŸ—οΈ Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” HTTP POST β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” SSE/Results β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Client/User β”‚ ──────────────► β”‚ Server API β”‚ ◄────────────────► β”‚ Browser Ext. β”‚ β”‚ β”‚ ◄────────────── β”‚ (Port 3113) β”‚ β”‚ (Content Script)β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ JSON Response β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Pending Requestsβ”‚ β”‚ DOM Methods β”‚ β”‚ Map (Sync) β”‚ β”‚ (30+ Methods) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## 🚦 Quick Start ### Installation #### Global CLI Installation ```bash npm install -g gx # or bun install -g gx ``` #### Programmatic Usage ```bash npm install gx # or bun add gx ``` ### Usage #### Interactive Mode ```bash gx --interactive # or gx -i ``` #### Server Mode ```bash gx --server # or gx --server --port 3114 ``` #### Single Commands ```bash # Get browser tabs gx tabs # Navigate to URL gx navigate-to "https://example.com" # Click element with targeting gx click "button" '{"tabId": 1325943024}' # Type text gx type "input[name='search']" "Hello World" ``` #### Programmatic API ```typescript import GalaxyAgent from "gx"; // Simple usage const agent = new GalaxyAgent('http://localhost:3113'); await agent.navigateTo('https://example.com'); await agent.click('.submit-btn'); // With options import { GalaxyAgent } from "gx"; const agent = new GalaxyAgent({ apiBase: 'http://localhost:3113', verbose: true, timeout: 15000 }); // Get browser tabs const tabs = await agent.getTabs(); // Target specific tab and interact const target = agent.targetById(tabs.tabs[0].id); await agent.click('button', target); await agent.type('input', 'Hello World', target); // Open new browser window await agent.open('http://localhost:3001'); ``` ### Browser Extension Setup 1. Open Chrome β†’ `chrome://extensions/` 2. Enable "Developer mode" 3. Click "Load unpacked" 4. Select `extension/` folder from this repository 5. Navigate to your target page ## πŸ“š API Methods ### DOM Manipulation - `getElementById(id)` - Find element by ID - `querySelector(selector)` - Find first matching element - `querySelectorAll(selector)` - Find all matching elements - `getElementText(selector)` - Get element text content - `getElementAttribute(selector, attribute)` - Get attribute value ### Form Interaction - `click(selector)` - Click element - `type(selector, text)` - Type text into input - `setValue(selector, value)` - Set input value - `submit(selector)` - Submit form - `check(selector)` - Check checkbox/radio - `focus(selector)` - Focus element ### Navigation - `navigateTo(url)` - Navigate to URL - `goBack()` - Browser back button - `goForward()` - Browser forward button - `reload()` - Reload current page - `getCurrentUrl()` - Get current URL ### Scrolling - `scrollTo(x, y)` - Scroll to coordinates - `scrollToElement(selector)` - Scroll element into view - `scrollToTop()` - Scroll to page top - `scrollToBottom()` - Scroll to page bottom ### Utility - `showAlert(message)` - Show alert dialog - `showConfirm(message)` - Show confirmation dialog - `wait(milliseconds)` - Wait for specified time - `waitForElement(selector, timeout)` - Wait for element to appear [**View Complete API Documentation**](http://localhost:3113/help) (when server is running) ## πŸ’‘ Usage Examples ### Form Automation ```bash # Fill login form curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "type", "params": ["input[name=\"username\"]", "myuser"]}' curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "type", "params": ["input[type=\"password\"]", "mypass"]}' curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "click", "params": ["button[type=\"submit\"]"]}' ``` ### Page Content Extraction ```bash # Get clean page text curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "getElementText", "params": ["main, .content, [role=\"main\"]"]}' # Get page information curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "getPageInfo", "params": []}' ``` ### Advanced Workflows ```bash # Search and navigate curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "type", "params": ["input[type=\"search\"]", "my query"]}' curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "submit", "params": ["form"]}' curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "waitForElement", "params": [".search-results", 5000]}' ``` ## πŸ“‹ Response Format ### Successful Response ```json { "success": true, "result": "actual_data_here", "executionId": "unique_id" } ``` ### Error Response ```json { "success": false, "error": "Error description", "executionId": "unique_id" } ``` ### Timeout Response ```json { "success": false, "error": "API call timeout after 10000ms", "executionId": "unique_id" } ``` ## πŸ—‚οΈ Project Structure ``` galaxy-claude/ β”œβ”€β”€ README.md # This file β”œβ”€β”€ E2E_TESTING_GUIDE.md # Complete testing guide β”œβ”€β”€ BROWSER_AUTOMATION_GUIDE.md # Detailed API reference β”œβ”€β”€ .config.toml # Server configuration β”œβ”€β”€ server/ β”‚ β”œβ”€β”€ index.ts # Main server with synchronous API β”‚ β”œβ”€β”€ package.json # Server dependencies β”‚ └── logs/ # API execution logs └── extension/ β”œβ”€β”€ manifest.json # Chrome extension manifest β”œβ”€β”€ background.js # Extension background script └── content.js # Content script with API methods ``` ## πŸ”§ Technical Details ### How It Works 1. **REST API Endpoint**: `/api/execute` accepts method calls via POST 2. **SSE Communication**: Server broadcasts commands to browser extension 3. **Synchronous Waiting**: Server waits for responses using execution ID matching 4. **Predefined Methods**: No eval() - all methods are pre-implemented functions 5. **Timeout Handling**: 10-second default timeout with error responses ### CSP Bypass Strategy - **No Code Execution**: All methods are predefined, not dynamically generated - **Extension Context**: Operations run in extension context, not page context - **No Injection**: No script injection into target pages - **Safe Communication**: Uses Chrome extension APIs and postMessage ### Performance - **Response Times**: 50-300ms for most operations - **Timeout Threshold**: 10 seconds (configurable) - **Concurrent Support**: Multiple simultaneous API calls - **Resource Efficient**: Minimal memory and CPU usage ## 🚨 Troubleshooting ### Common Issues **Timeout Errors** ```bash # Check server status curl http://localhost:3113/status # Expected: {"connectedClients": 1, ...} # If 0 clients, reload browser extension ``` **Extension Not Loading** 1. Go to `chrome://extensions/` 2. Click reload button on the extension 3. Hard refresh target page (Ctrl+F5) **Port Issues** ```bash # Kill existing processes lsof -ti:3113 | xargs kill -9 # Start server (first time) bgr --name server --directory . --command "cd server && bun run index.ts" # Restart server bgr server --restart ``` **JavaScript Errors** - Clear browser cache - Reload extension - Check browser console for specific errors ## πŸ§ͺ Testing ### Quick Verification ```bash # Test connection curl http://localhost:3113/status # Test basic functionality curl -X POST http://localhost:3113/api/execute \ -H "Content-Type: application/json" \ -d '{"method": "showAlert", "params": ["API Working!"]}' ``` ### Complete Test Suite See [E2E_TESTING_GUIDE.md](E2E_TESTING_GUIDE.md) for comprehensive testing instructions including: - 9 test categories with 30+ test cases - Real-world workflow examples - Performance benchmarks - Troubleshooting steps ## πŸ” Security Features - βœ… **CSP Compliant**: No eval(), Function(), or dynamic code execution - βœ… **Local Only**: Server runs on localhost, no external connections - βœ… **Extension Sandboxed**: All operations run in Chrome extension context - βœ… **No Code Injection**: No scripts injected into target pages - βœ… **Controlled Access**: Only predefined API methods available - βœ… **Request Validation**: All inputs validated against schemas ## 🎯 Use Cases ### Web Testing & Automation - E2E testing of web applications - Form submission automation - Content extraction and validation - Navigation flow testing ### Data Collection - Automated content scraping - Form data collection - Page information extraction - Dynamic content monitoring ### User Experience Testing - Automated user workflows - Performance timing measurement - Accessibility testing - Cross-browser compatibility ### Development & Debugging - Automated testing during development - Debug assistance for complex forms - Content validation automation - Integration testing ## 🀝 Contributing 1. Fork the repository 2. Create a feature branch 3. Implement changes with tests 4. Update documentation 5. Submit pull request ## πŸ“„ License MIT License - see LICENSE file for details ## πŸ™‹β€β™‚οΈ Support - **Documentation**: Check [BROWSER_AUTOMATION_GUIDE.md](BROWSER_AUTOMATION_GUIDE.md) - **API Reference**: Visit http://localhost:3113/help - **Testing Guide**: See [E2E_TESTING_GUIDE.md](E2E_TESTING_GUIDE.md) - **Issues**: Create an issue in the repository ## 🌟 Why CSP-Safe Browser Automation? Traditional browser automation tools often fail with modern web applications that use strict Content Security Policies (CSP). This project solves that problem by: 1. **No Dynamic Code Execution**: Uses predefined functions instead of eval() 2. **Extension-Based Architecture**: Leverages Chrome's trusted extension context 3. **Synchronous API**: Provides immediate results instead of complex async handling 4. **Production Ready**: Works with real-world CSP-protected applications Perfect for modern web applications, SaaS platforms, and enterprise environments where CSP restrictions prevent traditional automation tools from working. --- **Built with ❀️ for developers who need reliable browser automation that actually works with modern security policies.**