@mements/gx
Version:
Galaxy Browser Automation CLI and Library
390 lines (309 loc) β’ 12.3 kB
Markdown
# GX - Galaxy Browser Automation
π Universal CLI and library for browser automation and file editing. A revolutionary browser automation system that bypasses Content Security Policy (CSP) restrictions without using `eval()` or `Function` constructor. Features both programmatic API and interactive CLI modes.
## π Key Features
- **π CSP-Safe**: Works with strict Content Security Policies
- **β‘ Synchronous API**: Get immediate results, not promises
- **π― 30+ Methods**: Complete DOM manipulation, navigation, and form handling
- **π Real-time Communication**: SSE-based browser extension communication
- **π Self-Documenting**: Built-in API documentation endpoint
- **π‘οΈ Secure**: No code injection, extension-only operation
## ποΈ Architecture
```
βββββββββββββββββββ HTTP POST βββββββββββββββββββ SSE/Results βββββββββββββββββββ
β Client/User β βββββββββββββββΊ β Server API β ββββββββββββββββββΊ β Browser Ext. β
β β βββββββββββββββ β (Port 3113) β β (Content Script)β
βββββββββββββββββββ JSON Response βββββββββββββββββββ βββββββββββββββββββ
β β
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Pending Requestsβ β DOM Methods β
β Map (Sync) β β (30+ Methods) β
βββββββββββββββββββ βββββββββββββββββββ
```
## π¦ Quick Start
### Installation
#### Global CLI Installation
```bash
npm install -g gx
# or
bun install -g gx
```
#### Programmatic Usage
```bash
npm install gx
# or
bun add gx
```
### Usage
#### Interactive Mode
```bash
gx --interactive
# or
gx -i
```
#### Server Mode
```bash
gx --server
# or
gx --server --port 3114
```
#### Single Commands
```bash
# Get browser tabs
gx tabs
# Navigate to URL
gx navigate-to "https://example.com"
# Click element with targeting
gx click "button" '{"tabId": 1325943024}'
# Type text
gx type "input[name='search']" "Hello World"
```
#### Programmatic API
```typescript
import GalaxyAgent from "gx";
// Simple usage
const agent = new GalaxyAgent('http://localhost:3113');
await agent.navigateTo('https://example.com');
await agent.click('.submit-btn');
// With options
import { GalaxyAgent } from "gx";
const agent = new GalaxyAgent({
apiBase: 'http://localhost:3113',
verbose: true,
timeout: 15000
});
// Get browser tabs
const tabs = await agent.getTabs();
// Target specific tab and interact
const target = agent.targetById(tabs.tabs[0].id);
await agent.click('button', target);
await agent.type('input', 'Hello World', target);
// Open new browser window
await agent.open('http://localhost:3001');
```
### Browser Extension Setup
1. Open Chrome β `chrome://extensions/`
2. Enable "Developer mode"
3. Click "Load unpacked"
4. Select `extension/` folder from this repository
5. Navigate to your target page
## π API Methods
### DOM Manipulation
- `getElementById(id)` - Find element by ID
- `querySelector(selector)` - Find first matching element
- `querySelectorAll(selector)` - Find all matching elements
- `getElementText(selector)` - Get element text content
- `getElementAttribute(selector, attribute)` - Get attribute value
### Form Interaction
- `click(selector)` - Click element
- `type(selector, text)` - Type text into input
- `setValue(selector, value)` - Set input value
- `submit(selector)` - Submit form
- `check(selector)` - Check checkbox/radio
- `focus(selector)` - Focus element
### Navigation
- `navigateTo(url)` - Navigate to URL
- `goBack()` - Browser back button
- `goForward()` - Browser forward button
- `reload()` - Reload current page
- `getCurrentUrl()` - Get current URL
### Scrolling
- `scrollTo(x, y)` - Scroll to coordinates
- `scrollToElement(selector)` - Scroll element into view
- `scrollToTop()` - Scroll to page top
- `scrollToBottom()` - Scroll to page bottom
### Utility
- `showAlert(message)` - Show alert dialog
- `showConfirm(message)` - Show confirmation dialog
- `wait(milliseconds)` - Wait for specified time
- `waitForElement(selector, timeout)` - Wait for element to appear
[**View Complete API Documentation**](http://localhost:3113/help) (when server is running)
## π‘ Usage Examples
### Form Automation
```bash
# Fill login form
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "type", "params": ["input[name=\"username\"]", "myuser"]}'
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "type", "params": ["input[type=\"password\"]", "mypass"]}'
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "click", "params": ["button[type=\"submit\"]"]}'
```
### Page Content Extraction
```bash
# Get clean page text
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "getElementText", "params": ["main, .content, [role=\"main\"]"]}'
# Get page information
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "getPageInfo", "params": []}'
```
### Advanced Workflows
```bash
# Search and navigate
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "type", "params": ["input[type=\"search\"]", "my query"]}'
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "submit", "params": ["form"]}'
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "waitForElement", "params": [".search-results", 5000]}'
```
## π Response Format
### Successful Response
```json
{
"success": true,
"result": "actual_data_here",
"executionId": "unique_id"
}
```
### Error Response
```json
{
"success": false,
"error": "Error description",
"executionId": "unique_id"
}
```
### Timeout Response
```json
{
"success": false,
"error": "API call timeout after 10000ms",
"executionId": "unique_id"
}
```
## ποΈ Project Structure
```
galaxy-claude/
βββ README.md # This file
βββ E2E_TESTING_GUIDE.md # Complete testing guide
βββ BROWSER_AUTOMATION_GUIDE.md # Detailed API reference
βββ .config.toml # Server configuration
βββ server/
β βββ index.ts # Main server with synchronous API
β βββ package.json # Server dependencies
β βββ logs/ # API execution logs
βββ extension/
βββ manifest.json # Chrome extension manifest
βββ background.js # Extension background script
βββ content.js # Content script with API methods
```
## π§ Technical Details
### How It Works
1. **REST API Endpoint**: `/api/execute` accepts method calls via POST
2. **SSE Communication**: Server broadcasts commands to browser extension
3. **Synchronous Waiting**: Server waits for responses using execution ID matching
4. **Predefined Methods**: No eval() - all methods are pre-implemented functions
5. **Timeout Handling**: 10-second default timeout with error responses
### CSP Bypass Strategy
- **No Code Execution**: All methods are predefined, not dynamically generated
- **Extension Context**: Operations run in extension context, not page context
- **No Injection**: No script injection into target pages
- **Safe Communication**: Uses Chrome extension APIs and postMessage
### Performance
- **Response Times**: 50-300ms for most operations
- **Timeout Threshold**: 10 seconds (configurable)
- **Concurrent Support**: Multiple simultaneous API calls
- **Resource Efficient**: Minimal memory and CPU usage
## π¨ Troubleshooting
### Common Issues
**Timeout Errors**
```bash
# Check server status
curl http://localhost:3113/status
# Expected: {"connectedClients": 1, ...}
# If 0 clients, reload browser extension
```
**Extension Not Loading**
1. Go to `chrome://extensions/`
2. Click reload button on the extension
3. Hard refresh target page (Ctrl+F5)
**Port Issues**
```bash
# Kill existing processes
lsof -ti:3113 | xargs kill -9
# Start server (first time)
bgr --name server --directory . --command "cd server && bun run index.ts"
# Restart server
bgr server --restart
```
**JavaScript Errors**
- Clear browser cache
- Reload extension
- Check browser console for specific errors
## π§ͺ Testing
### Quick Verification
```bash
# Test connection
curl http://localhost:3113/status
# Test basic functionality
curl -X POST http://localhost:3113/api/execute \
-H "Content-Type: application/json" \
-d '{"method": "showAlert", "params": ["API Working!"]}'
```
### Complete Test Suite
See [E2E_TESTING_GUIDE.md](E2E_TESTING_GUIDE.md) for comprehensive testing instructions including:
- 9 test categories with 30+ test cases
- Real-world workflow examples
- Performance benchmarks
- Troubleshooting steps
## π Security Features
- β
**CSP Compliant**: No eval(), Function(), or dynamic code execution
- β
**Local Only**: Server runs on localhost, no external connections
- β
**Extension Sandboxed**: All operations run in Chrome extension context
- β
**No Code Injection**: No scripts injected into target pages
- β
**Controlled Access**: Only predefined API methods available
- β
**Request Validation**: All inputs validated against schemas
## π― Use Cases
### Web Testing & Automation
- E2E testing of web applications
- Form submission automation
- Content extraction and validation
- Navigation flow testing
### Data Collection
- Automated content scraping
- Form data collection
- Page information extraction
- Dynamic content monitoring
### User Experience Testing
- Automated user workflows
- Performance timing measurement
- Accessibility testing
- Cross-browser compatibility
### Development & Debugging
- Automated testing during development
- Debug assistance for complex forms
- Content validation automation
- Integration testing
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Implement changes with tests
4. Update documentation
5. Submit pull request
## π License
MIT License - see LICENSE file for details
## πββοΈ Support
- **Documentation**: Check [BROWSER_AUTOMATION_GUIDE.md](BROWSER_AUTOMATION_GUIDE.md)
- **API Reference**: Visit http://localhost:3113/help
- **Testing Guide**: See [E2E_TESTING_GUIDE.md](E2E_TESTING_GUIDE.md)
- **Issues**: Create an issue in the repository
## π Why CSP-Safe Browser Automation?
Traditional browser automation tools often fail with modern web applications that use strict Content Security Policies (CSP). This project solves that problem by:
1. **No Dynamic Code Execution**: Uses predefined functions instead of eval()
2. **Extension-Based Architecture**: Leverages Chrome's trusted extension context
3. **Synchronous API**: Provides immediate results instead of complex async handling
4. **Production Ready**: Works with real-world CSP-protected applications
Perfect for modern web applications, SaaS platforms, and enterprise environments where CSP restrictions prevent traditional automation tools from working.
---
**Built with β€οΈ for developers who need reliable browser automation that actually works with modern security policies.**