@agentic-intelligence/dom-engine
Version:
Agentic DOM Intelligence - A lightweight TypeScript library for DOM analysis and manipulation, designed for web automation and AI agents
330 lines (264 loc) • 10.4 kB
Markdown
DOM Engine
[](https://www.npmjs.com/package/@agentic-intelligence/dom-engine)
[](https://opensource.org/licenses/MIT)
[](http://www.typescriptlang.org/)
A simple, lightweight library that turns website DOMs into actionable context for browser agents.
## Installation
```bash
npm install @agentic-intelligence/dom-engine
```
## Usage
### Basic Usage
```typescript
import { getInteractiveContext, scrollToNewContent, executeActions } from '@agentic-intelligence/dom-engine';
// 1. Analyze page and get interactive elements
const domData = getInteractiveContext({ injectTrackers: true });
console.log('Interactive elements:', domData.interactiveElements);
console.log('Scroll info:', domData.scrollInfo);
// 2. Execute actions on elements
const actions = [
{
agenticPurposeId: domData.interactiveElements.inputs[0].agenticPurposeId,
actionType: "type",
value: "hello@example.com"
},
{
agenticPurposeId: domData.interactiveElements.buttons[0].agenticPurposeId,
actionType: "click"
}
];
const result = executeActions(actions);
console.log('Actions executed:', result.results);
// 3. Navigate with smart scroll
const scrollResult = scrollToNewContent();
if (scrollResult.success) {
console.log('Scrolled to:', scrollResult.scrolledTo);
}
```
### DOM Analysis
```typescript
import { getInteractiveContext } from '@agentic-intelligence/dom-engine';
// Analyze entire page
const domData = getInteractiveContext({ injectTrackers: true });
console.log('Buttons found:', domData.interactiveElements.buttons);
console.log('Inputs found:', domData.interactiveElements.inputs);
console.log('Links found:', domData.interactiveElements.links);
console.log('Total elements:', domData.interactiveElements.total);
// With custom context (for extensions, iframes, etc.)
const customContext = { document: someDocument, window: someWindow };
const domDataCustom = getInteractiveContext({
injectTrackers: true,
context: customContext
});
```
### Example Response
Here's what a typical response looks like:
```typescript
const domData = getInteractiveContext({ injectTrackers: true });
// Example response structure:
// Note: agenticPurposeId is a unique identifier that AI agents can use to reference
// and interact with specific elements when performing actions on the page.
{
interactiveElements: {
total: 5,
buttons: [
{
text: "Submit",
agenticPurposeId: "a1b2c3d4",
className: "btn btn-primary",
onclick: "Yes",
tabindex: 0
}
],
inputs: [
{
text: "Placeholder: Enter your email | Name: email",
agenticPurposeId: "e5f6g7h8",
type: "email",
className: "form-control",
onclick: "No",
tabindex: 0
}
],
links: [
{
text: "Text: Learn more | Title: Documentation",
agenticPurposeId: "i9j0k1l2",
href: "/docs",
className: "nav-link",
onclick: "No",
tabindex: 0
}
],
...
},
scrollInfo: {
totalHeight: 2000,
viewportHeight: 800,
scrollTop: 0,
verticalScrollPercentage: 0,
remainingHeight: 1200,
nextContentPixel: 800
}
}
```
### Scroll Management
```typescript
import { getInteractiveContext, scrollToNewContent } from '@agentic-intelligence/dom-engine';
// Get scroll information (no parameters needed!)
const domData = getInteractiveContext();
console.log('Scroll percentage:', domData.scrollInfo.verticalScrollPercentage);
console.log('Remaining content:', domData.scrollInfo.remainingHeight);
// Scroll to new content (automatically handles scroll to top if no new content)
const result = scrollToNewContent();
console.log('Scrolled to:', result.scrolledTo);
```
### Action Execution
```typescript
import { executeActions } from '@agentic-intelligence/dom-engine';
// Execute multiple actions
const actions = [
{
agenticPurposeId: "a1b2c3d4",
actionType: "type",
value: "user@example.com"
},
{
agenticPurposeId: "a1b2c3d4",
actionType: "click"
}
];
const result = executeActions(actions);
console.log('Results:', result.results);
// With custom context
const customContext = { document: someDoc, window: someWin };
const resultCustom = executeActions(actions, customContext);
```
**Available Action Types:**
- `click`: Click on buttons, links, or any clickable element
- `type`: Type text into inputs, textareas, or contentEditable elements
**Human-like Interaction:**
- Simulates realistic mouse events with coordinates
- Multiple fallback methods for reliable clicking
- Proper event sequences (mouseover, mousedown, mouseup, click)
- Keyboard events for activation
### Scroll Management
```typescript
import { scrollToNewContent } from '@agentic-intelligence/dom-engine';
// Smart scroll to new content
const scrollResult = scrollToNewContent();
console.log('Scrolled to:', scrollResult.scrolledTo);
// With custom context
const customContext = { document: someDoc, window: someWin };
const result = scrollToNewContent(customContext);
```
**Smart Scroll Behavior:**
- If there's new content below: scrolls to the next unseen content
- If no new content available: scrolls back to the top (pixel 0)
- Always returns `success: true` with the scroll position
## Project Structure
```
src/
├── core/
│ └── dom-engine.ts # Main DOM analysis engine
├── read/
│ ├── element-analyzer.ts # Element text extraction and analysis
│ └── interactive-finder.ts # Interactive element detection
├── scroll/
│ └── scroll-manager.ts # Scroll calculation and navigation
├── actions/
│ ├── executor.ts # Action coordination and execution
│ ├── click.ts # Click action implementation
│ ├── type.ts # Type action implementation
│ └── raw.ts # Original extracted code
├── utils/
│ └── helpers.ts # Utility functions
├── types.ts # TypeScript type definitions
└── index.ts # Public API exports
```
## Use Case Example
### 🤖 AI Agents & Automation
```typescript
// AI agent workflow
const domData = getInteractiveContext({ injectTrackers: true });
const actions = aiAgent.decideActions(domData.interactiveElements);
const result = executeActions(actions);
```
### 🧪 E2E Testing
```typescript
// Automated testing
const actions = [
{ agenticPurposeId: "email-input", actionType: "type", value: "test@example.com" },
{ agenticPurposeId: "submit-btn", actionType: "click" }
];
const result = executeActions(actions);
assert(result.results.every(r => r.success));
```
### 🔌 Browser Extensions
```typescript
// Extension content script
const customContext = { document, window };
const domData = getInteractiveContext({
injectTrackers: true,
context: customContext
});
```
## Features
### ✅ Core Functionality
- **Smart Element Analysis**: Automatically detects interactive elements (buttons, inputs, links)
- **Advanced Categorization**: Classifies elements by type and functionality
- **Human-like Actions**: Click and type with realistic event simulation
- **Smart Scroll Management**: Intelligent scroll control with automatic top return
- **Visibility Filtering**: Only processes actually visible elements
- **Element Tracking**: Inject unique IDs for agent tracking and interaction
### ✅ Technical Features
- **Zero Dependencies**: Pure JavaScript, no external libraries
- **Cross-Platform**: Works in modern browsers and Node.js
- **Custom DOM Context**: Support for analyzing different document contexts (extensions, iframes)
- **TypeScript Support**: Full type definitions and IntelliSense
- **Modular Architecture**: Clean separation of concerns
### 🔲 Planned Features
- **Interaction History**: Track and maintain history of interacted elements
- **Advanced Actions**: Hover, drag & drop, keyboard shortcuts
- **Iframe Processing**: Enhanced support for analyzing and interacting with iframe content
- **Performance Optimization**: Lazy loading and caching for large pages
## API Reference
### Core Functions
#### `getInteractiveContext(options?)`
Analyzes the DOM and returns interactive elements with scroll information.
**Parameters:**
- `options.injectTrackers?: boolean` - Inject unique IDs for action tracking
- `options.context?: DOMContext` - Custom DOM context for extensions/iframes
**Returns:** `DOMExtractionResult`
#### `executeActions(actions, context?)`
Executes multiple actions on DOM elements.
**Parameters:**
- `actions: Action[]` - Array of actions to execute
- `context?: DOMContext` - Custom DOM context
**Returns:** `ActionsResult`
#### `scrollToNewContent(context?)`
Scrolls to new content or returns to top if no new content available.
**Parameters:**
- `context?: DOMContext` - Custom DOM context
**Returns:** `ScrollResult`
### Types
## Roadmap
- ✅ **Smart Element Analysis**: Automatically detects interactive elements (buttons, inputs, links)
- ✅ **Advanced Categorization**: Classifies elements by type and functionality
- ✅ **Smart Scroll Management**: Intelligent scroll control with automatic top return
- ✅ **Visibility Filtering**: Only processes actually visible elements
- ✅ **Zero Dependencies**: Pure JavaScript, no external libraries
- ✅ **Cross-Platform**: Works in modern browsers and Node.js
- ✅ **Custom DOM Context**: Support for analyzing different document contexts (extensions, iframes)
- ✅ **Element Tracking**: Inject unique IDs for agent tracking and interaction
- 🔲 **Interaction History**: Track and maintain history of interacted elements
- 🔲 **Iframe Processing**: Support for analyzing and interacting with iframe content
## Contributing
Contributions are welcome! Please:
1. Fork the project
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.