bear-tracker
Version:
Lightweight bot detection middleware for tracking AI crawler visits (OpenAI, ChatGPT, etc.) with API support and analytics
237 lines (179 loc) • 6.78 kB
Markdown
# Bear Tracker 🐻
A lightweight, zero-dependency npm package for tracking AI/LLM bots (OpenAI, Google, etc.) in web applications. Perfect for Vercel, Express.js, Next.js, and other Node.js frameworks.
## Features
- 🤖 **AI Bot Detection**: Identifies OpenAI bots (GPTBot, ChatGPT-User, OAI-SearchBot), Googlebot, and other AI/LLM bots
- 🚀 **Minimal Integration**: Less than 5 lines of code to get started
- 📊 **Structured Logging**: Vercel-friendly JSON logs for easy parsing and analysis
- 🎯 **AI-Focused**: Specialized for tracking AI training, search, and user interaction bots
- 🔧 **Framework Agnostic**: Works with Express, Next.js, Fastify, and any Node.js middleware system
- 📦 **Zero Dependencies**: Lightweight with no external dependencies
## Installation
```bash
npm install bear-tracker
```
## Quick Start (< 5 lines)
### Express.js
```javascript
const express = require('express');
const { createBotTracker } = require('bear-tracker');
const app = express();
app.use(createBotTracker('info')); // Only this line needed!
// Your existing routes...
```
### Next.js API Routes
```javascript
// middleware.js
import { createBotTracker } from 'bear-tracker';
export const middleware = createBotTracker('warn'); // Only this line needed!
export const config = {
matcher: '/api/:path*'
};
```
### Express with Custom Logging
```javascript
const { createCustomBotTracker } = require('bear-tracker');
app.use(createCustomBotTracker((botInfo) => {
if (botInfo.isBot) console.log(`AI Bot detected: ${botInfo.name} - ${botInfo.description}`);
}));
```
## Detected AI/LLM Bots
The package specializes in detecting these AI and search bots:
### OpenAI Bots
- **OAI-SearchBot**: OpenAI SearchBot for linking and surfacing websites in ChatGPT search results
- **ChatGPT-User**: ChatGPT user actions and Custom GPTs web interactions
- **GPTBot**: OpenAI GPTBot for training generative AI foundation models
### Search Engines
- **Googlebot**: Google web crawler for search indexing
### Additional AI Bots
- **Claude-Web**: Anthropic Claude web interactions
- **Bard**: Google Bard AI interactions
- **AI Bot**: Generic AI or bot-like user agents
## Advanced Usage
### Full Configuration
```javascript
const { BotTracker } = require('bear-tracker');
const tracker = new BotTracker({
enableLogging: true,
trackOnlyBots: true, // Only log when AI bots are detected
includeIp: true, // Include IP addresses in logs
logLevel: 'warn', // 'info', 'warn', or 'error'
customLogger: (botInfo) => {
// Your custom logging logic
console.log(`${botInfo.name}: ${botInfo.description} from ${botInfo.ip}`);
}
});
app.use(tracker.middleware());
```
### Accessing Bot Info in Routes
```javascript
app.get('/api/data', (req, res) => {
const botInfo = res.locals.botInfo;
if (botInfo.isBot) {
console.log(`API accessed by ${botInfo.name}: ${botInfo.description}`);
// Handle different AI bot types
if (botInfo.type === 'ai_training') {
// GPTBot - you might want to limit what content is accessible
res.json({ message: 'Limited data for training bots' });
} else if (botInfo.type === 'ai_search') {
// OAI-SearchBot - optimize for search indexing
res.json({ data: 'SEO-optimized content for search' });
}
}
res.json({ data: 'your data' });
});
```
### Manual Bot Detection
```javascript
const { detectBotFromUserAgent } = require('bear-tracker');
const userAgent = 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot';
const result = detectBotFromUserAgent(userAgent);
console.log(result);
// {
// name: 'GPTBot',
// type: 'ai_training',
// isBot: true,
// userAgent: '...',
// timestamp: 2024-01-01T12:00:00.000Z,
// description: 'OpenAI GPTBot for training generative AI foundation models'
// }
```
## Log Output Format
The structured logs are perfect for Vercel and other serverless platforms:
```json
{
"timestamp": "2024-01-01T12:00:00.000Z",
"bot_detected": true,
"bot_name": "GPTBot",
"bot_type": "ai_training",
"bot_description": "OpenAI GPTBot for training generative AI foundation models",
"user_agent": "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1...",
"ip_address": "40.84.180.224"
}
```
## Framework-Specific Examples
### Vercel/Next.js
```javascript
// middleware.js
import { createBotTracker } from 'bear-tracker';
export const middleware = createBotTracker('info');
export const config = {
matcher: [
'/api/:path*',
'/((?!_next/static|favicon.ico).*)'
]
};
```
### Express.js with AI Bot Analytics
```javascript
const express = require('express');
const { BotTracker } = require('bear-tracker');
const app = express();
const aiTracker = new BotTracker({
logLevel: 'warn',
trackOnlyBots: true,
customLogger: (botInfo) => {
// Send AI bot data to your analytics service
analytics.track('ai_bot_visit', {
bot_name: botInfo.name,
bot_type: botInfo.type,
bot_description: botInfo.description,
timestamp: botInfo.timestamp
});
}
});
app.use(aiTracker.middleware());
```
## Use Cases for AI Bot Tracking
- **AI Training Control**: Detect GPTBot and control what content is used for AI training
- **Search Optimization**: Optimize content delivery for OAI-SearchBot and Googlebot
- **Rate Limiting**: Apply different limits for AI bots vs human users
- **Content Strategy**: Track which AI services are accessing your content
- **Compliance**: Monitor and log AI bot access for regulatory requirements
- **Performance**: Serve optimized responses to different types of AI bots
## API Reference
### `createBotTracker(logLevel?)`
Quick setup function that tracks only AI bots.
- `logLevel`: `'info' | 'warn' | 'error'` (default: `'info'`)
### `createCustomBotTracker(customLogger)`
Setup with custom logging function.
- `customLogger`: `(botInfo: BotInfo) => void`
### `BotTracker(options?)`
Full-featured class with configuration options.
### `detectBotFromUserAgent(userAgent)`
Manually detect if a user-agent string belongs to an AI bot.
## TypeScript Support
Full TypeScript support with exported types:
```typescript
import { BotInfo, BotTrackerOptions, BotTracker } from 'bear-tracker';
const options: BotTrackerOptions = {
enableLogging: true,
trackOnlyBots: true
};
const tracker = new BotTracker(options);
```
## License
MIT
## Contributing
Contributions welcome! Please feel free to submit issues and pull requests.
---
**Perfect for Vercel deployments** - The structured JSON logs integrate seamlessly with Vercel's logging and analytics systems. Track OpenAI bots, Google crawlers, and other AI services with minimal code.