UNPKG

bear-tracker

Version:

Lightweight bot detection middleware for tracking AI crawler visits (OpenAI, ChatGPT, etc.) with API support and analytics

237 lines (179 loc) 6.78 kB
# Bear Tracker 🐻 A lightweight, zero-dependency npm package for tracking AI/LLM bots (OpenAI, Google, etc.) in web applications. Perfect for Vercel, Express.js, Next.js, and other Node.js frameworks. ## Features - 🤖 **AI Bot Detection**: Identifies OpenAI bots (GPTBot, ChatGPT-User, OAI-SearchBot), Googlebot, and other AI/LLM bots - 🚀 **Minimal Integration**: Less than 5 lines of code to get started - 📊 **Structured Logging**: Vercel-friendly JSON logs for easy parsing and analysis - 🎯 **AI-Focused**: Specialized for tracking AI training, search, and user interaction bots - 🔧 **Framework Agnostic**: Works with Express, Next.js, Fastify, and any Node.js middleware system - 📦 **Zero Dependencies**: Lightweight with no external dependencies ## Installation ```bash npm install bear-tracker ``` ## Quick Start (< 5 lines) ### Express.js ```javascript const express = require('express'); const { createBotTracker } = require('bear-tracker'); const app = express(); app.use(createBotTracker('info')); // Only this line needed! // Your existing routes... ``` ### Next.js API Routes ```javascript // middleware.js import { createBotTracker } from 'bear-tracker'; export const middleware = createBotTracker('warn'); // Only this line needed! export const config = { matcher: '/api/:path*' }; ``` ### Express with Custom Logging ```javascript const { createCustomBotTracker } = require('bear-tracker'); app.use(createCustomBotTracker((botInfo) => { if (botInfo.isBot) console.log(`AI Bot detected: ${botInfo.name} - ${botInfo.description}`); })); ``` ## Detected AI/LLM Bots The package specializes in detecting these AI and search bots: ### OpenAI Bots - **OAI-SearchBot**: OpenAI SearchBot for linking and surfacing websites in ChatGPT search results - **ChatGPT-User**: ChatGPT user actions and Custom GPTs web interactions - **GPTBot**: OpenAI GPTBot for training generative AI foundation models ### Search Engines - **Googlebot**: Google web crawler for search indexing ### Additional AI Bots - **Claude-Web**: Anthropic Claude web interactions - **Bard**: Google Bard AI interactions - **AI Bot**: Generic AI or bot-like user agents ## Advanced Usage ### Full Configuration ```javascript const { BotTracker } = require('bear-tracker'); const tracker = new BotTracker({ enableLogging: true, trackOnlyBots: true, // Only log when AI bots are detected includeIp: true, // Include IP addresses in logs logLevel: 'warn', // 'info', 'warn', or 'error' customLogger: (botInfo) => { // Your custom logging logic console.log(`${botInfo.name}: ${botInfo.description} from ${botInfo.ip}`); } }); app.use(tracker.middleware()); ``` ### Accessing Bot Info in Routes ```javascript app.get('/api/data', (req, res) => { const botInfo = res.locals.botInfo; if (botInfo.isBot) { console.log(`API accessed by ${botInfo.name}: ${botInfo.description}`); // Handle different AI bot types if (botInfo.type === 'ai_training') { // GPTBot - you might want to limit what content is accessible res.json({ message: 'Limited data for training bots' }); } else if (botInfo.type === 'ai_search') { // OAI-SearchBot - optimize for search indexing res.json({ data: 'SEO-optimized content for search' }); } } res.json({ data: 'your data' }); }); ``` ### Manual Bot Detection ```javascript const { detectBotFromUserAgent } = require('bear-tracker'); const userAgent = 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot'; const result = detectBotFromUserAgent(userAgent); console.log(result); // { // name: 'GPTBot', // type: 'ai_training', // isBot: true, // userAgent: '...', // timestamp: 2024-01-01T12:00:00.000Z, // description: 'OpenAI GPTBot for training generative AI foundation models' // } ``` ## Log Output Format The structured logs are perfect for Vercel and other serverless platforms: ```json { "timestamp": "2024-01-01T12:00:00.000Z", "bot_detected": true, "bot_name": "GPTBot", "bot_type": "ai_training", "bot_description": "OpenAI GPTBot for training generative AI foundation models", "user_agent": "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1...", "ip_address": "40.84.180.224" } ``` ## Framework-Specific Examples ### Vercel/Next.js ```javascript // middleware.js import { createBotTracker } from 'bear-tracker'; export const middleware = createBotTracker('info'); export const config = { matcher: [ '/api/:path*', '/((?!_next/static|favicon.ico).*)' ] }; ``` ### Express.js with AI Bot Analytics ```javascript const express = require('express'); const { BotTracker } = require('bear-tracker'); const app = express(); const aiTracker = new BotTracker({ logLevel: 'warn', trackOnlyBots: true, customLogger: (botInfo) => { // Send AI bot data to your analytics service analytics.track('ai_bot_visit', { bot_name: botInfo.name, bot_type: botInfo.type, bot_description: botInfo.description, timestamp: botInfo.timestamp }); } }); app.use(aiTracker.middleware()); ``` ## Use Cases for AI Bot Tracking - **AI Training Control**: Detect GPTBot and control what content is used for AI training - **Search Optimization**: Optimize content delivery for OAI-SearchBot and Googlebot - **Rate Limiting**: Apply different limits for AI bots vs human users - **Content Strategy**: Track which AI services are accessing your content - **Compliance**: Monitor and log AI bot access for regulatory requirements - **Performance**: Serve optimized responses to different types of AI bots ## API Reference ### `createBotTracker(logLevel?)` Quick setup function that tracks only AI bots. - `logLevel`: `'info' | 'warn' | 'error'` (default: `'info'`) ### `createCustomBotTracker(customLogger)` Setup with custom logging function. - `customLogger`: `(botInfo: BotInfo) => void` ### `BotTracker(options?)` Full-featured class with configuration options. ### `detectBotFromUserAgent(userAgent)` Manually detect if a user-agent string belongs to an AI bot. ## TypeScript Support Full TypeScript support with exported types: ```typescript import { BotInfo, BotTrackerOptions, BotTracker } from 'bear-tracker'; const options: BotTrackerOptions = { enableLogging: true, trackOnlyBots: true }; const tracker = new BotTracker(options); ``` ## License MIT ## Contributing Contributions welcome! Please feel free to submit issues and pull requests. --- **Perfect for Vercel deployments** - The structured JSON logs integrate seamlessly with Vercel's logging and analytics systems. Track OpenAI bots, Google crawlers, and other AI services with minimal code.