Add AI bot classification for event enrichment#253
Open
jaredmixpanel wants to merge 7 commits intomasterfrom
Open
Add AI bot classification for event enrichment#253jaredmixpanel wants to merge 7 commits intomasterfrom
jaredmixpanel wants to merge 7 commits intomasterfrom
Conversation
Part of AI bot classification feature for Node.js SDK.
Part of AI bot classification feature for Node.js SDK.
Part of AI bot classification feature for Node.js SDK.
Part of AI bot classification feature for Node.js SDK.
Fix CI formatting check for AI bot classification files.
Fix CI lint check in AI bot middleware.
There was a problem hiding this comment.
Pull request overview
This PR adds AI bot classification middleware to automatically detect and enrich events from AI crawler requests (GPTBot, ClaudeBot, PerplexityBot, etc.). It implements a pattern-based user-agent classifier that enriches Mixpanel events with bot detection properties when the $user_agent field is present.
Changes:
- Adds a classifier module that matches user-agent strings against 12 known AI bot patterns
- Implements middleware that wraps
send_event_requestto enrich events with classification properties - Exports the new functionality via
Mixpanel.aiandMixpanel.AiBotClassifier
Reviewed changes
Copilot reviewed 5 out of 8 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/ai_bot_classifier.js | Core classification logic with AI bot pattern database and matching functions |
| lib/ai_bot_classifier.d.ts | TypeScript type definitions for classifier functions and interfaces |
| lib/ai_bot_middleware.js | Middleware implementation that wraps send_event_request and helper for HTTP request tracking |
| lib/ai_bot_middleware.d.ts | TypeScript type definitions for middleware functions and options |
| lib/mixpanel-node.js | Exports the new ai middleware and AiBotClassifier modules |
| lib/mixpanel-node.d.ts | TypeScript module augmentation attempting to add new exports |
| test/ai_bot_classifier.js | Comprehensive tests for classifier covering all 12 bot patterns and edge cases |
| test/ai_bot_middleware.js | Integration tests for middleware including configuration options and limitations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add missing $ai_bot_category assertions for 4 bot tests - Prevent mutation of input properties in track_request - Add double-wrapping guard for enable_bot_classification - Fix JSDoc comment accuracy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds AI bot classification middleware/integration that automatically detects AI crawler requests (GPTBot, ClaudeBot, PerplexityBot, etc.) and enriches tracked events with classification properties.
What it does
$is_ai_bot,$ai_bot_name,$ai_bot_provider, and$ai_bot_categorypropertiesAI Bots Detected
GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, Google-Extended, PerplexityBot, Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent, cohere-ai
Implementation Details
Architecture
enable_bot_classification(mixpanel)wrapssend_event_requeston the client instance — the single chokepoint fortrack()andimport()calls$user_agent){ enable(), disable() }controller for runtime toggling_ai_bot_classification_enabledguard on the mixpanel instance to prevent double-wrappingPublic API
ai_bot_classifiermodule:classify_user_agent(userAgent)(string | null | undefined) => AiBotClassificationcreate_classifier(options)({ additional_bots?: AiBotEntry[] }) => (userAgent: string) => AiBotClassificationget_bot_database()() => AiBotEntry[]ai_bot_middlewaremodule:enable_bot_classification(mixpanel, options?)(mixpanel, BotClassificationOptions?) => BotClassificationControllersend_event_requestto auto-enrich events; returns{ enable(), disable() }track_request(mixpanel, req, eventName, properties?, callback?)(mixpanel, IncomingMessage, string, object?, Function?) => voiduser-agentand IP from an HTTP request and callsmixpanel.track()BotClassificationOptions:user_agent_property(default"$user_agent") — property key to read the UA fromproperty_prefix(default"$") — prefix for injected classification propertiesadditional_bots— array of{ pattern: RegExp, name, provider, category }checked before built-in botsNotable Design Decisions
send_event_request, nottrack(): This catches bothtrack()andimport()calls through a single interception point, avoiding the need to wrap multiple methodscreate_classifierspreadsadditional_botsbeforeAI_BOT_DATABASEso custom patterns take priority over built-in onesBotName/(e.g./GPTBot\//i) to avoid false positives on substrings — the trailing slash is part of the standard bot UA version tokenUsage Examples
Automatic Event Enrichment
Standalone Classification
Custom Bot Patterns
Framework Integration (Express)
Files Added
lib/ai_bot_classifier.d.tslib/ai_bot_classifier.jslib/ai_bot_middleware.d.tslib/ai_bot_middleware.jstest/ai_bot_classifier.jstest/ai_bot_middleware.jsFiles Modified
lib/mixpanel-node.d.tslib/mixpanel-node.jsTest Plan
$is_ai_bot: false(Chrome, Googlebot, curl, etc.)