mcp-talent-server
Version:
Model Context Protocol server for talent management tools
152 lines (131 loc) • 8.15 kB
JavaScript
import { z } from 'zod';
import SheetVectorData from '../models/sheet-vector.model.js';
const StructuredSheetSearchSchema = z.object({
aggregationPipeline: z.array(z.record(z.any()))
.describe("Pre-generated MongoDB aggregation pipeline to execute directly"),
limit: z.number().min(1).max(100).default(10),
skip: z.number().min(0).default(0),
});
export class StructuredSheetSearchTool {
async searchInAllSheets(input, userId, sessionId) {
try {
const results = await SheetVectorData.aggregate([...input.aggregationPipeline, { $skip: input.skip }, { $limit: input.limit }]);
const totalRecords = await SheetVectorData.aggregate([...input.aggregationPipeline, { $count: "totalRecords" }]);
return {
results,
totalRecords: totalRecords[0].totalRecords,
};
}
catch (error) {
console.error("Structured sheet search failed:", error);
throw new Error(error);
}
}
}
export const StructuredSheetSearchDescription = `**Structured Sheets Search - Multi-Sheet Query Intelligence Tool**
**🎯 CRITICAL: Multi-Sheet Query Strategy**
When user queries require data from multiple sheets (e.g., "talents in CA who use Verizon"):
1. **ANALYZE**: Identify which sheets contain the required data fields
2. **DECOMPOSE**: Break query into sheet-specific components
3. **EXECUTE**: Call this tool MULTIPLE TIMES - once per relevant sheet
4. **COMBINE**: Merge results based on common identifiers (name, talent ID)
**Query Analysis Examples**:
• "talents in CA who use Verizon" → Query demographics sheet for CA residents + mobile carrier sheet for Verizon users
• "users aged 25-35 with pets" → Query demographics sheet for age + lifestyle sheet for pet ownership
• "influencers with 100K+ followers using iPhone" → Query social metrics sheet + device preferences sheet
**Primary Purpose**: Execute intelligent multi-sheet queries using MongoDB aggregation pipelines on structured talent and demographic data.
**Core Functionality**:
• **Multi-Sheet Intelligence**: Automatically identifies which sheets contain relevant data
• **Schema-Driven Queries**: Uses validated database schemas to ensure accurate field references
• **MongoDB Aggregation**: Executes powerful pipeline operations ($match, $group, $project, $sort)
• **Cross-Sheet Analysis**: Orchestrates multiple queries across different sheets
• **Demographic Filtering**: Advanced filtering on age, location, interests, platform metrics
• **Statistical Operations**: Aggregations, averages, counts, and data grouping
• **Pagination Support**: Handle large datasets with **limit** and **skip** parameters (Important: **limit** and **skip** are not part of the aggregation pipeline, they are used to paginate the results)
**Multi-Sheet Query Process**:
• **Sheet Identification**: Analyze query to identify which sheets contain required fields
• **Query Decomposition**: Break complex queries into sheet-specific sub-queries
• **Pipeline Generation**: Generate targeted aggregation for each relevant sheet
• **Result Coordination**: Use sheetName filtering and talent matching for result correlation
**Data Domains Available**:
• **Demographics**: Age distributions, geographic locations, ethnicity, languages
• **Contact Information**: Names, addresses, emails, phone numbers by region
• **Social Media Metrics**: Follower counts, engagement rates, platform activity
• **Audience Analytics**: Gender/age/location breakdowns across Instagram/TikTok/YouTube
• **Lifestyle Data**: Pets, health conditions, hobbies, interests, vehicle ownership
• **Professional Info**: Employment status, business ownership, brand partnerships
• **Platform Performance**: Views, clicks, reach, story/reel engagement
• **Subscription Data**: Service memberships, costs, billing frequencies
• **Content Categorization**: Tags, labels, and profile segmentation
**Advanced Query Capabilities**:
• **Complex Filtering**: Multi-condition searches with logical operators
• **Text Matching**: Case-insensitive regex searches with fuzzy matching
• **Numeric Ranges**: Age brackets, follower thresholds, engagement minimums
• **Array Operations**: Handle multi-value fields (interests, platforms, languages)
• **Relationship Queries**: Family structures, household demographics
• **Geographic Searches**: State codes, city names, timezone filtering
**Pipeline Generation Rules** (Critical):
✓ MUST use "originalData." prefix for all field names
✓ Generate complete MongoDB aggregation pipeline
✓ Validate all fields against provided schema
✓ End with $project stage containing only requested fields
✓ Use $regex with "$options": "i" for text searches
✓ Apply $elemMatch for complex array queries
✓ Handle person/relationship structures properly
For searching In structured sheets use the following tools:
• structured_sheets
• structured_sheet_schema
• structured_sheets_schema
• structured_sheets_search
**Example Use Cases & Multi-Sheet Handling**:
**Single-Sheet Queries**:
✓ "Find influencers in California" (demographics sheet only)
✓ "Show Instagram followers >100K" (social metrics sheet only)
✓ "List users with pets" (lifestyle sheet only)
**Multi-Sheet Queries** (requires multiple tool calls):
✓ "Find influencers in California with >100K Instagram followers"
→ Call 1: Query demographics sheet for California residents
→ Call 2: Query social metrics sheet for >100K Instagram followers
→ Merge results by talent name/ID
✓ "Show talents aged 18-24 who use Verizon"
→ Call 1: Query demographics sheet for age 18-24
→ Call 2: Query mobile carrier sheet for Verizon users
→ Merge results by talent name/ID
✓ "List users with pets who are comfortable promoting health products"
→ Call 1: Query lifestyle sheet for pet ownership
→ Call 2: Query preferences sheet for health product promotion
→ Merge results by talent name/ID
**Response Features**:
• **Structured Results**: Clean, formatted data matching query requirements
• **Pagination Info**: Total counts, page numbers, result limits
• **Processing Stats**: Execution time, matched documents, performance metrics
• **Schema Information**: Available fields and data types for reference
• **Error Handling**: Detailed validation errors and pipeline debugging
**Parameters**:
• aggregationPipeline: Pre-generated MongoDB pipeline
• limit: Number of results to return (default: 10, max: 100)
• skip: Number of results to skip (default: 0)
**Dynamic Schema Support**:
• **Schema Flexibility**: Accept custom schemas for different data structures
• **Field Validation**: Ensure all pipeline fields exist in provided schema
• **Schema Sources**: Support provided schema, service-fetched schema, or default fallback
• **Runtime Adaptation**: Dynamically adjust pipeline generation based on schema structure
**Security & Validation**:
• Pipeline sanitization prevents dangerous operations
• Field validation against schema prevents injection attacks
• Result validation ensures data integrity
• Error boundaries for graceful failure handling
**Fallback Tool**:
For search in unstructered or raw sheets, use the fallback tool "orignal_sheet_content_search", it will use the aggregation pipeline to search in the sheets, it is more powerful than this tool.
Its contains list of tools to search in unstructered or raw sheets.
Tool List:
• orignal_sheets_list
• orignal_sheet_columns
• orignal_sheet_content_search
**Important Note**:
• Always return complete dataset by calling to this tool multiple times with different **skip** and **limit** parameters
• Always use **limit** and **skip** parameters to paginate the results and optimize the results and output tokens
• Before giving irrelevant results, always use the fallback tool to search in unstructered or raw sheets to verify the results
`;
export const structuredSheetSearchSchema = StructuredSheetSearchSchema;
//# sourceMappingURL=structured-sheets-search.js.map