UNPKG

@anyshift/mcp-tools-common

Version:

Reusable JQ tool and file writing utilities for MCP servers

626 lines (478 loc) 18.4 kB
# @anyshift/mcp-tools-common Reusable utilities for building MCP (Model Context Protocol) servers. Provides production-ready tools for JSON processing and intelligent response handling. ## What's Included ### 🔧 JQ Query Tool Execute [jq](https://jqlang.github.io/jq/) queries on JSON files with AI-optimized error messages and schema hints. **Features:** - Sandboxed jq execution with timeout protection - Path validation for security (no arbitrary file access) - Query sanitization (blocks environment variable access) - Schema-aware error messages that help LLMs write better queries - Comprehensive retry strategies in tool descriptions ### 📄 Smart File Writer Automatically write large tool responses to files instead of returning them inline. **Features:** - Threshold-based file writing (configurable character limit) - JSON schema analysis and quick reference generation - Compact, timestamped filenames with tool abbreviations - Nullable field detection for better JQ query guidance - Returns file references with schema hints instead of massive text ### 🔍 JSON Schema Analyzer Deep schema analysis for JSON data structures. **Features:** - Detects numeric string keys (common in Cypher results: `{"0": {...}, "1": {...}}`) - Identifies nullable vs always-null fields - Handles mixed-type arrays, nested objects, and complex structures - Generates LLM-friendly access patterns and hints ### 🛡️ Path Validation Secure file path validation with allowlist support. **Features:** - Requires absolute paths (prevents relative path ambiguity) - Validates files exist and are within allowed directories - Resolves symlinks for security - Clear error messages with examples ### ✂️ Response Truncation Automatic token-based response size limiting to prevent LLM context overflow. **Features:** - Token estimation using configurable chars/token ratio - Configurable token limits (e.g., 10k for Datadog, 15k for Anyshift) - Optional JSON logging to stderr for monitoring - Customizable truncation notice messages - Preserves as much content as possible before truncating ## Installation ```bash npm install @anyshift/mcp-tools-common # or pnpm add @anyshift/mcp-tools-common ``` ## Integration Guide ### Option 1: Using High-Level Factory Functions (Recommended) The easiest way to integrate - use `createJqTool()` and `createFileWriter()`: ```typescript import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; import { createJqTool, createFileWriter } from '@anyshift/mcp-tools-common'; const server = new McpServer({ name: 'my-mcp-server', version: '1.0.0', }); // 1. Configure File Writer const fileWriter = createFileWriter({ enabled: process.env.WRITE_TO_FILE === 'true', outputPath: process.env.OUTPUT_PATH || './output', minCharsForWrite: 1000, // Write to file if response > 1000 chars toolAbbreviations: { 'my_data_tool': 'data', 'my_search_tool': 'srch', } }); // 2. Configure JQ Tool const jqTool = createJqTool({ allowedPaths: [ process.cwd(), // Allow files in execution directory process.env.OUTPUT_PATH || './output', // Allow files in output directory ], timeoutMs: 30000, // 30 second timeout }); // 3. Configure Response Truncation import { truncateResponseIfNeeded, type TruncationConfig } from '@anyshift/mcp-tools-common'; const truncationConfig: TruncationConfig = { maxTokens: 15000, // Max tokens before truncation enableLogging: false, // Optional: log truncation events to stderr charsPerToken: 4, // Token estimation ratio (default: 4) }; // 4. Register JQ Tool with MCP Server server.tool( jqTool.toolDefinition.name, jqTool.toolDefinition.description, jqTool.toolDefinition.inputSchema, async (args) => { return await jqTool.handler({ params: { arguments: args } }); } ); // 5. Wrap Your Other Tools with File Writer and Truncation server.tool( 'my_data_tool', 'Fetch large datasets', { query: { type: 'string' } }, async ({ query }) => { const result = await fetchData(query); // handleResponse will: // - Return result inline if small // - Write to file and return file reference if large let response = await fileWriter.handleResponse( 'my_data_tool', { query }, result ); // Apply truncation to prevent context overflow if (response.content?.[0]?.text) { response.content[0].text = truncateResponseIfNeeded( truncationConfig, response.content[0].text ); } return response; } ); ``` ### Option 2: Using Individual Functions (Advanced) For more control, import and use functions directly: ```typescript import { executeJqQuery, handleToolResponse, analyzeJsonSchema, generateCompactFilename, validatePathWithinAllowedDirs, type JqConfig, type FileWriterConfig } from '@anyshift/mcp-tools-common'; // Define your configs const jqConfig: JqConfig = { allowedPaths: ['/path/to/data'], timeoutMs: 30000, }; const fileWriterConfig: FileWriterConfig = { enabled: true, outputPath: '/tmp/output', minCharsForWrite: 500, toolAbbreviations: { 'my_tool': 'mt' } }; // Use functions directly server.tool('execute_jq_query', 'Run jq queries', schema, async ({ jq_query, file_path }) => { return await executeJqQuery(jqConfig, jq_query, file_path); }); server.tool('my_tool', 'Example tool', schema, async (args) => { const response = { content: [{ type: 'text', text: 'Large data...' }] }; return await handleToolResponse(fileWriterConfig, 'my_tool', args, response); }); ``` ### Option 3: Zero Dependencies on Environment Variables Pass configuration explicitly for maximum flexibility: ```typescript import { executeJqQuery } from '@anyshift/mcp-tools-common'; // No environment variables - pure configuration const result = await executeJqQuery( { allowedPaths: ['/specific/path'], timeoutMs: 10000, }, '.data[] | select(.active == true)', '/specific/path/data.json' ); ``` ## API Reference ### JQ Tool #### `executeJqQuery(config, jqQuery, filePath)` Execute a jq query on a JSON file. ```typescript import { executeJqQuery, type JqConfig } from '@anyshift/mcp-tools-common'; const config: JqConfig = { allowedPaths: ['/data'], timeoutMs: 30000, }; const result = await executeJqQuery( config, '.users[] | select(.age > 18)', '/data/users.json' ); // Returns: { content: [{ type: 'text', text: '...' }] } ``` **Parameters:** - `config: JqConfig` - Configuration object - `allowedPaths: string[]` - Absolute paths where files can be accessed - `timeoutMs: number` - Maximum execution time in milliseconds - `jqQuery: string` - The jq query to execute (will be sanitized) - `filePath: string` - Absolute path to JSON file (must be in allowedPaths) **Returns:** `Promise<{ content: Array<{ type: 'text'; text: string }> }>` #### `createJqTool(config)` Create a JQ tool with handler and definition. ```typescript const jqTool = createJqTool({ allowedPaths: ['/data'], timeoutMs: 30000 }); // Use with MCP SDK server.tool( jqTool.toolDefinition.name, jqTool.toolDefinition.description, jqTool.toolDefinition.inputSchema, async (args) => jqTool.handler({ params: { arguments: args } }) ); ``` ### File Writer #### `handleToolResponse(config, toolName, args, responseData)` Intelligently handle tool responses - write to file if large, return inline if small. ```typescript import { handleToolResponse, type FileWriterConfig } from '@anyshift/mcp-tools-common'; const config: FileWriterConfig = { enabled: true, outputPath: '/tmp/output', minCharsForWrite: 1000, toolAbbreviations: { 'search_data': 'srch' } }; const response = { content: [{ type: 'text', text: 'Very large dataset...' }], _rawText: 'Very large dataset...' // Optional: for better file writing }; const result = await handleToolResponse( config, 'search_data', { query: 'users' }, response ); // If large: { content: [{ type: 'text', text: '📄 File: /tmp/output/srch-20250115-1234-a1b2.json\n...' }] } // If small: returns response as-is ``` **Parameters:** - `config: FileWriterConfig` - Configuration object - `enabled: boolean` - Enable file writing - `outputPath: string` - Directory for output files - `minCharsForWrite: number` - Minimum characters to trigger file write - `toolAbbreviations: Record<string, string>` - Tool name abbreviations for filenames - `toolName: string` - Name of the tool (for filename generation) - `args: Record<string, unknown>` - Tool arguments (for filename hash) - `responseData: unknown` - The response to potentially write to file **Returns:** `Promise<FileWriterResult | unknown>` - Either file reference or original response #### `createFileWriter(config)` Create a file writer instance. ```typescript const fileWriter = createFileWriter({ enabled: true, outputPath: './output', minCharsForWrite: 1000, toolAbbreviations: { 'my_tool': 'mt' } }); const result = await fileWriter.handleResponse('my_tool', { arg: 'value' }, response); ``` ### Schema Analysis #### `analyzeJsonSchema(data)` Analyze JSON structure and generate schema with LLM-friendly hints. ```typescript import { analyzeJsonSchema } from '@anyshift/mcp-tools-common'; const data = { "0": { Values: [1, "Alice", null], Keys: ["id", "name", "email"] }, "1": { Values: [2, "Bob", "bob@example.com"], Keys: ["id", "name", "email"] } }; const schema = analyzeJsonSchema(data); console.log(schema); // { // type: 'object', // _keysAreNumeric: true, // _accessPattern: 'Use .["0"] not .[0]', // properties: { ... } // } ``` **Returns:** `JsonSchema` with special fields: - `_keysAreNumeric: boolean` - Object has numeric string keys (e.g., Cypher results) - `_accessPattern: string` - Hint for accessing data - `_hasNulls: boolean` - Contains null values (suggests filtering) - `_hint: string` - Suggestion for handling data #### `extractNullableFields(schema)` Extract nullable field information from schema. ```typescript import { analyzeJsonSchema, extractNullableFields } from '@anyshift/mcp-tools-common'; const schema = analyzeJsonSchema(data); const nullFields = extractNullableFields(schema); // { // alwaysNull: ['email'], // Fields that are always null // nullable: ['middleName'] // Fields that are sometimes null // } ``` ### Path Validation #### `validatePathWithinAllowedDirs(filePath, allowedPaths)` Validate that a file path is within allowed directories. ```typescript import { validatePathWithinAllowedDirs } from '@anyshift/mcp-tools-common'; try { const realPath = validatePathWithinAllowedDirs( '/data/users.json', ['/data', '/tmp'] ); console.log('Access granted:', realPath); } catch (error) { console.error('Access denied:', error.message); } ``` **Throws:** Error if: - Path is not absolute - File doesn't exist - File is outside allowed directories ### Response Truncation #### `truncateResponseIfNeeded(config, content)` Truncate response content if it exceeds the configured token limit. ```typescript import { truncateResponseIfNeeded, type TruncationConfig } from '@anyshift/mcp-tools-common'; const config: TruncationConfig = { maxTokens: 15000, enableLogging: false, // Set to true for JSON logs to stderr charsPerToken: 4, // Default: 4 chars per token }; const content = 'Very large response text...'; const truncated = truncateResponseIfNeeded(config, content); // Returns original if under limit, or truncated with notice if over ``` **Parameters:** - `config: TruncationConfig` - Configuration object - `maxTokens: number` - Maximum allowed tokens (e.g., 10000, 15000) - `enableLogging?: boolean` - Log truncation events to stderr as JSON (default: false) - `messagePrefix?: string` - Custom prefix for truncation notice (default: "RESPONSE TRUNCATED") - `charsPerToken?: number` - Characters per token ratio (default: 4) - `content: string` - The content to potentially truncate **Returns:** `string` - Original content if under limit, or truncated content with notice #### `estimateTokens(text, charsPerToken?)` Estimate token count using chars/token ratio. ```typescript import { estimateTokens } from '@anyshift/mcp-tools-common'; const tokens = estimateTokens('Hello world', 4); // Returns: 3 (11 chars / 4 = 2.75, rounded up to 3) ``` **Parameters:** - `text: string` - Text to estimate tokens for - `charsPerToken?: number` - Characters per token ratio (default: 4) **Returns:** `number` - Estimated token count #### `wouldBeTruncated(content, maxTokens, charsPerToken?)` Check if content would be truncated without actually truncating. ```typescript import { wouldBeTruncated } from '@anyshift/mcp-tools-common'; if (wouldBeTruncated(content, 15000)) { console.log('Content exceeds 15k tokens, will be truncated'); } ``` **Parameters:** - `content: string` - Content to check - `maxTokens: number` - Maximum token limit - `charsPerToken?: number` - Characters per token ratio (default: 4) **Returns:** `boolean` - True if content exceeds token limit ### Utilities #### `generateCompactFilename(toolName, args, abbreviations?)` Generate compact, deterministic filenames for tool output. ```typescript import { generateCompactFilename } from '@anyshift/mcp-tools-common'; const filename = generateCompactFilename( 'search_users', { query: 'active users', limit: 10 }, { 'search_users': 'srch' } ); // Returns: 'srch-20250115-1430-a3f9.json' // Format: {abbrev}-{YYYYMMDD}-{HHMM}-{hash}.json ``` ## Configuration Best Practices ### Environment Variables Pattern ```typescript // config/toolsCommon.ts - Centralize config mapping import { FileWriterConfig, JqConfig, TruncationConfig } from '@anyshift/mcp-tools-common'; export const fileWriterConfig: FileWriterConfig = { enabled: process.env.WRITE_TO_FILE === 'true', outputPath: process.env.OUTPUT_PATH || './output', minCharsForWrite: Number(process.env.MIN_CHARS_FOR_FILE_WRITE) || 1000, toolAbbreviations: { 'my_tool': 'mt', 'search': 'srch', } }; export const jqConfig: JqConfig = { allowedPaths: [ process.cwd(), process.env.OUTPUT_PATH || './output', ].filter(Boolean), timeoutMs: 30000, }; export const truncationConfig: TruncationConfig = { maxTokens: 15000, // Adjust based on your LLM's context window enableLogging: process.env.TRUNCATION_LOGGING === 'true', charsPerToken: 4, }; // Use in tools: import { fileWriterConfig, jqConfig, truncationConfig } from './config/toolsCommon'; ``` ### Security Considerations 1. **Always use absolute paths** for `allowedPaths` - relative paths can be ambiguous 2. **Validate user input** - especially for file paths and jq queries 3. **Set reasonable timeouts** - default 30s prevents hanging queries 4. **Limit file access** - only allow necessary directories in `allowedPaths` 5. **JQ query sanitization is automatic** - blocks `$ENV` and `env` access ### Performance Tips 1. **Set appropriate `minCharsForWrite`** - balance between inline convenience and context limits 2. **Use tool abbreviations** - keeps filenames short and readable 3. **Consider pagination** - disable pagination when file writing is enabled 4. **Cache resolved paths** - validate allowedPaths at startup, not per-request 5. **Configure truncation limits** - set `maxTokens` based on your LLM's context window (e.g., 10k for Datadog, 15k for Anyshift) 6. **Enable truncation logging selectively** - use `enableLogging: true` in development/debugging, disable in production ## TypeScript Types All major types are exported: ```typescript import type { JqConfig, FileWriterConfig, FileWriterResult, JsonSchema, NullableFields, TruncationConfig, } from '@anyshift/mcp-tools-common'; ``` ## Examples ### Complete MCP Server with Both Features ```typescript import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; import { createJqTool, createFileWriter } from '@anyshift/mcp-tools-common'; const server = new McpServer({ name: 'example-server', version: '1.0.0' }); // Setup const outputPath = process.env.OUTPUT_PATH || './output'; const fileWriter = createFileWriter({ enabled: process.env.WRITE_TO_FILE === 'true', outputPath, minCharsForWrite: 1000, toolAbbreviations: { 'fetch_data': 'data', 'search': 'srch' } }); const jqTool = createJqTool({ allowedPaths: [process.cwd(), outputPath], timeoutMs: 30000, }); // Register JQ tool server.tool( jqTool.toolDefinition.name, jqTool.toolDefinition.description, jqTool.toolDefinition.inputSchema, async (args) => jqTool.handler({ params: { arguments: args } }) ); // Register custom tool with file writing server.tool('fetch_data', 'Fetch large datasets', { query: { type: 'string' } }, async ({ query }) => { const data = await fetchLargeDataset(query); const response = { content: [{ type: 'text', text: JSON.stringify(data, null, 2) }], _rawText: JSON.stringify(data, null, 2) }; return await fileWriter.handleResponse('fetch_data', { query }, response); }); server.connect(); ``` ## Migrating from Inline Implementations If you have existing JQ or file writing code in your MCP server: 1. **Install the package:** `npm install @anyshift/mcp-tools-common` 2. **Create config adapter:** Map your env vars to `JqConfig` and `FileWriterConfig` 3. **Replace tool implementations:** Use `executeJqQuery()` or `createJqTool()` 4. **Wrap tool responses:** Replace inline file writing with `handleToolResponse()` 5. **Remove duplicate code:** Delete old implementations and helpers 6. **Test thoroughly:** Verify file writing thresholds and JQ queries work See the [Anyshift MCP Server](https://github.com/anyshift/anyshift-mcp-server) for a real-world migration example. ## Requirements - Node.js >= 18.0.0 - `jq` command-line tool installed on system (for JQ functionality) ### Installing jq ```bash # macOS brew install jq # Ubuntu/Debian apt-get install jq # Windows (via Chocolatey) choco install jq # Or download from https://jqlang.github.io/jq/download/ ``` ## License MIT ## Contributing Issues and pull requests welcome! This library is designed to be MCP-server agnostic and should work with any MCP implementation. ## Related - [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - [MCP TypeScript SDK](https://github.com/modelcontextprotocol/typescript-sdk) - [jq Manual](https://jqlang.github.io/jq/manual/)