@anyshift/mcp-tools-common
Version:
Reusable JQ tool and file writing utilities for MCP servers
626 lines (478 loc) • 18.4 kB
Markdown
# @anyshift/mcp-tools-common
Reusable utilities for building MCP (Model Context Protocol) servers. Provides production-ready tools for JSON processing and intelligent response handling.
## What's Included
### 🔧 JQ Query Tool
Execute [jq](https://jqlang.github.io/jq/) queries on JSON files with AI-optimized error messages and schema hints.
**Features:**
- Sandboxed jq execution with timeout protection
- Path validation for security (no arbitrary file access)
- Query sanitization (blocks environment variable access)
- Schema-aware error messages that help LLMs write better queries
- Comprehensive retry strategies in tool descriptions
### 📄 Smart File Writer
Automatically write large tool responses to files instead of returning them inline.
**Features:**
- Threshold-based file writing (configurable character limit)
- JSON schema analysis and quick reference generation
- Compact, timestamped filenames with tool abbreviations
- Nullable field detection for better JQ query guidance
- Returns file references with schema hints instead of massive text
### 🔍 JSON Schema Analyzer
Deep schema analysis for JSON data structures.
**Features:**
- Detects numeric string keys (common in Cypher results: `{"0": {...}, "1": {...}}`)
- Identifies nullable vs always-null fields
- Handles mixed-type arrays, nested objects, and complex structures
- Generates LLM-friendly access patterns and hints
### 🛡️ Path Validation
Secure file path validation with allowlist support.
**Features:**
- Requires absolute paths (prevents relative path ambiguity)
- Validates files exist and are within allowed directories
- Resolves symlinks for security
- Clear error messages with examples
### ✂️ Response Truncation
Automatic token-based response size limiting to prevent LLM context overflow.
**Features:**
- Token estimation using configurable chars/token ratio
- Configurable token limits (e.g., 10k for Datadog, 15k for Anyshift)
- Optional JSON logging to stderr for monitoring
- Customizable truncation notice messages
- Preserves as much content as possible before truncating
## Installation
```bash
npm install @anyshift/mcp-tools-common
# or
pnpm add @anyshift/mcp-tools-common
```
## Integration Guide
### Option 1: Using High-Level Factory Functions (Recommended)
The easiest way to integrate - use `createJqTool()` and `createFileWriter()`:
```typescript
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { createJqTool, createFileWriter } from '@anyshift/mcp-tools-common';
const server = new McpServer({
name: 'my-mcp-server',
version: '1.0.0',
});
// 1. Configure File Writer
const fileWriter = createFileWriter({
enabled: process.env.WRITE_TO_FILE === 'true',
outputPath: process.env.OUTPUT_PATH || './output',
minCharsForWrite: 1000, // Write to file if response > 1000 chars
toolAbbreviations: {
'my_data_tool': 'data',
'my_search_tool': 'srch',
}
});
// 2. Configure JQ Tool
const jqTool = createJqTool({
allowedPaths: [
process.cwd(), // Allow files in execution directory
process.env.OUTPUT_PATH || './output', // Allow files in output directory
],
timeoutMs: 30000, // 30 second timeout
});
// 3. Configure Response Truncation
import { truncateResponseIfNeeded, type TruncationConfig } from '@anyshift/mcp-tools-common';
const truncationConfig: TruncationConfig = {
maxTokens: 15000, // Max tokens before truncation
enableLogging: false, // Optional: log truncation events to stderr
charsPerToken: 4, // Token estimation ratio (default: 4)
};
// 4. Register JQ Tool with MCP Server
server.tool(
jqTool.toolDefinition.name,
jqTool.toolDefinition.description,
jqTool.toolDefinition.inputSchema,
async (args) => {
return await jqTool.handler({ params: { arguments: args } });
}
);
// 5. Wrap Your Other Tools with File Writer and Truncation
server.tool(
'my_data_tool',
'Fetch large datasets',
{ query: { type: 'string' } },
async ({ query }) => {
const result = await fetchData(query);
// handleResponse will:
// - Return result inline if small
// - Write to file and return file reference if large
let response = await fileWriter.handleResponse(
'my_data_tool',
{ query },
result
);
// Apply truncation to prevent context overflow
if (response.content?.[0]?.text) {
response.content[0].text = truncateResponseIfNeeded(
truncationConfig,
response.content[0].text
);
}
return response;
}
);
```
### Option 2: Using Individual Functions (Advanced)
For more control, import and use functions directly:
```typescript
import {
executeJqQuery,
handleToolResponse,
analyzeJsonSchema,
generateCompactFilename,
validatePathWithinAllowedDirs,
type JqConfig,
type FileWriterConfig
} from '@anyshift/mcp-tools-common';
// Define your configs
const jqConfig: JqConfig = {
allowedPaths: ['/path/to/data'],
timeoutMs: 30000,
};
const fileWriterConfig: FileWriterConfig = {
enabled: true,
outputPath: '/tmp/output',
minCharsForWrite: 500,
toolAbbreviations: { 'my_tool': 'mt' }
};
// Use functions directly
server.tool('execute_jq_query', 'Run jq queries', schema, async ({ jq_query, file_path }) => {
return await executeJqQuery(jqConfig, jq_query, file_path);
});
server.tool('my_tool', 'Example tool', schema, async (args) => {
const response = { content: [{ type: 'text', text: 'Large data...' }] };
return await handleToolResponse(fileWriterConfig, 'my_tool', args, response);
});
```
### Option 3: Zero Dependencies on Environment Variables
Pass configuration explicitly for maximum flexibility:
```typescript
import { executeJqQuery } from '@anyshift/mcp-tools-common';
// No environment variables - pure configuration
const result = await executeJqQuery(
{
allowedPaths: ['/specific/path'],
timeoutMs: 10000,
},
'.data[] | select(.active == true)',
'/specific/path/data.json'
);
```
## API Reference
### JQ Tool
#### `executeJqQuery(config, jqQuery, filePath)`
Execute a jq query on a JSON file.
```typescript
import { executeJqQuery, type JqConfig } from '@anyshift/mcp-tools-common';
const config: JqConfig = {
allowedPaths: ['/data'],
timeoutMs: 30000,
};
const result = await executeJqQuery(
config,
'.users[] | select(.age > 18)',
'/data/users.json'
);
// Returns: { content: [{ type: 'text', text: '...' }] }
```
**Parameters:**
- `config: JqConfig` - Configuration object
- `allowedPaths: string[]` - Absolute paths where files can be accessed
- `timeoutMs: number` - Maximum execution time in milliseconds
- `jqQuery: string` - The jq query to execute (will be sanitized)
- `filePath: string` - Absolute path to JSON file (must be in allowedPaths)
**Returns:** `Promise<{ content: Array<{ type: 'text'; text: string }> }>`
#### `createJqTool(config)`
Create a JQ tool with handler and definition.
```typescript
const jqTool = createJqTool({ allowedPaths: ['/data'], timeoutMs: 30000 });
// Use with MCP SDK
server.tool(
jqTool.toolDefinition.name,
jqTool.toolDefinition.description,
jqTool.toolDefinition.inputSchema,
async (args) => jqTool.handler({ params: { arguments: args } })
);
```
### File Writer
#### `handleToolResponse(config, toolName, args, responseData)`
Intelligently handle tool responses - write to file if large, return inline if small.
```typescript
import { handleToolResponse, type FileWriterConfig } from '@anyshift/mcp-tools-common';
const config: FileWriterConfig = {
enabled: true,
outputPath: '/tmp/output',
minCharsForWrite: 1000,
toolAbbreviations: { 'search_data': 'srch' }
};
const response = {
content: [{ type: 'text', text: 'Very large dataset...' }],
_rawText: 'Very large dataset...' // Optional: for better file writing
};
const result = await handleToolResponse(
config,
'search_data',
{ query: 'users' },
response
);
// If large: { content: [{ type: 'text', text: '📄 File: /tmp/output/srch-20250115-1234-a1b2.json\n...' }] }
// If small: returns response as-is
```
**Parameters:**
- `config: FileWriterConfig` - Configuration object
- `enabled: boolean` - Enable file writing
- `outputPath: string` - Directory for output files
- `minCharsForWrite: number` - Minimum characters to trigger file write
- `toolAbbreviations: Record<string, string>` - Tool name abbreviations for filenames
- `toolName: string` - Name of the tool (for filename generation)
- `args: Record<string, unknown>` - Tool arguments (for filename hash)
- `responseData: unknown` - The response to potentially write to file
**Returns:** `Promise<FileWriterResult | unknown>` - Either file reference or original response
#### `createFileWriter(config)`
Create a file writer instance.
```typescript
const fileWriter = createFileWriter({
enabled: true,
outputPath: './output',
minCharsForWrite: 1000,
toolAbbreviations: { 'my_tool': 'mt' }
});
const result = await fileWriter.handleResponse('my_tool', { arg: 'value' }, response);
```
### Schema Analysis
#### `analyzeJsonSchema(data)`
Analyze JSON structure and generate schema with LLM-friendly hints.
```typescript
import { analyzeJsonSchema } from '@anyshift/mcp-tools-common';
const data = {
"0": { Values: [1, "Alice", null], Keys: ["id", "name", "email"] },
"1": { Values: [2, "Bob", "bob@example.com"], Keys: ["id", "name", "email"] }
};
const schema = analyzeJsonSchema(data);
console.log(schema);
// {
// type: 'object',
// _keysAreNumeric: true,
// _accessPattern: 'Use .["0"] not .[0]',
// properties: { ... }
// }
```
**Returns:** `JsonSchema` with special fields:
- `_keysAreNumeric: boolean` - Object has numeric string keys (e.g., Cypher results)
- `_accessPattern: string` - Hint for accessing data
- `_hasNulls: boolean` - Contains null values (suggests filtering)
- `_hint: string` - Suggestion for handling data
#### `extractNullableFields(schema)`
Extract nullable field information from schema.
```typescript
import { analyzeJsonSchema, extractNullableFields } from '@anyshift/mcp-tools-common';
const schema = analyzeJsonSchema(data);
const nullFields = extractNullableFields(schema);
// {
// alwaysNull: ['email'], // Fields that are always null
// nullable: ['middleName'] // Fields that are sometimes null
// }
```
### Path Validation
#### `validatePathWithinAllowedDirs(filePath, allowedPaths)`
Validate that a file path is within allowed directories.
```typescript
import { validatePathWithinAllowedDirs } from '@anyshift/mcp-tools-common';
try {
const realPath = validatePathWithinAllowedDirs(
'/data/users.json',
['/data', '/tmp']
);
console.log('Access granted:', realPath);
} catch (error) {
console.error('Access denied:', error.message);
}
```
**Throws:** Error if:
- Path is not absolute
- File doesn't exist
- File is outside allowed directories
### Response Truncation
#### `truncateResponseIfNeeded(config, content)`
Truncate response content if it exceeds the configured token limit.
```typescript
import { truncateResponseIfNeeded, type TruncationConfig } from '@anyshift/mcp-tools-common';
const config: TruncationConfig = {
maxTokens: 15000,
enableLogging: false, // Set to true for JSON logs to stderr
charsPerToken: 4, // Default: 4 chars per token
};
const content = 'Very large response text...';
const truncated = truncateResponseIfNeeded(config, content);
// Returns original if under limit, or truncated with notice if over
```
**Parameters:**
- `config: TruncationConfig` - Configuration object
- `maxTokens: number` - Maximum allowed tokens (e.g., 10000, 15000)
- `enableLogging?: boolean` - Log truncation events to stderr as JSON (default: false)
- `messagePrefix?: string` - Custom prefix for truncation notice (default: "RESPONSE TRUNCATED")
- `charsPerToken?: number` - Characters per token ratio (default: 4)
- `content: string` - The content to potentially truncate
**Returns:** `string` - Original content if under limit, or truncated content with notice
#### `estimateTokens(text, charsPerToken?)`
Estimate token count using chars/token ratio.
```typescript
import { estimateTokens } from '@anyshift/mcp-tools-common';
const tokens = estimateTokens('Hello world', 4);
// Returns: 3 (11 chars / 4 = 2.75, rounded up to 3)
```
**Parameters:**
- `text: string` - Text to estimate tokens for
- `charsPerToken?: number` - Characters per token ratio (default: 4)
**Returns:** `number` - Estimated token count
#### `wouldBeTruncated(content, maxTokens, charsPerToken?)`
Check if content would be truncated without actually truncating.
```typescript
import { wouldBeTruncated } from '@anyshift/mcp-tools-common';
if (wouldBeTruncated(content, 15000)) {
console.log('Content exceeds 15k tokens, will be truncated');
}
```
**Parameters:**
- `content: string` - Content to check
- `maxTokens: number` - Maximum token limit
- `charsPerToken?: number` - Characters per token ratio (default: 4)
**Returns:** `boolean` - True if content exceeds token limit
### Utilities
#### `generateCompactFilename(toolName, args, abbreviations?)`
Generate compact, deterministic filenames for tool output.
```typescript
import { generateCompactFilename } from '@anyshift/mcp-tools-common';
const filename = generateCompactFilename(
'search_users',
{ query: 'active users', limit: 10 },
{ 'search_users': 'srch' }
);
// Returns: 'srch-20250115-1430-a3f9.json'
// Format: {abbrev}-{YYYYMMDD}-{HHMM}-{hash}.json
```
## Configuration Best Practices
### Environment Variables Pattern
```typescript
// config/toolsCommon.ts - Centralize config mapping
import { FileWriterConfig, JqConfig, TruncationConfig } from '@anyshift/mcp-tools-common';
export const fileWriterConfig: FileWriterConfig = {
enabled: process.env.WRITE_TO_FILE === 'true',
outputPath: process.env.OUTPUT_PATH || './output',
minCharsForWrite: Number(process.env.MIN_CHARS_FOR_FILE_WRITE) || 1000,
toolAbbreviations: {
'my_tool': 'mt',
'search': 'srch',
}
};
export const jqConfig: JqConfig = {
allowedPaths: [
process.cwd(),
process.env.OUTPUT_PATH || './output',
].filter(Boolean),
timeoutMs: 30000,
};
export const truncationConfig: TruncationConfig = {
maxTokens: 15000, // Adjust based on your LLM's context window
enableLogging: process.env.TRUNCATION_LOGGING === 'true',
charsPerToken: 4,
};
// Use in tools:
import { fileWriterConfig, jqConfig, truncationConfig } from './config/toolsCommon';
```
### Security Considerations
1. **Always use absolute paths** for `allowedPaths` - relative paths can be ambiguous
2. **Validate user input** - especially for file paths and jq queries
3. **Set reasonable timeouts** - default 30s prevents hanging queries
4. **Limit file access** - only allow necessary directories in `allowedPaths`
5. **JQ query sanitization is automatic** - blocks `$ENV` and `env` access
### Performance Tips
1. **Set appropriate `minCharsForWrite`** - balance between inline convenience and context limits
2. **Use tool abbreviations** - keeps filenames short and readable
3. **Consider pagination** - disable pagination when file writing is enabled
4. **Cache resolved paths** - validate allowedPaths at startup, not per-request
5. **Configure truncation limits** - set `maxTokens` based on your LLM's context window (e.g., 10k for Datadog, 15k for Anyshift)
6. **Enable truncation logging selectively** - use `enableLogging: true` in development/debugging, disable in production
## TypeScript Types
All major types are exported:
```typescript
import type {
JqConfig,
FileWriterConfig,
FileWriterResult,
JsonSchema,
NullableFields,
TruncationConfig,
} from '@anyshift/mcp-tools-common';
```
## Examples
### Complete MCP Server with Both Features
```typescript
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { createJqTool, createFileWriter } from '@anyshift/mcp-tools-common';
const server = new McpServer({ name: 'example-server', version: '1.0.0' });
// Setup
const outputPath = process.env.OUTPUT_PATH || './output';
const fileWriter = createFileWriter({
enabled: process.env.WRITE_TO_FILE === 'true',
outputPath,
minCharsForWrite: 1000,
toolAbbreviations: { 'fetch_data': 'data', 'search': 'srch' }
});
const jqTool = createJqTool({
allowedPaths: [process.cwd(), outputPath],
timeoutMs: 30000,
});
// Register JQ tool
server.tool(
jqTool.toolDefinition.name,
jqTool.toolDefinition.description,
jqTool.toolDefinition.inputSchema,
async (args) => jqTool.handler({ params: { arguments: args } })
);
// Register custom tool with file writing
server.tool('fetch_data', 'Fetch large datasets', {
query: { type: 'string' }
}, async ({ query }) => {
const data = await fetchLargeDataset(query);
const response = {
content: [{ type: 'text', text: JSON.stringify(data, null, 2) }],
_rawText: JSON.stringify(data, null, 2)
};
return await fileWriter.handleResponse('fetch_data', { query }, response);
});
server.connect();
```
## Migrating from Inline Implementations
If you have existing JQ or file writing code in your MCP server:
1. **Install the package:** `npm install @anyshift/mcp-tools-common`
2. **Create config adapter:** Map your env vars to `JqConfig` and `FileWriterConfig`
3. **Replace tool implementations:** Use `executeJqQuery()` or `createJqTool()`
4. **Wrap tool responses:** Replace inline file writing with `handleToolResponse()`
5. **Remove duplicate code:** Delete old implementations and helpers
6. **Test thoroughly:** Verify file writing thresholds and JQ queries work
See the [Anyshift MCP Server](https://github.com/anyshift/anyshift-mcp-server) for a real-world migration example.
## Requirements
- Node.js >= 18.0.0
- `jq` command-line tool installed on system (for JQ functionality)
### Installing jq
```bash
# macOS
brew install jq
# Ubuntu/Debian
apt-get install jq
# Windows (via Chocolatey)
choco install jq
# Or download from https://jqlang.github.io/jq/download/
```
## License
MIT
## Contributing
Issues and pull requests welcome! This library is designed to be MCP-server agnostic and should work with any MCP implementation.
## Related
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
- [MCP TypeScript SDK](https://github.com/modelcontextprotocol/typescript-sdk)
- [jq Manual](https://jqlang.github.io/jq/manual/)