xc-mcp

Version:

MCP server that wraps Xcode command-line tools for iOS/macOS development workflows

57 lines • 9.42 kB

TypeScript

import { ScreenshotSize } from '../../utils/screenshot-sizing.js'; /** * Capture screenshot and return as inline base64-encoded data for direct response transmission * * **Full documentation:** See simctl/screenshot-inline.md for detailed parameters and examples */ interface ScreenshotInlineToolArgs { udid?: string; size?: ScreenshotSize; appName?: string; screenName?: string; state?: string; enableCoordinateCaching?: boolean; } /** * Capture screenshot and return as optimized base64 image data (inline) * * Examples: * - Simple screenshot: udid: "device-123" (defaults to 256×512, 170 tokens) * - Full size: udid: "device-123", size: "full" (native resolution, 340 tokens) * - Quarter size: udid: "device-123", size: "quarter" (128×256, 170 tokens) * - Semantic naming: udid: "device-123", appName: "MyApp", screenName: "LoginScreen", state: "Empty" * * Screenshot size optimization (default: 'half' for 50% token savings): * - half: 256×512 pixels, 1 tile, 170 tokens (DEFAULT) * - full: Native resolution, 2 tiles, 340 tokens * - quarter: 128×256 pixels, 1 tile, 170 tokens * - thumb: 128×128 pixels, 1 tile, 170 tokens * * The tool automatically optimizes the screenshot: * - Resizes to tile-aligned dimensions (default: 256×512) * - Converts to WebP format for best compression (60% quality) * - Falls back to JPEG if WebP unavailable * - Returns base64-encoded data inline in response * * LLM Optimization: * For semantic naming, provide appName, screenName, and state to help agents * understand which screen was captured and track state progression. */ export declare function simctlScreenshotInlineTool(args: ScreenshotInlineToolArgs): Promise<{ content: ({ type: "image"; data: string; mimeType: string; text?: undefined; } | { type: "text"; text: string; data?: undefined; mimeType?: undefined; })[]; isError: boolean; }>; export declare const SIMCTL_SCREENSHOT_INLINE_DOCS = "\n# simctl-screenshot-inline\n\nCapture optimized screenshots with inline base64 encoding for direct MCP response transmission.\n\n## What it does\n\nCaptures simulator screenshots and returns them as base64-encoded images directly in the\nMCP response. Automatically optimizes images for token efficiency with tile-aligned resizing\nand WebP/JPEG compression. Includes interactive element detection and coordinate transforms.\n\n## Parameters\n\n- **udid** (string, optional): Simulator UDID (auto-detects booted device if omitted)\n- **size** (string, optional): Screenshot size - half, full, quarter, thumb (default: half)\n- **appName** (string, optional): App name for semantic context\n- **screenName** (string, optional): Screen/view name for semantic context\n- **state** (string, optional): UI state for semantic context\n- **enableCoordinateCaching** (boolean, optional): Enable view fingerprinting for coordinate caching\n\n## Screenshot Size Optimization\n\nAutomatically optimizes screenshots for token efficiency:\n\n- **half** (default): 256\u00D7512 pixels, 1 tile, ~170 tokens (50% savings)\n- **full**: Native resolution, 2 tiles, ~340 tokens\n- **quarter**: 128\u00D7256 pixels, 1 tile, ~170 tokens\n- **thumb**: 128\u00D7128 pixels, 1 tile, ~170 tokens\n\n## Automatic Optimization Process\n\n1. **Capture**: Screenshot taken at native resolution\n2. **Resize**: Automatically resized to tile-aligned dimensions (unless size='full')\n3. **Compress**: Converted to WebP format at 60% quality (falls back to JPEG if unavailable)\n4. **Encode**: Base64-encoded for inline MCP response transmission\n5. **Extract**: Interactive elements detected from accessibility tree\n6. **Transform**: Coordinate mapping provided for resized screenshots\n\n## Returns\n\nMCP response with:\n- Base64-encoded optimized image (inline)\n- Screenshot optimization metadata (dimensions, tokens, savings)\n- Interactive elements with coordinates and properties\n- Coordinate transform for mapping screenshot to device coordinates\n- View fingerprint (if enableCoordinateCaching is true)\n- Semantic metadata (if provided)\n\n## Examples\n\n### Simple optimized screenshot (256\u00D7512)\n```typescript\nawait simctlScreenshotInlineTool({\n udid: 'device-123'\n})\n```\n\n### Full resolution screenshot\n```typescript\nawait simctlScreenshotInlineTool({\n udid: 'device-123',\n size: 'full'\n})\n```\n\n### Screenshot with semantic context\n```typescript\nawait simctlScreenshotInlineTool({\n udid: 'device-123',\n appName: 'MyApp',\n screenName: 'LoginScreen',\n state: 'Empty'\n})\n```\n\n### Screenshot with coordinate caching enabled\n```typescript\nawait simctlScreenshotInlineTool({\n udid: 'device-123',\n enableCoordinateCaching: true\n})\n```\n\n## Interactive Element Detection\n\nAutomatically extracts interactive elements from the accessibility tree:\n- Element type (Button, TextField, etc.)\n- Label and identifier\n- Bounds (x, y, width, height)\n- Tappability status\n\nLimited to top 20 elements to avoid token overflow. Elements are filtered to only\ninclude those with bounds and hittable status.\n\n## Coordinate Transform\n\nWhen screenshots are resized (size \u2260 'full'), provides automatic coordinate transformation:\n\n### Automatic Transformation (Recommended for Agents)\n\nUse the **coordinateTransformHelper** field in the response with **idb-ui-tap**:\n1. Identify element coordinates visually from the screenshot\n2. Call idb-ui-tap with **applyScreenshotScale: true** plus scale factors\n3. The tool automatically transforms screenshot coordinates to device coordinates\n\nExample:\n```\nidb-ui-tap {\n x: 256, // Screenshot coordinate\n y: 512, // Screenshot coordinate\n applyScreenshotScale: true,\n screenshotScaleX: 1.67,\n screenshotScaleY: 1.66\n}\n// Tool automatically calculates: deviceX = 256 * 1.67, deviceY = 512 * 1.66\n```\n\n### Manual Transformation (For Reference)\n\nIf not using automatic transformation:\n- **scaleX**: Multiply screenshot X coordinates by this to get device coordinates\n- **scaleY**: Multiply screenshot Y coordinates by this to get device coordinates\n- **coordinateTransform.guidance**: Human-readable instructions\n\n**Important**: Most agents should use the automatic transformation via idb-ui-tap's applyScreenshotScale parameter. Manual calculation is provided for reference only.\n\n## View Fingerprinting (Opt-in)\n\nWhen enableCoordinateCaching is true, computes a structural hash of the view:\n- **elementStructureHash**: SHA-256 hash of element hierarchy\n- **cacheable**: Whether view is stable enough to cache coordinates\n- **elementCount**: Number of elements in hierarchy\n- **orientation**: Device orientation\n\nExcludes loading states, animations, and dynamic content from caching.\n\n## Common Use Cases\n\n1. **Visual analysis**: LLM-based screenshot analysis with token optimization\n2. **UI automation**: Detect interactive elements and get tap coordinates\n3. **Bug reporting**: Capture and transmit screenshots inline\n4. **Test documentation**: Screenshot with semantic context for test tracking\n5. **Coordinate caching**: Store element coordinates for repeated interactions\n\n## Token Efficiency\n\nScreenshots are optimized for minimal token usage:\n- **Default (half)**: ~170 tokens (50% savings vs full)\n- **Full**: ~340 tokens (native resolution)\n- **Quarter**: ~170 tokens (75% savings vs full)\n- **Thumb**: ~170 tokens (smallest, for thumbnails)\n\nToken counts are estimates based on Claude's image processing (170 tokens per 512\u00D7512 tile).\n\n## Important Notes\n\n- **Auto-detection**: If udid is omitted, uses the currently booted device\n- **Temp files**: Uses temp directory for processing, auto-cleans up\n- **WebP fallback**: Attempts WebP compression, falls back to JPEG if unavailable\n- **Element extraction**: Requires app to be running with accessibility enabled\n- **Coordinate accuracy**: Transform provides pixel-perfect coordinate mapping\n\n## Error Handling\n\n- **Simulator not found**: Validates simulator exists in cache\n- **Simulator not booted**: Indicates simulator must be booted first\n- **Capture failure**: Reports if screenshot capture fails\n- **Optimization failure**: Falls back to original if optimization fails\n- **Element extraction**: Gracefully degrades if accessibility is unavailable\n\n## Next Steps After Screenshot\n\n1. **Analyze visually**: LLM processes inline image for visual analysis\n2. **Interact with elements**: Use coordinates from interactiveElements\n3. **Tap elements**: Apply coordinate transform if resized, then use simctl-tap\n4. **Query specific elements**: Use simctl-query-ui for targeted element discovery\n5. **Cache coordinates**: Store fingerprint for reuse on identical views\n\n## Comparison with simctl-io\n\n| Feature | screenshot-inline | simctl-io |\n|---------|------------------|-----------|\n| Returns | Base64 inline | File path |\n| Optimization | Automatic | Manual |\n| Elements | Auto-detected | Not included |\n| Transform | Included | Included |\n| Use case | MCP responses | File storage |\n| Token usage | Optimized | Depends on size |\n"; export declare const SIMCTL_SCREENSHOT_INLINE_DOCS_MINI = "Capture screenshot with base64 encoding. Use rtfm({ toolName: \"screenshot\" }) for docs."; export {}; //# sourceMappingURL=screenshot-inline.d.ts.map