UNPKG

@debugg-ai/debugg-ai-mcp

Version:

Zero-Config, Fully AI-Managed End-to-End Testing for all code gen platforms.

362 lines (265 loc) 24.1 kB
# Changelog All notable changes to the DebuggAI MCP project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [3.5.1] ### Fixed — default OAuth issuer points at the Django AS The default for `DEBUGGAI_OAUTH_ISSUER` (used in the RFC 9728 metadata's `authorization_servers`) was `https://auth.debugg.ai` — which is only the login UI. The actual Authorization Server is Django at `https://api.debugg.ai`, so the default is now `https://api.debugg.ai`. A deployment can still override it via the env var; this just makes a fresh container correct without one. ## [3.5.0] ### Added — Remote transport: Streamable HTTP + OAuth Resource Server (opt-in) The server can now run as a hosted, multi-user remote MCP over **stateless Streamable HTTP**, in addition to the default stdio transport (which is unchanged). Enable with `DEBUGGAI_MCP_TRANSPORT=http` (+ `PORT`, default 3000). As an OAuth **Resource Server** (MCP 2025-06-18): - Each `POST /mcp` request must carry `Authorization: Bearer <token>`; the token is request-scoped (AsyncLocalStorage) and used as the backend credential — `api.debugg.ai` is the validator, so no token-verification keys live here. - Missing/invalid token → `401` with `WWW-Authenticate: Bearer resource_metadata=…`. - Serves RFC 9728 metadata at `/.well-known/oauth-protected-resource` advertising the authorization server (`auth.debugg.ai`), so clients run the OAuth flow and retry with a token. - `GET /health` for load-balancer / ECS health checks. Auth became request-scoped without touching the ~20 backend-client call sites: `config.api.key` resolves the per-request token when set (`utils/requestContext.ts`). Config env: `DEBUGGAI_MCP_TRANSPORT`, `PORT`, `DEBUGGAI_MCP_PUBLIC_URL`, `DEBUGGAI_OAUTH_ISSUER`, `DEBUGGAI_TOKEN_TYPE=bearer`. stdio installs need none of these. ## [3.4.0] ### Added — MCP Resources (browse projects / environments / executions) The server now declares the `resources` capability and exposes the read-only entities as addressable resources, so clients can browse and @-mention them as context instead of only calling tools: - **Collections** (`resources/list`): `debugg-ai://projects`, `debugg-ai://environments`, `debugg-ai://executions` - **Templates** (`resources/templates/list`): `debugg-ai://project/{uuid}`, `debugg-ai://environment/{uuid}`, `debugg-ai://execution/{uuid}` - **`resources/read`** dispatches each URI to the same entity handler the tools use — identical data + auth, no drift — and returns the JSON payload. Additive: clients without resource support keep using the tools unchanged. Implementation in `handlers/resourcesHandler.ts`. ## [3.3.0] ### Added — Run artifacts returned as resource links `check_app_in_browser` and `executions {action:"get"}` now surface execution artifacts — **run recording, HAR, console log** — as MCP [`resource_link`](https://modelcontextprotocol.io/specification/2025-06-18/server/tools) content blocks pointing at their presigned URLs, instead of base64-inlining them. Leaner responses, and the URLs stay renewable / fetchable on demand. The legacy run-recording GIF (previously downloaded and inlined as multi-MB base64) is now a link; the `browserSession` presigned URLs are auto-detected and linked (deduped). Screenshots are **deliberately kept inline** as image blocks so vision-capable clients can still see them — the core visual-verification workflow. Helpers: `resourceLinkBlock` + `artifactResourceLinks` in `utils/imageUtils.ts`. ## [3.2.0] ### Added — Structured tool output (`structuredContent`) Every successful tool result now carries [`structuredContent`](https://modelcontextprotocol.io/specification/2025-06-18/server/tools) — the parsed JSON payload — so clients can consume structured data directly instead of re-parsing the text blob. The text block is kept for back-compat. Promoted centrally in the CallTool path (`withStructuredContent` in `utils/structuredContent.ts`) rather than touching every handler. No-op for errors, non-object payloads, or multi-text results. `outputSchema` is intentionally not declared: the action tools return polymorphic shapes per action, a faithful schema would need top-level `oneOf` (which the Anthropic API rejects), and a permissive schema adds no value. `structuredContent` without a declared schema is spec-valid and is the win. ## [3.1.0] ### Added — Tool annotations (behavioral hints for clients) Every tool now declares MCP [tool annotations](https://modelcontextprotocol.io/specification/2025-06-18/server/tools) so clients can reason about a tool before calling it (e.g. confirm-gate destructive ops, fast-path read-only ones): - `environment`, `test_suite`, `test_case``destructiveHint: true` (expose a delete action) - `executions`, `probe_page``readOnlyHint: true` - `project`, `check_app_in_browser`, `trigger_crawl` → write but non-destructive - all tools → `openWorldHint: true` (they reach the DebuggAI backend / live web) Annotations are advisory; deletes are still enforced server-side via the existing confirmation gate. Presets live in `tools/annotations.ts`. ## [3.0.1] ### Fixed — 5 action tools were invisible in Claude Code (and any Anthropic-API client) 3.0.0's action tools (`project`, `environment`, `test_suite`, `test_case`, `executions`) declared a top-level `oneOf` in their JSON Schema to express per-action required fields. The Anthropic tool `input_schema` does not accept top-level `oneOf`/`anyOf`/`allOf`, so clients **silently dropped all 5 tools** — only the 3 browser tools showed up (the server still advertised all 8). Removed the `oneOf`; per-action required fields remain enforced by the Zod discriminated unions at call time and documented in each tool's description. Added a registry regression test asserting no tool schema uses top-level `oneOf`/`anyOf`/`allOf`. ## [3.0.0] ### Changed — Tool surface consolidated to 8 action-based tools (BREAKING) The 20 per-verb tools were consolidated into **8** tools: three browser tools (`check_app_in_browser`, `probe_page`, `trigger_crawl`) plus one action-based tool per entity — `project`, `environment`, `test_suite`, `test_case`, `executions` — each taking an `action` discriminator. Clients pick up the new surface on MCP restart. Migration (old tool → new tool + action): - `search_projects``project {action:"get"|"list"}` - `create_project``project {action:"create"}` - `search_environments``environment {action:"get"|"list"}` - `create_environment` / `update_environment` / `delete_environment``environment {action:"create"|"update"|"delete"}` - `create_test_suite` / `search_test_suites` / `run_test_suite` / `get_test_suite_results` / `delete_test_suite``test_suite {action:"create"|"list"|"run"|"results"|"delete"}` - `create_test_case` / `update_test_case` / `delete_test_case``test_case {action:"create"|"update"|"delete"}` - `search_executions``executions {action:"get"|"list"}` ### Removed - `update_project` and `delete_project` — rename/delete a project from the DebuggAI web app (both were effectively unused). - `trigger_crawl`'s `headless` parameter — the MCP now always runs headless (no opt-out). ### Added - Destructive `delete` actions require confirmation: an elicitation prompt when the client supports it, otherwise a required `confirm: true` argument. ## [Unreleased] ### Added — E2E test suite management (8 new MCP tools) Eight new tools for building and managing automated E2E test suites directly via MCP: - `create_test_suite` — Create a named test suite for a project - `search_test_suites` — List/search suites for a project with pagination and text filter - `delete_test_suite` — Soft-delete (disable) a test suite - `create_test_case` — Create a test case assigned to a suite (no auto-run) - `update_test_case` — Update a test case's name, description, or agent task description - `delete_test_case` — Soft-delete (disable) a test case - `run_test_suite` — Trigger all test cases in a suite asynchronously - `get_test_suite_results` — Fetch suite with per-test pass/fail outcomes and run history All tools support name-based resolution (projectName, suiteName) with the same case-insensitive exact-match + ambiguity handling as existing tools. All backed by `/api/v1/test-suites/` and `/api/v1/e2e-tests/` endpoints on the DebuggAI backend. 80 new unit + integration tests added. ### Fixed — MCP now validates local reachability BEFORE hitting the backend (fixes 5-min false-pass regression) - `check_app_in_browser` and `trigger_crawl` now do a pre-flight TCP probe to `127.0.0.1:<port>` before provisioning a backend tunnel key. If the dev server isn't listening, we return a structured `LocalServerUnreachable` error in ~ms instead of letting the browser agent burn its 5-minute step budget on `ERR_NGROK_8012`. Bead `1om`. - After the tunnel is established, we do a second `GET /` probe through the tunnel itself and parse the body for `ERR_NGROK_*` markers. If ngrok received traffic but couldn't dial our backend (e.g., the dev server binds to 0.0.0.0/::1 but not 127.0.0.1), we tear down the tunnel, revoke the key, and return `TunnelTrafficBlocked` — again, fast, with a message that points at the actual cause. - End-to-end proof: new eval flow `28-localhost-not-listening.mjs` against a guaranteed-free port, asserts response arrives in <10s with `error:'LocalServerUnreachable'`. Measured **9ms** in practice vs. the prior **5-minute false-pass**. ### Fixed — ngrok now dials IPv4 loopback explicitly (fixes ERR_NGROK_8012 on macOS Next.js) - `ngrok.connect({addr})` now passes `127.0.0.1:<port>` instead of the bare port number for plain-http localhost URLs. Bare port / `localhost` could resolve to IPv6 `[::1]` first on modern macOS, but Next.js / Vite / most Node dev servers bind to `127.0.0.1` only. Result was a successful tunnel that dialed `[::1]:<port>` and got `connection refused`, surfacing to users as `ERR_NGROK_8012` inside the browser agent trace. Bead `fhg`. Evidenced by real incident log 2026-04-24T19:37Z. - Docker (`DOCKER_CONTAINER=true`) and https-localhost paths unchanged. ### Fixed — concurrent callers joining a pending tunnel revoke their redundant key - When caller B's request for a localhost URL arrives while caller A's tunnel for the same port is still provisioning, B used to silently join A's promise and throw away B's own minted ngrok key (and its `revokeKey` callback) — an orphan-key-on-backend leak. B now revokes its redundant key immediately on join. Bead `7qh` finding 2. ### Added — tunnel fault-injection + trace harness for diagnosis - New `DEBUGG_TUNNEL_FAULT_MODE` env var (dev/test only — inert when `NODE_ENV=production`) lets developers force specific ngrok-side failures without mocking, to reproduce client-reported transient "Tunnel setup failed" incidents. Modes: `fail-connect-N:<count>`, `empty-url-N:<count>`, `delay-connect:<ms>`, combinable with commas. Bead `42g`. - Structured `TunnelTrace` captures timestamped lifecycle events per tunnel-create call (start, each connect attempt, fault inject, agent reset, backoff, success/fail). Dumped to WARN logs on any tunnel creation failure so real-world flakes get a post-mortem trail instead of an opaque error message. ### Fixed — tunnel provisioning flakiness surfaces as user-facing errors - `check_app_in_browser` / `trigger_crawl` now automatically retry transient tunnel-provision failures (5xx, 408, 429, network errors like ECONNRESET) with exponential backoff (500ms → 1500ms → 3000ms, 3 attempts). Previously a single ngrok/backend blip forced the caller to manually retry the tool call. Bead `7nx`. - **ngrok.connect() retry widened from 2 to 3 attempts** with 500ms / 1500ms backoff. A client still hit "Tunnel setup failed" after `7nx` shipped — the failure was in the ngrok-listener-bringup path, not the backend-provision path. Auth errors still fail fast. Bead `ixh`. - Tunnel-provision error messages now carry structured diagnostic context — HTTP status, ngrok error code, backend `x-request-id`, retryable flag — so users have something actionable to file bug reports against instead of opaque "Tunnel setup failed". Bead `5wz`. - 4xx auth/quota errors (401/403/404) fail fast without retry to avoid loops against a bad API key. - New posthog telemetry event `tunnel.provision_retry` fires per retry attempt with outcome, status, stage (`ngrok_connect` vs backend-provision), and diagnostic fields so flaky rates become measurable. ## [2.0.0] - 2026-04-23 > **Republish note:** Versions `1.0.64`, `1.0.65`, and `1.0.66` shipped with this > same breaking surface but were incorrectly versioned as patches (CI auto-bumped > patch regardless of commit type). All three are now deprecated on npm; consumers > should upgrade to `^2.0.0`. The underlying code in `2.0.0` is functionally > identical to `1.0.66`. This is a **breaking release**. The MCP surface collapsed from 22 tools to 11 through a uniform `search_*` pattern plus credential-management consolidation into the environment tools. The full old→new mapping is below. ### ⚠️ BREAKING CHANGES — 14 tools removed, replaced by 11-tool surface | Removed tool | Replacement | |---|---| | `list_projects` | `search_projects({q?, page?, pageSize?})` (filter mode) | | `get_project` | `search_projects({uuid})` (uuid mode — returns the curated detail shape) | | `list_environments` | `search_environments({projectUuid?, q?, page?, pageSize?})` — credentials inlined per env | | `get_environment` | `search_environments({uuid, projectUuid})` | | `list_credentials` | `search_environments(...)` — credentials are inlined on each returned env (never include password) | | `get_credential` | `search_environments({uuid, projectUuid})` — pull from the env's `credentials[]` | | `create_credential` | `create_environment({name, url, credentials: [...]})` (seed on env create), or `update_environment({uuid, addCredentials: [...]})` | | `update_credential` | `update_environment({uuid, updateCredentials: [{uuid, ...patch}]})` | | `delete_credential` | `update_environment({uuid, removeCredentialIds: [uuid]})` | | `list_teams` | `create_project({teamName, ...})` — backend name-resolved with exact-match + ambiguity handling | | `list_repos` | `create_project({repoName, ...})` — same pattern | | `list_executions` | `search_executions({status?, projectUuid?, page?, pageSize?})` | | `get_execution` | `search_executions({uuid})` — full detail with `nodeExecutions` + state | | `cancel_execution` | Dropped — backend spin-down is now automatic; no client action needed | All `search_*` tools use a dual-mode signature: pass `{uuid}` for a single-record detail response, or pass filter params for a paginated summary list. 404 from the backend surfaces as `isError: true` with `{error: 'NotFound', message, uuid}`. Credential mutations on `update_environment` execute as `remove → update → add` in a single call, so a freed label can be re-bound in one request. Per-cred failures surface in `credentialWarnings[]` without blocking the env update. ### Added - **`trigger_crawl` tool**: server-side browser-agent crawl to populate the project's knowledge graph. Returns `{executionId, status, targetUrl, durationMs, outcome?, crawlSummary?, knowledgeGraph?}` with `knowledgeGraph.imported` = true on successful KG ingestion. Supports localhost via automatic ngrok tunneling with per-process reuse. - **`create_project` name-based resolution**: pass `teamName` instead of `teamUuid`, or `repoName` instead of `repoUuid`. Backend-side search with case-insensitive exact match. Returns `AmbiguousMatch` with candidates if multiple hits, `NotFound` if none. - **`create_environment` credential seeding**: pass `credentials: [{label, username, password, role?}]` to create creds atomically with the env. - **`update_environment` credential sub-actions**: `addCredentials[]`, `updateCredentials[]`, `removeCredentialIds[]` in one call. - **`engines.node: ">=20.20.0"`** in `package.json`. Driven by `posthog-node@^5.26.0` requiring Node 20.20+. - **Boot-smoke CI** (`.github/workflows/boot-smoke.yml`): matrix `{ubuntu, macos} × {Node 20, 22}` verifies the MCP server boots + completes `tools/list` with published-style spawn. - **Eval runner tag filtering**: `--tag=<name>`, `--skip-tag=<name>`, `--flow=<csv>`; `--list` prints flows + tags. `--tag=fast` runs 12 non-browser flows in ~40s; `--tag=browser` runs heavy flows. - **27 eval flows total** (up from 16 in prior unreleased work). New flows since the last published version: response-structure (20), tunnel reuse (21), long-running check (22), crawl triggers public + localhost + with-project (23/24/26), published-boot-smoke (25), localhost deep-path (27). - **Response sanitization**: `check_app_in_browser` strips ngrok tunnel URLs from the full response including agent-authored `actionTrace[*].intent`. ### Changed - **Deferred API-key validation**: missing `DEBUGGAI_API_KEY` no longer crashes the subprocess at boot (the bug that surfaced in Claude Code as "Failed to reconnect to debugg-ai"). The server starts, `tools/list` succeeds, and the error surfaces only when a tool is actually invoked — as a structured `isError: true` response pointing the caller at the missing env var. - **Boot-time behavior**: `index.ts` no longer calls `resolveProjectContext()` at startup. Project context resolves lazily on first tool call that needs it. - **`services/projectContext.ts`**: promise-dedup pattern replaces the failure-caching singleton. Concurrent callers share one in-flight promise; results cached on success only, so transient network errors don't permanently disable context resolution. - **Pagination mandatory on every list response**: `search_projects` / `search_environments` / `search_executions` accept optional `page` (1-indexed) and `pageSize` (default 20, max 200, oversized clamped). Response shape: `{filter, pageInfo: {page, pageSize, totalCount, totalPages, hasMore}, <items>}`. - **Axios error handling**: handlers map `err.statusCode` (surfaced by the transport's response interceptor) to tool-level `NotFound` errors instead of checking `err.response?.status` which the interceptor strips. ### Fixed - **Progress-notification race** (bead `0bq`) in both `testPageChangesHandler` and `triggerCrawlHandler`: a progress callback firing after the handler resolved could tear down the stdio transport. Circuit breaker suppresses subsequent callbacks after the first throw; terminal-status detection emits the final `progress === total` notification inside `onUpdate` before the poll loop exits. - **"Failed to reconnect to debugg-ai" UX** (bead `cma`): missing API key now surfaces as a per-tool-call error instead of a silent subprocess exit at boot. MCP clients see the server register normally and get a readable error only when a tool is actually invoked. - **Credential role filter** (bead `hpo`): backend `?role=` filter on credentials list was returning all creds regardless. MCP now applies client-side role filtering as defense-in-depth. ### Security invariants - Passwords are write-only. No response body from any tool contains a password (verified by unit tests + eval flows 06/10/12/15). - Tunnel URLs (`*.ngrok.debugg.ai`) are stripped from all `check_app_in_browser` responses including agent-authored text (verified by flow 05). - 404s from the backend surface as `isError: true` with structured `{error: 'NotFound', ...}`, never as thrown exceptions. ### Tool count The server registers **11** tools (was 22 pre-collapse, 18 in the previous unreleased snapshot). Verified by eval flow `01-protocol.mjs` which locks the roster. ## [1.0.15] - 2025-08-18 ### Added - **Live Session Monitoring Tools**: Added 5 new MCP tools for real-time browser session monitoring - `debugg_ai_start_live_session`: Launch live remote browser sessions with real-time monitoring - `debugg_ai_stop_live_session`: Stop active live sessions - `debugg_ai_get_live_session_status`: Monitor session status and health - `debugg_ai_get_live_session_logs`: Retrieve console logs and network requests from live sessions - `debugg_ai_get_live_session_screenshot`: Capture screenshots from active sessions - **Enhanced Tunnel Management**: Complete rewrite of tunnel infrastructure with improved ngrok integration - New `TunnelManager` service for high-level tunnel abstraction - Automatic localhost URL detection and tunnel creation - Better error handling and connection stability - Integrated tunnel support in live session handlers - **Browser Sessions Service**: New dedicated service for managing browser automation sessions - **Comprehensive Test Infrastructure**: Added extensive test suite covering unit, integration, and end-to-end scenarios - Handler tests for E2E suites and live sessions - Backend services integration tests - Network and MCP tools validation tests - Mock infrastructure for reliable testing - **Enhanced Project Analysis**: New utilities for analyzing codebases and extracting context - **Improved Error Handling**: Centralized error management with structured error types - **URL Parser Utilities**: Robust URL parsing and localhost detection capabilities - **Configuration Management**: Centralized configuration system with environment-based settings - **API Specification**: Complete OpenAPI specification for backend integration - **GitHub Actions Workflows**: Automated publishing, version bumping, and validation workflows ### Changed - **Major Architecture Refactoring**: Reorganized services, handlers, and utilities into cleaner modular structure - **Moved Tunnel Services**: Relocated tunnel management from `tunnels/` to `services/ngrok/` for better organization - **Enhanced E2E Runner**: Improved test execution with better progress tracking and error handling - **Updated Package Dependencies**: Upgraded to latest versions of core dependencies including MCP SDK - **Improved Documentation**: Updated README with comprehensive setup and usage instructions - **Enhanced Type Definitions**: Expanded type system with better validation schemas ### Fixed - **API Endpoint Updates**: Resolved compatibility issues with backend API changes - **Image Support Improvements**: Enhanced handling of screenshots and visual test artifacts - **Tunnel Connection Stability**: Fixed issues with ngrok tunnel reliability and reconnection - **ES Module Compatibility**: Resolved module resolution issues for better Node.js compatibility ### Security - **License Addition**: Added Apache 2.0 license for proper open source compliance - **Environment Variable Validation**: Enhanced validation of sensitive configuration data ## [1.0.14] - 2025-06-09 ### Added - Final screen shot included. ## [1.0.12] - 2025-06-02 ### Added - Readme docs issue ## [1.0.11] - 2025-06-02 ### Added - New readme with instructions on install, usage, etc. ## [1.0.10] - 2025-05-29 ### Fixed - Most MCP clients still don't support images. removed that as a response. ## [1.0.7] - 2025-05-29 ### Fixed - Fixed tunneling issues - Remove notifications when a token is not provided in the original request ## [1.0.2] - 2025-05-28 ### Fixed - Fixed ES module path resolution issues - Added proper shebang line to executable files - Ensured executable permissions are set during build ### Added - Docker container support - Improved error handling for E2E test runs ## [1.0.1] - 2025-05-28 ### Fixed - Fixed TypeScript configuration to target ES2022 - Resolved dependency issues with Zod library ### Added - Initial implementation of E2E test runner - Integration with DebuggAI server client ## [1.0.0] - 2025-05-28 ### Added - Initial release of DebuggAI MCP - Support for running UI tests via MCP protocol - Integration with ngrok for tunnel creation - Basic test reporting functionality