@debugg-ai/debugg-ai-mcp
Version:
Zero-Config, Fully AI-Managed End-to-End Testing for all code gen platforms.
362 lines (265 loc) • 24.1 kB
Markdown
# Changelog
All notable changes to the DebuggAI MCP project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [3.5.1]
### Fixed — default OAuth issuer points at the Django AS
The default for `DEBUGGAI_OAUTH_ISSUER` (used in the RFC 9728 metadata's
`authorization_servers`) was `https://auth.debugg.ai` — which is only the login
UI. The actual Authorization Server is Django at `https://api.debugg.ai`, so the
default is now `https://api.debugg.ai`. A deployment can still override it via the
env var; this just makes a fresh container correct without one.
## [3.5.0]
### Added — Remote transport: Streamable HTTP + OAuth Resource Server (opt-in)
The server can now run as a hosted, multi-user remote MCP over **stateless
Streamable HTTP**, in addition to the default stdio transport (which is
unchanged). Enable with `DEBUGGAI_MCP_TRANSPORT=http` (+ `PORT`, default 3000).
As an OAuth **Resource Server** (MCP 2025-06-18):
- Each `POST /mcp` request must carry `Authorization: Bearer <token>`; the token
is request-scoped (AsyncLocalStorage) and used as the backend credential —
`api.debugg.ai` is the validator, so no token-verification keys live here.
- Missing/invalid token → `401` with `WWW-Authenticate: Bearer resource_metadata=…`.
- Serves RFC 9728 metadata at `/.well-known/oauth-protected-resource` advertising
the authorization server (`auth.debugg.ai`), so clients run the OAuth flow and
retry with a token.
- `GET /health` for load-balancer / ECS health checks.
Auth became request-scoped without touching the ~20 backend-client call sites:
`config.api.key` resolves the per-request token when set (`utils/requestContext.ts`).
Config env: `DEBUGGAI_MCP_TRANSPORT`, `PORT`, `DEBUGGAI_MCP_PUBLIC_URL`,
`DEBUGGAI_OAUTH_ISSUER`, `DEBUGGAI_TOKEN_TYPE=bearer`. stdio installs need none of these.
## [3.4.0]
### Added — MCP Resources (browse projects / environments / executions)
The server now declares the `resources` capability and exposes the read-only
entities as addressable resources, so clients can browse and @-mention them as
context instead of only calling tools:
- **Collections** (`resources/list`): `debugg-ai://projects`, `debugg-ai://environments`, `debugg-ai://executions`
- **Templates** (`resources/templates/list`): `debugg-ai://project/{uuid}`, `debugg-ai://environment/{uuid}`, `debugg-ai://execution/{uuid}`
- **`resources/read`** dispatches each URI to the same entity handler the tools
use — identical data + auth, no drift — and returns the JSON payload.
Additive: clients without resource support keep using the tools unchanged.
Implementation in `handlers/resourcesHandler.ts`.
## [3.3.0]
### Added — Run artifacts returned as resource links
`check_app_in_browser` and `executions {action:"get"}` now surface execution
artifacts — **run recording, HAR, console log** — as MCP
[`resource_link`](https://modelcontextprotocol.io/specification/2025-06-18/server/tools)
content blocks pointing at their presigned URLs, instead of base64-inlining them.
Leaner responses, and the URLs stay renewable / fetchable on demand. The legacy
run-recording GIF (previously downloaded and inlined as multi-MB base64) is now a
link; the `browserSession` presigned URLs are auto-detected and linked
(deduped).
Screenshots are **deliberately kept inline** as image blocks so vision-capable
clients can still see them — the core visual-verification workflow. Helpers:
`resourceLinkBlock` + `artifactResourceLinks` in `utils/imageUtils.ts`.
## [3.2.0]
### Added — Structured tool output (`structuredContent`)
Every successful tool result now carries [`structuredContent`](https://modelcontextprotocol.io/specification/2025-06-18/server/tools)
— the parsed JSON payload — so clients can consume structured data directly
instead of re-parsing the text blob. The text block is kept for back-compat.
Promoted centrally in the CallTool path (`withStructuredContent` in
`utils/structuredContent.ts`) rather than touching every handler. No-op for
errors, non-object payloads, or multi-text results.
`outputSchema` is intentionally not declared: the action tools return
polymorphic shapes per action, a faithful schema would need top-level `oneOf`
(which the Anthropic API rejects), and a permissive schema adds no value.
`structuredContent` without a declared schema is spec-valid and is the win.
## [3.1.0]
### Added — Tool annotations (behavioral hints for clients)
Every tool now declares MCP [tool annotations](https://modelcontextprotocol.io/specification/2025-06-18/server/tools)
so clients can reason about a tool before calling it (e.g. confirm-gate
destructive ops, fast-path read-only ones):
- `environment`, `test_suite`, `test_case` → `destructiveHint: true` (expose a delete action)
- `executions`, `probe_page` → `readOnlyHint: true`
- `project`, `check_app_in_browser`, `trigger_crawl` → write but non-destructive
- all tools → `openWorldHint: true` (they reach the DebuggAI backend / live web)
Annotations are advisory; deletes are still enforced server-side via the existing
confirmation gate. Presets live in `tools/annotations.ts`.
## [3.0.1]
### Fixed — 5 action tools were invisible in Claude Code (and any Anthropic-API client)
3.0.0's action tools (`project`, `environment`, `test_suite`, `test_case`,
`executions`) declared a top-level `oneOf` in their JSON Schema to express
per-action required fields. The Anthropic tool `input_schema` does not accept
top-level `oneOf`/`anyOf`/`allOf`, so clients **silently dropped all 5 tools** —
only the 3 browser tools showed up (the server still advertised all 8). Removed
the `oneOf`; per-action required fields remain enforced by the Zod discriminated
unions at call time and documented in each tool's description. Added a registry
regression test asserting no tool schema uses top-level `oneOf`/`anyOf`/`allOf`.
## [3.0.0]
### Changed — Tool surface consolidated to 8 action-based tools (BREAKING)
The 20 per-verb tools were consolidated into **8** tools: three browser tools
(`check_app_in_browser`, `probe_page`, `trigger_crawl`) plus one action-based
tool per entity — `project`, `environment`, `test_suite`, `test_case`,
`executions` — each taking an `action` discriminator. Clients pick up the new
surface on MCP restart.
Migration (old tool → new tool + action):
- `search_projects` → `project {action:"get"|"list"}`
- `create_project` → `project {action:"create"}`
- `search_environments` → `environment {action:"get"|"list"}`
- `create_environment` / `update_environment` / `delete_environment` → `environment {action:"create"|"update"|"delete"}`
- `create_test_suite` / `search_test_suites` / `run_test_suite` / `get_test_suite_results` / `delete_test_suite` → `test_suite {action:"create"|"list"|"run"|"results"|"delete"}`
- `create_test_case` / `update_test_case` / `delete_test_case` → `test_case {action:"create"|"update"|"delete"}`
- `search_executions` → `executions {action:"get"|"list"}`
### Removed
- `update_project` and `delete_project` — rename/delete a project from the DebuggAI web app (both were effectively unused).
- `trigger_crawl`'s `headless` parameter — the MCP now always runs headless (no opt-out).
### Added
- Destructive `delete` actions require confirmation: an elicitation prompt when the client supports it, otherwise a required `confirm: true` argument.
## [Unreleased]
### Added — E2E test suite management (8 new MCP tools)
Eight new tools for building and managing automated E2E test suites directly via MCP:
- `create_test_suite` — Create a named test suite for a project
- `search_test_suites` — List/search suites for a project with pagination and text filter
- `delete_test_suite` — Soft-delete (disable) a test suite
- `create_test_case` — Create a test case assigned to a suite (no auto-run)
- `update_test_case` — Update a test case's name, description, or agent task description
- `delete_test_case` — Soft-delete (disable) a test case
- `run_test_suite` — Trigger all test cases in a suite asynchronously
- `get_test_suite_results` — Fetch suite with per-test pass/fail outcomes and run history
All tools support name-based resolution (projectName, suiteName) with the same case-insensitive exact-match + ambiguity handling as existing tools. All backed by `/api/v1/test-suites/` and `/api/v1/e2e-tests/` endpoints on the DebuggAI backend. 80 new unit + integration tests added.
### Fixed — MCP now validates local reachability BEFORE hitting the backend (fixes 5-min false-pass regression)
- `check_app_in_browser` and `trigger_crawl` now do a pre-flight TCP probe to `127.0.0.1:<port>` before provisioning a backend tunnel key. If the dev server isn't listening, we return a structured `LocalServerUnreachable` error in ~ms instead of letting the browser agent burn its 5-minute step budget on `ERR_NGROK_8012`. Bead `1om`.
- After the tunnel is established, we do a second `GET /` probe through the tunnel itself and parse the body for `ERR_NGROK_*` markers. If ngrok received traffic but couldn't dial our backend (e.g., the dev server binds to 0.0.0.0/::1 but not 127.0.0.1), we tear down the tunnel, revoke the key, and return `TunnelTrafficBlocked` — again, fast, with a message that points at the actual cause.
- End-to-end proof: new eval flow `28-localhost-not-listening.mjs` against a guaranteed-free port, asserts response arrives in <10s with `error:'LocalServerUnreachable'`. Measured **9ms** in practice vs. the prior **5-minute false-pass**.
### Fixed — ngrok now dials IPv4 loopback explicitly (fixes ERR_NGROK_8012 on macOS Next.js)
- `ngrok.connect({addr})` now passes `127.0.0.1:<port>` instead of the bare port number for plain-http localhost URLs. Bare port / `localhost` could resolve to IPv6 `[::1]` first on modern macOS, but Next.js / Vite / most Node dev servers bind to `127.0.0.1` only. Result was a successful tunnel that dialed `[::1]:<port>` and got `connection refused`, surfacing to users as `ERR_NGROK_8012` inside the browser agent trace. Bead `fhg`. Evidenced by real incident log 2026-04-24T19:37Z.
- Docker (`DOCKER_CONTAINER=true`) and https-localhost paths unchanged.
### Fixed — concurrent callers joining a pending tunnel revoke their redundant key
- When caller B's request for a localhost URL arrives while caller A's tunnel for the same port is still provisioning, B used to silently join A's promise and throw away B's own minted ngrok key (and its `revokeKey` callback) — an orphan-key-on-backend leak. B now revokes its redundant key immediately on join. Bead `7qh` finding 2.
### Added — tunnel fault-injection + trace harness for diagnosis
- New `DEBUGG_TUNNEL_FAULT_MODE` env var (dev/test only — inert when `NODE_ENV=production`) lets developers force specific ngrok-side failures without mocking, to reproduce client-reported transient "Tunnel setup failed" incidents. Modes: `fail-connect-N:<count>`, `empty-url-N:<count>`, `delay-connect:<ms>`, combinable with commas. Bead `42g`.
- Structured `TunnelTrace` captures timestamped lifecycle events per tunnel-create call (start, each connect attempt, fault inject, agent reset, backoff, success/fail). Dumped to WARN logs on any tunnel creation failure so real-world flakes get a post-mortem trail instead of an opaque error message.
### Fixed — tunnel provisioning flakiness surfaces as user-facing errors
- `check_app_in_browser` / `trigger_crawl` now automatically retry transient tunnel-provision failures (5xx, 408, 429, network errors like ECONNRESET) with exponential backoff (500ms → 1500ms → 3000ms, 3 attempts). Previously a single ngrok/backend blip forced the caller to manually retry the tool call. Bead `7nx`.
- **ngrok.connect() retry widened from 2 to 3 attempts** with 500ms / 1500ms backoff. A client still hit "Tunnel setup failed" after `7nx` shipped — the failure was in the ngrok-listener-bringup path, not the backend-provision path. Auth errors still fail fast. Bead `ixh`.
- Tunnel-provision error messages now carry structured diagnostic context — HTTP status, ngrok error code, backend `x-request-id`, retryable flag — so users have something actionable to file bug reports against instead of opaque "Tunnel setup failed". Bead `5wz`.
- 4xx auth/quota errors (401/403/404) fail fast without retry to avoid loops against a bad API key.
- New posthog telemetry event `tunnel.provision_retry` fires per retry attempt with outcome, status, stage (`ngrok_connect` vs backend-provision), and diagnostic fields so flaky rates become measurable.
## [2.0.0] - 2026-04-23
> **Republish note:** Versions `1.0.64`, `1.0.65`, and `1.0.66` shipped with this
> same breaking surface but were incorrectly versioned as patches (CI auto-bumped
> patch regardless of commit type). All three are now deprecated on npm; consumers
> should upgrade to `^2.0.0`. The underlying code in `2.0.0` is functionally
> identical to `1.0.66`.
This is a **breaking release**. The MCP surface collapsed from 22 tools to 11 through a uniform `search_*` pattern plus credential-management consolidation into the environment tools. The full old→new mapping is below.
### ⚠️ BREAKING CHANGES — 14 tools removed, replaced by 11-tool surface
| Removed tool | Replacement |
|---|---|
| `list_projects` | `search_projects({q?, page?, pageSize?})` (filter mode) |
| `get_project` | `search_projects({uuid})` (uuid mode — returns the curated detail shape) |
| `list_environments` | `search_environments({projectUuid?, q?, page?, pageSize?})` — credentials inlined per env |
| `get_environment` | `search_environments({uuid, projectUuid})` |
| `list_credentials` | `search_environments(...)` — credentials are inlined on each returned env (never include password) |
| `get_credential` | `search_environments({uuid, projectUuid})` — pull from the env's `credentials[]` |
| `create_credential` | `create_environment({name, url, credentials: [...]})` (seed on env create), or `update_environment({uuid, addCredentials: [...]})` |
| `update_credential` | `update_environment({uuid, updateCredentials: [{uuid, ...patch}]})` |
| `delete_credential` | `update_environment({uuid, removeCredentialIds: [uuid]})` |
| `list_teams` | `create_project({teamName, ...})` — backend name-resolved with exact-match + ambiguity handling |
| `list_repos` | `create_project({repoName, ...})` — same pattern |
| `list_executions` | `search_executions({status?, projectUuid?, page?, pageSize?})` |
| `get_execution` | `search_executions({uuid})` — full detail with `nodeExecutions` + state |
| `cancel_execution` | Dropped — backend spin-down is now automatic; no client action needed |
All `search_*` tools use a dual-mode signature: pass `{uuid}` for a single-record detail response, or pass filter params for a paginated summary list. 404 from the backend surfaces as `isError: true` with `{error: 'NotFound', message, uuid}`.
Credential mutations on `update_environment` execute as `remove → update → add` in a single call, so a freed label can be re-bound in one request. Per-cred failures surface in `credentialWarnings[]` without blocking the env update.
### Added
- **`trigger_crawl` tool**: server-side browser-agent crawl to populate the project's knowledge graph. Returns `{executionId, status, targetUrl, durationMs, outcome?, crawlSummary?, knowledgeGraph?}` with `knowledgeGraph.imported` = true on successful KG ingestion. Supports localhost via automatic ngrok tunneling with per-process reuse.
- **`create_project` name-based resolution**: pass `teamName` instead of `teamUuid`, or `repoName` instead of `repoUuid`. Backend-side search with case-insensitive exact match. Returns `AmbiguousMatch` with candidates if multiple hits, `NotFound` if none.
- **`create_environment` credential seeding**: pass `credentials: [{label, username, password, role?}]` to create creds atomically with the env.
- **`update_environment` credential sub-actions**: `addCredentials[]`, `updateCredentials[]`, `removeCredentialIds[]` in one call.
- **`engines.node: ">=20.20.0"`** in `package.json`. Driven by `posthog-node@^5.26.0` requiring Node 20.20+.
- **Boot-smoke CI** (`.github/workflows/boot-smoke.yml`): matrix `{ubuntu, macos} × {Node 20, 22}` verifies the MCP server boots + completes `tools/list` with published-style spawn.
- **Eval runner tag filtering**: `--tag=<name>`, `--skip-tag=<name>`, `--flow=<csv>`; `--list` prints flows + tags. `--tag=fast` runs 12 non-browser flows in ~40s; `--tag=browser` runs heavy flows.
- **27 eval flows total** (up from 16 in prior unreleased work). New flows since the last published version: response-structure (20), tunnel reuse (21), long-running check (22), crawl triggers public + localhost + with-project (23/24/26), published-boot-smoke (25), localhost deep-path (27).
- **Response sanitization**: `check_app_in_browser` strips ngrok tunnel URLs from the full response including agent-authored `actionTrace[*].intent`.
### Changed
- **Deferred API-key validation**: missing `DEBUGGAI_API_KEY` no longer crashes the subprocess at boot (the bug that surfaced in Claude Code as "Failed to reconnect to debugg-ai"). The server starts, `tools/list` succeeds, and the error surfaces only when a tool is actually invoked — as a structured `isError: true` response pointing the caller at the missing env var.
- **Boot-time behavior**: `index.ts` no longer calls `resolveProjectContext()` at startup. Project context resolves lazily on first tool call that needs it.
- **`services/projectContext.ts`**: promise-dedup pattern replaces the failure-caching singleton. Concurrent callers share one in-flight promise; results cached on success only, so transient network errors don't permanently disable context resolution.
- **Pagination mandatory on every list response**: `search_projects` / `search_environments` / `search_executions` accept optional `page` (1-indexed) and `pageSize` (default 20, max 200, oversized clamped). Response shape: `{filter, pageInfo: {page, pageSize, totalCount, totalPages, hasMore}, <items>}`.
- **Axios error handling**: handlers map `err.statusCode` (surfaced by the transport's response interceptor) to tool-level `NotFound` errors instead of checking `err.response?.status` which the interceptor strips.
### Fixed
- **Progress-notification race** (bead `0bq`) in both `testPageChangesHandler` and `triggerCrawlHandler`: a progress callback firing after the handler resolved could tear down the stdio transport. Circuit breaker suppresses subsequent callbacks after the first throw; terminal-status detection emits the final `progress === total` notification inside `onUpdate` before the poll loop exits.
- **"Failed to reconnect to debugg-ai" UX** (bead `cma`): missing API key now surfaces as a per-tool-call error instead of a silent subprocess exit at boot. MCP clients see the server register normally and get a readable error only when a tool is actually invoked.
- **Credential role filter** (bead `hpo`): backend `?role=` filter on credentials list was returning all creds regardless. MCP now applies client-side role filtering as defense-in-depth.
### Security invariants
- Passwords are write-only. No response body from any tool contains a password (verified by unit tests + eval flows 06/10/12/15).
- Tunnel URLs (`*.ngrok.debugg.ai`) are stripped from all `check_app_in_browser` responses including agent-authored text (verified by flow 05).
- 404s from the backend surface as `isError: true` with structured `{error: 'NotFound', ...}`, never as thrown exceptions.
### Tool count
The server registers **11** tools (was 22 pre-collapse, 18 in the previous unreleased snapshot). Verified by eval flow `01-protocol.mjs` which locks the roster.
## [1.0.15] - 2025-08-18
### Added
- **Live Session Monitoring Tools**: Added 5 new MCP tools for real-time browser session monitoring
- `debugg_ai_start_live_session`: Launch live remote browser sessions with real-time monitoring
- `debugg_ai_stop_live_session`: Stop active live sessions
- `debugg_ai_get_live_session_status`: Monitor session status and health
- `debugg_ai_get_live_session_logs`: Retrieve console logs and network requests from live sessions
- `debugg_ai_get_live_session_screenshot`: Capture screenshots from active sessions
- **Enhanced Tunnel Management**: Complete rewrite of tunnel infrastructure with improved ngrok integration
- New `TunnelManager` service for high-level tunnel abstraction
- Automatic localhost URL detection and tunnel creation
- Better error handling and connection stability
- Integrated tunnel support in live session handlers
- **Browser Sessions Service**: New dedicated service for managing browser automation sessions
- **Comprehensive Test Infrastructure**: Added extensive test suite covering unit, integration, and end-to-end scenarios
- Handler tests for E2E suites and live sessions
- Backend services integration tests
- Network and MCP tools validation tests
- Mock infrastructure for reliable testing
- **Enhanced Project Analysis**: New utilities for analyzing codebases and extracting context
- **Improved Error Handling**: Centralized error management with structured error types
- **URL Parser Utilities**: Robust URL parsing and localhost detection capabilities
- **Configuration Management**: Centralized configuration system with environment-based settings
- **API Specification**: Complete OpenAPI specification for backend integration
- **GitHub Actions Workflows**: Automated publishing, version bumping, and validation workflows
### Changed
- **Major Architecture Refactoring**: Reorganized services, handlers, and utilities into cleaner modular structure
- **Moved Tunnel Services**: Relocated tunnel management from `tunnels/` to `services/ngrok/` for better organization
- **Enhanced E2E Runner**: Improved test execution with better progress tracking and error handling
- **Updated Package Dependencies**: Upgraded to latest versions of core dependencies including MCP SDK
- **Improved Documentation**: Updated README with comprehensive setup and usage instructions
- **Enhanced Type Definitions**: Expanded type system with better validation schemas
### Fixed
- **API Endpoint Updates**: Resolved compatibility issues with backend API changes
- **Image Support Improvements**: Enhanced handling of screenshots and visual test artifacts
- **Tunnel Connection Stability**: Fixed issues with ngrok tunnel reliability and reconnection
- **ES Module Compatibility**: Resolved module resolution issues for better Node.js compatibility
### Security
- **License Addition**: Added Apache 2.0 license for proper open source compliance
- **Environment Variable Validation**: Enhanced validation of sensitive configuration data
## [1.0.14] - 2025-06-09
### Added
- Final screen shot included.
## [1.0.12] - 2025-06-02
### Added
- Readme docs issue
## [1.0.11] - 2025-06-02
### Added
- New readme with instructions on install, usage, etc.
## [1.0.10] - 2025-05-29
### Fixed
- Most MCP clients still don't support images. removed that as a response.
## [1.0.7] - 2025-05-29
### Fixed
- Fixed tunneling issues
- Remove notifications when a token is not provided in the original request
## [1.0.2] - 2025-05-28
### Fixed
- Fixed ES module path resolution issues
- Added proper shebang line to executable files
- Ensured executable permissions are set during build
### Added
- Docker container support
- Improved error handling for E2E test runs
## [1.0.1] - 2025-05-28
### Fixed
- Fixed TypeScript configuration to target ES2022
- Resolved dependency issues with Zod library
### Added
- Initial implementation of E2E test runner
- Integration with DebuggAI server client
## [1.0.0] - 2025-05-28
### Added
- Initial release of DebuggAI MCP
- Support for running UI tests via MCP protocol
- Integration with ngrok for tunnel creation
- Basic test reporting functionality