UNPKG

embark-mcp

Version:

MCP server proxy for Embark code search

237 lines (155 loc) 6.94 kB
# Multi-Repository Search The Embark MCP server now supports searching across multiple Git repositories simultaneously, with flexible filtering options. ## Overview When multiple workspace roots are provided by the MCP client, the server will: 1. **Discover** all Git repositories from the provided roots 2. **Filter** repositories based on environment variables (optional) 3. **Search** across all filtered repositories in parallel 4. **Aggregate** results with clear repository labels ## How It Works ### Default Behavior By default, when multiple repositories are discovered from workspace roots, the semantic code search will search **all of them** and return aggregated results. ```bash # No configuration needed - searches all discovered repositories node dist/index.js ``` ### Filtering Repositories You can control which repositories are searched using environment variables: #### Include Specific Repositories Use `INCLUDE_REPOSITORY_URLS` to search only specific repositories: ```bash export INCLUDE_REPOSITORY_URLS="https://github.com/owner/repo1.git,https://github.com/owner/repo2.git" node dist/index.js ``` Only the repositories listed in `INCLUDE_REPOSITORY_URLS` will be searched. #### Exclude Specific Repositories Use `EXCLUDE_REPOSITORY_URLS` to skip certain repositories: ```bash export EXCLUDE_REPOSITORY_URLS="https://github.com/owner/large-repo.git,https://github.com/owner/deprecated-repo.git" node dist/index.js ``` All discovered repositories **except** those listed in `EXCLUDE_REPOSITORY_URLS` will be searched. #### Combine Both Filters You can use both filters together. The include filter is applied first, then the exclude filter: ```bash export INCLUDE_REPOSITORY_URLS="https://github.com/owner/repo1.git,https://github.com/owner/repo2.git,https://github.com/owner/repo3.git" export EXCLUDE_REPOSITORY_URLS="https://github.com/owner/repo3.git" node dist/index.js ``` This will search only `repo1` and `repo2`. ## Environment Variables ### `INCLUDE_REPOSITORY_URLS` - **Format**: Comma-separated list of Git repository URLs - **Effect**: Only repositories in this list will be searched - **Example**: `"https://github.com/org/repo1.git,git@github.com:org/repo2.git"` ### `EXCLUDE_REPOSITORY_URLS` - **Format**: Comma-separated list of Git repository URLs - **Effect**: Repositories in this list will be skipped - **Example**: `"https://github.com/org/large-repo.git"` ### Repository URL Matching The URLs must **exactly match** the Git remote URLs discovered from your repositories. To see what URLs are discovered, check the server logs during initialization. Common formats: - HTTPS: `https://github.com/owner/repo.git` - SSH: `git@github.com:owner/repo.git` - Enterprise: `https://git.company.com/owner/repo.git` ## Search Behavior ### Single Repository When only one repository is available (after filtering or discovery), the search behaves like a traditional single-repository search with clean, simple output. ### Multiple Repositories When multiple repositories are available, the search: 1. Queries all repositories **in parallel** for better performance 2. Aggregates results by repository 3. Returns formatted results with repository headers: ``` Searched 3 of 3 repositories for "authentication handler". Found 15 total results. ## Repository: frontend-app Found 5 results for "authentication handler" in repository "https://github.com/owner/frontend-app.git": 1. File=src/auth/handler.ts, offset=120:450, similarity=0.892, type=FUNCTION ... --- ## Repository: backend-api Found 10 results for "authentication handler" in repository "https://github.com/owner/backend-api.git": 1. File=api/auth.py, offset=200:550, similarity=0.876, type=CLASS ... ``` ## Backward Compatibility This feature is fully backward compatible: - ✅ Single-repository setups continue to work unchanged -`REPOSITORY_GIT_REMOTE_URL` environment variable still supported as fallback - ✅ No changes to tool parameters or API - ✅ Existing integrations remain functional ## Use Cases ### Monorepo with Multiple Projects Search across all projects in a monorepo: ```bash # Search all projects node dist/index.js ``` ### Multi-Repo Development Search across related repositories you're working on: ```bash # Include only active development repos export INCLUDE_REPOSITORY_URLS="https://github.com/org/frontend.git,https://github.com/org/backend.git" node dist/index.js ``` ### Exclude Large/Archived Repositories Skip repositories that are large or not relevant: ```bash # Exclude archived and large repos export EXCLUDE_REPOSITORY_URLS="https://github.com/org/archived-legacy.git,https://github.com/org/huge-data-repo.git" node dist/index.js ``` ## Troubleshooting ### No Results Across All Repositories If you're getting "No repositories available for search": 1. Check that Git repositories are present in your workspace roots 2. Verify repositories have a `git remote` configured 3. If using filters, ensure URLs exactly match the Git remote URLs 4. Check server logs for discovered repository URLs ### Repository Not Being Searched If a repository isn't appearing in results: 1. Verify it was discovered during initialization (check logs) 2. If using `INCLUDE_REPOSITORY_URLS`, ensure the URL is listed 3. If using `EXCLUDE_REPOSITORY_URLS`, ensure the URL is not listed 4. Verify the repository URL format matches exactly (HTTPS vs SSH) ### Performance with Many Repositories Searches across multiple repositories run in parallel, but: - Consider using `INCLUDE_REPOSITORY_URLS` to limit scope - Use `pathFilter` parameter to narrow search within repositories - Large repositories may take longer to search ## Implementation Details ### Repository Discovery 1. Server requests roots from MCP client using `listRoots()` 2. Each root is checked for Git repositories using `git rev-parse --git-dir` 3. Git remote URL is extracted using `git remote get-url origin` 4. Repositories are cached for the duration of the server session ### Filtering Logic ``` All Discovered Repositories ↓ If INCLUDE_REPOSITORY_URLS is set: Keep only repositories in include list ↓ If EXCLUDE_REPOSITORY_URLS is set: Remove repositories in exclude list ↓ Filtered Repositories ``` ### Search Execution - **Single repo**: Direct search, clean output - **Multiple repos**: Parallel searches, aggregated results - **No repos**: Clear error message ## Future Enhancements Potential future improvements: - [ ] Support for repository name patterns (wildcards) - [ ] Per-repository search configuration - [ ] Result ranking across repositories - [ ] Repository-specific pathFilter - [ ] Dynamic repository discovery during runtime ## Testing Run the test suite to see examples: ```bash # Run multi-repository filter demonstration node test-multi-repo-filters.mjs ``` This will show example scenarios and configuration patterns.