embark-mcp
Version:
MCP server proxy for Embark code search
237 lines (155 loc) • 6.94 kB
Markdown
# Multi-Repository Search
The Embark MCP server now supports searching across multiple Git repositories simultaneously, with flexible filtering options.
## Overview
When multiple workspace roots are provided by the MCP client, the server will:
1. **Discover** all Git repositories from the provided roots
2. **Filter** repositories based on environment variables (optional)
3. **Search** across all filtered repositories in parallel
4. **Aggregate** results with clear repository labels
## How It Works
### Default Behavior
By default, when multiple repositories are discovered from workspace roots, the semantic code search will search **all of them** and return aggregated results.
```bash
# No configuration needed - searches all discovered repositories
node dist/index.js
```
### Filtering Repositories
You can control which repositories are searched using environment variables:
#### Include Specific Repositories
Use `INCLUDE_REPOSITORY_URLS` to search only specific repositories:
```bash
export INCLUDE_REPOSITORY_URLS="https://github.com/owner/repo1.git,https://github.com/owner/repo2.git"
node dist/index.js
```
Only the repositories listed in `INCLUDE_REPOSITORY_URLS` will be searched.
#### Exclude Specific Repositories
Use `EXCLUDE_REPOSITORY_URLS` to skip certain repositories:
```bash
export EXCLUDE_REPOSITORY_URLS="https://github.com/owner/large-repo.git,https://github.com/owner/deprecated-repo.git"
node dist/index.js
```
All discovered repositories **except** those listed in `EXCLUDE_REPOSITORY_URLS` will be searched.
#### Combine Both Filters
You can use both filters together. The include filter is applied first, then the exclude filter:
```bash
export INCLUDE_REPOSITORY_URLS="https://github.com/owner/repo1.git,https://github.com/owner/repo2.git,https://github.com/owner/repo3.git"
export EXCLUDE_REPOSITORY_URLS="https://github.com/owner/repo3.git"
node dist/index.js
```
This will search only `repo1` and `repo2`.
## Environment Variables
### `INCLUDE_REPOSITORY_URLS`
- **Format**: Comma-separated list of Git repository URLs
- **Effect**: Only repositories in this list will be searched
- **Example**: `"https://github.com/org/repo1.git,git@github.com:org/repo2.git"`
### `EXCLUDE_REPOSITORY_URLS`
- **Format**: Comma-separated list of Git repository URLs
- **Effect**: Repositories in this list will be skipped
- **Example**: `"https://github.com/org/large-repo.git"`
### Repository URL Matching
The URLs must **exactly match** the Git remote URLs discovered from your repositories. To see what URLs are discovered, check the server logs during initialization.
Common formats:
- HTTPS: `https://github.com/owner/repo.git`
- SSH: `git@github.com:owner/repo.git`
- Enterprise: `https://git.company.com/owner/repo.git`
## Search Behavior
### Single Repository
When only one repository is available (after filtering or discovery), the search behaves like a traditional single-repository search with clean, simple output.
### Multiple Repositories
When multiple repositories are available, the search:
1. Queries all repositories **in parallel** for better performance
2. Aggregates results by repository
3. Returns formatted results with repository headers:
```
Searched 3 of 3 repositories for "authentication handler". Found 15 total results.
## Repository: frontend-app
Found 5 results for "authentication handler" in repository "https://github.com/owner/frontend-app.git":
1. File=src/auth/handler.ts, offset=120:450, similarity=0.892, type=FUNCTION
...
---
## Repository: backend-api
Found 10 results for "authentication handler" in repository "https://github.com/owner/backend-api.git":
1. File=api/auth.py, offset=200:550, similarity=0.876, type=CLASS
...
```
## Backward Compatibility
This feature is fully backward compatible:
- ✅ Single-repository setups continue to work unchanged
- ✅ `REPOSITORY_GIT_REMOTE_URL` environment variable still supported as fallback
- ✅ No changes to tool parameters or API
- ✅ Existing integrations remain functional
## Use Cases
### Monorepo with Multiple Projects
Search across all projects in a monorepo:
```bash
# Search all projects
node dist/index.js
```
### Multi-Repo Development
Search across related repositories you're working on:
```bash
# Include only active development repos
export INCLUDE_REPOSITORY_URLS="https://github.com/org/frontend.git,https://github.com/org/backend.git"
node dist/index.js
```
### Exclude Large/Archived Repositories
Skip repositories that are large or not relevant:
```bash
# Exclude archived and large repos
export EXCLUDE_REPOSITORY_URLS="https://github.com/org/archived-legacy.git,https://github.com/org/huge-data-repo.git"
node dist/index.js
```
## Troubleshooting
### No Results Across All Repositories
If you're getting "No repositories available for search":
1. Check that Git repositories are present in your workspace roots
2. Verify repositories have a `git remote` configured
3. If using filters, ensure URLs exactly match the Git remote URLs
4. Check server logs for discovered repository URLs
### Repository Not Being Searched
If a repository isn't appearing in results:
1. Verify it was discovered during initialization (check logs)
2. If using `INCLUDE_REPOSITORY_URLS`, ensure the URL is listed
3. If using `EXCLUDE_REPOSITORY_URLS`, ensure the URL is not listed
4. Verify the repository URL format matches exactly (HTTPS vs SSH)
### Performance with Many Repositories
Searches across multiple repositories run in parallel, but:
- Consider using `INCLUDE_REPOSITORY_URLS` to limit scope
- Use `pathFilter` parameter to narrow search within repositories
- Large repositories may take longer to search
## Implementation Details
### Repository Discovery
1. Server requests roots from MCP client using `listRoots()`
2. Each root is checked for Git repositories using `git rev-parse --git-dir`
3. Git remote URL is extracted using `git remote get-url origin`
4. Repositories are cached for the duration of the server session
### Filtering Logic
```
All Discovered Repositories
↓
If INCLUDE_REPOSITORY_URLS is set:
Keep only repositories in include list
↓
If EXCLUDE_REPOSITORY_URLS is set:
Remove repositories in exclude list
↓
Filtered Repositories
```
### Search Execution
- **Single repo**: Direct search, clean output
- **Multiple repos**: Parallel searches, aggregated results
- **No repos**: Clear error message
## Future Enhancements
Potential future improvements:
- [ ] Support for repository name patterns (wildcards)
- [ ] Per-repository search configuration
- [ ] Result ranking across repositories
- [ ] Repository-specific pathFilter
- [ ] Dynamic repository discovery during runtime
## Testing
Run the test suite to see examples:
```bash
# Run multi-repository filter demonstration
node test-multi-repo-filters.mjs
```
This will show example scenarios and configuration patterns.