@debugmcp/mcp-debugger
Version:
Run-time step-through debugging for LLM agents.
319 lines (242 loc) • 9.68 kB
Markdown
# Building MCP Debug Server with LLM Collaboration
This document captures the unique journey of building the MCP Debug Server through LLM-assisted development, showcasing patterns, challenges, and lessons learned that can help others undertaking similar projects.
## The Origin Story
The MCP Debug Server was born from a practical need: an LLM agent (Claude) needed step-through debugging capabilities to effectively analyze and fix code. Rather than waiting for such a tool to exist, we decided to build it together, creating a powerful example of human-LLM collaboration.
## The C2 (Command & Control) Pattern
### Why C2 Was Essential
LLMs have context window limitations - typically 100-200k tokens. A complex refactoring project quickly fills this window with:
- File contents being analyzed
- Test outputs and error messages
- Implementation details
- Historical context of changes
The C2 pattern emerged as a solution to manage this complexity effectively.
### How C2 Works
```
┌─────────────────────┐ ┌─────────────────────┐
│ C2 Thread (You) │ │ Implementation Thread│
│ - Strategic View │ │ - Tactical Work │
│ - Project Context │────▶│ - Code Changes │
│ - Task Planning │ │ - Testing │
│ - Quality Control │◀────│ - Deep Analysis │
└─────────────────────┘ └─────────────────────┘
▲ │
│ Human Orchestrator │
└──────────────────────────────┘
```
1. **C2 Thread** maintains the big picture, project goals, and strategic decisions
2. **Implementation Threads** dive deep into specific tasks with fresh context
3. **Human Orchestrator** bridges communication between threads
### C2 in Practice
Example from our development:
**C2 Thread**: "We need to fix memory leaks in event handlers"
→ Creates detailed task prompt with context
→ Human starts new implementation thread
**Implementation Thread**: Analyzes code, finds 15 event handler attachments without cleanup
→ Creates implementation plan
→ Human shares plan with C2
**C2 Thread**: Reviews plan, adds consideration about WeakMap pattern
→ Human shares feedback with implementation
**Implementation Thread**: Implements solution with WeakMap for automatic cleanup
→ Human reports success to C2
## The One-Week Transformation
### Starting Point (<20% Coverage)
- Jest test framework with compatibility issues
- Minimal test coverage
- Production code mixed with test logic
- Memory leaks and race conditions
- Hardcoded values throughout
### End Result (>90% Coverage)
- Vitest migration complete
- 657 passing tests
- Clean architecture with dependency injection
- Production-ready error handling
- Professional documentation
### Day-by-Day Progress
**Day 1-2: Vitest Migration**
- Migrated from Jest to Vitest for better ESM support
- Fixed compatibility issues
- Established testing foundation
**Day 3-4: Architecture Refactoring**
- Introduced dependency injection throughout
- Separated concerns (SessionManager → ProxyManager)
- Created clean interfaces for testability
**Day 5-6: Test Coverage Push**
- Implemented comprehensive unit tests
- Added integration tests
- Reached 90%+ coverage
**Day 7: Critical Issues & Documentation**
- Fixed memory leaks
- Resolved race conditions
- Created comprehensive documentation
## Key Patterns Discovered
### 1. Test-Production Separation
**The Problem**: Production code checking for test environment
```typescript
// ❌ Anti-pattern we found
if (process.env.NODE_ENV === 'test') {
// Different behavior in tests
}
```
**The Solution**: Three-layer architecture
```typescript
// ✅ Clean separation
dap-proxy-core.ts // Pure logic
dap-proxy-entry.ts // Production entry
dap-proxy.ts // Test exports
```
### 2. Dependency Injection for Testability
**The Problem**: Hardcoded dependencies made testing impossible
```typescript
// ❌ Before
class SessionManager {
private fs = require('fs'); // Can't test!
}
```
**The Solution**: Constructor injection
```typescript
// ✅ After
class SessionManager {
constructor(private fs: IFileSystem) {} // Fully testable
}
```
### 3. Event Handler Lifecycle Management
**The Problem**: Memory leaks from untracked event handlers
```typescript
// ❌ Leak-prone code
proxy.on('data', this.handleData);
// Never removed!
```
**The Solution**: Systematic cleanup
```typescript
// ✅ Proper lifecycle
private cleanupFunctions = new WeakMap();
setupHandlers(proxy) {
const handler = (data) => this.handleData(data);
proxy.on('data', handler);
this.cleanupFunctions.set(proxy, () => {
proxy.off('data', handler);
});
}
```
## LLM Collaboration Insights
### What Worked Well
1. **Incremental Refactoring**
- Breaking large changes into focused tasks
- Each task had clear acceptance criteria
- Progress was measurable and visible
2. **Test-Driven Fixes**
- Write test for desired behavior
- Make it pass
- Refactor with confidence
3. **Real-Time Feedback Loop**
- LLM provides implementation
- Human runs tests immediately
- Quick iteration on failures
4. **Pattern Recognition**
- LLMs excel at identifying patterns across codebase
- Suggested consistent approaches
- Spotted anti-patterns quickly
### Challenges and Solutions
1. **Challenge**: LLM sometimes over-engineers
- **Solution**: Human judgment on "good enough"
- Example: Simple timeout vs. complex retry logic
2. **Challenge**: Context loss between conversations
- **Solution**: C2 pattern maintains continuity
- Detailed task prompts preserve context
3. **Challenge**: Hallucination of APIs
- **Solution**: Always verify with actual execution
- Test immediately, trust but verify
4. **Challenge**: Inconsistent code style
- **Solution**: Established patterns early
- Reference existing code for consistency
## Lessons for LLM-Assisted Development
### 1. Structure is Critical
- Use C2 pattern for complex projects
- Maintain clear task boundaries
- Document decisions as you go
### 2. Verification Over Trust
- Run tests immediately
- Check generated code compiles
- Verify documentation matches implementation
### 3. Human Judgment Matters
- Know when to push back on suggestions
- Recognize over-engineering
- Make pragmatic decisions
### 4. Iterative Improvement
- Start with working code
- Improve incrementally
- Maintain test coverage throughout
### 5. Documentation as You Go
- Capture patterns immediately
- Document decisions while context is fresh
- Create examples from real code
## Anti-Patterns We Fixed
### 1. Environment Variable Filtering
```typescript
// ❌ What we found
const env = Object.entries(process.env)
.filter(([key]) => !['NODE_ENV', 'VITEST'].includes(key))
```
**Why it's bad**: Hides the real problem instead of fixing it
### 2. Race Condition "Fixes"
```typescript
// ❌ Timing hack
await new Promise(resolve => setTimeout(resolve, 500));
```
**Why it's bad**: Will fail intermittently under load
### 3. Silent Failures
```typescript
// ❌ Inconsistent error handling
getVariables() {
if (!session) return []; // Silent failure!
}
```
**Why it's bad**: Hides problems from users
## The Power of Human-LLM Collaboration
This project demonstrates that human-LLM collaboration can produce professional-quality software when:
1. **Humans provide**:
- Strategic direction
- Quality standards
- Pragmatic judgment
- Real-world testing
2. **LLMs provide**:
- Rapid implementation
- Pattern recognition
- Comprehensive analysis
- Tireless refactoring
3. **Together they achieve**:
- Fast development (1 week vs months)
- High quality (90%+ test coverage)
- Professional architecture
- Comprehensive documentation
## Recommendations for Your LLM Project
1. **Start with Clear Goals**
- Define what "done" looks like
- Set quality standards upfront
- Decide on key patterns early
2. **Use the C2 Pattern**
- One strategic thread
- Multiple implementation threads
- Human orchestration between them
3. **Test Everything Immediately**
- Don't accumulate untested code
- Fix issues as they arise
- Maintain coverage standards
4. **Document Patterns**
- Capture insights immediately
- Create templates for common tasks
- Build a knowledge base
5. **Trust but Verify**
- LLMs are powerful but not infallible
- Always run the code
- Check edge cases
## Conclusion
The MCP Debug Server project proves that LLM-assisted development can produce production-quality software quickly. The key is structured collaboration, immediate verification, and human judgment at decision points.
By following patterns like C2 and maintaining high standards throughout development, you can leverage LLM capabilities while avoiding common pitfalls. The result is professional software that would traditionally take months to develop, completed in days or weeks.
This project stands as both a useful tool for the community and a demonstration of what's possible when humans and LLMs collaborate effectively on software development.
## Resources
- [Test Coverage Report](../../coverage/index.html)
- [Architecture Overview](../architecture/system-overview.md)
- [Contributing Guidelines](../../CONTRIBUTING.md)
---
*Built with ❤️ through human-LLM collaboration*