claude-flow-novice
Version:
Claude Flow Novice - Advanced orchestration platform for multi-agent AI workflows with CFN Loop architecture Includes Local RuVector Accelerator and all CFN skills for complete functionality.
288 lines (226 loc) • 8.79 kB
Markdown
# RuVector Isolation Audit - Quick Reference
## Status: CRITICAL - Multi-Project Data Leakage
### One-Minute Summary
The centralized RuVector database (`~/.local/share/ruvector/index_v2.db`) contains 783K+ entities from all projects but provides **zero isolation**. Any query returns results from ALL projects. Project B can access Project A's code by simply searching for common terms.
## Vulnerability Matrix
```
Query Method | Isolation | Risk | Impact
----------------------------------|-----------|---------|----------
QueryV2::search() | NONE | HIGH | Semantic search leaks all projects
QueryV2::search_similar_entities()| NONE | HIGH | Returns similar from ANY project
StoreV2::find_entities_by_name() | NONE | HIGH | Gets all matching names globally
StoreV2::find_entities_by_kind() | NONE | HIGH | Enumerates all classes/functions
StoreV2::search_entities() | NONE | HIGH | Fulltext search across all
find_entities_in_file() | PATH ONLY | MEDIUM | Directory traversal possible
```
**Summary**: 9 of 10 query methods unfiltered.
## Leakage Demonstration
### Step 1: Setup
```
Centralized DB contains:
├── Project A: /home/user/project-a/ (783,891 entities)
│ ├── src/auth.ts (sensitive auth implementation)
│ ├── src/database.ts (connection details)
│ └── src/crypto.ts (encryption keys pattern)
└── Project B: /home/user/project-b/ (empty, just initialized)
```
### Step 2: Attack
```rust
// In Project B context:
let query = QueryV2::new(&db_path)?;
let results = query.search("authentication", 10, 0.5)?;
// search() has NO project_root parameter
// search() has NO WHERE clause filtering
```
### Step 3: Leak
```
Results returned:
[Project A] authenticate() in /home/user/project-a/src/auth.ts
[Project A] OAuth provider in /home/user/project-a/src/oauth.ts
[Project A] SessionManager in /home/user/project-a/src/session.ts
[Project B] User model in /home/user/project-b/src/user.ts ← legitimate
[Project A] CredentialManager in /home/user/project-a/src/crypto.ts
```
**Result**: Project A's entire auth system exposed to Project B.
## Root Causes (Top 3)
### 1. No Project Identifier Column
```sql
-- Current schema (BAD):
CREATE TABLE entities (
id INTEGER PRIMARY KEY,
kind TEXT,
name TEXT,
file_path TEXT, -- ← ONLY isolation mechanism
...
);
-- Single string field insufficient for reliable isolation
```
### 2. No WHERE Clause in Core Queries
```rust
// Current QueryV2::search() (BAD):
let mut stmt = self.store.conn.prepare(
"SELECT e.id, e.kind, e.name, ... FROM entities e
JOIN entity_embeddings ee ON e.id = ee.entity_id"
// ↑ NO WHERE clause - returns ALL rows
)?;
```
### 3. No Project Context Passed to APIs
```rust
// main.rs parses project_dir but never uses it:
pub fn search(&self, query: &str, max_results: usize, threshold: f32) -> Result<Vec<SearchResult>>
// ↑ NO project_root parameter
```
## Call Stack Analysis
```
main.rs:CommandQuery.execute()
↓
cli/query.rs:QueryCommand.execute()
├─ Captures project_dir: PathBuf
├─ Creates QueryV2 (no project context passed)
├─ Calls query_v2.search(query, max_results, threshold)
│ └─ ❌ search() has NO project filtering
│ ├─ Queries ALL rows: "SELECT ... FROM entities e JOIN entity_embeddings"
│ └─ Returns results from ANY project matching similarity
└─ Optional client-side filter (insufficient)
└─ if let Some(ref file_filter) = self.config.file_filter {
results.filter(...contains(file_filter)) // Substring, bypassed easily
}
```
**Gap**: Project context lost between CLI and query layer.
## Attack Vectors
### Vector 1: Direct Semantic Search
```rust
// No project parameter exists
query.search("password", 10, 0.5) // Gets ALL password-related code
query.search("token", 10, 0.5) // Gets ALL token logic
query.search("secret", 10, 0.5) // Gets ALL secrets/keys
```
### Vector 2: Entity Kind Enumeration
```rust
// Enumerate all classes across projects
for kind in [Struct, Class, Function, Type] {
let results = store.find_entities_by_kind(kind, 1000);
// Gets 1000 results from ALL projects for each kind
}
```
### Vector 3: Name-Based Discovery
```rust
// Find all functions named "authenticate"
let results = store.find_entities_by_name("authenticate", 500);
// Returns authenticate() from Project A, B, C, ...
```
### Vector 4: Similar Entity Mapping
```rust
// If entity_id obtained (1 to 9B range):
let similar = query.search_similar_entities(entity_id, 10, 0.5);
// Maps ALL similar entities from ALL projects
```
### Vector 5: Directory Traversal
```rust
// No validation on path
store.find_entities_in_file("/home/user/project-a/src/secrets.rs")
// Directly accesses any project's file entities
```
## Code Locations - What Needs Fixing
| File | Method | Line | Issue | Fix |
|------|--------|------|-------|-----|
| query_v2.rs | search() | 42-118 | No WHERE filter | Add `WHERE project_root = ?` |
| query_v2.rs | search_similar_entities() | 136-209 | No project filter | Add project_root param |
| store_v2.rs | find_entities_by_name() | 143-156 | No WHERE filter | Add `AND project_root = ?` |
| store_v2.rs | find_entities_by_kind() | 158-171 | No WHERE filter | Add `AND project_root = ?` |
| store_v2.rs | search_entities() | 187-208 | No WHERE filter | Add `AND project_root = ?` |
| cli/query.rs | QueryCommand.execute() | 63-91 | Insufficient filtering | Enforce DB-level filtering |
| schema_v2.rs | SchemaV2.initialize() | 214-286 | Missing project_root column | Add column with constraint |
| main.rs | Query subcommand | 69-99 | Unused project_dir param | Pass to query methods |
## High-Level Fix (Pseudo-code)
### Step 1: Database
```sql
-- Add project isolation column
ALTER TABLE entities ADD COLUMN project_root TEXT NOT NULL DEFAULT '';
UPDATE entities SET project_root = ...; -- derive from file_path
-- Add indexes for performance
CREATE INDEX idx_entities_project_kind ON entities(project_root, kind);
```
### Step 2: QueryV2
```rust
// Current:
pub fn search(&self, query: &str, max_results: usize, threshold: f32)
// Fixed:
pub fn search(&self, query: &str, max_results: usize, threshold: f32, project_root: &str)
// Add WHERE project_root = ? to SQL query
```
### Step 3: CLI
```rust
// Current:
let results = self.query_v2.search(&self.config.query, max_results, threshold)?;
// Fixed:
let results = self.query_v2.search(
&self.config.query,
max_results,
threshold,
&project_root_string // ← Pass project context
)?;
```
### Step 4: Tests
```rust
#[test]
fn test_search_isolation() {
// Add Project A and B entities to same DB
// Search from Project B context
// Assert: Results from Project A == 0
}
```
## Timeline
| Phase | Tasks | Effort |
|-------|-------|--------|
| **CRITICAL (Week 1)** | Add project_root column, fix 5 query methods, path validation | 3-4 days |
| **HIGH (Week 2)** | Fix remaining methods, audit logging, test suite | 2-3 days |
| **MEDIUM (Week 3-4)** | Indexes, documentation, performance | 1-2 days |
## Do NOT Use Until Fixed
```
❌ Multi-project environments
❌ Sensitive code repositories
❌ Regulated data (HIPAA, PCI-DSS, SOX, GDPR)
❌ Competitive projects
❌ Production deployments
```
## Safe Uses Only
```
✓ Single-project development
✓ Public codebases
✓ Internal company projects (with trust)
✓ Research/academic
```
## Test To Verify Fix
```bash
# Before running, check for critical issues:
cargo test --lib query_v2::tests::test_cross_project_leakage
# Should FAIL before fix (demonstrating vulnerability)
# Should PASS after fix
```
## References
- **Full Audit**: `docs/RUVECTOR_ISOLATION_AUDIT.md`
- **Query Implementation**: `.claude/skills/cfn-local-ruvector-accelerator/src/query_v2.rs`
- **Store Layer**: `.claude/skills/cfn-local-ruvector-accelerator/src/store_v2.rs`
- **Database Location**: `~/.local/share/ruvector/index_v2.db`
## Key Takeaway
**Centralized database without per-project filtering = multi-project code leakage.**
Fix requires:
1. Add `project_root` column to schema
2. Add `project_root` parameter to all query APIs
3. Update CLI to pass project context
4. Add tests to enforce isolation
Estimated effort: **1-2 weeks** for safe production use.