claude-flow-novice

Version:

Claude Flow Novice - Advanced orchestration platform for multi-agent AI workflows with CFN Loop architecture Includes Local RuVector Accelerator and all CFN skills for complete functionality.

github.com/cfn-dev/claude-flow-novice

cfn-dev/claude-flow-novice

429 lines (346 loc) • 19 kB

Plain Text

================================================================================ RUVECTOR QUERY ISOLATION AUDIT - EXECUTIVE SUMMARY ================================================================================ ASSESSMENT DATE: 2025-12-11 AUDIT STATUS: CRITICAL VULNERABILITIES IDENTIFIED DATABASE: ~/.local/share/ruvector/index_v2.db (Centralized, Multi-Project) ENTITIES AT RISK: 783,891+ rows (Project A + Project B + others) ================================================================================ CRITICAL FINDING ================================================================================ The RuVector centralized database provides ZERO cross-project isolation. Any query from Project B can access Project A's entire codebase. No database-level filtering exists. No WHERE clause project restrictions in core queries. No project identifier column in schema. Result: 100% DATA LEAKAGE in multi-project scenarios. ================================================================================ ISOLATION AUDIT RESULTS ================================================================================ QUERY ANALYSIS -------------- Function Filtering WHERE Clause Risk Level ═══════════════════════════════════════════════════════════════════════ QueryV2::search() NONE NO CRITICAL QueryV2::search_similar_entities() NONE NO CRITICAL StoreV2::find_entities_by_name() NONE NO CRITICAL StoreV2::find_entities_by_kind() NONE NO CRITICAL StoreV2::search_entities() NONE NO CRITICAL StoreV2::find_references_to_entity() NONE NO CRITICAL StoreV2::find_references_from_entity() NONE NO CRITICAL StoreV2::find_entities_using_type() NONE NO CRITICAL QueryCommand::execute() OPTIONAL STRING (client-side) HIGH StoreV2::find_entities_in_file() PATH ONLY NO VALIDATION MEDIUM Total methods analyzed: 10 Unfiltered global results: 8/10 (80%) Client-side filtering: 1/10 (insufficient) Properly isolated: 0/10 ================================================================================ PATH-BASED IDENTIFICATION ASSESSMENT ================================================================================ Implementation Type: File path string as sole isolation mechanism Example Paths: /home/user/project-a/src/auth.rs /home/user/project-b/src/auth.rs /home/user/project-c/src/oauth.ts Strengths: ✓ Path is recorded in database ✓ Can enable retrospective filtering ✓ Supports detection of cross-project leakage Weaknesses: ✗ No validation that queries use current project's path ✗ No path canonicalization (symlinks, relative paths unreliable) ✗ No directory traversal protection (../../../) ✗ Substring matching vulnerable (file_filter.contains() easily bypassed) ✗ Case sensitivity issues on case-insensitive filesystems ✗ No structured project ID column ✗ No database-level constraint enforcement Edge Cases: • Symlink attacks: Link to other project accessible via symlink • Relative path traversal: ../../project-a/src/ not validated • Path case issues: PROJECT-A vs project-a mismatch • Network paths: NFS, Samba mounts may resolve differently CONCLUSION: Path-based isolation INSUFFICIENT for production use. ================================================================================ CROSS-PROJECT LEAKAGE RISKS (SEVERITY LEVELS) ================================================================================ RISK #1: SEMANTIC SEARCH LEAKAGE [CRITICAL] ──────────────────────────────────────────── Vector: query.search("authentication", 10, 0.5) Impact: Returns auth code from ALL projects Exposure: 100% of matching entities across all projects Exploit: 0-click, runs on every search Fix Time: Add WHERE project_root = ? clause (30 minutes) Status: EXPLOITABLE NOW RISK #2: ENTITY ENUMERATION BY KIND [CRITICAL] ──────────────────────────────────────────── Vector: store.find_entities_by_kind(Class, 500) Impact: Gets 500 classes from Project A, B, C... Exposure: All class definitions across all projects Exploit: Loop over EntityKind enum (8 types total) Fix Time: Add WHERE project_root = ? clause (30 minutes) Status: EXPLOITABLE NOW RISK #3: NAME-BASED ENTITY DISCOVERY [CRITICAL] ──────────────────────────────────────────── Vector: store.find_entities_by_name("authenticate", 500) Impact: Returns all authenticate() functions from all projects Exposure: Function signatures, implementations, patterns Exploit: Guess common function names Fix Time: Add WHERE project_root = ? clause (30 minutes) Status: EXPLOITABLE NOW RISK #4: SIMILAR ENTITY MAPPING [CRITICAL] ──────────────────────────────────────────── Vector: query.search_similar_entities(entity_id, 10, 0.5) Impact: Maps similar code across all projects Exposure: Reveals architecture, design patterns, variable naming Exploit: Brute force entity IDs (1-9B range) Fix Time: Add project_root param and WHERE filter (1 hour) Status: EXPLOITABLE NOW RISK #5: DIRECTORY TRAVERSAL [CRITICAL] ──────────────────────────────────────── Vector: store.find_entities_in_file("/home/user/project-a/src/auth.rs") Impact: Direct access to any project's file Exposure: Complete file entity lists for any project Exploit: Guess project paths (easy with standard naming) Fix Time: Add path validation and canonicalization (2 hours) Status: EXPLOITABLE NOW RISK #6: BATCH QUERY UNFILTERING [HIGH] ──────────────────────────────────────── Vector: BatchQueryCommand reads queries from external file Impact: No per-line project scoping in batch mode Exposure: All queries return global results Exploit: Create batch file with cross-project queries Fix Time: Apply same fixes as Risk #1 Status: EXPLOITABLE NOW ================================================================================ VULNERABILITY TEST SCENARIO ================================================================================ SETUP: Centralized DB contains: Project A: /home/user/project-a/ (783,891 entities) Includes: auth.ts, database.ts, crypto.ts Project B: /home/user/project-b/ (empty, just initialized) ACTION (from Project B context): query.search("authentication", 10, 0.5) EXPECTED RESULT: Query returns 0 results (Project B empty) OR Query returns only Project B results if any match ACTUAL RESULT: Query returns results from Project A: [1] authenticate() from /home/user/project-a/src/auth.ts [2] OAuth provider from /home/user/project-a/src/oauth.ts [3] SessionManager from /home/user/project-a/src/session.ts [4] CredentialManager from /home/user/project-a/src/crypto.ts ... (up to 10 results from Project A) ASSESSMENT: 100% LEAKAGE CONFIRMED No filtering mechanism prevents Project B from accessing Project A's code. ================================================================================ DATABASE SCHEMA ISSUES ================================================================================ MISSING COLUMNS: □ project_root TEXT NOT NULL - Primary isolation mechanism □ project_id INTEGER - Explicit project reference □ access_level - Could support future fine-grained access control MISSING CONSTRAINTS: □ UNIQUE(project_root, file_path) - Prevent cross-project dupes □ CHECK(project_root LIKE '/home/%') - Validate path format □ FK constraint on refs ensuring same-project references MISSING INDEXES: □ idx_entities_project_kind ON entities(project_root, kind) □ idx_entities_project_name ON entities(project_root, name) □ idx_refs_project_source ON refs(project_root, source_entity_id) □ idx_type_usage_project_entity ON type_usage(project_root, entity_id) CURRENT INDEXES (UNUSED): ✓ idx_entities_file_path - Created but queries don't use WHERE file_path ✓ idx_refs_file_path - Created but queries don't filter on file_path SCHEMA RISK LEVEL: CRITICAL Requires migration to add isolation guarantees. ================================================================================ CENTRALIZED DB ISOLATION GUARANTEES ================================================================================ Database-Level Enforcement: NONE Application-Level Filtering: OPTIONAL (file_filter parameter) Code-Level Validation: NONE Current Flow: CLI: QueryCommand.execute() ├─ Captures project_dir ├─ Calls QueryV2.search() ← NO project_root passed │ ├─ SQL: "SELECT * FROM entities e JOIN entity_embeddings ee ON ..." │ │ ↑ NO WHERE clause │ ├─ Returns ALL rows matching similarity │ └─ No project filtering ├─ Optional file_filter.contains() check ← Client-side, insufficient └─ Output results Problems: 1. Project context lost between CLI and query layer 2. No WHERE clause enforces isolation in SQL 3. Client-side filtering optional and easily bypassed 4. File_filter uses substring match (false positives) 5. Defense-in-depth violated (no DB-level security) Fix Requires: 1. Add project_root column to schema 2. Update ALL 10 query methods to accept project_root parameter 3. Add WHERE project_root = ? to every query 4. Pass project_root from CLI to query layer 5. Add path validation helper 6. Remove optional file_filter (enforce at DB level) ================================================================================ QUERY METHOD FILTERING COVERAGE ================================================================================ Method File Path Status ──────────────────────────────────────────────────────────────────── search() query_v2.rs:42 ✗ UNFILTERED search_similar_entities() query_v2.rs:136 ✗ UNFILTERED find_entities_by_name() store_v2.rs:143 ✗ UNFILTERED find_entities_by_kind() store_v2.rs:158 ✗ UNFILTERED find_entities_in_file() store_v2.rs:173 ⚠ PATH ONLY (no validation) search_entities() store_v2.rs:187 ✗ UNFILTERED find_entities_using_type() store_v2.rs:285 ✗ UNFILTERED find_references_to_entity() store_v2.rs:235 ✗ UNFILTERED find_references_from_entity() store_v2.rs:249 ✗ UNFILTERED find_module_by_file() store_v2.rs:321 ⚠ EXACT MATCH (if validated) Coverage: 0/10 properly isolated Unfiltered: 8/10 methods Partially safe: 2/10 methods (if input validated, which isn't) ================================================================================ RECOMMENDATIONS - PRIORITY ORDER ================================================================================ CRITICAL (Week 1 - Must Fix Before Any Production Use): ──────────────────────────────────────────────────────── [1] Add project_root column to entities, refs, type_usage, modules tables Effort: 2 hours Impact: Enables database-level isolation [2] Update QueryV2::search() to accept project_root parameter Add WHERE e.project_root = ? clause Effort: 1 hour Impact: Fixes semantic search leakage [3] Update QueryV2::search_similar_entities() with project_root filter Effort: 1 hour Impact: Fixes similarity-based leakage [4] Update all 8 StoreV2 unfiltered methods with project_root filtering Effort: 2 hours (repeated pattern) Impact: Closes 80% of query gaps [5] Add path validation helper (canonicalize, traverse check) Effort: 1 hour Impact: Prevents directory traversal attacks [6] Update QueryCommand to pass project_root to all query methods Effort: 1 hour Impact: Connects CLI context to database queries [7] Create comprehensive test suite for isolation Effort: 2 hours Impact: Prevents regression Total P0 Effort: ~10 hours (1-2 developer days) HIGH (Week 2-3 - Important for Robustness): ────────────────────────────────────────── [8] Add project consistency check constraints (FKs within project) Effort: 2 hours [9] Create audit logging table for query operations Effort: 2 hours [10] Create composite indexes on (project_root, kind), (project_root, name) Effort: 1 hour [11] Add documentation on isolation assumptions and API contracts Effort: 2 hours MEDIUM (Week 3-4 - Nice to Have): ──────────────────────────────── [12] Performance tuning for project-scoped queries [13] Add rate limiting per project [14] Create project-scoped access control layer ================================================================================ RECOMMENDATIONS - CODE CHANGES ================================================================================ SCHEMA CHANGE: ────────────── ALTER TABLE entities ADD COLUMN project_root TEXT NOT NULL DEFAULT ''; UPDATE entities SET project_root = SUBSTR(file_path, 1, INSTR(file_path, '/src/') - 1); ALTER TABLE entities ADD CONSTRAINT entities_project_check CHECK(project_root != ''); API CHANGES: ──────────── OLD: pub fn search(&self, query: &str, max_results: usize, threshold: f32) NEW: pub fn search(&self, query: &str, max_results: usize, threshold: f32, project_root: &str) OLD SQL: "SELECT ... FROM entities e JOIN entity_embeddings ee ON e.id = ee.entity_id" NEW SQL: "SELECT ... FROM entities e JOIN entity_embeddings ee ON e.id = ee.entity_id WHERE e.project_root = ?" CLI CHANGES: ──────────── let project_root = self.project_dir.canonicalize()?.to_str()?; let results = self.query_v2.search(query, max_results, threshold, project_root)?; VALIDATION CHANGES: ─────────────────── fn validate_project_path(file_path: &str, project_root: &str) -> Result<()> { let canonical_file = std::fs::canonicalize(file_path)?; let canonical_project = std::fs::canonicalize(project_root)?; if !canonical_file.starts_with(&canonical_project) { return Err(anyhow!("Path traversal detected")); } Ok(()) } ================================================================================ TESTING VERIFICATION ================================================================================ Before accepting ANY fix, create and run these tests: TEST 1: test_cross_project_search_isolation Setup: Add 100 entities to Project A, 10 to Project B in same DB Action: Search from Project B context Assert: Results contain ONLY Project B entities Status: MUST PASS to consider fix valid TEST 2: test_cross_project_find_by_name_isolation Setup: Same 100/10 entity distribution Action: find_entities_by_name from Project B context Assert: Results contain ONLY Project B entities with that name Status: MUST PASS TEST 3: test_cross_project_find_by_kind_isolation Setup: Same distribution Action: find_entities_by_kind(Class) from Project B Assert: Results contain ONLY Project B classes Status: MUST PASS TEST 4: test_directory_traversal_blocked Setup: Create Project A and B Action: Query with path containing "../../../project-a" Assert: Error or blocked traversal Status: MUST PASS TEST 5: test_symlink_attack_blocked Setup: Create symlink from Project B to Project A Action: Query through symlink path Assert: Path validation catches it OR canonicalizes correctly Status: MUST PASS All tests MUST PASS in Standard mode (95%+ pass rate). ================================================================================ OVERALL ASSESSMENT ================================================================================ VULNERABILITY SEVERITY: CRITICAL EXPLOITABILITY: TRIVIAL (no special knowledge required) BLAST RADIUS: ALL projects using centralized DB DETECTION DIFFICULTY: HARD (appears as normal search results) DATA AT RISK: ALL entities in centralized database BEFORE FIX: Status: UNSAFE for multi-project use Safe for: Single project, public code, research only Risk: 100% code leakage between projects AFTER FIX (assuming all P0 completed): Status: SAFE for production multi-project use Added isolation: Database-level + application-level Testing: Comprehensive isolation test suite Maintenance: Required documentation updates RECOMMENDATION: DO NOT DEPLOY to production multi-project environments. Implement critical fixes immediately (1-2 days). Re-audit after implementation. ================================================================================ SIGN-OFF ================================================================================ Audit Completed: 2025-12-11 Auditor: Code Security Review Status: CRITICAL ISSUES IDENTIFIED Recommendation: URGENT FIX REQUIRED Next Steps: 1. Review this audit with development team 2. Prioritize P0 fixes (Week 1) 3. Create test suite before implementation 4. Implement fixes following recommendations 5. Run tests to verify isolation 6. Request re-audit after implementation 7. Update CHANGELOG with security fixes Documentation: - Full audit: docs/RUVECTOR_ISOLATION_AUDIT.md (835 lines) - Quick ref: docs/RUVECTOR_QUICK_REFERENCE.md (287 lines) - This summary: docs/RUVECTOR_ISOLATION_SUMMARY.txt ================================================================================