arela
Version:
AI-powered CTO with multi-agent orchestration, code summarization, visual testing (web + mobile) for blazing fast development.
675 lines (537 loc) • 69.3 kB
Markdown
The Hexi-Memory Cognitive Architecture: A Production Blueprint for AI Technical Assistants
1. Executive Summary
This report presents a comprehensive analysis and implementation blueprint for the proposed Hexi-Memory architecture. The primary finding is that the 6-layer (Hexi-Memory) system is not overkill; it is the optimal, state-of-the-art design required to achieve Arela's vision of a "perfect memory" AI Technical Co-Founder. This architecture correctly evolves from a simple, stateless storage model to a sophisticated, stateful cognitive architecture.
The Hexi-Memory system's design is best understood as a 2-dimensional matrix, not a simple stack:
Temporal Layers (Access Speed & Context): Layer 4 (Session/Hot), Layer 5 (Project/Warm), and Layer 6 (User/Cold) provide the working, episodic, and procedural context.
Functional Layers (Data Type): Layer 1 (Vector/Semantic), Layer 2 (Graph/Structural), and Layer 3 (Governance/Audit) provide the semantic and factual knowledge base.
A review of the competitive landscape reveals a fragmented and immature market for AI memory. Competitors like Cursor rely on a patchwork of third-party plugins ; GitHub Copilot is only now introducing basic native memory 2; Devin is a "black box" that appears to bet on a single, unified context, which risks losing long-term, cross-project knowledge 3; and Sourcegraph Cody focuses on repository-level RAG without true user learning.4
Arela's integrated, local-first, 6-layer design will be a significant and defensible market differentiator.
Production success is contingent on three critical engineering pillars:
A Local-First Storage Stack: A high-performance, low-dependency stack is non-negotiable. This report recommends LanceDB for Layer 1 (Vector) 5 and SQLite for Layers 2, 3, 5, and 6, leveraging the native VS Code workspaceState API for Layer 4 (Session).6
A Sub-200ms Fusion Engine: To meet the <200ms query goal, a "Memory Router" must query all 6 layers in parallel. Retrieval must be hierarchical (using L4/L5/L6 to filter L1/L2) and fused using Reciprocal Rank Fusion (RRF).7
An Automated "Learning & Forgetting" System: "Perfect memory" requires "perfect forgetting".8 An automated consolidation system is proposed, using Frequent Itemset Mining for learning patterns, Bayesian Updating for confidence scoring 9, and Exponential Decay (Half-Life) for graceful forgetting.10
The objective is not merely to remember facts but to synthesize them into a single, coherent context. The Hexi-Memory architecture, paired with the proposed Fusion Engine, achieves this, providing the "perfect context, every time" that Arela's vision demands.
2. Memory Architecture Analysis
Optimal Layering: From Storage to Cognitive Architecture
The core query of whether six layers is "overkill" is a question of architecture: is the goal to build a database or a "brain"? A simple 3-layer system (Vector, Graph, Governance) is a powerful, but stateless, Retrieval-Augmented Generation (RAG) system. It stores facts but possesses no experience. It can answer "what is in this file?" but not "what was I working on?" or "what patterns do I prefer?"
The addition of the Session, Project, and User layers transforms the system from a stateless database into a stateful, cognitive agent. This hierarchical, multi-layered approach is validated by contemporary AI research. For example, academic proposals for agent memory include 3-tier models (short-term, mid-term, long-term) like MemoryOS 11 and 4-layer models (Domain, Category, Trace, Episode) like H-MEM.12 These cognitive architectures confirm that layering based on temporality and contextual scope is a state-of-the-art practice.13
The proposed Hexi-Memory architecture is a practical, production-ready implementation of these cognitive theories, mapping directly to human memory functions 15:
Layer 4 (Session): Functions as Short-Term / Working Memory, holding the immediate task context.16
Layers 5 & 6 (Project & User): Function as Long-Term / Episodic & Procedural Memory, storing learned experiences, events, and skills ("how-to" patterns).15
Layers 1 & 2 (Vector & Graph): Function as Long-Term / Semantic Memory, storing generalized facts and relationships about the world (the codebase).15
The Hexi-Memory 2D Matrix: A Dual-Axis Design
The 6-layer stack should not be conceptualized as a linear list. A more powerful and accurate model is a 2-dimensional (2D) matrix. The "new" layers (4, 5, 6) are scoped by time and context (Session, Project, User), while the "old" layers (1, 2, 3) are scoped by data function (Semantic, Structural, Audit).
This "Dual-Axis" model is the key to solving both redundancy and retrieval challenges. Information is not redundantly stored; rather, different aspects of the same event are stored in the appropriate layer, providing a rich, 360-degree view.
Consider the user's example, "Why did we choose Postgres over MongoDB?"
The immutable fact of the decision is logged in L3 (Governance): "On 2025-11-14, 'Postgres' was chosen."
The rationale document (e.g., RESEARCH.md) is indexed for semantic search in L1 (Vector).
The learned pattern from this decision is consolidated into L5 (Project): "Pattern: Use Postgres for relational data needs."
The global preference may eventually be promoted to L6 (User): "User prefers Postgres for new projects."
This structure prevents redundancy and enables a sophisticated "summary-then-details" retrieval strategy.17 The retrieval engine can first query the high-level patterns (L5/L6) and then use that context to execute a much more precise, filtered query on the vast, low-level knowledge base (L1/L2).
The Hexi-Memory Architectural Blueprint
The following table serves as the primary technical specification for the Arela development team. It codifies the consensus from this report into a single, actionable reference, defining the strict separation of concerns for each layer.
Layer (ID)
Primary Purpose
Storage Technology
Retrieval API
Lifespan
Consolidation
L4: Session
(Hot) Active task, open files, conversation history.
In-Memory (IDE) + VS Code workspaceState
Key-Value (memory.session.currentTask)
Single Session
Cleared on exit; workspaceState persists
L5: Project
(Warm) Project-specific patterns, decisions, tech stack.
SQLite (.arela/project.db)
SQL / Programmatic (memory.project.patterns)
Project Lifetime
Weekly (Learned patterns merge)
L6: User
(Cold) Global preferences, expertise, cross-project patterns.
SQLite (~/.arela/user.db)
SQL / Programmatic (memory.user.preferences)
Forever
Monthly (Promote patterns from L5)
L1: Vector
(Functional) Semantic search of code, docs.
LanceDB (.arela/rag.lancedb)
Vector Similarity (k-NN)
Project Lifetime
On file change (re-index)
L2: Graph
(Functional) Structural dependencies, imports, API calls.
SQLite (.arela/graph.db)
Graph/SQL (memory.graph.dependenciesOf(...))
Project Lifetime
On file change (re-parse)
L3: Governance
(Functional) Immutable audit trail, decisions.
SQLite (.arela/audit.db)
Temporal SQL (SELECT... WHERE date >...)
Project Lifetime
Append-only (Never merge)
Memory Conflict Resolution: The "Arbiter" Pattern
A critical success criterion (Criterion #10) is handling contradictory patterns. A naive system will fail when L6 (User) "prefers functional programming" conflicts with L5 (Project) "uses class-based React components."
The system must not silently resolve this conflict. The conflict is the context. A human co-founder would identify this discrepancy and mediate it. Arela must do the same.
This is achieved via the "Arbiter" Pattern. Research into multi-agent systems discusses "specialised arbiter agents" that perform "semantic conflict resolution," considering intent and nuance, not just timestamps.18 In the Hexi-Memory architecture, the Fusion Engine (detailed in Part 6) acts as this arbiter.
Implementation:
When the Fusion Engine retrieves conflicting, high-relevance memories (e.g., user.preferences.style vs. project.conventions.style), it does not discard one.
It tags both memories with their source layer and metadata (e.g., , ).
It explicitly includes both tagged items in the final context prompt sent to the LLM, along with an instruction to mediate the conflict.
This instructs the LLM to generate a response like: "I know you generally prefer a functional style, but this project's conventions use class-based components. I will proceed using classes to maintain consistency. Is that correct?" This elevates the AI from a simple tool to a true technical partner that navigates trade-offs.
3. Competitive Analysis
Competitor Memory Models (Inferred)
An analysis of the memory systems of key competitors reveals a fragmented and immature landscape, validating Arela's opportunity to build a best-in-class, integrated system.
Cursor: Cursor's memory system is highly fragmented and places a significant configuration burden on the user. It natively supports the Model Context Protocol (MCP) , encouraging users to connect and manage their own third-party memory servers, such as mcp-memory-bank 19 or OpenMemory.20 Its native, out-of-the-box memory is limited to session history and user-defined @ file-level context.
GitHub Copilot: Historically stateless (limited to open tabs), Copilot is now introducing a native memory feature. A new setting, github.copilot.chat.tools.memory.enabled, has been identified for managing persistent memory across conversations.2 Copilot also supports MCP as an enhancement to its "agent mode" 21, signaling a clear industry trend toward extensible, persistent memory.
Windsurf: Similar to Cursor, Windsurf is often paired with MCP-compatible memory layers like OpenMemory 23 to provide persistent context. Its native capabilities are not publicly documented but appear focused on session-level context.
Devin: While Devin's architecture is a "black box," its public-facing philosophy provides the most significant strategic counter-argument. Cognition AI (Devin's creator) has argued against fragmenting context across multiple agents, warning that it leads to "conflicting decisions and compounding errors".3 This suggests Devin employs a single, unified, long-running context for each task. Their system can "recall relevant context at every step," "learn over time" 24, and actively prompts users to save information to its "ongoing memory".25 This appears to be a highly stateful, single-agent model that lacks the explicit cross-project learning of Arela's L6.
Cody (Sourcegraph): Cody's primary strength is its deep, repository-level context. It uses a combination of local IDE context and a powerful remote Sourcegraph search (Hybrid Keyword + Semantic) to pull context from an entire codebase.4 However, it appears to be stateless in terms of learning. It excels at reading the current state of the repository but does not store learned patterns (L5) or user preferences (L6) over time. It relies on a large LLM context window 27 rather than a consolidated, persistent memory.
Strategic Gaps and Opportunities for Arela
Addressing the "Devin Counter-Argument": Devin's warning about fragmented context 3 is the single greatest technical risk to the Hexi-Memory design. A poorly fused 6-layer system will be worse than Devin's unified context. The opportunity for Arela is to build a world-class Fusion Engine (Part 6). By fusing the 6 layers into one coherent, deduplicated, and conflict-tagged context block, Arela achieves the best of both worlds: Devin's unified context plus the persistent, long-term, cross-project learning that a single-task context inherently lacks.
The MCP Ecosystem Platform Play: The convergence of Cursor, Copilot, and OpenMemory on the Model Context Protocol (MCP) 1 is a clear market signal. Arela should not just have a memory system; it should be a memory system. By exposing the Hexi-Memory architecture via a built-in, local-first MCP server 20, Arela could become the central memory "brain" for all AI tools in the user's IDE, including Copilot and Cursor. This is a powerful platform strategy.
Local-First as a Differentiator: Competitors are either cloud-first or entirely opaque. Arela's local-first (.arela/, ~/.arela/) model is a powerful differentiator for privacy, security, and performance. This is a critical trust-building feature, especially for enterprises concerned with GDPR 28 and data sovereignty.
Competitive Memory Architecture Comparison
This table summarizes the competitive landscape, highlighting Arela's strategic advantages.
Tool
Memory Layers (Inferred)
Storage
Retrieval
Persistence
Key Gaps / Weakness
Cursor
Session + 3rd-Party (MCP)
JSON, Vector DB (via MCP)
File-based, Semantic
Session, Project (via plugins)
Fragmented. High user setup cost.
G. Copilot
File + Session + User (Emerging)
In-memory, Local files 2
Embedding (file), Keyword
Session, User (emerging)
Opaque. Limited project-level learning.
Devin
Unified Session + Task Context
Sandboxed FS, Vector DB
Agent-driven, Semantic
Session, Task 25
Black Box. Unproven long-term/cross-project.
Cody
File + Repository (Vector)
In-memory, Remote Index 4
Hybrid (Keyword + Semantic)
Session (via large context)
Stateless. No user/project learning.
Arela (Hexi)
Session, Project, User, Vector, Graph, Governance
Local-First (LanceDB, SQLite)
Hybrid-Hierarchical (RRF)
Session, Project, User, Lifetime
(N/A - Proposed)
4. Phased Implementation Plan
This plan breaks down the development of the Hexi-Memory architecture into four logical, deliverable phases.
Phase 1 (v4.1.0): Session Memory & Continuity
Goal: Achieve 100% session continuity (Success Criterion #1). This is a high-impact, low-effort feature that immediately solves a core user pain point.
Focus: Implement Layer 4 (Session).
Actions:
IDE Integration: Utilize the VS Code Extension API 29 to track the active IDE state.
State Capture:
Hook vscode.workspace.textDocuments 30 to get a list of all open files.
Hook vscode.window.activeTextEditor 31 to capture the lastEdit file and line.
Integrate the native VS Code Git extension API 32 to read the current branch. Use this to infer the currentTask or activeTicket (e.g., parsing "CLAUDE-001" from a branch named feature/CLAUDE-001-login).
Conversation History: Store the conversationHistory array in a simple in-memory variable within the extension's context.
Persistence: For persistence across IDE restarts, serialize the entire L4 session object to the VS Code workspaceState. On activation, check if context.workspaceState.get('arela.session') exists and reload it. This Memento API is designed for exactly this purpose and requires no external dependencies.6
Effort Estimate: 2 engineer-weeks.
Phase 2 (v4.2.0): Project & Governance Memory
Goal: Enable Use Case 3 (Decision Recall) and begin building the long-term project-level "brain."
Focus: Implement Layers 3 (Governance) and 5 (Project).
Actions:
Storage: Implement the SQLite schemas (defined in Part 9) for audit.db (L3) and project.db (L5). These files will be stored locally in the .arela/memory/ directory.
API: Create the internal "write" APIs (e.g., Memory.addDecision(), Memory.learnPattern()). Initially, these can be triggered by explicit user commands (e.g., "/remember").
Retrieval: Integrate L3 and L5 into a basic "Memory Router" so that queries can begin pulling from these new sources.
Effort Estimate: 3 engineer-weeks.
Phase 3 (v4.3.0): User & Vector Memory
Goal: Enable Use Case 2 (Cross-project Pattern Recognition) and migrate from a toy RAG system to a production-grade one.
Focus: Implement Layers 1 (Vector) and 6 (User).
Actions:
Storage (User): Implement the SQLite schema (Part 9) for user.db (L6). This file must be stored in a global, user-specific directory (e.g., ~/.arela/user.db).
Storage (Vector): This is a critical migration. The existing 46MB .rag-index.json is not scalable. It must be replaced with a high-performance, embedded vector database. As detailed in Part 5, LanceDB is the recommendation.5 This phase involves building the LanceDB store at .arela/rag.lancedb.
Indexing: Create the background file-watcher process. On file save, this process must trigger an asynchronous re-parsing and re-indexing of the changed file for both L1 (Vector) and L2 (Graph).
Effort Estimate: 4 engineer-weeks.
Phase 4 (v4.4.0): Fusion, Consolidation & Learning
Goal: Enable Use Case 4 (Complex Context Assembly) and achieve all remaining success criteria. This is the "brain" phase that ties all layers together.
Focus: Implement the advanced algorithms from Parts 6 & 7.
Actions:
Retrieval & Fusion (Part 6): Build the production-grade "Memory Router." This includes:
The parallel "fan-out" query architecture.
The hierarchical retrieval/filtering logic (L4/L5/L6 -> L1/L2).
The Reciprocal Rank Fusion (RRF) algorithm for merging heterogeneous results.7
Learning (Part 7): Implement the implicit feedback loop.34 This involves:
Logging user actions (saves, commits, edits) as "transactions."
Running a periodic (e.g., nightly) job using the FP-Growth algorithm 35 to mine transactions for new patterns.
Consolidation (Part 7): Implement the "learning and forgetting" mechanisms:
Bayesian Updating 9 to adjust pattern confidence scores based on user feedback.
The Exponential Decay (Half-Life) 10 algorithm to "forget" old, irrelevant patterns and prevent memory bloat.
The "promotion" job to scan L5 databases and move common patterns to L6.
Effort Estimate: 6-8 engineer-weeks.
5. Storage Technology Recommendations
The choice of storage technology is critical for a local-first application. The stack must be high-performance, low-dependency, and portable.
Layer 4 (Session): In-Memory + VS Code workspaceState
Recommendation: In-memory variable + context.workspaceState.
Rationale: The user's proposal of Redis is an excellent choice for a cloud-hosted or multi-user agent. However, for a local-first IDE extension, it is a poor choice. It introduces a heavy, complex external dependency that the user must install, configure, and maintain, creating a high-friction experience (e.g., port conflicts, service management).
The VS Code Extension API provides the exact, dependency-free tools for this use case. A simple in-memory object provides <1ms access for the "hot" data. On deactivation, this object is serialized to context.workspaceState.6 This Memento API is a simple key-value store managed by VS Code itself, designed specifically for persisting extension state across restarts.33
Layer 1 (Vector): LanceDB
Recommendation: LanceDB.36
Rationale: The current 46MB JSON file is a bottleneck and not scalable. A production-grade vector database is required.
Analysis (ChromaDB vs. LanceDB): While ChromaDB is a popular embedded-first option 37, LanceDB is the superior choice for Arela's specific use case.5
Embedded & Portable: LanceDB is "like SQLite but for vectors".38 It is built in Rust, runs in-process with no server required, and can be bundled with the application, making it perfect for a desktop/Electron environment.5
Performance & Format: It uses the custom "Lance" data format, an evolution of Apache Parquet. This format is optimized for vector workloads and enables significantly faster (up to 100x) queries than Parquet-based stores.5
Zero-Copy Versioning: The Lance format's architecture allows for "zero-copy versioning." This is a critical feature for a coding assistant, as it makes it extremely storage-efficient to update the vector index incrementally as files change, rather than performing costly re-writes.5
Alternative: If a zero new binary dependency is a hard constraint, sqlite-vec 39 is an emerging extension for SQLite that adds vector search. However, LanceDB is a purpose-built, SOTA solution for this exact problem.5
Layers 2, 3, 5, 6 (Graph, Governance, Project, User): SQLite
Recommendation: SQLite.
Rationale: The user is already employing SQLite for Layers 2 and 3. This is the correct choice and should be extended to Layers 5 and 6.
Using a single, robust technology for 4 of the 6 layers dramatically simplifies the engineering stack. SQLite is battle-tested, serverless, transactional, and transparent (users can inspect their own .db files). It is the standard for local, structured data.40
Alternatives like PostgreSQL or MongoDB are massive overkill. They are designed as client-server databases and would introduce the same dependency and friction problems as Redis, which is antithetical to a local-first tool.
6. Retrieval & Fusion Strategy (<200ms)
Meeting the <200ms retrieval goal (Success Criterion #3) while querying six disparate data sources is a non-trivial engineering challenge. A simple sequential query (L4 -> L5 -> L6...) will fail, taking seconds. Success requires a parallel, hierarchical, and robustly-fused architecture.
The <200ms Challenge & Parallel Fan-Out Query
The "Memory Router" must act as a scatter-gather orchestrator. When it receives a user query, it must simultaneously query all relevant layers.
Implementation: Use a Promise.allSettled() pattern to "fan-out" six asynchronous query promises.
Fault Tolerance: Each query (especially L1 vector search and L2 graph traversal) must have a hard timeout (e.g., 150ms). This ensures that one slow or failed layer does not block the entire response, a common failure mode in production multi-agent systems.41 The router can then proceed with whatever data has returned within the time budget.
Hierarchical Retrieval: "Summary-then-Details"
The single most important optimization is to stop "dumb RAG" (blindly vector-searching all of L1) and adopt hierarchical, "summary-then-details" retrieval.17 This is the "Dual-Axis" architecture (Part 2) in practice.
The "Hot" temporal layers (L4, L5, L6) provide fast, high-level context that is used to prune the search space for the "Cold" functional layers (L1, L2).
Algorithm:
Step 1 (Parallel Fan-out < 20ms): Query L4 (Session), L5 (Project), and L6 (User) for keywords, patterns, and active context. These are simple key-value or SQL lookups and are extremely fast.
Result: L4.activeTicket="CLAUDE-001", L5.techStack="Node.js, Prisma", L6.expertise="intermediate".
Step 2 (Hierarchical Filtering): Use these results to construct intelligent queries for the slow layers.
L1 (Vector) Query: The query is no longer just "find 'add auth' snippets." It becomes: "Find 'add auth' snippets semantically similar to the query, AND where metadata tags match techStack="Node.js"."
L2 (Graph) Query: The query is no longer "search the whole graph." It becomes: "Find dependencies of files related to L4.activeTicket="CLAUDE-001"."
Result: This hierarchical approach transforms a massive, slow, and irrelevant search into a small, fast, and highly-relevant one. This is the core principle of production-grade hybrid Graph-Vector RAG systems.44
Heterogeneous Fusion: Reciprocal Rank Fusion (RRF)
The Memory Router now has multiple ranked lists from heterogeneous sources: L1 (vector distance), L2 (graph hops), L4 (recency), L5 (confidence score), etc. These scores are not comparable and cannot be simply added.
The solution is Reciprocal Rank Fusion (RRF).7 RRF is a simple, robust, and score-free algorithm. It combines lists based only on the rank of items, not their scores.
Algorithm: Each retrieved memory fragment is given a final RRF score calculated as: $RRF_{score} = \sum_{i} \frac{1}{k + rank_{i}}$
$rank_{i}$ is the item's rank in retrieval list $i$.
$k$ is a constant (e.g., $k=60$) that diminishes the impact of low-ranked items.
Why RRF: It is battle-tested, requires no score normalization, and strongly favors items that appear in the top results of multiple lists.47 For example, a code snippet retrieved by both L1 (Vector) and L5 (Project Patterns) will receive a very high score, correctly identifying it as highly relevant.
Post-Fusion Processing: Deduplication and Conflict Tagging
After RRF, the system has a single, ranked list of memory fragments. Two final steps are required to prepare this context for the LLM:
Semantic Deduplication: The list may contain semantic duplicates (e.g., "Always use Prisma" from L5 and "User prefers Prisma" from L6). Before passing this to the LLM, a final pass using a lightweight algorithm like MinHash 48 or SemHash 49 can identify and merge these near-duplicates, reducing token count and noise.
Conflict Tagging: This is the final implementation of the Arbiter Pattern (Part 2). As the list is built, if two high-ranking items are identified as contradictory (e.g., same category like "style," different values," and different sources like L5 vs. L6), they are both included and tagged as ``.
7. Learning & Adaptation
A static memory is a "dumb" memory. Arela's memory must learn, adapt, and forget, just like a human's. This is achieved by building an automated feedback loop.
How to Detect Patterns: Implicit Feedback & Frequent Itemset Mining
Arela should learn implicitly from user behavior, not just explicitly from commands ("remember this"). The user's actions are a constant stream of implicit feedback.34
Data Collection (Implicit): Use VS Code API hooks like onDidSaveTextDocument 51 and workspace.onDidCreateFiles. Every save event is a "transaction," and each transaction contains a "basket of items."
Example Transaction: A user adds import { PrismaClient } from '@prisma/client' to src/db.ts and saves. The transaction is: {'file': 'src/db.ts', 'imports': '@prisma/client', 'language': 'TypeScript', 'git_branch': 'feature/db-setup'}.
Pattern Mining: A periodic (e.t., nightly or weekly) background job runs a Frequent Itemset Mining (FIM) algorithm on the transaction logs. Algorithms like Apriori or the more efficient FP-Growth 35 are designed for this.
Insight Generation: The FIM algorithm will discover Association Rules 53 with a high confidence score.
Example Rule: (language: 'TypeScript', file: '*.ts') => (imports: '@prisma/client') [Confidence: 80%].
This rule is then translated into a human-readable pattern and stored in L5 (Project) memory: "Pattern: This project uses Prisma for database access." with an initial confidence=0.80.
How to Update Confidence: Bayesian Updating
A newly learned pattern is only a hypothesis. Its confidence must be updated based on continuous user feedback.
Implicit Feedback (Positive): When Arela uses a pattern (e.g., it auto-imports Prisma) and the user does not revert that change (i.e., they save again or commit), that is a positive signal.
Explicit Feedback (Negative): When the user corrects the AI ("No, use Drizzle instead") or reverts its change, that is a strong negative signal.55
Algorithm: A simple, effective Bayesian updating model can be used to adjust confidence scores.9 Each pattern in L5 and L6 will have a confidence score.
On Positive Feedback (Use/Approval): new_confidence = min(0.95, current_confidence * 1.20)
On Negative Feedback (Revert/Correction): new_confidence = max(0.05, current_confidence * 0.85)
New patterns from FIM (7.A) start at the confidence level derived from the association rule.
How to Handle Changing Preferences: Graceful Forgetting (Decay)
"Perfect memory" is a bug if it remembers outdated patterns. The system must "gracefully forget" irrelevant information to avoid cognitive overload and memory bloat.8
The Problem: A user switches from Vitest to Jest. The L6 memory "User prefers Vitest" is now wrong, but its confidence is 0.95. It will pollute every prompt.
The Solution: Exponential Decay (Half-Life).57 A memory's "importance" or "relevance" score should decay over time if it is not accessed.
Algorithm: A background job periodically re-calculates an "importance score" for all patterns in L5 and L6. A proven model for this is 10:
$score = (n_{use})^{\beta} \cdot e^{-\lambda \cdot \Delta t}$
$n_{use}$: Access count (incremented on positive feedback).
$\beta$: Sub-linear weighting (e.g., 0.6) to reward usage.
$\Delta t$: Time (in seconds) since last access.
$\lambda$: The decay constant, set via a "half-life" (e.g., $t_{1/2} = 30$ days).
Implementation: Patterns with a score below a certain threshold (e.g., 0.1) are "forgotten" (archived or deleted). This automatically and gracefully prunes old, unused preferences (like "prefers Vitest") while keeping active, frequently used patterns (like "prefers Prisma") at the top.
Cross-Project Learning: Promotion from L5 to L6
This mechanism enables Use Case 2 (Cross-project pattern recognition).
Algorithm: A weekly "global consolidation" job.
The job scans all L5 (Project) databases (.arela/project.db) across all known project directories.
It looks for identical patterns (e.g., "Uses Prisma for DB") that have a high confidence score (e.g., $> 0.8$) in multiple projects (e.g., 3 or more).
When this threshold is met, it "promotes" the pattern, copying it to the global L6 (User) database (~/.arela/user.db).
The next time the user starts a new project, Arela queries L6 and immediately asks, "I notice you always use Prisma. Should I set that up?"
8. Privacy & Security
Handling project and user data locally introduces significant trust and security responsibilities.
The Local-First Advantage: GDPR & Privacy-by-Design
This architecture is Arela's single greatest compliance advantage. Cloud-based AI tools must expend enormous effort to comply with GDPR, which places strict rules on automated decision-making and data processing.28
Arela's local-first (.arela/, ~/.arela/) model is "Data Protection by Design" (Article 25 GDPR).61 The data never leaves the user's machine, meaning Arela is a "data processor" for the user's own data, not a "data controller" storing that data in the cloud. This massively simplifies compliance and should be a core marketing message to build user trust.
What Data to NEVER Store: Data Minimization
The system must enforce the GDPR principle of data minimization.60 A hardcoded, non-negotiable filter must be applied synchronously to all memory write operations.
Blacklist: This filter must use regular expressions to detect and strip sensitive data before it is ever written to disk.
Do Not Store:
Secrets: sk_live_..., ghp_..., AWS_SECRET_..., API_KEY, JWTs.
PII: Common formats for emails, phone numbers, credit cards, SSNs.
Sensitive Files: The contents of *.env, *.pem, id_rsa, *.key.
This filter is the primary line of defense against privacy violations and catastrophic secret leaks.62
Encryption Strategy: At Rest
Project Memory (L1-L5): This data is stored in the .arela/ directory, inside the user's project folder. It is as safe as the user's source code. Standard disk-level encryption (e.g., BitLocker, FileVault) is sufficient.
User Memory (L6): This is the high-risk file. The ~/.arela/user.db file contains global preferences and patterns learned across all projects. This file must be encrypted at rest.62
Implementation: Do not store the encryption key in plain text. Use the native OS secret manager. For a VS Code extension, this is trivial using the context.secrets API, which leverages keytar (macOS Keychain, Windows Credential Manager, Linux libsecret) to securely store and retrieve the AES key used to encrypt/decrypt user.db.
User Control: The "Right to be Forgotten"
GDPR guarantees user control, including the right to access and delete data.59 The local-first architecture makes this transparent and trivial to implement.
Implementation: Provide simple CLI commands or UI buttons:
arela memory export --layer=all --format=json: Exports all memory data for user review.
arela memory delete --layer=user: Deletes the global ~/.arela/user.db file.
arela memory delete --layer=project: Deletes the local .arela/memory directory.
A UI in the Arela sidebar must also allow users to review, edit, and delete individual patterns from L5 and L6.
Sandboxing: Project-level Isolation
The memory system must be strictly sandboxed to prevent data leakage across projects.63
Implementation: The Memory Router must enforce a strict file-path security boundary. A query originating from Project_A (e.g., /path/to/Project_A) must be sandboxed to only read files from /path/to/Project_A/.arela/ and the global ~/.arela/ store.
It must be impossible for Project_A's agent to query or access /path/to/Project_B/.arela/. This prevents cross-project data contamination.
9. Code Examples & Schemas
This section provides actionable schemas and code snippets to accelerate implementation.
SQL Schemas (SQLite)
These schemas are based on the user's JSON examples, but normalized and extended to support the learning and consolidation algorithms.
SQL
-- Layer 5: Project Memory (located at.arela/memory/project.db)
CREATE TABLE IF NOT EXISTS decisions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
date TEXT NOT NULL DEFAULT (STRFTIME('%Y-%m-%d', 'now')),
decision TEXT NOT NULL,
rationale TEXT,
alternatives TEXT -- Stored as JSON array
);
CREATE TABLE IF NOT EXISTS patterns (
id INTEGER PRIMARY KEY AUTOINCREMENT,
type TEXT NOT NULL, -- 'pattern', 'anti-pattern', 'convention'
value TEXT NOT NULL UNIQUE,
confidence REAL DEFAULT 0.5, -- Bayesian score 0.0-1.0
last_used TEXT NOT NULL DEFAULT (STRFTIME('%Y-%m-%dT%H:%M:%SZ', 'now')),
access_count INTEGER DEFAULT 1
);
CREATE TABLE IF NOT EXISTS tech_stack (
id INTEGER PRIMARY KEY AUTOINCREMENT,
key TEXT NOT NULL UNIQUE,
value TEXT NOT NULL
);
-- Layer 3: Governance Memory (located at.arela/memory/audit.db)
-- (User-provided schema is sufficient, append-only)
-- Layer 6: User Memory (located at ~/.arela/user.db)
CREATE TABLE IF NOT EXISTS preferences (
id INTEGER PRIMARY KEY AUTOINCREMENT,
key TEXT NOT NULL UNIQUE, -- 'language', 'framework', 'style'
value TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS expertise (
id INTEGER PRIMARY KEY AUTOINCREMENT,
domain TEXT NOT NULL UNIQUE, -- 'frontend', 'backend'
level TEXT NOT NULL -- 'beginner', 'intermediate', 'expert'
);
CREATE TABLE IF NOT EXISTS global_patterns (
id INTEGER PRIMARY KEY AUTOINCREMENT,
type TEXT NOT NULL, -- 'pattern', 'anti-pattern'
value TEXT NOT NULL UNIQUE,
confidence REAL DEFAULT 0.8, -- Promoted patterns start high
last_used TEXT NOT NULL DEFAULT (STRFTIME('%Y-%m-%dT%H:%M:%SZ', 'now')),
access_count INTEGER DEFAULT 1,
source_projects TEXT -- Stored as JSON array of project IDs
);
VS Code API Integration (Session Tracking & Persistence)
This snippet demonstrates the core of the L4 (Session) implementation described in Phase 1.
TypeScript
// In extension.ts - activate() function
import * as vscode from 'vscode';
// Assumes a utility function to get the branch via VS Code's Git API
import { getGitBranch } from './git-api';
// Define the L4 schema
interface SessionMemory {
currentTask?: string;
filesOpen?: string;
conversationHistory?: { role: string; content: string };
activeTicket?: string;
lastEdit?: { file: string; line: number; timestamp: string };
timestamp?: string;
}
// Global in-memory cache for L4
let session: SessionMemory = {};
export async function activate(context: vscode.ExtensionContext) {
// 1. Reload persisted session on activation
const persistedSession = context.workspaceState.get<SessionMemory>('arela.session');
if (persistedSession) {
session = persistedSession;
console.log("Arela: Reloaded session state.");
}
// 2. Hook file edits to update 'lastEdit'
context.subscriptions.push(
vscode.workspace.onDidChangeTextDocument(event => {
if (event.document.uri.scheme === 'file') {
session.lastEdit = {
file: event.document.fileName,
line: event.contentChanges?.range.start.line |
| 0,
timestamp: new Date().toISOString()
};
session.timestamp = new Date().toISOString();
}
})
);
// 3. Persist session on deactivation
context.subscriptions.push(
new vscode.Disposable(() => {
// VS Code's Memento API handles the actual write
context.workspaceState.update('arela.session', session);
})
);
// 4. Infer Active Task from Git Branch [32, 65]
try {
const branch = await getGitBranch(vscode.workspace.rootPath);
if (branch) {
const ticketMatch = branch.match(/([A-Z]+-\d+)/); // Parses "CLAUDE-001"
if (ticketMatch) {
session.activeTicket = ticketMatch;
session.currentTask = `Working on ticket ${ticketMatch}`;
}
}
} catch (e) {
console.warn("Arela: Could not read Git branch.");
}
// 5. Update open files on load
session.filesOpen = vscode.workspace.textDocuments
.filter(doc => doc.uri.scheme === 'file')
.map(doc => doc.fileName);
}
Programmatic API (Internal Interface)
This illustrates a potential internal, strongly-typed API for Arela's agents to interact with the Memory Router.
TypeScript
// Internal API for Arela's agents
interface ArelaMemory {
/** High-level query: Fuses all 6 layers */
query(queryText: string, options: QueryOptions): Promise<MemoryFragment>;
/** Direct Layer Access (Read-Only) */
session: Readonly<SessionMemory>;
project: {
getPatterns: () => Promise<Pattern>;
getDecisions: () => Promise<Decision>;
getTechStack: () => Promise<TechStack>;
};
user: {
getPreferences: () => Promise<Preference>;
getExpertise: () => Promise<Expertise>;
};
/** Direct Layer Access (Write) */
logDecision(decision: Omit<Decision, 'id'>): Promise<void>; // L3/L5
learnPattern(pattern: Omit<Pattern, 'id'>): Promise<void>; // L5
updatePatternConfidence(id: number, success: boolean): Promise<void>; // L5/L6
logConversation(role: 'user' | 'assistant', content: string): void; // L4
}
Natural Language Query (NLQ) -> Text-to-Query
The user's example of a programmatic query (memory.project.patterns.filter(...)) is useful for agents, but not for users. Users will use natural language. This requires an internal "Text-to-Query" agent that translates NL into calls to the programmatic API above.
This translation is a well-understood problem, similar to Text-to-SQL or Text-to-Cypher.66
User: "What's my preferred testing framework?"
NLQ Agent: -> Translates to ->
arela.memory.user.getPreferences().then(prefs => prefs.find(p => p.key === 'testing'))
User: "Why did we choose Postgres?"
NLQ Agent: -> Translates to ->
arela.memory.project.getDecisions().then(decs => decs.find(d => d.decision.includes('Postgres')))
10. Risk Assessment
This section outlines primary risks associated with the Hexi-Memory architecture and their mitigation strategies.
Risk 1: Memory Bloat / Irrelevance
Likelihood: High.
Impact: This is the primary failure mode of "perfect memory." The AI's context becomes polluted with thousands of outdated, irrelevant facts (e.g., "User was working on login.ts 3 months ago"). This breaks "smart fusion" (Criterion #4) and "graceful forgetting" (Criterion #6), making the AI unusable.
Mitigation: The Graceful Forgetting (Half-Life) algorithm (Part 7.C) is the primary mitigation.10 This automated, periodic consolidation, based on access frequency and recency, is non-negotiable.8 It automatically prunes irrelevant memories.
Risk 2: Performance Degradation (>200ms)
Likelihood: Medium.
Impact: The AI assistant feels "slow," breaking the user experience and violating Success Criterion #3.
Mitigation:
Storage: Use of LanceDB 5 and SQLite, which are orders of magnitude faster than the current JSON file-based system.
Retrieval: A Parallel Fan-Out query architecture (Part 6.A) with hard timeouts.42
Filtering: Hierarchical Retrieval (Part 6.B) to prune the search space before running expensive L1/L2 queries.17
Risk 3: Privacy Violation / Secret Leakage
Likelihood: Low (if mitigated), High (if ignored).
Impact: Catastrophic loss of user trust. Storing an AWS_SECRET_ACCESS_KEY in project.db (which may be committed to Git) is a company-ending bug.
Mitigation:
The Data Minimization Blacklist (Part 8.B) must be a hard-blocking, synchronous check on every memory-write operation.62
Encryption at Rest for the L6 user.db file (Part 8.C) using the native OS secret store is a mandatory second layer of defense.
Risk 4: "False" or Overfitted Learning
Likelihood: Medium.
Impact: The AI learns a "coincidence" as a "pattern" (e.g., user renames a file once, AI thinks it's a permanent rule). This makes the AI annoying, untrustworthy, and violates the "pattern recognition in 3 uses" goal (Criterion #2).
Mitigation:
Thresholds: Do not learn a pattern from a single event. Use Frequent Itemset Mining (Part 7.A) to find frequent patterns (e.g., 3+ occurrences).35
Confidence: Use Bayesian Confidence Scoring (Part 7.B). A new pattern starts at a modest confidence and must be "proven" over time through positive implicit feedback.9
User Control: Always provide a one-click "No, don't suggest that again" option (explicit feedback) that immediately drops a pattern's confidence to near-zero.
11. References
Cognitive Architectures
15 ResearchGate. (n.d.). Memory Architectures in Long-Term AI Agents: Beyond Simple State Representation.
12 arXiv. (n.d.). Hierarchical memory architecture for LLM agents (H-MEM).
11 arXiv. (n.d.). MemoryOS: A hierarchical storage architecture for AI agents.
14 Medium. (n.d.). Memory Types in Agentic AI: A Breakdown.
13 Various. (n.d.). Cognitive architectures for conversational AI memory.
71 TechSee. (n.d.). Understanding AI Memory: A Deep Dive into the Cognitive Layers of Service Automation.
In-Market Memory Systems
72 Mem0.ai & arXiv. (n.d.). Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory.
20 GitHub (CaviraOSS) & VS Code Marketplace. (n.d.). OpenMemory: Persistent Memory for AI Assistants.
Retrieval, Fusion & Consolidation Algorithms
18 Various. (n.d.). Memory conflict resolution and consolidation strategies.
8 Various. (n.d.). Forgetting mechanisms in artificial intelligence memory systems.
10 Various. (n.d.). Automatic decay function for AI memory ('exponential decay', 'half-life').
96 arXiv & AAAI. (n.d.). Similarity-based merging algorithms for AI memory patterns ('LLM-driven fact merging').
44 Memgraph & Neo4j. (n.d.). Hybrid retrieval algorithms for vector and graph databases.
41 Various. (n.d.). Parallel query strategies for multiple memory systems.
7 arXiv & EmergentMind. (n.d.). Ranking and fusion algorithms (Reciprocal Rank Fusion) for heterogeneous RAG sources.
48 Zilliz & Anthropic. (n.d.). Semantic deduplication of memory search results.
17 arXiv & Medium. (n.d.). Hierarchical retrieval strategy ('summary then details').
Learning & Adaptation Algorithms
35 Various. (n.d.). Pattern mining algorithms ('frequent itemsets', 'association rules').
34 Various. (n.d.). Inferring user preferences from 'implicit feedback'.
9 GitHub & PMC. (n.d.). 'Bayesian updating' for AI agent 'confidence scoring'.
Natural Language Querying
66 Neo4j & arXiv. (n.d.). Natural language to hybrid 'SQL and Cypher' query for RAG.
66 Neo4j & IEEE. (n.d.). 'Text-to-Cypher' LLM techniques for graph database querying.
Storage Technologies
5 Various. (n.d.). ChromaDB vs. LanceDB performance comparison.
39 Various. (n.d.). Local-first and embedded databases for Electron apps ('SQLite-vec').
5 LanceDB.com & Reddit. (n.d.). LanceDB vs. 'sqlite-vec' for local-first vector search.
Privacy & IDE Integration
62 Google Cloud & EDPB. (n.d.). Privacy and security best practices for AI agent local memory.
63 AWS & Daytona.io. (n.d.). Sandboxing AI agent project memory.
28 Various. (n.d.). GDPR compliance for local-first AI coding assistants.
29 VS Code API Docs. (n.d.). VS Code extension API for session tracking.
31 VS Code API Docs & Stack Overflow. (n.d.). Accessing 'git branch' in VS Code extension.
6 VS Code API Docs & Stack Overflow. (n.d.). Persisting VS Code extension state ('globalState', 'workspaceState').
Competitor Tooling
2 GitHub & VS Code Docs. (n.d.). GitHub Copilot native memory features.
1 Reddit & Cursor Forum. (n.d.). Cursor AI 'native memory' vs. 'OpenMemory' integration.
3 Devin.ai & Cognition.ai. (n.d.). Devin AI agent memory architecture.
4 Sourcegraph. (n.d.). Sourcegraph Cody native memory and context recall.
Works cited
9 MCP memory servers/frameworks that actually make Cursor agent useful - Reddit, accessed on November 14, 2025, https://www.reddit.com/r/cursor/comments/1na4sv9/9_mcp_memory_serversframeworks_that_actually_make/
Github Copilot native memory tool is now available in Visual Code ..., accessed on November 14, 2025, https://www.reddit.com/r/GithubCopilot/comments/1oo4ty9/github_copilot_native_memory_tool_is_now/
Are we over-engineering coding agents? Thoughts on the Devin multi-agent blog - Reddit, accessed on November 14, 2025, https://www.reddit.com/r/ChatGPTCoding/comments/1latkqz/are_we_overengineering_coding_agents_thoughts_on/
How Cody understands your codebase | Sourcegraph Blog, accessed on November 14, 2025, https://sourcegraph.com/blog/how-cody-understands-your-codebase
Vector Databases: Lance vs Chroma | by Patrick Lenert - GenAI Fullstack Developer, accessed on November 14, 2025, https://medium.com/@patricklenert/vector-databases-lance-vs-chroma-cc8d124372e9
Common Capabilities | Visual Studio Code Extension API, accessed on November 14, 2025, https://code.visualstudio.com/api/extension-capabilities/common-capabilities
Rank Fusion Technique - Emergent Mind, accessed on November 14, 2025, https://www.emergentmind.com/topics/rank-fusion-technique
Forgetting in AI Agent Memory Systems | by Volodymyr Pavlyshyn, accessed on November 14, 2025, https://ai.plainenglish.io/forgetting-in-ai-agent-memory-systems-7049181798c4
ReasoningBank: Persistent Memory System for AI Agents - Complete Documentation and Pre-Trained Models · Issue #811 · ruvnet/claude-flow - GitHub, accessed on November 14, 2025, https://github.com/ruvnet/claude-flow/issues/811
prefrontal-systems/cortexgraph: Temporal memory system for AI assistants with human-like forgetting curves. All data stored locally in human-readable formats: JSONL for short-term memory, Markdown (Obsidian-compatible) for long-term. Memories naturally decay unless reinforced. Features knowledge graphs, smart prompting, and - GitHub, accessed on November 14, 2025, https://github.com/mnemexai/mnemex
Memory OS of AI Agent - arXiv, accessed on November 14, 2025, https://arxiv.org/abs/2506.06326
H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents, accessed on November 14, 2025, https://arxiv.org/html/2507.22925v1
What is Cognitive Architecture? How Intelligent Agents Think, Learn, and Adapt - Quiq, accessed on November 14, 2025, https://quiq.com/blog/what-is-cognitive-architecture/
Building AI Agents with Memory Systems: Cognitive Architectures for LLMs, accessed on November 14, 2025, https://bluetickconsultants.medium.com/building-ai-agents-with-memory-systems-cognitive-architectures-for-llms-176d17e642e7
(PDF) Memory Architectures in Long-Term AI Agents: Beyond Simple State Representation, accessed on November 14, 2025, https://www.researchgate.net/publication/388144017_Memory_Architectures_in_Long-Term_AI_Agents_Beyond_Simple_State_Representation
Memory Types in Agentic AI: A Breakdown | by Gokcer Belgusen - Medium, accessed on November 14, 2025, https://medium.com/@gokcerbelgusen/memory-types-in-agentic-ai-a-breakdown-523c980921ec
Hierarchical Indices: Enhancing RAG Systems | by Nirdiamant - Medium, accessed on November 14, 2025, https://medium.com/@nirdiamant21/hierarchical-indices-enhancing-rag-systems-43c06330c085
Memory in multi-agent systems: technical implementations | by cauri - Medium, accessed on November 14, 2025, https://medium.com/@cauri/memory-in-multi-agent-systems-technical-implementations-770494c0eca7
One-Shot Memory Bank for Cursor that makes a difference, accessed on November 14, 2025, https://forum.cursor.com/t/one-shot-memory-bank-for-cursor-that-makes-a-difference/87411
How to make your clients more context-aware with OpenMemory MCP - DEV Community, accessed on November 14, 2025, https://dev.to/anmolbaranwal/how-to-make-your-clients-more-context-aware-with-openmemory-mcp-4h71
GitHub Copilot features, accessed on November 14, 2025, https://docs.github.com/en/copilot/get-started/features
GitHub Copilot in VS Code, accessed on November 14, 2025, https://code.visualstudio.com/docs/copilot/overview
OpenMemory for VS Code - Visual Studio Marketplace, accessed on November 14, 2025, https://marketplace.visualstudio.com/items?itemName=Nullure.openmemory-vscode
Introducing Devin, the first AI software engineer - Cognition, accessed on November 14, 2025, https://cognition.ai/blog/introducing-devin
Coding Agents 101: The Art of Actually Getting Things Done - Devin AI, accessed on November 14, 2025, https://devin.ai/agents101
How Cody provides remote repository awareness for codebases of every size - Sourcegraph, accessed on November 14, 2025, https://sourcegraph.com/blog/how-cody-provides-remote-repository-context
Cody for VS Code v1.14.0: Now with bigger context windows and a refreshed chat UI, accessed on November 14, 2025, https://sourcegraph.com/blog/cody-vscode-1-14-0-release
GDPR-Compliant AI Coding Tools: Enterprise Comparison, accessed on November 14, 2025, https://www.augmentcode.com/guides/gdpr-compliant-ai-coding-tools-enterprise-comparison
Extension API - Visual Studio Code, accessed on November 14, 2025, https://code.visualstudio.com/api
[extension api] vscode.workspace.onDidOpenTextDocument can't be triggered · Issue #1203 - GitHub, accessed on November 14, 2025, https://github.com/microsoft/vscode/issues/1203
VS Code API | Visual Studio Code Extension API, accessed on November 14, 2025, https://code.visualstudio.com/api/references/vscode-api
Read current git branch natively using vscode extension - Stack Overflow, accesse