UNPKG

@adarsh6938/mcp-knowledge-graph-semantic

Version:

Private MCP Server for semantic knowledge graph with persistent memory

462 lines (374 loc) 20.4 kB
# Personal Knowledge Graph with Semantic Search A powerful MCP (Model Context Protocol) server that provides persistent memory using a local knowledge graph with semantic search capabilities. Built for personal use with Claude/Cursor to maintain context across conversations. ## Features 🧠 **Persistent Memory**: Store and retrieve information across chat sessions 🔍 **Semantic Search**: Find relevant information based on meaning, not just keywords 🔗 **Knowledge Graph**: Entities and relationships for structured knowledge storage 📄 **Pagination**: Handle large datasets without response size limits 🚀 **Local & Private**: All data stays on your machine 💰 **Cost-Free**: Uses open-source Transformers.js models (no API costs) 🛡️ **Smart Entity Management**: Automatic entity health monitoring and bloat prevention 🤖 **Auto-Split Entities**: Automatically reorganize oversized entities without interruption 🎯 **Configurable**: Customize limits and categories for any domain or use case **Temporal Tracking**: Automatic timestamps and session tracking for all activities 🔄 **Session Continuity**: Smart context retrieval across chat sessions 📊 **Activity Detection**: Automatic categorization of work types (coding, debugging, planning) 💬 **Chat Transitions**: 🆕 Save/restore complete session summaries between chat windows 🔗 **Context Carryover**: 🆕 Perfect continuity when switching chat sessions ## Quick Start ### Installation ```bash npm install -g @adarsh6938/mcp-knowledge-graph-semantic ``` ### Configuration Add to your `.cursor/mcp.json` or `claude_desktop_config.json`: ```json { "mcpServers": { "knowledge-graph-semantic": { "command": "npx", "args": [ "-y", "@adarsh6938/mcp-knowledge-graph-semantic", "--memory-path", "/path/to/your/memory.jsonl" ] } } } ``` ## Core Concepts ### Enhanced Entities with Temporal Tracking Primary nodes in your knowledge graph with automatic temporal tracking: ```json { "name": "John_Doe", "entityType": "person", "sessionId": "session_2025_01_19_10_15", "observations": [ { "content": "Software engineer specializing in contract testing", "timestamp": "2025-01-19T10:15:32.123Z", "sessionId": "session_2025_01_19_10_15", "activityType": "discussion" }, "Uses TypeScript and Java for development" // Legacy format still supported ] } ``` ### Relations with Session Tracking Connections between entities with temporal awareness: ```json { "from": "John_Doe", "to": "Alpha_workspace", "relationType": "works_in", "sessionId": "session_2025_01_19_10_15" } ``` ### Automatic Session Management - **Session Detection**: Automatically creates new sessions after 30+ minute gaps - **Session IDs**: Auto-generated format `session_YYYY_MM_DD_HH_MM` - **Activity Types**: Automatically detects coding, planning, debugging, discussion, completion, learning, research - **Backward Compatibility**: Supports both enhanced observations and legacy string format ## Available Tools ### Core Operations - `create_entities` - Add new entities to the graph with names, types, and observations - `create_relations` - Connect entities with typed relationships - `add_observations` - Add facts/information to existing entities (with smart suggestions) - `update_entities` - Modify existing entity names, types, or observations - `update_relations` - Modify existing relationship types or connections ### Deletion & Cleanup - `delete_entities` - Remove entities and their connections - `delete_observations` - Remove specific facts from entities - `delete_relations` - Remove connections between entities ### Reading & Discovery - `read_graph` - Get limited view (first 5 entities) for quick overview - `read_graph_paginated` - Browse large datasets with pagination control - `open_nodes` - Get specific entities by name with all their relationships - `search_nodes` - Keyword-based search across entity names and observations - `semantic_search` - AI-powered semantic search with temporal awareness - `hybrid_search` - Combined keyword + semantic search for comprehensive results ### 🆕 Temporal Context & Session Continuity - `get_recent_context` - Retrieve last 24 hours of activity with recency priority - `get_related_work` - Find semantically related work from specified time windows - `get_historical_overview` - Get key entities and session summaries from older timeframes - `get_session_continuity_context` - Smart context for new chat sessions with confidence scoring - `save_current_session_summary` - 🆕 Save comprehensive chat session summary for context carryover - `get_last_session_summary` - 🆕 Retrieve previous session summary when starting new chat ### Smart Entity Management - `analyze_entity_health` - Identify bloated entities that need splitting - `split_entity` - Break down oversized entities into organized components - `configure_entity_management` - Customize behavior, limits, and categories - `get_entity_management_config` - View current configuration settings ### System Maintenance - `rebuild_semantic_index` - Refresh semantic search index for better performance ## 🔄 Session Continuity & Temporal Tracking **NEW**: Complete solution for context window exhaustion with automatic session continuity! ### The Problem When AI context windows fill up, users start new chats and lose work continuity. This system solves that with intelligent temporal tracking and context retrieval. ### Automatic Temporal Tracking ⏰ Every observation now includes: - **Timestamp**: Precise creation time - **Session ID**: Automatically assigned session identifier - **Activity Type**: Auto-detected work category - **Content**: The actual information ```typescript interface ObservationData { content: string; timestamp: string; sessionId: string; activityType: 'coding' | 'planning' | 'debugging' | 'discussion' | 'completion' | 'learning' | 'research'; } ``` ### Session Boundary Detection 🎯 - **Automatic Sessions**: New session created after 30+ minute gaps - **No Manual Work**: Everything happens automatically - **Session Progression**: Clear tracking of work evolution - **Activity Categorization**: Smart detection of work types ### Tiered Context Retrieval System 📊 #### 1. Recent Context (`get_recent_context`) - **Purpose**: Last 24 hours of activity - **Prioritization**: Recent activities weighted higher - **Session Awareness**: Groups activities by session - **Use Case**: Quick continuity for ongoing work ```typescript // Example usage get_recent_context({ hoursBack: 24, maxResults: 20 }) ``` #### 2. Related Work (`get_related_work`) - **Purpose**: Semantically similar work from time windows - **Session Clustering**: Groups related activities across sessions - **Semantic Matching**: Finds conceptually similar work - **Use Case**: Finding past work relevant to current task ```typescript // Example usage get_related_work({ query: "temporal tracking implementation", daysBack: 7, maxResults: 15 }) ``` #### 3. Historical Overview (`get_historical_overview`) - **Purpose**: Key entities and patterns from older work - **Session Summaries**: High-level view of past sessions - **Entity Importance**: Focuses on most significant entities - **Use Case**: Understanding long-term patterns and key topics ```typescript // Example usage get_historical_overview({ excludeDays: 7, maxResults: 10 }) ``` #### 4. Session Continuity Context (`get_session_continuity_context`) - **Purpose**: Intelligent automatic context for new sessions - **Confidence Scoring**: Rates context availability quality - **Smart Recommendations**: Suggests most relevant context - **Use Case**: Zero-effort session continuation ```typescript // Example usage get_session_continuity_context({ query: "continue work on session continuity" }) ``` ### Enhanced Semantic Search 🔍 The regular `semantic_search` now includes temporal awareness: - **Combined Scoring**: 70% semantic similarity + 30% recency score - **Session Tracking**: Results show session progression - **Activity Types**: Each result includes detected activity type - **Temporal Ordering**: Recent relevant results prioritized ### Activity Type Detection 🤖 Automatically categorizes observations: - **coding**: Implementation, debugging, code reviews - **planning**: Architecture decisions, task planning, requirements - **debugging**: Problem solving, error investigation, testing - **discussion**: Conversations, explanations, knowledge sharing - **completion**: Finished tasks, achievements, milestones - **learning**: New concepts, research, skill development - **research**: Information gathering, analysis, exploration ### Session Management - **Format**: `session_YYYY_MM_DD_HH_MM` - **Auto-Creation**: New session after 30+ minute gaps - **Preservation**: All temporal data preserved during entity operations - **Audit Trail**: Complete history of all activities ### Use Cases for Session Continuity 1. **Context Window Exhaustion**: Start new chat with full context 2. **Work Resumption**: Pick up where you left off after breaks 3. **Project Evolution**: Track how work develops over time 4. **Knowledge Retention**: Never lose important context or decisions 5. **Collaboration**: Share complete context with team members ## 🆕 Chat Window Transitions & Session Summaries **NEW**: Perfect solution for maintaining context when switching between chat windows! ### The Challenge When context windows fill up or you need to start fresh chats, you lose conversation continuity. Session summaries solve this by capturing and restoring complete context. ### Session Summary System 💬 #### When User Says "End of Chat" ```typescript save_current_session_summary({ summary: "Comprehensive summary of what happened in this chat session..." }) ``` **Automatically Captures:** - **All Session Entities**: Every entity created/modified in this session - **All Session Relations**: Connections between entities from this session - **Activity Analysis**: Work types (coding, planning, debugging, etc.) - **Time Boundaries**: Auto-detected session start/end times - **Rich Metadata**: Entity count, relation count, observation count #### When User Says "New Chat" ```typescript get_last_session_summary() ``` **Instantly Restores:** - **Complete Previous Context**: Full summary of last session - **Recent Session History**: Last 5 session summaries for broader context - **Metadata Overview**: Quick stats about previous work - **Seamless Continuation**: Pick up exactly where you left off ### Session Summary Features - **Automatic Storage**: Saves to `*_session_summaries.json` alongside memory - **Smart Cleanup**: Keeps last 50 summaries, auto-removes older ones - **Zero Configuration**: Works with existing session management - **Rich Context**: Captures all entities, relations, and temporal data - **Perfect Carryover**: Complete context restoration for new chats ### Use Cases for Session Summaries 1. **Context Window Exhaustion**: Save before hitting limits, restore in new chat 2. **Daily Work Transitions**: End work sessions, resume next day with full context 3. **Project Handoffs**: Share complete session context with team members 4. **Multi-Chat Workflows**: Switch between different chat windows seamlessly 5. **Long-Term Projects**: Maintain continuity across weeks/months of work ## Smart Entity Management The MCP now includes intelligent entity management to prevent bloated entities and maintain clean knowledge graphs. ### Automatic Entity Splitting 🤖 **NEW**: The system can now automatically split oversized entities to prevent bloat: - **Auto-Split**: When entities exceed the limit (20 observations), they're automatically reorganized - **Smart Categorization**: Observations are grouped by type (activities, tools, problems, etc.) - **Preserved Relationships**: Original connections are maintained through new semantic relations - **Seamless Operation**: No interruption to your workflow - splitting happens transparently - **Configurable**: Can be disabled via `enableAutoSplit: false` if manual control preferred - **Temporal Preservation**: All timestamps and session data preserved during splits Example auto-split behavior: ```bash 🤖 Auto-splitting entity "John_Doe" (15 + 8 = 23 observations exceed limit) Created: John_Doe_activities_and_actions (8 observations) Created: John_Doe_tools_and_technologies (6 observations) Created: John_Doe_problem_solving (4 observations) 🔗 Relations: John_Doe has_activities_and_actions John_Doe_activities_and_actions Temporal data preserved across all split entities ``` ### Default Protection The system automatically: - **Warns** when entities exceed 12 observations - **Auto-splits** entities exceeding 20 observations (when enabled) - **Suggests** splitting entities with mixed content types - **Categorizes** observations into 7 universal types - **Maintains** semantic relationships during reorganization - **Preserves** temporal tracking data during all operations ### Universal Categories The system recognizes these domain-agnostic patterns: 1. **activities_and_actions** - Things people do (working on, developing, managing, learning) 2. **tools_and_technologies** - Software, frameworks, systems they use 3. **problem_solving** - Issues encountered and solutions found 4. **knowledge_and_learning** - Things learned, studied, or understood 5. **projects_and_goals** - Projects, objectives, milestones 6. **relationships_and_interactions** - People interactions, meetings, collaborations 7. **processes_and_workflows** - Procedures, methodologies, best practices ### Configuration Options - **Basic limits**: Adjust `maxObservationsPerEntity`, `warningThreshold`, `optimalObservationsCount` - **Smart suggestions**: Enable/disable automatic categorization suggestions - **Custom categories**: Define domain-specific patterns for specialized use cases - **Domain templates**: Pre-configured setups for medical, software development, research, etc. ### Key Features - **Health Analysis**: `analyze_entity_health` identifies bloated entities with category breakdowns - **Smart Splitting**: `split_entity` reorganizes oversized entities into logical components - **Automatic Relationships**: Creates proper semantic connections between split entities - **Intelligent Warnings**: Provides actionable suggestions when adding observations - **Universal Design**: Works across any domain without configuration - **Temporal Awareness**: All operations preserve session and timestamp data ## Search & Discovery - **Semantic Search**: Find information by meaning with temporal awareness - **Keyword Search**: Traditional text-based search across entities and observations - **Hybrid Search**: Combines semantic and keyword search for comprehensive results - **Pagination**: Handle large knowledge graphs efficiently - **Temporal Context**: Specialized tools for time-aware context retrieval - **Session Continuity**: Smart context for seamless chat transitions ## Technical Details - **Storage**: JSONL format for entities/relations with temporal extensions - **Embeddings**: Transformers.js with `all-MiniLM-L6-v2` model - **Search**: Cosine similarity with temporal scoring and configurable thresholds - **Memory**: Automatic indexing when entities are created/modified - **Sessions**: Automatic session boundary detection with 30+ minute gaps - **Temporal Scoring**: Combined semantic similarity (70%) + recency score (30%) - **Activity Detection**: Pattern-based automatic categorization - **Backward Compatibility**: Supports both enhanced and legacy observation formats ## Use Cases - **Personal Assistant**: Remember preferences, goals, and context across sessions - **Project Memory**: Track technical decisions and implementations over time - **Learning**: Store and connect knowledge across domains with temporal context - **Development**: Maintain context about codebases and architectures - **Session Continuity**: Seamlessly resume work after context window exhaustion - **Work Evolution**: Track how projects and understanding develop over time - **Knowledge Retention**: Never lose important context or decisions - **Collaboration**: Share complete temporal context with team members ## Configuration Options ### Memory Path ```json "args": ["--memory-path", "/Users/you/projects/memory.jsonl"] ``` ### Multiple Projects Use different memory files for different contexts: ```json // Work project "--memory-path", "/Users/you/work/work-memory.jsonl" // Personal project "--memory-path", "/Users/you/personal/personal-memory.jsonl" ``` ## System Prompt Recommendation Add this to your Claude/Cursor configuration: ``` Follow these steps for each interaction: 1. User Identification: - You should assume that you are interacting with default_user - If you have not identified default_user, proactively try to do so. 2. Memory Retrieval: - Always begin your chat by saying only "Remembering..." and retrieve relevant information from your knowledge graph - Always refer to your knowledge graph as your "memory" - **🆕 CHAT TRANSITIONS:** - **When user says "new chat"**: Use get_last_session_summary to restore previous session context - **When user says "end of chat"**: Use save_current_session_summary to preserve session for next chat - **CHOOSE ONE PRIMARY TOOL for normal memory retrieval (do not call multiple):** - **BEST CHOICE**: Use get_session_continuity_context for comprehensive automatic context with confidence scoring - **OR** get_recent_context if you only need last 24 hours - **OR** semantic_search if you have a specific query - **OR** hybrid_search for keyword + semantic combined search - **Additional tools only if needed:** - get_related_work, get_historical_overview, search_nodes, open_nodes, read_graph_paginated - Available tools: create_entities, create_relations, add_observations, update_entities, update_relations, delete_entities, delete_observations, delete_relations, read_graph, read_graph_paginated, search_nodes, semantic_search, hybrid_search, open_nodes, rebuild_semantic_index, analyze_entity_health, split_entity, configure_entity_management, get_entity_management_config, get_recent_context, get_related_work, get_historical_overview, get_session_continuity_context, save_current_session_summary, get_last_session_summary 3. Memory Health: - Periodically use analyze_entity_health to check for bloated entities and get category breakdowns - If entities exceed recommended sizes (12+ observations), suggest using split_entity to reorganize them - Use configure_entity_management to customize limits and categories for domain-specific use cases - Use get_entity_management_config to check current settings - Monitor for smart suggestions when using add_observations to prevent entity bloat 4. Memory: - While conversing with the user, be attentive to any new information that falls into these categories: a) Basic Identity (age, gender, location, job title, education level, etc.) b) Behaviors (interests, habits, etc.) c) Preferences (communication style, preferred language, etc.) d) Goals (goals, targets, aspirations, etc.) e) Relationships (personal and professional relationships up to 3 degrees of separation) f) Technical knowledge (implementations, decisions, learnings) 5. Memory Update: - If any new information was gathered during the interaction, update your memory as follows: a) Create entities for recurring organizations, people, and significant events b) Connect them to the current entities using relations c) Store facts about them as observations (automatically gets timestamps and session tracking) d) Update entities and relations as information evolves e) Delete outdated entities, relations, or observations when needed f) Pay attention to smart suggestions when adding observations to prevent entity bloat g) When warnings appear about entity size, consider splitting into domain-specific entities h) Use the universal categorization system to organize information appropriately i) All observations automatically include temporal tracking and activity type detection ```