UNPKG

autosnippet

Version:

Extract code patterns into a knowledge base for AI coding assistants

301 lines (208 loc) 17.2 kB
<div align="center"> # AutoSnippet Extract patterns from your codebase into a knowledge base that AI coding assistants can query in your IDE — so generated code actually follows your team's conventions. [![npm version](https://img.shields.io/npm/v/autosnippet.svg?style=flat-square)](https://www.npmjs.com/package/autosnippet) [![License](https://img.shields.io/npm/l/autosnippet.svg?style=flat-square)](https://github.com/GxFn/AutoSnippet/blob/main/LICENSE) [![Node](https://img.shields.io/badge/node-%E2%89%A522-brightgreen?style=flat-square)](https://nodejs.org) [中文](README_CN.md) </div> --- - [Why](#why) · [Getting Started](#getting-started) · [Using in IDE](#using-in-ide) · [Evolution Architecture](#evolution-architecture) · [Engineering Capabilities](#engineering-capabilities) · [IDE Support](#ide-support) · [Deep Dive](#deep-dive) ## Why Copilot and Cursor don't know how your team writes code. What they generate works, but doesn't look like yours — wrong naming, wrong patterns, wrong abstractions. You end up rewriting AI output or explaining the same conventions in every Code Review. AutoSnippet builds a layer of **localized project memory**. It scans your codebase, extracts valuable patterns (with your approval), and makes them searchable by all AI tools via [MCP](https://modelcontextprotocol.io/). Knowledge persists locally, never consuming the LLM context window — it's injected on-demand when the AI needs it. The more knowledge accumulates, the more the generated code matches your conventions. ``` Your code → AI extracts patterns → You review → Knowledge base ↓ Cursor / Copilot / VS Code / Xcode ↓ AI generates your way ``` ## Getting Started ```bash npm install -g autosnippet cd your-project asd setup # Initialize workspace + database + MCP config (auto-detects Cursor / VS Code / Trae / Qoder) asd ui # Start background service (MCP Server + Dashboard) — IDE and MCP tools depend on this ``` > **Trae / Qoder users:** After `asd setup`, run `asd mirror` to sync `.cursor/` config to `.trae/` / `.qoder/`. ## Using in IDE `asd setup` configures everything. Open your IDE's **Agent Mode** (Cursor Composer / VS Code Copilot Chat / Trae) and start chatting. > **First time:** Manually enable the `autosnippet` service in your IDE's MCP settings. > **Tip:** Stronger models work better. We recommend Claude Opus 4.6 / Sonnet 4.6, GPT-5.4, or Gemini 3.1 Pro in Cursor / Copilot for more accurate patterns and fewer false positives. ### Cold Start: Build Your Knowledge Base > 💬 *"Cold start — build the project knowledge base"* The Agent scans your entire project, extracting coding patterns, architecture conventions, and call habits, while generating a project Wiki. Cold start runs once; after that, it's daily use. ### Daily Use: Just Ask | You say | You get | |---------|---------| | ① *"How do we write API endpoints in this project?"* | Code following your project's actual style, not generic examples | | ② *"Write a user registration endpoint"* | Generated code automatically follows the API conventions just retrieved | | ③ *"Check if this file follows our conventions"* | Pre-commit convention check — fewer back-and-forths in Code Review | | ④ *"Save this error handling pattern as a project convention"* | One-time capture — every team member's AI learns this pattern | After the Agent finishes writing code, the Guard compliance engine auto-checks the diff — violations trigger self-repair, no manual intervention needed. ### Gets Better Over Time Review and approve candidates in Dashboard (`asd ui`) → they become **Recipes** → AI references them when generating code → you spot new good patterns → keep capturing → AI increasingly writes like a team member. Knowledge is local Markdown files, travels with git, never disappears with conversations, and doesn't consume context window — no matter how large the knowledge base grows, it won't slow down AI. --- ## Evolution Architecture AutoSnippet isn't a static knowledge tool — it's a **knowledge organism**. Recipes are its cells — the IDE Agent is the external driving force, and each interaction triggers coordinated responses from different organs inside the organism. ``` IDE Agent (Cursor / Copilot / Trae) │ │ Capture · Write · Search · Shift · Complete · Boundary │ ═════════════════▼══════════════════════════════════════ ║ AutoSnippet Knowledge Organism ║ ║ ║ ║ ┌─ Panorama (Skeleton) ──── Project Structure ───┐ ║ ║ │ │ ║ ║ │ Signal (Nerves) ◄────► Governance (Digest) │ ║ ║ │ ↕ ↕ │ ║ ║ │ ┌──────────┐ │ ║ ║ │ │ Recipe │ │ ║ ║ │ │ Living │ │ ║ ║ │ │Knowledge │ │ ║ ║ │ └──────────┘ │ ║ ║ │ ↕ ↕ │ ║ ║ │ Guard (Immunity) ◄────► Tool Forge (Create) │ ║ ║ │ │ ║ ║ └────────────────────────────────────────────────┘ ║ ║ ║ ══════════════════════════════════════════════════════════ ``` ### Agent Actions × Organism Responses Each IDE Agent action triggers coordinated responses from different organs: | Agent Action | Organism Response | Organs Involved | |-------------|------------------|-----------------| | **Capture knowledge** — extract and submit patterns | Digestive system metabolizes internally: confidence routing → staging observation → evolves or decays. Developer retains full intervention rights | Digest → Nerves | | **Write code** — start coding | Nervous system analyzes intent, auto-injects relevant Recipes with sourceRefs source evidence for higher trust | Nerves → Recipe | | **Search knowledge** — active search | Precise retrieval based on current intent + file context, multi-path fusion ranking, dynamic weight adjustment per scenario | Nerves → Recipe | | **Shift intent** — change direction | Nervous system records drift signals, senses problems; immune system reverse-checks whether Recipes are still valid | Nerves → Immunity | | **Complete task** — finish writing code | Immune system triggers Guard Review, attaches relevant Recipes for Agent to fix violations | Immunity → Recipe | | **Capability boundary** — hit an unsolvable problem | Creation system calls LLM to forge temporary tools, vm-sandboxed execution, auto-reclaimed on expiry | Create | ### Five Organs **Skeleton — Panorama** The organism's structural awareness. AST + call graphs infer module roles & layers (four-signal fusion, 13 role types), Tarjan SCC computes coupling, Kahn topological sort infers layering, DimensionAnalyzer generates 11-dimension health radar, outputting coverage heatmaps and gap reports. All organs share this project overview. **Digest — Governance** The metabolic engine for new knowledge entering the organism. ContradictionDetector finds conflicts, RedundancyAnalyzer flags duplication, DecayDetector scores decay (6 strategies + 4-dimension scoring), ConfidenceRouter numerically routes (≥ 0.85 auto-publishes, < 0.2 rejects). ProposalExecutor auto-executes evolution proposals on expiry (7 types, differentiated observation windows). Six-state lifecycle: `pending → staging → active → evolving/decaying → deprecated`. **Nerves — Signal + Intent** Senses all Agent behavior. IntentExtractor extracts terms, infers language and module, cross-language synonym expansion, identifies 4 scenarios. SignalBus unifies 12 signal types (guard / search / usage / lifecycle / quality / exploration / panorama / decay / forge / intent / anomaly / guard_blind_spot), HitRecorder batches usage events. When the Agent shifts intent, nerves record drift signals and coordinate the immune system for reverse checking. **Immunity — Guard** Bidirectional immune system. Forward: four-layer detection (regex → code-level multi-line → tree-sitter AST → cross-file), built-in 8-language rules, three-state output (pass / violation / uncertain). Backward: ReverseGuard verifies Recipe-referenced API symbols still exist (5 drift types). Auto-triggers Review when Agent completes a task, handing violations along with relevant Recipes to the Agent for fixing. RuleLearner tracks P/R/F1 for auto-tuning. **Create — Tool Forge** Creativity at capability boundaries. Three progressive modes — Reuse (0ms) → Compose (10ms, atomic tool assembly) → Generate (~5s, LLM writes code → vm sandbox validation: 5s timeout + 18 security rules). Temporary tools have 30min TTL, auto-reclaimed on expiry. LLM participates only during forging; execution is fully deterministic. ### Design Philosophy 1. **AI Compile-Time + Engineering Runtime** — LLM produces deterministic artifacts; runtime is pure engineering logic 2. **Deterministic Marking + Probabilistic Resolution** — Each layer does its deterministic part; uncertainty escalates to AI 3. **Orthogonal Composition > Specialized Subclasses** — Capability × Strategy × Policy replaces N subclasses 4. **Signal-Driven > Time-Driven** — Trigger on signal saturation, not scheduled scans 5. **Defense in Depth** — Constitution → Gateway → Permission → SafetyPolicy → PathGuard → ConfidenceRouter > Organ implementation details, engineering metrics, and defense chain breakdown in [Technical Book](https://docs.gaoxuefeng.com/visual-tour) --- ## Engineering Capabilities The above is the organism itself. Below are the engineering integration capabilities it exposes. ### Guard CLI ```bash asd guard src/ # Check directory asd guard:staged # pre-commit: staged files only asd guard:ci --min-score 90 # CI quality gate ``` ### Multi-Language AST 11-language tree-sitter: Go · Python · Java · Kotlin · Swift · JS · TS · Rust · ObjC · Dart · C#. 5-stage CallGraph, incremental analysis, 8 project types auto-detected. ### 6-Channel IDE Delivery Knowledge changes auto-deliver to IDE-consumable formats: | Channel | Path | Content | |---------|------|---------| | **A** | `.cursor/rules/autosnippet-project-rules.mdc` | alwaysApply one-liner rules | | **B** | `.cursor/rules/autosnippet-patterns-{topic}.mdc` | When/Do/Don't themed rules | | **C · D** | `.cursor/skills/` | Project Skills + development docs | | **F** | `AGENTS.md` / `CLAUDE.md` / `.github/copilot-instructions.md` | Agent instructions | | **Mirror** | `.qoder/` / `.trae/` | IDE mirrors | ### More - **Bootstrap Cold Start** — 6-phase · 10-dimension analysis, one-time knowledge base build - **Knowledge Graph** — 14 relationship types, query impact paths and dependency depth - **Semantic Search** — HNSW vector index + field-weighted scoring hybrid, RRF fusion + 7-signal ranking - **sourceRefs** — Recipes carry source evidence, Agent trusts without self-verification - **Lark Remote** — Message from phone, intent routes to Bot or IDE - **Remote Repository** — Recipe directory as git sub-repo, shared across projects > AI-driven features require an LLM API Key. Supports Google / OpenAI / Claude / DeepSeek / Ollama with automatic fallback. --- ## Project Structure After `asd setup`, your project gains these: ``` your-project/ ├── AutoSnippet/ # Knowledge data (git-tracked) │ ├── recipes/ # Reviewed patterns (Markdown) │ ├── candidates/ # Pending review │ ├── skills/ # Project-specific Agent instructions │ └── wiki/ # Project Wiki ├── .autosnippet/ # Runtime cache (gitignored) │ ├── autosnippet.db # SQLite (WAL mode) │ └── context/ # Vector index (HNSW) ├── .cursor/ │ ├── mcp.json # Cursor MCP config │ ├── rules/ # Channel A + B rules │ └── skills/ # Channel C + D Skills ├── .vscode/mcp.json # VS Code MCP config ├── .github/copilot-instructions.md ├── AGENTS.md └── CLAUDE.md ``` Recipes are Markdown files. SQLite is just a read cache. If the database breaks, `asd sync` rebuilds it. --- ## IDE Support | IDE | Integration | Details | |-----|------------|---------| | **VS Code** | Extension + MCP | `#asd` tool references in Agent Mode; search, directives, CodeLens, Guard diagnostic squiggles, light-bulb fixes | | **Cursor** | MCP + Rules | `.cursor/mcp.json` + `.cursor/rules/` + `.cursor/skills/` | | **Claude Code** | MCP + CLAUDE.md | `CLAUDE.md` + MCP tools; supports hooks | | **Trae / Qoder** | MCP | `asd setup` auto-generates, `asd mirror` syncs config | | **Xcode** | File watching | `asd watch` + file directives + Snippet sync | | **Lark** | Bot + WebSocket | Message from phone → intent recognition → Bot Agent or IDE Agent Mode execution | ### VS Code Extension - **Comment Directives**: `// as:s <query>` search & insert, `// as:c` create candidate from selection, `// as:a` audit current file - **CodeLens**: Clickable actions above directives - **Guard Diagnostics**: Violations shown as squiggles + light-bulb quick fixes - **Status Bar**: Live API Server connection status All configuration auto-generated by `asd setup`. Run `asd upgrade` after updates. --- ## Deep Dive > **[Visual Tour — Understand the entire system in 5 minutes](https://docs.gaoxuefeng.com/visual-tour)** · 25 hand-drawn architecture diagrams from workflow to Agent loop | Chapter | Content | |---------|--------| | [Introduction](https://docs.gaoxuefeng.com/part1/ch01-introduction) | Problem definition, solution overview, quick start | | [SOUL Principles](https://docs.gaoxuefeng.com/part1/ch02-soul) | 3 hard constraints + 5 design philosophies | | [Architecture](https://docs.gaoxuefeng.com/part2/ch03-architecture) | 7-layer DDD with module topology | | [Security Pipeline](https://docs.gaoxuefeng.com/part2/ch04-security) | Six-layer defense in depth | | [Code Understanding](https://docs.gaoxuefeng.com/part2/ch05-ast) | 10-language Tree-sitter AST analysis | | [Knowledge Domain](https://docs.gaoxuefeng.com/part3/ch06-knowledge-entry) | Unified entity, lifecycle, quality scoring | | [Core Services](https://docs.gaoxuefeng.com/part4/ch09-bootstrap) | Bootstrap, Guard, Search, Metabolism | | [Agent Intelligence](https://docs.gaoxuefeng.com/part5/ch13-agent-runtime) | ReAct loop, orthogonal composition, 61+ tools | | [Platform & Delivery](https://docs.gaoxuefeng.com/part6/ch16-infrastructure) | Data infrastructure, MCP, four-interface access | | [BiliDili Cold Start](https://docs.gaoxuefeng.com/part7/ch19-bilidili-coldstart) | Real data: 8.4M tokens, 101 candidates | --- ## Requirements - Node.js ≥ 22 - macOS recommended (Xcode features require it; other features are cross-platform) - better-sqlite3 (bundled) ### Recommended: Local Embedding for Semantic Search AutoSnippet has a built-in hybrid search engine (keyword + vector semantic). Install a local embedding model to unlock semantic search — concept-level matching that finds relevant recipes even when exact keywords don't match. ```bash # Install Ollama (https://ollama.com) brew install ollama && ollama serve # Pull the recommended model (~639MB, supports Chinese + English + code) ollama pull qwen3-embedding:0.6b ``` Then add to your project's `.env`: ```bash ASD_EMBED_PROVIDER=ollama ASD_EMBED_MODEL=qwen3-embedding:0.6b ``` Or configure it in Dashboard (`asd ui`) → Settings → Embedding Model. After configuring, run `asd embed` to build the vector index. Semantic search adds ~200–400ms per query (local inference, no API calls, no data leaves your machine). > **Without a local model**, search still works — it uses field-weighted keyword matching, which is fast and accurate for exact terms. Semantic search is a bonus layer for concept-level queries like *"how to avoid data races"* or *"cookie persistence"*. ## Contributing 1. Run `npm test` before submitting 2. Follow existing code patterns (ESM, domain-driven structure) ## License [MIT](LICENSE) © gaoxuefeng