UNPKG

aiwg

Version:

Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo

935 lines (854 loc) 101 kB
# Software Architecture Document (SAD) ## AIWG Research Framework **Version:** 1.0.0 **Status:** DRAFT **Last Updated:** 2026-01-25 **Owner:** Architecture Designer --- ## Document Control | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0.0 | 2026-01-25 | Architecture Designer | Initial architecture | --- ## Table of Contents 1. [Introduction](#1-introduction) 2. [Architectural Goals and Constraints](#2-architectural-goals-and-constraints) 3. [System Overview](#3-system-overview) 4. [Architectural Views](#4-architectural-views) 5. [Key Architectural Decisions](#5-key-architectural-decisions) 6. [Cross-Cutting Concerns](#6-cross-cutting-concerns) 7. [Integration Architecture](#7-integration-architecture) 8. [Quality Attributes](#8-quality-attributes) 9. [Appendices](#9-appendices) --- ## 1. Introduction ### 1.1 Purpose This Software Architecture Document describes the high-level architecture of the AIWG Research Framework, a CLI-based research management system designed to automate discovery, acquisition, documentation, and citation of academic papers within the AIWG ecosystem. The document provides: - A comprehensive view of the system architecture - Component decomposition and responsibilities - Data flow and integration patterns - Guidance for implementation decisions ### 1.2 Scope The Research Framework covers the following functional areas: | Area | Use Cases | Primary Agent | |------|-----------|---------------| | Discovery | UC-RF-001: Paper discovery via Semantic Scholar | Discovery Agent | | Acquisition | UC-RF-002: PDF download with FAIR validation | Acquisition Agent | | Documentation | UC-RF-003: LLM summarization with GRADE scoring | Documentation Agent | | Citation | UC-RF-004: Claims backing and bibliography generation | Citation Agent | | Provenance | UC-RF-005: W3C PROV compliance tracking | Provenance Agent | | Quality | UC-RF-006: Quality validation and FAIR assessment | Quality Agent | | Reporting | UC-RF-007: Research status and progress reporting | Reporting Agent | | Integration | UC-RF-008-010: External tool integration | Integration Agent | ### 1.3 Definitions, Acronyms, and Terms | Term | Definition | |------|------------| | AIWG | AI Writing Guide - parent framework | | ADR | Architectural Decision Record | | CLI | Command Line Interface | | CSL | Citation Style Language | | DOI | Digital Object Identifier | | FAIR | Findable, Accessible, Interoperable, Reusable | | GRADE | Grading of Recommendations Assessment, Development and Evaluation | | LLM | Large Language Model | | MCP | Model Context Protocol | | MoC | Map of Content (Zettelkasten pattern) | | NFR | Non-Functional Requirement | | PROV | W3C Provenance standard | | RAG | Retrieval-Augmented Generation | | REF-XXX | Research Framework persistent identifier format | | SAD | Software Architecture Document | | SDLC | Software Development Lifecycle | ### 1.4 References | Document | Location | |----------|----------| | Vision Document | @$AIWG_ROOT/agentic/code/frameworks/research-complete/inception/vision-document.md | | NFR Specifications | @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/nfr/NFR-RF-specifications.md | | Use Cases (UC-RF-001-010) | @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/use-cases/ | | Risk Assessment | @$AIWG_ROOT/agentic/code/frameworks/research-complete/inception/initial-risk-assessment.md | | AIWG Extension System | @$AIWG_ROOT/docs/extensions/overview.md | | W3C PROV Overview | https://www.w3.org/TR/prov-overview/ | | FAIR Principles | https://www.go-fair.org/fair-principles/ | --- ## 2. Architectural Goals and Constraints ### 2.1 Key Architectural Drivers The architecture is driven by the following NFR categories (45 requirements across 9 categories): | Priority | Driver | Source NFRs | Impact | |----------|--------|-------------|--------| | **Critical** | Provenance Compliance | NFR-RF-P-01 to P-08 | W3C PROV tracking for all operations | | **Critical** | Quality Assessment | NFR-RF-Q-01 to Q-06 | FAIR/GRADE scoring for all sources | | **Critical** | LLM Accuracy | NFR-RF-D-01 to D-08 | RAG pattern with hallucination detection | | **High** | Performance | NFR-RF-A-01 to A-10 | <60s acquisition, <5min documentation | | **High** | Usability | NFR-RF-U-01 to U-05 | Natural language commands, progress visibility | | **Medium** | Integration | NFR-RF-I-01 to I-05 | Semantic Scholar, Zotero, Obsidian APIs | ### 2.2 Constraints #### 2.2.1 Organizational Constraints | Constraint | Description | Impact | |------------|-------------|--------| | C-01: Solo Developer | Single developer for v1.0 | Prioritize simplicity over sophistication | | C-02: AIWG Integration | Must follow AIWG patterns | Use existing extension system | | C-03: Open Source | MIT license, public repository | No proprietary dependencies | #### 2.2.2 Technical Constraints | Constraint | Description | Impact | |------------|-------------|--------| | C-04: CLI-First | Primary interface is command line | No web UI in v1.0 | | C-05: Node.js Runtime | AIWG uses Node.js | TypeScript implementation | | C-06: Local Storage | All data in `.aiwg/research/` | File-based, no database server | | C-07: API Rate Limits | Semantic Scholar: 100 req/min | Implement rate limiting | | C-08: LLM Token Costs | Claude/OpenAI API costs | Efficient prompt design | #### 2.2.3 Quality Constraints | Constraint | Description | Target | |------------|-------------|--------| | C-09: Reproducibility | External replication success | >90% | | C-10: FAIR Compliance | Source quality assessment | >80% high/moderate | | C-11: Citation Accuracy | Citation format correctness | >95% | | C-12: Hallucination Detection | LLM output validation | >95% recall | ### 2.3 Architectural Principles | Principle | Rationale | |-----------|-----------| | **P-01: Agent-Based Architecture** | Aligns with AIWG's multi-agent patterns | | **P-02: File-Based Persistence** | Simple, portable, Git-friendly | | **P-03: Pipeline Processing** | Clear data flow through stages | | **P-04: Provenance by Default** | Every operation logged automatically | | **P-05: Graceful Degradation** | Failures don't block workflow | | **P-06: Extension Points** | Custom agents, templates, integrations | --- ## 3. System Overview ### 3.1 Context Diagram (C4 Level 1) ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ SYSTEM CONTEXT │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────┐ ┌──────────────────┐ │ │ │ Developer/ │ │ Semantic │ │ │ │ Researcher │ │ Scholar API │ │ │ │ (Primary) │ │ (External) │ │ │ └───────┬───────┘ └────────┬─────────┘ │ │ │ │ │ │ │ CLI Commands Paper Search │ │ │ │ Natural Language Metadata │ │ │ ▼ ▼ │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ AIWG Research Framework │ │ │ │ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ Discovery │→ │ Acquisition │→ │Documentation│→ │ Citation │ │ │ │ │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ ▲ │ │ │ │ │ │ │ │ │ ┌───────┴───────┐ │ │ │ │ │ Provenance │ │ │ │ │ │ Agent │ │ │ │ │ └───────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ Summaries, Citations API Calls │ │ │ │ Knowledge Notes Metadata │ │ │ ▼ ▼ │ │ ┌───────────────┐ ┌──────────────────┐ │ │ │ SDLC Docs │ │ LLM API │ │ │ │ (.aiwg/) │ │ (Claude/OpenAI) │ │ │ └───────────────┘ └──────────────────┘ │ │ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │ │ │ CrossRef │ │ arXiv │ │ Optional Integrations │ │ │ │ API │ │ API │ │ (Zotero, Obsidian) │ │ │ └───────────────┘ └───────────────┘ └───────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### 3.2 Container Diagram (C4 Level 2) ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ AIWG Research Framework │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ CLI Interface Layer ││ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ │ │ aiwg │ │ /research- │ │ Natural │ │ MCP Server │ ││ │ │ │ research │ │ slash │ │ Language │ │ (Protocol) │ ││ │ │ │ commands │ │ commands │ │ Parser │ │ │ ││ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ││ │ └─────────┼────────────────┼────────────────┼────────────────┼───────────┘│ │ └────────────────┴────────────────┴────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ Agent Orchestration Layer ││ │ │ ││ │ │ ┌───────────────────────────────────────────────────────────────────┐ ││ │ │ │ Research Orchestrator │ ││ │ │ │ - Workflow coordination │ ││ │ │ │ - Agent lifecycle management │ ││ │ │ │ - State machine (SDLC phases) │ ││ │ │ └───────────────────────────────────────────────────────────────────┘ ││ │ │ │ ││ │ │ ┌─────────┼─────────┬───────────────┬────────────────┬──────────────┐ ││ │ │ ▼ ▼ ▼ ▼ ▼ │ ││ │ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ ││ │ │ │Discov.│ │Acquis.│ │Docum. │ │Citat. │ │Proven.│ │ ││ │ │ │Agent │ │Agent │ │Agent │ │Agent │ │Agent │ │ ││ │ │ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ │ ││ │ │ │ │ │ │ │ │ ││ │ │ ▼ ▼ ▼ ▼ ▼ │ ││ │ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ ││ │ │ │Quality│ │Report │ │Integr.│ │Search │ │Metadata│ │ ││ │ │ │Agent │ │Agent │ │Agent │ │Agent │ │Agent │ │ ││ │ │ └───────┘ └───────┘ └───────┘ └───────┘ └─────────┘ │ ││ │ │ │ ││ │ └─────────────────────────────────────────────────────────────────────────┘│ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ Service Layer ││ │ │ ││ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ │ │ API Client │ │ PDF │ │ LLM │ │ Citation │ ││ │ │ │ Service │ │ Processor │ │ Service │ │ Formatter │ ││ │ │ │ │ │ │ │ (RAG) │ │ (CSL) │ ││ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ││ │ │ ││ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ │ │ FAIR │ │ GRADE │ │ Provenance │ │ Checksum │ ││ │ │ │ Validator │ │ Scorer │ │ Logger │ │ Service │ ││ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ││ │ │ ││ │ └─────────────────────────────────────────────────────────────────────────┘│ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ Data Access Layer ││ │ │ ││ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ │ │ Source │ │ Knowledge │ │ Provenance │ │ Config │ ││ │ │ │ Repository │ │ Repository │ │ Repository │ │ Repository │ ││ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ││ │ │ ││ │ └─────────────────────────────────────────────────────────────────────────┘│ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ File System (Storage) ││ │ │ ││ │ │ .aiwg/research/ ││ │ │ ├── sources/ # PDFs and metadata ││ │ │ ├── knowledge/ # Summaries, extractions, notes ││ │ │ ├── discovery/ # Search results, acquisition queue ││ │ │ ├── provenance/ # W3C PROV logs, lineage graph ││ │ │ ├── networks/ # Citation network, knowledge graph ││ │ │ └── config/ # Framework configuration ││ │ │ ││ │ └─────────────────────────────────────────────────────────────────────────┘│ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## 4. Architectural Views ### 4.1 Logical View (Component Decomposition) #### 4.1.1 Agent Components The Research Framework employs 8 specialized agents following AIWG's agent pattern: | Agent | Responsibility | Use Case(s) | Primary Tools | |-------|---------------|-------------|---------------| | **Discovery Agent** | Search academic databases, rank results | UC-RF-001 | Semantic Scholar API, CrossRef API | | **Acquisition Agent** | Download PDFs, validate FAIR compliance | UC-RF-002 | HTTP client, PDF validator, FAIR scorer | | **Documentation Agent** | LLM summarization, GRADE scoring | UC-RF-003 | LLM API (RAG), PDF extractor, GRADE scorer | | **Citation Agent** | Format citations, build bibliography | UC-RF-004 | CSL processor, claims index, BibTeX exporter | | **Provenance Agent** | Track operations, W3C PROV logging | UC-RF-005 | PROV-JSON logger, lineage graph | | **Quality Agent** | FAIR validation, quality assessment | UC-RF-006 | FAIR validator, quality metrics | | **Reporting Agent** | Status reports, progress tracking | UC-RF-007 | Report generator, metrics aggregator | | **Integration Agent** | External tool sync (Zotero, Obsidian) | UC-RF-008-010 | Zotero API, Obsidian vault writer | #### 4.1.2 Agent Component Diagram ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Agent Layer │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ Research Orchestrator │ │ │ │ │ │ │ │ - Manages agent lifecycle (spawn, coordinate, terminate) │ │ │ │ - Implements research workflow state machine │ │ │ │ - Handles inter-agent communication │ │ │ │ - Enforces quality gates between stages │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ orchestrates │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────┐│ │ │ ││ │ │ Stage 1: Discovery Stage 2: Acquisition Stage 3: Doc ││ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ ││ │ │ │ Discovery Agent │ → │ Acquisition │ → │ Documentation │ ││ │ │ │ │ │ Agent │ │ Agent │ ││ │ │ │ - searchPapers │ │ - downloadPDF │ │ - summarize │ ││ │ │ │ - rankResults │ │ - extractMeta │ │ - extractData │ ││ │ │ │ - buildQueue │ │ - validateFAIR │ │ - gradeSource │ ││ │ │ │ - savePRISMA │ │ - assignREFID │ │ - createNote │ ││ │ │ └─────────────────┘ └─────────────────┘ └────────────────┘ ││ │ │ ││ │ │ Stage 4: Citation Cross-Cutting: Provenance ││ │ │ ┌─────────────────┐ ┌─────────────────────────────────────────┐ ││ │ │ │ Citation Agent │ │ Provenance Agent │ ││ │ │ │ │ │ │ ││ │ │ │ - formatCitation│ │ - logOperation (all stages) │ ││ │ │ │ - insertClaim │ │ - buildLineageGraph │ ││ │ │ │ - buildBiblio │ │ - computeChecksums │ ││ │ │ │ - exportBibTeX │ │ - validatePROV │ ││ │ │ └─────────────────┘ └─────────────────────────────────────────┘ ││ │ │ ││ │ │ Supporting Agents ││ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ││ │ │ │ Quality Agent │ │ Reporting │ │ Integration │ ││ │ │ │ │ │ Agent │ │ Agent │ ││ │ │ │ - validateFAIR│ │ - genStatus │ │ - syncZotero │ ││ │ │ │ - auditQuality│ │ - genProgress │ │ - syncObsidian│ ││ │ │ └───────────────┘ └───────────────┘ └───────────────┘ ││ │ │ ││ │ └─────────────────────────────────────────────────────────────────────────┘│ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` #### 4.1.3 Service Components | Service | Responsibility | Interfaces | |---------|---------------|------------| | **APIClientService** | HTTP client for external APIs | SemanticScholarClient, CrossRefClient, ArXivClient | | **PDFProcessorService** | PDF download, extraction, validation | download(), extractText(), validateMagicBytes() | | **LLMService** | RAG-based summarization, hallucination detection | summarize(), extractData(), validateGrounding() | | **CitationFormatterService** | CSL-based citation formatting | formatInline(), formatFull(), exportBibTeX() | | **FAIRValidatorService** | FAIR compliance scoring | scoreFindable(), scoreAccessible(), scoreInteroperable(), scoreReusable() | | **GRADEScorerService** | Evidence quality scoring | scoreRiskOfBias(), scoreConsistency(), scoreDirectness(), scorePrecision() | | **ProvenanceLoggerService** | W3C PROV-JSON logging | logEntity(), logActivity(), logAgent(), logRelationship() | | **ChecksumService** | SHA-256 hash computation and verification | compute(), verify() | #### 4.1.4 Component Interfaces ```typescript // Agent Interface (all agents implement) interface ResearchAgent { readonly id: string; readonly name: string; readonly capabilities: string[]; execute(context: AgentContext): Promise<AgentResult>; validate(input: unknown): ValidationResult; rollback(context: AgentContext): Promise<void>; } // Discovery Agent Interface interface DiscoveryAgent extends ResearchAgent { searchPapers(query: SearchQuery): Promise<SearchResults>; rankResults(results: SearchResults, criteria: RankingCriteria): Promise<RankedResults>; buildAcquisitionQueue(selected: Paper[]): Promise<AcquisitionQueue>; savePRISMAFlow(protocol: PRISMAProtocol): Promise<void>; } // Acquisition Agent Interface interface AcquisitionAgent extends ResearchAgent { downloadPDF(paper: Paper): Promise<PDFFile>; extractMetadata(pdf: PDFFile): Promise<Metadata>; validateFAIR(metadata: Metadata): Promise<FAIRScore>; assignREFID(paper: Paper): Promise<string>; } // Documentation Agent Interface interface DocumentationAgent extends ResearchAgent { summarize(source: AcquiredSource, llm: LLMConfig): Promise<Summary>; extractStructuredData(source: AcquiredSource): Promise<Extraction>; gradeSource(source: AcquiredSource): Promise<GRADEScore>; createLiteratureNote(source: AcquiredSource): Promise<LiteratureNote>; } // Citation Agent Interface interface CitationAgent extends ResearchAgent { formatCitation(source: AcquiredSource, style: CitationStyle): Promise<Citation>; insertClaim(claim: Claim, source: AcquiredSource, document: Document): Promise<void>; buildBibliography(sources: AcquiredSource[]): Promise<Bibliography>; exportBibTeX(bibliography: Bibliography): Promise<string>; } // Provenance Agent Interface interface ProvenanceAgent extends ResearchAgent { logOperation(operation: Operation): Promise<PROVRecord>; buildLineageGraph(entity: Entity): Promise<LineageGraph>; computeChecksum(file: File): Promise<string>; validatePROV(record: PROVRecord): Promise<ValidationResult>; exportReproducibilityPackage(): Promise<Package>; } ``` ### 4.2 Process View (Runtime Behavior) #### 4.2.1 Research Workflow State Machine ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Research Workflow States │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────┐ │ │ │ INITIAL │ │ │ └─────┬─────┘ │ │ │ aiwg use research │ │ ▼ │ │ ┌───────────┐ │ │ │ READY │◄──────────────────────────────────────────────────┐ │ │ └─────┬─────┘ │ │ │ │ aiwg research search │ │ │ ▼ │ │ │ ┌───────────┐ ┌───────────┐ │ │ │ │DISCOVERING│──▶│ DISCOVERY │ │ │ │ │ │ │ COMPLETE │ │ │ │ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ aiwg research acquire │ │ │ ▼ error ▼ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ │ ERROR │ │ ACQUIRING │──▶│ACQUISITION│ │ │ │ │ │ │ │ │ COMPLETE │ │ │ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ │ aiwg research summarize │ │ │ ▼ retry ▼ error ▼ │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ └────────▶│ READY │ │DOCUMENTING│──▶│ DOC │ │ │ │ │ │ │ │ │ COMPLETE │ │ │ │ └───────────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ cite │ │ │ ▼ error ▼ │ │ │ ┌───────────┐ ┌───────────┐ │ │ │ │ ERROR │ │ CITING │──▶│ │ │ └───────────┘ │ │ │ │ │ └─────┬─────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌───────────┐ │ │ │ │ CITATION │───┘ │ │ │ COMPLETE │ │ │ └───────────┘ │ │ │ │ Legend: │ │ ────────▶ State transition │ │ ◄──────── Error recovery │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` #### 4.2.2 Agent Orchestration Pattern ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Agent Orchestration Sequence │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ User Orchestrator Discovery Provenance API │ │ │ │ │ │ │ │ │ │ search "OAuth" │ │ │ │ │ │ │───────────────▶│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ spawn │ │ │ │ │ │ │───────────────▶│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ logStart │ │ │ │ │ │ │─────────────▶│ │ │ │ │ │ │ │ │ │ │ │ │ │ searchPapers │ │ │ │ │ │ │─────────────────────────▶│ │ │ │ │ │ │ │ │ │ │ │ │◀─────────────────────────│ │ │ │ │ │ │ results │ │ │ │ │ │ │ │ │ │ │ │ │ logEntity │ │ │ │ │ │ │─────────────▶│ │ │ │ │ │ │ │ │ │ │ │ │◀───────────────│ │ │ │ │ │ │ results │ │ │ │ │ │ │ │ │ │ │ │ │◀───────────────│ │ │ │ │ │ │ display │ │ │ │ │ │ │ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` #### 4.2.3 Concurrency Model | Operation Type | Concurrency Approach | Rationale | |----------------|---------------------|-----------| | **API Calls** | Parallel (rate-limited) | Maximize throughput within limits | | **PDF Downloads** | 5 concurrent connections | Balance speed vs. server load | | **LLM Summarization** | Sequential | Control token costs, avoid rate limits | | **Provenance Logging** | Async (non-blocking) | Don't slow primary operations | | **File I/O** | Sequential with mutex | Prevent corruption | ```typescript // Concurrency configuration const ConcurrencyConfig = { api: { semanticScholar: { maxConcurrent: 5, rateLimit: 100 }, // 100 req/min crossRef: { maxConcurrent: 3, rateLimit: 50 }, arXiv: { maxConcurrent: 2, rateLimit: 30 } }, download: { maxConcurrent: 5, retryAttempts: 3, exponentialBackoff: [5000, 10000, 20000] // ms }, llm: { maxConcurrent: 1, // Sequential to control costs timeout: 120000 // 2 minutes per request }, provenance: { mode: 'async', bufferSize: 100, // Buffer before flush flushInterval: 5000 // 5 seconds } }; ``` ### 4.3 Data View (Data Model) #### 4.3.1 Directory Structure ``` .aiwg/research/ ├── sources/ # Acquired research materials │ ├── pdfs/ # PDF files │ │ └── REF-001-oauth-security.pdf │ ├── metadata/ # Source metadata (JSON) │ │ └── REF-001-metadata.json │ └── checksums.txt # SHA-256 hashes │ ├── knowledge/ # Processed knowledge │ ├── summaries/ # LLM-generated summaries │ │ └── REF-001-summary.md │ ├── extractions/ # Structured data extractions │ │ └── REF-001-extraction.json │ ├── notes/ # Literature notes (Zettelkasten) │ │ ├── REF-001-literature-note.md │ │ └── permanent/ # Permanent notes (synthesis) │ │ └── llm-caching-patterns.md │ ├── maps/ # Maps of Content │ │ └── llm-evaluation-methods.md │ └── claims-index.md # Claims tracking │ ├── discovery/ # Discovery artifacts │ ├── search-results/ # Raw API responses │ │ └── search-2026-01-25T10-30-00.json │ ├── acquisition-queue.json # Papers queued for download │ └── prisma-protocols/ # PRISMA flow documents │ └── oauth-security-review.md │ ├── provenance/ # Provenance tracking │ ├── prov-2026-01.json # Monthly PROV-JSON logs │ ├── lineage-graph.json # Entity relationships │ ├── failed-logs/ # Recovery for failed logs │ └── reproducibility-packages/ # Export packages │ ├── networks/ # Relationship graphs │ ├── citation-network.json # Paper citation relationships │ └── knowledge-graph.json # Concept connections │ ├── bibliography.md # Generated bibliography ├── bibliography.bib # BibTeX export │ └── config/ # Framework configuration ├── research-config.json # Framework settings ├── api-credentials.json # API keys (gitignored) └── citation-styles/ # Custom CSL files ``` #### 4.3.2 Entity Relationship Model ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Entity Relationships │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────┐ ┌───────────────┐ │ │ │ SearchQuery │ │ SearchResult │ │ │ │ │ 1 * │ │ │ │ │ - query │──────────▶│ - paperId │ │ │ │ - filters │ │ - title │ │ │ │ - timestamp │ │ - authors │ │ │ │ - source │ │ - relevance │ │ │ └───────────────┘ └───────┬───────┘ │ │ │ selected │ │ ▼ │ │ ┌───────────────┐ ┌───────────────┐ │ │ │ AcquisitionQ │ * 1 │ AcquiredSource│ │ │ │ │◀─────────│ │ │ │ │ - queuedAt │ │ - refId │─────────┐ │ │ │ - priority │ │ - pdfPath │ │ │ │ │ - status │ │ - metadata │ │ │ │ └───────────────┘ │ - fairScore │ │ │ │ │ - checksum │ │ │ │ └───────┬───────┘ │ │ │ │ documents │ │ │ ▼ │ │ │ ┌───────────────┐ │ │ │ │ Summary │ │ │ │ │ │ │ │ │ │ - executive │ │ │ │ │ - findings │ │ │ │ │ - methodology │ │ │ │ │ - gradeScore │ │ │ │ └───────┬───────┘ │ │ │ │ extracts │ │ │ ▼ │ │ │ ┌───────────────┐ │ │ │ │ Extraction │ │ │ │ │ │ │ │ │ │ - claims[] │ │ │ │ │ - methods[] │ │ │ │ │ - datasets[] │ │ │ │ │ - findings[] │ │ │ │ └───────┬───────┘ │ │ │ │ backs │ cites │ │ ▼ ▼ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ │ LiteratureNote│◀──────────│ Claim │ │ Citation │ │ │ │ │ references│ │ │ │ │ │ │ - title │ │ - text │ │ - inline │ │ │ │ - source │ │ - status │ │ - full │ │ │ │ - tags[] │ │ - sourceRef │ │ - style │ │ │ │ - keyPoints[] │ │ - documentLoc │ │ - sourceRef │ │ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ │ ┌───────────────┐ ┌───────────────┐ │ │ │ PROVRecord │ │ LineageEdge │ │ │ │ │ 1 * │ │ │ │ │ - entity │──────────▶│ - source │ │ │ │ - activity │ │ - target │ │ │ │ - agent │ │ - relationship│ │ │ │ - timestamp │ │ - timestamp │ │ │ └───────────────┘ └───────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` #### 4.3.3 Data Schemas **Source Metadata Schema (JSON)** ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "required": ["ref_id", "title", "authors", "year", "acquisition_timestamp"], "properties": { "ref_id": { "type": "string", "pattern": "^REF-\\d{3}$", "description": "Persistent identifier (REF-001 to REF-999)" }, "title": { "type": "string", "minLength": 1 }, "title_slug": { "type": "string", "pattern": "^[a-z0-9-]+$" }, "authors": { "type": "array", "items": { "type": "object", "required": ["name"], "properties": { "name": { "type": "string" }, "affiliation": { "type": "string" } } } }, "year": { "type": "integer", "minimum": 1900, "maximum": 2100 }, "venue": { "type": "string" }, "venue_tier": { "type": "string", "enum": ["A*", "A", "B", "C", "preprint"] }, "doi": { "type": "string", "pattern": "^10\\.\\d+/.+" }, "abstract": { "type": "string" }, "license": { "type": "string" }, "url": { "type": "string", "format": "uri" }, "pdf_url": { "type": "string", "format": "uri" }, "citations": { "type": "integer", "minimum": 0 }, "acquisition_timestamp": { "type": "string", "format": "date-time" }, "acquisition_source": { "type": "string", "enum": ["semantic-scholar-api", "crossref-api", "arxiv-api", "manual"] }, "fair_score": { "type": "object", "properties": { "findable": { "type": "integer", "minimum": 0, "maximum": 100 }, "accessible": { "type": "integer", "minimum": 0, "maximum": 100 }, "interoperable": { "type": "integer", "minimum": 0, "maximum": 100 }, "reusable": { "type": "integer", "minimum": 0, "maximum": 100 }, "overall": { "type": "integer", "minimum": 0, "maximum": 100 } } }, "checksum_sha256": { "type": "string", "pattern": "^[a-f0-9]{64}$" }, "file_size_bytes": { "type": "integer", "minimum": 0 }, "provenance": { "type": "object", "properties": { "discovery_query": { "type": "string" }, "discovery_timestamp": { "type": "string", "format": "date-time" }, "selected_by": { "type": "string" } } } } } ``` **Structured Extraction Schema (JSON)** ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "required": ["ref_id", "extraction_timestamp", "claims"], "properties": { "ref_id": { "type": "string", "pattern": "^REF-\\d{3}$" }, "extraction_timestamp": { "type": "string", "format": "date-time" }, "llm_model": { "type": "string" }, "claims": { "type": "array", "items": { "type": "string" }, "minItems": 1 }, "methods": { "type": "array", "items": { "type": "string" } }, "datasets": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "size": { "type": "string" }, "source": { "type": "string" } } } }, "metrics": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "baseline": { "type": "string" }, "intervention": { "type": "string" } } } }, "findings": { "type": "array", "items": { "type": "object", "properties": { "claim": { "type": "string" }, "statistic": { "type": "string" }, "confidence_interval": { "type": "string" } } } }, "related_work": { "type": "array", "items": { "type": "string" } } } } ``` **W3C PROV Record Schema (JSON-LD subset)** ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "required": ["prefix", "entity", "activity", "agent"], "properties": { "prefix": { "type": "object", "required": ["prov", "aiwg"], "properties": { "prov": { "const": "http://www.w3.org/ns/prov#" }, "aiwg": { "const": "https://aiwg.io/research#" } } }, "entity": { "type": "object", "additionalProperties": { "type": "object", "required": ["prov:type"], "properties": { "prov:type": { "const": "prov:Entity" }, "aiwg:entityType": { "type": "string" }, "aiwg:filePath": { "type": "string" }, "aiwg:checksum": { "type": "string" }, "prov:generatedAtTime": { "type": "string", "format": "date-time" } } } }, "activity": { "type": "object", "additionalProperties": { "type": "object", "required": ["prov:type", "prov:startTime", "prov:endTime"], "properties": { "prov:type": { "const": "prov:Activity" }, "aiwg:activityType": { "type": "string" }, "aiwg:command": { "type": "string" }, "prov:startTime": { "type": "string", "format": "date-time" }, "prov:endTime": { "type": "string", "format": "date-time" } } } }, "agent": { "type": "object", "additionalProperties": { "type": "object", "required": ["prov:type"], "properties": { "prov:type": { "type": "string", "enum": ["prov:Agent", "prov:SoftwareAgent"] }, "aiwg:agentType": { "type": "string" }, "aiwg:version": { "type": "string" } } } }, "wasGeneratedBy": { "type": "object" }, "used": { "type": "object" }, "wasAssociatedWith": { "type": "object" }, "wasAttributedTo": { "type": "object" }, "wasDerivedFrom": { "type": "object" } } } ``` ### 4.4 Deployment View #### 4.4.1 Installation Model ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Deployment Architecture │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ Developer Machine │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ │ │ Node.js Runtime (v18+) │ │ │ │ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ AIWG CLI │ │ │ │ │ │ │ │ npm install -g aiwg │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ │ │ Research Framework Plugin │ │ │ │ │ │ │ │ │ │ aiwg use research │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ • research-discovery-agent │ │ │ │ │ │ │ │ │ │ • research-acquisition-agent │ │ │ │ │ │ │ │ │ │ • research-documentation-agent │ │ │ │ │ │ │ │ │ │ • research-citation-agent │ │ │ │ │ │ │ │ │ │ • research-provenance-agent │ │ │ │ │ │ │ │ │ │ • research-quality-agent │ │ │ │ │ │ │ │ │ │ • research-reporting-agent │ │ │ │ │ │ │ │ │ │ • research-integration-agent │ │ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────┘ │ │ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ Project Repo │ │ API Credentials │ │ Optional Tools │ │ │ │ │ │ .aiwg/research/ │ │ (environment) │ │ │ │ │ │ │ │ │ │ │ │ • Tesseract OCR │ │ │ │ │ │ • sources/ │ │ • SEMANTIC_KEY │ │ • pdftotext │ │ │ │ │ │ • knowledge/ │ │ • OPENAI_KEY │ │ • GraphViz │ │ │ │ │ │ • provenance/ │ │ • ANTHROPIC_KEY │ │ │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ │ External Services (Cloud) │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ Semantic │ │ CrossRef │ │ arXiv │ │ LLM API │ │ │ │ │ │ Scholar API │ │ API │ │ API │ │(Claude/GPT) │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ Rate: 100/m │ │ Rate: 50/m │ │ Rate: 30/m │ │ Token-based │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ │ Optional Integrations │ │ ┌────────────────────────────