aiwg
Version:
Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo
935 lines (854 loc) • 101 kB
Markdown
# Software Architecture Document (SAD)
## AIWG Research Framework
**Version:** 1.0.0
**Status:** DRAFT
**Last Updated:** 2026-01-25
**Owner:** Architecture Designer
---
## Document Control
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2026-01-25 | Architecture Designer | Initial architecture |
---
## Table of Contents
1. [Introduction](#1-introduction)
2. [Architectural Goals and Constraints](#2-architectural-goals-and-constraints)
3. [System Overview](#3-system-overview)
4. [Architectural Views](#4-architectural-views)
5. [Key Architectural Decisions](#5-key-architectural-decisions)
6. [Cross-Cutting Concerns](#6-cross-cutting-concerns)
7. [Integration Architecture](#7-integration-architecture)
8. [Quality Attributes](#8-quality-attributes)
9. [Appendices](#9-appendices)
---
## 1. Introduction
### 1.1 Purpose
This Software Architecture Document describes the high-level architecture of the AIWG Research Framework, a CLI-based research management system designed to automate discovery, acquisition, documentation, and citation of academic papers within the AIWG ecosystem.
The document provides:
- A comprehensive view of the system architecture
- Component decomposition and responsibilities
- Data flow and integration patterns
- Guidance for implementation decisions
### 1.2 Scope
The Research Framework covers the following functional areas:
| Area | Use Cases | Primary Agent |
|------|-----------|---------------|
| Discovery | UC-RF-001: Paper discovery via Semantic Scholar | Discovery Agent |
| Acquisition | UC-RF-002: PDF download with FAIR validation | Acquisition Agent |
| Documentation | UC-RF-003: LLM summarization with GRADE scoring | Documentation Agent |
| Citation | UC-RF-004: Claims backing and bibliography generation | Citation Agent |
| Provenance | UC-RF-005: W3C PROV compliance tracking | Provenance Agent |
| Quality | UC-RF-006: Quality validation and FAIR assessment | Quality Agent |
| Reporting | UC-RF-007: Research status and progress reporting | Reporting Agent |
| Integration | UC-RF-008-010: External tool integration | Integration Agent |
### 1.3 Definitions, Acronyms, and Terms
| Term | Definition |
|------|------------|
| AIWG | AI Writing Guide - parent framework |
| ADR | Architectural Decision Record |
| CLI | Command Line Interface |
| CSL | Citation Style Language |
| DOI | Digital Object Identifier |
| FAIR | Findable, Accessible, Interoperable, Reusable |
| GRADE | Grading of Recommendations Assessment, Development and Evaluation |
| LLM | Large Language Model |
| MCP | Model Context Protocol |
| MoC | Map of Content (Zettelkasten pattern) |
| NFR | Non-Functional Requirement |
| PROV | W3C Provenance standard |
| RAG | Retrieval-Augmented Generation |
| REF-XXX | Research Framework persistent identifier format |
| SAD | Software Architecture Document |
| SDLC | Software Development Lifecycle |
### 1.4 References
| Document | Location |
|----------|----------|
| Vision Document | @$AIWG_ROOT/agentic/code/frameworks/research-complete/inception/vision-document.md |
| NFR Specifications | @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/nfr/NFR-RF-specifications.md |
| Use Cases (UC-RF-001-010) | @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/use-cases/ |
| Risk Assessment | @$AIWG_ROOT/agentic/code/frameworks/research-complete/inception/initial-risk-assessment.md |
| AIWG Extension System | @$AIWG_ROOT/docs/extensions/overview.md |
| W3C PROV Overview | https://www.w3.org/TR/prov-overview/ |
| FAIR Principles | https://www.go-fair.org/fair-principles/ |
---
## 2. Architectural Goals and Constraints
### 2.1 Key Architectural Drivers
The architecture is driven by the following NFR categories (45 requirements across 9 categories):
| Priority | Driver | Source NFRs | Impact |
|----------|--------|-------------|--------|
| **Critical** | Provenance Compliance | NFR-RF-P-01 to P-08 | W3C PROV tracking for all operations |
| **Critical** | Quality Assessment | NFR-RF-Q-01 to Q-06 | FAIR/GRADE scoring for all sources |
| **Critical** | LLM Accuracy | NFR-RF-D-01 to D-08 | RAG pattern with hallucination detection |
| **High** | Performance | NFR-RF-A-01 to A-10 | <60s acquisition, <5min documentation |
| **High** | Usability | NFR-RF-U-01 to U-05 | Natural language commands, progress visibility |
| **Medium** | Integration | NFR-RF-I-01 to I-05 | Semantic Scholar, Zotero, Obsidian APIs |
### 2.2 Constraints
#### 2.2.1 Organizational Constraints
| Constraint | Description | Impact |
|------------|-------------|--------|
| C-01: Solo Developer | Single developer for v1.0 | Prioritize simplicity over sophistication |
| C-02: AIWG Integration | Must follow AIWG patterns | Use existing extension system |
| C-03: Open Source | MIT license, public repository | No proprietary dependencies |
#### 2.2.2 Technical Constraints
| Constraint | Description | Impact |
|------------|-------------|--------|
| C-04: CLI-First | Primary interface is command line | No web UI in v1.0 |
| C-05: Node.js Runtime | AIWG uses Node.js | TypeScript implementation |
| C-06: Local Storage | All data in `.aiwg/research/` | File-based, no database server |
| C-07: API Rate Limits | Semantic Scholar: 100 req/min | Implement rate limiting |
| C-08: LLM Token Costs | Claude/OpenAI API costs | Efficient prompt design |
#### 2.2.3 Quality Constraints
| Constraint | Description | Target |
|------------|-------------|--------|
| C-09: Reproducibility | External replication success | >90% |
| C-10: FAIR Compliance | Source quality assessment | >80% high/moderate |
| C-11: Citation Accuracy | Citation format correctness | >95% |
| C-12: Hallucination Detection | LLM output validation | >95% recall |
### 2.3 Architectural Principles
| Principle | Rationale |
|-----------|-----------|
| **P-01: Agent-Based Architecture** | Aligns with AIWG's multi-agent patterns |
| **P-02: File-Based Persistence** | Simple, portable, Git-friendly |
| **P-03: Pipeline Processing** | Clear data flow through stages |
| **P-04: Provenance by Default** | Every operation logged automatically |
| **P-05: Graceful Degradation** | Failures don't block workflow |
| **P-06: Extension Points** | Custom agents, templates, integrations |
---
## 3. System Overview
### 3.1 Context Diagram (C4 Level 1)
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYSTEM CONTEXT │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌──────────────────┐ │
│ │ Developer/ │ │ Semantic │ │
│ │ Researcher │ │ Scholar API │ │
│ │ (Primary) │ │ (External) │ │
│ └───────┬───────┘ └────────┬─────────┘ │
│ │ │ │
│ │ CLI Commands Paper Search │ │
│ │ Natural Language Metadata │ │
│ ▼ ▼ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ AIWG Research Framework │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Discovery │→ │ Acquisition │→ │Documentation│→ │ Citation │ │ │
│ │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ ▲ │ │
│ │ │ │ │
│ │ ┌───────┴───────┐ │ │
│ │ │ Provenance │ │ │
│ │ │ Agent │ │ │
│ │ └───────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ │ Summaries, Citations API Calls │ │
│ │ Knowledge Notes Metadata │ │
│ ▼ ▼ │
│ ┌───────────────┐ ┌──────────────────┐ │
│ │ SDLC Docs │ │ LLM API │ │
│ │ (.aiwg/) │ │ (Claude/OpenAI) │ │
│ └───────────────┘ └──────────────────┘ │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │
│ │ CrossRef │ │ arXiv │ │ Optional Integrations │ │
│ │ API │ │ API │ │ (Zotero, Obsidian) │ │
│ └───────────────┘ └───────────────┘ └───────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### 3.2 Container Diagram (C4 Level 2)
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AIWG Research Framework │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ CLI Interface Layer ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││
│ │ │ aiwg │ │ /research- │ │ Natural │ │ MCP Server │ ││
│ │ │ research │ │ slash │ │ Language │ │ (Protocol) │ ││
│ │ │ commands │ │ commands │ │ Parser │ │ │ ││
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ││
│ └─────────┼────────────────┼────────────────┼────────────────┼───────────┘│
│ └────────────────┴────────────────┴────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ Agent Orchestration Layer ││
│ │ ││
│ │ ┌───────────────────────────────────────────────────────────────────┐ ││
│ │ │ Research Orchestrator │ ││
│ │ │ - Workflow coordination │ ││
│ │ │ - Agent lifecycle management │ ││
│ │ │ - State machine (SDLC phases) │ ││
│ │ └───────────────────────────────────────────────────────────────────┘ ││
│ │ │ ││
│ │ ┌─────────┼─────────┬───────────────┬────────────────┬──────────────┐ ││
│ │ ▼ ▼ ▼ ▼ ▼ │ ││
│ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ ││
│ │ │Discov.│ │Acquis.│ │Docum. │ │Citat. │ │Proven.│ │ ││
│ │ │Agent │ │Agent │ │Agent │ │Agent │ │Agent │ │ ││
│ │ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ │ ││
│ │ │ │ │ │ │ │ ││
│ │ ▼ ▼ ▼ ▼ ▼ │ ││
│ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ ││
│ │ │Quality│ │Report │ │Integr.│ │Search │ │Metadata│ │ ││
│ │ │Agent │ │Agent │ │Agent │ │Agent │ │Agent │ │ ││
│ │ └───────┘ └───────┘ └───────┘ └───────┘ └─────────┘ │ ││
│ │ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ Service Layer ││
│ │ ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││
│ │ │ API Client │ │ PDF │ │ LLM │ │ Citation │ ││
│ │ │ Service │ │ Processor │ │ Service │ │ Formatter │ ││
│ │ │ │ │ │ │ (RAG) │ │ (CSL) │ ││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ││
│ │ ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││
│ │ │ FAIR │ │ GRADE │ │ Provenance │ │ Checksum │ ││
│ │ │ Validator │ │ Scorer │ │ Logger │ │ Service │ ││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ Data Access Layer ││
│ │ ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││
│ │ │ Source │ │ Knowledge │ │ Provenance │ │ Config │ ││
│ │ │ Repository │ │ Repository │ │ Repository │ │ Repository │ ││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ File System (Storage) ││
│ │ ││
│ │ .aiwg/research/ ││
│ │ ├── sources/ # PDFs and metadata ││
│ │ ├── knowledge/ # Summaries, extractions, notes ││
│ │ ├── discovery/ # Search results, acquisition queue ││
│ │ ├── provenance/ # W3C PROV logs, lineage graph ││
│ │ ├── networks/ # Citation network, knowledge graph ││
│ │ └── config/ # Framework configuration ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 4. Architectural Views
### 4.1 Logical View (Component Decomposition)
#### 4.1.1 Agent Components
The Research Framework employs 8 specialized agents following AIWG's agent pattern:
| Agent | Responsibility | Use Case(s) | Primary Tools |
|-------|---------------|-------------|---------------|
| **Discovery Agent** | Search academic databases, rank results | UC-RF-001 | Semantic Scholar API, CrossRef API |
| **Acquisition Agent** | Download PDFs, validate FAIR compliance | UC-RF-002 | HTTP client, PDF validator, FAIR scorer |
| **Documentation Agent** | LLM summarization, GRADE scoring | UC-RF-003 | LLM API (RAG), PDF extractor, GRADE scorer |
| **Citation Agent** | Format citations, build bibliography | UC-RF-004 | CSL processor, claims index, BibTeX exporter |
| **Provenance Agent** | Track operations, W3C PROV logging | UC-RF-005 | PROV-JSON logger, lineage graph |
| **Quality Agent** | FAIR validation, quality assessment | UC-RF-006 | FAIR validator, quality metrics |
| **Reporting Agent** | Status reports, progress tracking | UC-RF-007 | Report generator, metrics aggregator |
| **Integration Agent** | External tool sync (Zotero, Obsidian) | UC-RF-008-010 | Zotero API, Obsidian vault writer |
#### 4.1.2 Agent Component Diagram
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Agent Layer │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Research Orchestrator │ │
│ │ │ │
│ │ - Manages agent lifecycle (spawn, coordinate, terminate) │ │
│ │ - Implements research workflow state machine │ │
│ │ - Handles inter-agent communication │ │
│ │ - Enforces quality gates between stages │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ orchestrates │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ ││
│ │ Stage 1: Discovery Stage 2: Acquisition Stage 3: Doc ││
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ ││
│ │ │ Discovery Agent │ → │ Acquisition │ → │ Documentation │ ││
│ │ │ │ │ Agent │ │ Agent │ ││
│ │ │ - searchPapers │ │ - downloadPDF │ │ - summarize │ ││
│ │ │ - rankResults │ │ - extractMeta │ │ - extractData │ ││
│ │ │ - buildQueue │ │ - validateFAIR │ │ - gradeSource │ ││
│ │ │ - savePRISMA │ │ - assignREFID │ │ - createNote │ ││
│ │ └─────────────────┘ └─────────────────┘ └────────────────┘ ││
│ │ ││
│ │ Stage 4: Citation Cross-Cutting: Provenance ││
│ │ ┌─────────────────┐ ┌─────────────────────────────────────────┐ ││
│ │ │ Citation Agent │ │ Provenance Agent │ ││
│ │ │ │ │ │ ││
│ │ │ - formatCitation│ │ - logOperation (all stages) │ ││
│ │ │ - insertClaim │ │ - buildLineageGraph │ ││
│ │ │ - buildBiblio │ │ - computeChecksums │ ││
│ │ │ - exportBibTeX │ │ - validatePROV │ ││
│ │ └─────────────────┘ └─────────────────────────────────────────┘ ││
│ │ ││
│ │ Supporting Agents ││
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ││
│ │ │ Quality Agent │ │ Reporting │ │ Integration │ ││
│ │ │ │ │ Agent │ │ Agent │ ││
│ │ │ - validateFAIR│ │ - genStatus │ │ - syncZotero │ ││
│ │ │ - auditQuality│ │ - genProgress │ │ - syncObsidian│ ││
│ │ └───────────────┘ └───────────────┘ └───────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
#### 4.1.3 Service Components
| Service | Responsibility | Interfaces |
|---------|---------------|------------|
| **APIClientService** | HTTP client for external APIs | SemanticScholarClient, CrossRefClient, ArXivClient |
| **PDFProcessorService** | PDF download, extraction, validation | download(), extractText(), validateMagicBytes() |
| **LLMService** | RAG-based summarization, hallucination detection | summarize(), extractData(), validateGrounding() |
| **CitationFormatterService** | CSL-based citation formatting | formatInline(), formatFull(), exportBibTeX() |
| **FAIRValidatorService** | FAIR compliance scoring | scoreFindable(), scoreAccessible(), scoreInteroperable(), scoreReusable() |
| **GRADEScorerService** | Evidence quality scoring | scoreRiskOfBias(), scoreConsistency(), scoreDirectness(), scorePrecision() |
| **ProvenanceLoggerService** | W3C PROV-JSON logging | logEntity(), logActivity(), logAgent(), logRelationship() |
| **ChecksumService** | SHA-256 hash computation and verification | compute(), verify() |
#### 4.1.4 Component Interfaces
```typescript
// Agent Interface (all agents implement)
interface ResearchAgent {
readonly id: string;
readonly name: string;
readonly capabilities: string[];
execute(context: AgentContext): Promise<AgentResult>;
validate(input: unknown): ValidationResult;
rollback(context: AgentContext): Promise<void>;
}
// Discovery Agent Interface
interface DiscoveryAgent extends ResearchAgent {
searchPapers(query: SearchQuery): Promise<SearchResults>;
rankResults(results: SearchResults, criteria: RankingCriteria): Promise<RankedResults>;
buildAcquisitionQueue(selected: Paper[]): Promise<AcquisitionQueue>;
savePRISMAFlow(protocol: PRISMAProtocol): Promise<void>;
}
// Acquisition Agent Interface
interface AcquisitionAgent extends ResearchAgent {
downloadPDF(paper: Paper): Promise<PDFFile>;
extractMetadata(pdf: PDFFile): Promise<Metadata>;
validateFAIR(metadata: Metadata): Promise<FAIRScore>;
assignREFID(paper: Paper): Promise<string>;
}
// Documentation Agent Interface
interface DocumentationAgent extends ResearchAgent {
summarize(source: AcquiredSource, llm: LLMConfig): Promise<Summary>;
extractStructuredData(source: AcquiredSource): Promise<Extraction>;
gradeSource(source: AcquiredSource): Promise<GRADEScore>;
createLiteratureNote(source: AcquiredSource): Promise<LiteratureNote>;
}
// Citation Agent Interface
interface CitationAgent extends ResearchAgent {
formatCitation(source: AcquiredSource, style: CitationStyle): Promise<Citation>;
insertClaim(claim: Claim, source: AcquiredSource, document: Document): Promise<void>;
buildBibliography(sources: AcquiredSource[]): Promise<Bibliography>;
exportBibTeX(bibliography: Bibliography): Promise<string>;
}
// Provenance Agent Interface
interface ProvenanceAgent extends ResearchAgent {
logOperation(operation: Operation): Promise<PROVRecord>;
buildLineageGraph(entity: Entity): Promise<LineageGraph>;
computeChecksum(file: File): Promise<string>;
validatePROV(record: PROVRecord): Promise<ValidationResult>;
exportReproducibilityPackage(): Promise<Package>;
}
```
### 4.2 Process View (Runtime Behavior)
#### 4.2.1 Research Workflow State Machine
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Research Workflow States │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────┐ │
│ │ INITIAL │ │
│ └─────┬─────┘ │
│ │ aiwg use research │
│ ▼ │
│ ┌───────────┐ │
│ │ READY │◄──────────────────────────────────────────────────┐ │
│ └─────┬─────┘ │ │
│ │ aiwg research search │ │
│ ▼ │ │
│ ┌───────────┐ ┌───────────┐ │ │
│ │DISCOVERING│──▶│ DISCOVERY │ │ │
│ │ │ │ COMPLETE │ │ │
│ └─────┬─────┘ └─────┬─────┘ │ │
│ │ │ aiwg research acquire │ │
│ ▼ error ▼ │ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ ERROR │ │ ACQUIRING │──▶│ACQUISITION│ │ │
│ │ │ │ │ │ COMPLETE │ │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │
│ │ │ │ aiwg research summarize │ │
│ ▼ retry ▼ error ▼ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ └────────▶│ READY │ │DOCUMENTING│──▶│ DOC │ │ │
│ │ │ │ │ │ COMPLETE │ │ │
│ └───────────┘ └─────┬─────┘ └─────┬─────┘ │ │
│ │ │ cite │ │
│ ▼ error ▼ │ │
│ ┌───────────┐ ┌───────────┐ │ │
│ │ ERROR │ │ CITING │──▶│ │
│ └───────────┘ │ │ │ │
│ └─────┬─────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────┐ │ │
│ │ CITATION │───┘ │
│ │ COMPLETE │ │
│ └───────────┘ │
│ │
│ Legend: │
│ ────────▶ State transition │
│ ◄──────── Error recovery │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
#### 4.2.2 Agent Orchestration Pattern
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Agent Orchestration Sequence │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ User Orchestrator Discovery Provenance API │
│ │ │ │ │ │ │
│ │ search "OAuth" │ │ │ │ │
│ │───────────────▶│ │ │ │ │
│ │ │ │ │ │ │
│ │ │ spawn │ │ │ │
│ │ │───────────────▶│ │ │ │
│ │ │ │ │ │ │
│ │ │ │ logStart │ │ │
│ │ │ │─────────────▶│ │ │
│ │ │ │ │ │ │
│ │ │ │ searchPapers │ │ │
│ │ │ │─────────────────────────▶│ │
│ │ │ │ │ │ │
│ │ │ │◀─────────────────────────│ │
│ │ │ │ │ results │ │
│ │ │ │ │ │ │
│ │ │ │ logEntity │ │ │
│ │ │ │─────────────▶│ │ │
│ │ │ │ │ │ │
│ │ │◀───────────────│ │ │ │
│ │ │ results │ │ │ │
│ │ │ │ │ │ │
│ │◀───────────────│ │ │ │ │
│ │ display │ │ │ │ │
│ │ │ │ │ │ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
#### 4.2.3 Concurrency Model
| Operation Type | Concurrency Approach | Rationale |
|----------------|---------------------|-----------|
| **API Calls** | Parallel (rate-limited) | Maximize throughput within limits |
| **PDF Downloads** | 5 concurrent connections | Balance speed vs. server load |
| **LLM Summarization** | Sequential | Control token costs, avoid rate limits |
| **Provenance Logging** | Async (non-blocking) | Don't slow primary operations |
| **File I/O** | Sequential with mutex | Prevent corruption |
```typescript
// Concurrency configuration
const ConcurrencyConfig = {
api: {
semanticScholar: { maxConcurrent: 5, rateLimit: 100 }, // 100 req/min
crossRef: { maxConcurrent: 3, rateLimit: 50 },
arXiv: { maxConcurrent: 2, rateLimit: 30 }
},
download: {
maxConcurrent: 5,
retryAttempts: 3,
exponentialBackoff: [5000, 10000, 20000] // ms
},
llm: {
maxConcurrent: 1, // Sequential to control costs
timeout: 120000 // 2 minutes per request
},
provenance: {
mode: 'async',
bufferSize: 100, // Buffer before flush
flushInterval: 5000 // 5 seconds
}
};
```
### 4.3 Data View (Data Model)
#### 4.3.1 Directory Structure
```
.aiwg/research/
├── sources/ # Acquired research materials
│ ├── pdfs/ # PDF files
│ │ └── REF-001-oauth-security.pdf
│ ├── metadata/ # Source metadata (JSON)
│ │ └── REF-001-metadata.json
│ └── checksums.txt # SHA-256 hashes
│
├── knowledge/ # Processed knowledge
│ ├── summaries/ # LLM-generated summaries
│ │ └── REF-001-summary.md
│ ├── extractions/ # Structured data extractions
│ │ └── REF-001-extraction.json
│ ├── notes/ # Literature notes (Zettelkasten)
│ │ ├── REF-001-literature-note.md
│ │ └── permanent/ # Permanent notes (synthesis)
│ │ └── llm-caching-patterns.md
│ ├── maps/ # Maps of Content
│ │ └── llm-evaluation-methods.md
│ └── claims-index.md # Claims tracking
│
├── discovery/ # Discovery artifacts
│ ├── search-results/ # Raw API responses
│ │ └── search-2026-01-25T10-30-00.json
│ ├── acquisition-queue.json # Papers queued for download
│ └── prisma-protocols/ # PRISMA flow documents
│ └── oauth-security-review.md
│
├── provenance/ # Provenance tracking
│ ├── prov-2026-01.json # Monthly PROV-JSON logs
│ ├── lineage-graph.json # Entity relationships
│ ├── failed-logs/ # Recovery for failed logs
│ └── reproducibility-packages/ # Export packages
│
├── networks/ # Relationship graphs
│ ├── citation-network.json # Paper citation relationships
│ └── knowledge-graph.json # Concept connections
│
├── bibliography.md # Generated bibliography
├── bibliography.bib # BibTeX export
│
└── config/ # Framework configuration
├── research-config.json # Framework settings
├── api-credentials.json # API keys (gitignored)
└── citation-styles/ # Custom CSL files
```
#### 4.3.2 Entity Relationship Model
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Entity Relationships │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ SearchQuery │ │ SearchResult │ │
│ │ │ 1 * │ │ │
│ │ - query │──────────▶│ - paperId │ │
│ │ - filters │ │ - title │ │
│ │ - timestamp │ │ - authors │ │
│ │ - source │ │ - relevance │ │
│ └───────────────┘ └───────┬───────┘ │
│ │ selected │
│ ▼ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ AcquisitionQ │ * 1 │ AcquiredSource│ │
│ │ │◀─────────│ │ │
│ │ - queuedAt │ │ - refId │─────────┐ │
│ │ - priority │ │ - pdfPath │ │ │
│ │ - status │ │ - metadata │ │ │
│ └───────────────┘ │ - fairScore │ │ │
│ │ - checksum │ │ │
│ └───────┬───────┘ │ │
│ │ documents │ │
│ ▼ │ │
│ ┌───────────────┐ │ │
│ │ Summary │ │ │
│ │ │ │ │
│ │ - executive │ │ │
│ │ - findings │ │ │
│ │ - methodology │ │ │
│ │ - gradeScore │ │ │
│ └───────┬───────┘ │ │
│ │ extracts │ │
│ ▼ │ │
│ ┌───────────────┐ │ │
│ │ Extraction │ │ │
│ │ │ │ │
│ │ - claims[] │ │ │
│ │ - methods[] │ │ │
│ │ - datasets[] │ │ │
│ │ - findings[] │ │ │
│ └───────┬───────┘ │ │
│ │ backs │ cites │
│ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ LiteratureNote│◀──────────│ Claim │ │ Citation │ │
│ │ │ references│ │ │ │ │
│ │ - title │ │ - text │ │ - inline │ │
│ │ - source │ │ - status │ │ - full │ │
│ │ - tags[] │ │ - sourceRef │ │ - style │ │
│ │ - keyPoints[] │ │ - documentLoc │ │ - sourceRef │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ PROVRecord │ │ LineageEdge │ │
│ │ │ 1 * │ │ │
│ │ - entity │──────────▶│ - source │ │
│ │ - activity │ │ - target │ │
│ │ - agent │ │ - relationship│ │
│ │ - timestamp │ │ - timestamp │ │
│ └───────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
#### 4.3.3 Data Schemas
**Source Metadata Schema (JSON)**
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["ref_id", "title", "authors", "year", "acquisition_timestamp"],
"properties": {
"ref_id": {
"type": "string",
"pattern": "^REF-\\d{3}$",
"description": "Persistent identifier (REF-001 to REF-999)"
},
"title": { "type": "string", "minLength": 1 },
"title_slug": { "type": "string", "pattern": "^[a-z0-9-]+$" },
"authors": {
"type": "array",
"items": {
"type": "object",
"required": ["name"],
"properties": {
"name": { "type": "string" },
"affiliation": { "type": "string" }
}
}
},
"year": { "type": "integer", "minimum": 1900, "maximum": 2100 },
"venue": { "type": "string" },
"venue_tier": { "type": "string", "enum": ["A*", "A", "B", "C", "preprint"] },
"doi": { "type": "string", "pattern": "^10\\.\\d+/.+" },
"abstract": { "type": "string" },
"license": { "type": "string" },
"url": { "type": "string", "format": "uri" },
"pdf_url": { "type": "string", "format": "uri" },
"citations": { "type": "integer", "minimum": 0 },
"acquisition_timestamp": { "type": "string", "format": "date-time" },
"acquisition_source": {
"type": "string",
"enum": ["semantic-scholar-api", "crossref-api", "arxiv-api", "manual"]
},
"fair_score": {
"type": "object",
"properties": {
"findable": { "type": "integer", "minimum": 0, "maximum": 100 },
"accessible": { "type": "integer", "minimum": 0, "maximum": 100 },
"interoperable": { "type": "integer", "minimum": 0, "maximum": 100 },
"reusable": { "type": "integer", "minimum": 0, "maximum": 100 },
"overall": { "type": "integer", "minimum": 0, "maximum": 100 }
}
},
"checksum_sha256": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"file_size_bytes": { "type": "integer", "minimum": 0 },
"provenance": {
"type": "object",
"properties": {
"discovery_query": { "type": "string" },
"discovery_timestamp": { "type": "string", "format": "date-time" },
"selected_by": { "type": "string" }
}
}
}
}
```
**Structured Extraction Schema (JSON)**
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["ref_id", "extraction_timestamp", "claims"],
"properties": {
"ref_id": { "type": "string", "pattern": "^REF-\\d{3}$" },
"extraction_timestamp": { "type": "string", "format": "date-time" },
"llm_model": { "type": "string" },
"claims": {
"type": "array",
"items": { "type": "string" },
"minItems": 1
},
"methods": {
"type": "array",
"items": { "type": "string" }
},
"datasets": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"size": { "type": "string" },
"source": { "type": "string" }
}
}
},
"metrics": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"baseline": { "type": "string" },
"intervention": { "type": "string" }
}
}
},
"findings": {
"type": "array",
"items": {
"type": "object",
"properties": {
"claim": { "type": "string" },
"statistic": { "type": "string" },
"confidence_interval": { "type": "string" }
}
}
},
"related_work": {
"type": "array",
"items": { "type": "string" }
}
}
}
```
**W3C PROV Record Schema (JSON-LD subset)**
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["prefix", "entity", "activity", "agent"],
"properties": {
"prefix": {
"type": "object",
"required": ["prov", "aiwg"],
"properties": {
"prov": { "const": "http://www.w3.org/ns/prov#" },
"aiwg": { "const": "https://aiwg.io/research#" }
}
},
"entity": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": ["prov:type"],
"properties": {
"prov:type": { "const": "prov:Entity" },
"aiwg:entityType": { "type": "string" },
"aiwg:filePath": { "type": "string" },
"aiwg:checksum": { "type": "string" },
"prov:generatedAtTime": { "type": "string", "format": "date-time" }
}
}
},
"activity": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": ["prov:type", "prov:startTime", "prov:endTime"],
"properties": {
"prov:type": { "const": "prov:Activity" },
"aiwg:activityType": { "type": "string" },
"aiwg:command": { "type": "string" },
"prov:startTime": { "type": "string", "format": "date-time" },
"prov:endTime": { "type": "string", "format": "date-time" }
}
}
},
"agent": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": ["prov:type"],
"properties": {
"prov:type": {
"type": "string",
"enum": ["prov:Agent", "prov:SoftwareAgent"]
},
"aiwg:agentType": { "type": "string" },
"aiwg:version": { "type": "string" }
}
}
},
"wasGeneratedBy": { "type": "object" },
"used": { "type": "object" },
"wasAssociatedWith": { "type": "object" },
"wasAttributedTo": { "type": "object" },
"wasDerivedFrom": { "type": "object" }
}
}
```
### 4.4 Deployment View
#### 4.4.1 Installation Model
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Deployment Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Developer Machine │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Node.js Runtime (v18+) │ │ │
│ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ AIWG CLI │ │ │ │
│ │ │ │ npm install -g aiwg │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ │
│ │ │ │ │ Research Framework Plugin │ │ │ │ │
│ │ │ │ │ aiwg use research │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ • research-discovery-agent │ │ │ │ │
│ │ │ │ │ • research-acquisition-agent │ │ │ │ │
│ │ │ │ │ • research-documentation-agent │ │ │ │ │
│ │ │ │ │ • research-citation-agent │ │ │ │ │
│ │ │ │ │ • research-provenance-agent │ │ │ │ │
│ │ │ │ │ • research-quality-agent │ │ │ │ │
│ │ │ │ │ • research-reporting-agent │ │ │ │ │
│ │ │ │ │ • research-integration-agent │ │ │ │ │
│ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │
│ │ │ └───────────────────────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Project Repo │ │ API Credentials │ │ Optional Tools │ │ │
│ │ │ .aiwg/research/ │ │ (environment) │ │ │ │ │
│ │ │ │ │ │ │ • Tesseract OCR │ │ │
│ │ │ • sources/ │ │ • SEMANTIC_KEY │ │ • pdftotext │ │ │
│ │ │ • knowledge/ │ │ • OPENAI_KEY │ │ • GraphViz │ │ │
│ │ │ • provenance/ │ │ • ANTHROPIC_KEY │ │ │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ External Services (Cloud) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Semantic │ │ CrossRef │ │ arXiv │ │ LLM API │ │ │
│ │ │ Scholar API │ │ API │ │ API │ │(Claude/GPT) │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ Rate: 100/m │ │ Rate: 50/m │ │ Rate: 30/m │ │ Token-based │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Optional Integrations │
│ ┌────────────────────────────