aiwg

Version:

Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo

aiwg.io

jmagly/aiwg

492 lines (371 loc) • 15.5 kB

Markdown

# Research Framework **Version:** 1.0.0 **Status:** Beta **Installation:** `aiwg use research` ## Overview The Research Framework provides a complete workflow for academic research management: discovering papers via semantic search, acquiring PDFs with FAIR validation, documenting papers using RAG-grounded LLM summarization, and synthesizing knowledge into literature notes. It automates 75% of the research process while maintaining high quality standards and preventing hallucinations. ## Why Use This Framework | Problem | Solution | Time Savings | |---------|----------|--------------| | Manual paper search | Semantic search + gap detection | 60% faster | | Tracking paper sources | Persistent REF-XXX identifiers | Always organized | | Paywalled papers | Unpaywall integration + manual upload | Streamlined access | | Reading papers | RAG-based summarization | 75% faster (20min → 5min) | | Preventing hallucinations | Claim validation against source | 0% vs 56% hallucination rate | | Quality assessment | Automated GRADE scoring | Consistent standards | | Note organization | Zettelkasten literature notes | Scalable knowledge base | ## Research Workflow Stages The framework implements an 8-stage workflow: ``` 1. Discovery → Find papers via semantic search 2. Acquisition → Download PDFs and extract metadata 3. Documentation → Summarize using RAG and extract structured data 4. Citation → Track citations and generate bibliographies 5. Quality → Assess source quality using GRADE methodology 6. Synthesis → Create literature notes and link related work 7. Gap Analysis → Identify research gaps and contradictions 8. Archival → Version control and corpus management ``` **Current Implementation:** All 8 stages (Discovery, Acquisition, Documentation, Citation, Quality, Archival, Provenance, Workflow) ## Agent Catalog ### Implemented Agents (v1.0.0) | Agent | Purpose | Model | Tools | |-------|---------|-------|-------| | **Discovery Agent** | Search academic databases, rank by relevance, detect gaps | sonnet | Bash, Read, Write, Grep, Glob, WebFetch | | **Acquisition Agent** | Download PDFs, validate FAIR compliance, assign REF-XXX IDs | sonnet | Bash, Read, Write, Glob, Grep, WebFetch | | **Documentation Agent** | RAG summarization, structured extraction, literature notes | sonnet | Bash, Read, Write, Grep, Glob, WebFetch | | **Citation Agent** | Format citations (9000+ styles), build citation networks, manage bibliographies | sonnet | Bash, Read, Write, Grep, Glob, WebFetch | | **Quality Agent** | GRADE assessments, FAIR validation, multi-dimensional quality scoring | sonnet | Bash, Read, Write, Grep, Glob, WebFetch | | **Archival Agent** | OAIS-compliant archival, SIP/AIP/DIP packages, integrity verification | sonnet | Bash, Read, Write, Glob, Grep | | **Provenance Agent** | W3C PROV logging, lineage tracking, reproducibility packages | sonnet | Bash, Read, Write, Glob, Grep | | **Workflow Agent** | DAG-based orchestration, multi-stage pipelines, progress tracking | sonnet | Bash, Read, Write, Glob, Grep, Task | ## Installation ### Quick Start ```bash # Install AIWG CLI npm install -g aiwg # Deploy Research Framework aiwg use research # Verify installation aiwg research --version ``` ### What Gets Installed ``` .claude/agents/ ├── discovery-agent.md # Semantic search and gap detection ├── research-acquisition-agent.md # PDF download and FAIR validation ├── documentation-agent.md # RAG summarization and note creation ├── citation-agent.md # Citation formatting and networks ├── quality-agent.md # GRADE assessments and FAIR validation ├── archival-agent.md # OAIS-compliant archival ├── provenance-agent.md # W3C PROV lineage tracking └── workflow-agent.md # Multi-stage orchestration .aiwg/research/ ├── discovery/ # Search results and strategies ├── sources/ # PDFs and metadata │ ├── pdfs/ │ ├── metadata/ │ └── checksums.txt ├── knowledge/ # Summaries and notes │ ├── summaries/ │ ├── extractions/ │ └── notes/ └── config/ # Agent configurations ├── discovery-agent.yaml ├── research-acquisition-agent.yaml ├── documentation-agent.yaml ├── citation-agent.yaml ├── quality-agent.yaml ├── archival-agent.yaml ├── provenance-agent.yaml └── workflow-agent.yaml ``` ## Usage Examples ### 1. Discover Papers ```bash # Basic search aiwg research search "OAuth2 security best practices" # Advanced search with filters aiwg research search "LLM agent safety" --year 2022-2024 --venue conference --limit 100 # With citation network traversal aiwg research search "retrieval augmented generation" --citation-network # Systematic review (PRISMA-compliant) aiwg research discover --preregister ``` **Output:** - `.aiwg/research/discovery/search-results-{timestamp}.json` - `.aiwg/research/discovery/search-strategy.md` - `.aiwg/research/discovery/gap-report-{timestamp}.md` - `.aiwg/research/discovery/acquisition-queue.json` ### 2. Acquire Papers ```bash # Single paper aiwg research acquire REF-025 # Bulk acquisition from discovery queue aiwg research acquire --from-queue # Manual upload for paywalled paper aiwg research acquire --upload /path/to/paper.pdf --ref REF-027 # Retry failed downloads aiwg research acquire --retry-failed ``` **Output:** - `.aiwg/research/sources/pdfs/REF-XXX-{slug}.pdf` - `.aiwg/research/sources/metadata/REF-XXX-metadata.json` - `.aiwg/research/sources/acquisition-report-{timestamp}.md` - `.aiwg/research/sources/checksums.txt` (updated) ### 3. Document Papers ```bash # Summarize single paper aiwg research summarize REF-025 # Progressive summarization (multi-level) aiwg research summarize REF-025 --progressive # Bulk documentation aiwg research summarize --from-acquired # Create permanent note from literature note aiwg research note-create --permanent --based-on REF-025 # Generate Map of Content (topic overview) aiwg research moc-create "LLM Evaluation Methods" ``` **Output:** - `.aiwg/research/knowledge/summaries/REF-XXX-summary.md` - `.aiwg/research/knowledge/extractions/REF-XXX-extraction.json` - `.aiwg/research/knowledge/notes/REF-XXX-literature-note.md` ## Complete Workflow Example ```bash # Step 1: Discover papers on LLM agent safety aiwg research search "LLM agent safety alignment" --year 2022-2024 --limit 50 # Output: Found 50 papers, identified 2 research gaps, created acquisition queue # Step 2: Acquire papers from queue aiwg research acquire --from-queue # Output: Acquired 45/50 papers (90%), 5 paywalled (flagged for manual upload) # Step 3: Document acquired papers aiwg research summarize --from-acquired # Output: Generated summaries for 45 papers, average time 4min 30sec per paper # Total time saved: ~11 hours vs manual reading # Step 4: Review gap analysis cat .aiwg/research/discovery/gap-report-latest.md # Output: Identifies under-researched topics for future investigation ``` ## Key Features ### Semantic Search with Gap Detection - Natural language queries (no boolean operators required) - Relevance ranking: similarity (40%), citations (30%), venue (20%), recency (10%) - Automatic gap detection via topic clustering - Citation network traversal (forward and backward) - PRISMA-compliant search documentation ### FAIR-Compliant Acquisition - REF-XXX persistent identifiers - SHA-256 checksum integrity verification - FAIR scoring: Findable, Accessible, Interoperable, Reusable (0-100) - Shared corpus deduplication (avoid duplicate downloads) - Paywalled paper workflow with manual upload support ### RAG-Based Documentation - **Zero hallucination target**: All claims validated against source text - Multi-level summarization: 1-sentence, 1-paragraph, 1-page - Structured extraction: Claims, methods, datasets, findings - GRADE quality assessment (High/Moderate/Low/Very Low) - Zettelkasten literature notes with atomic note principles ## Configuration ### Discovery Agent Settings ```yaml # .aiwg/research/config/discovery-agent.yaml discovery_agent: api: primary: semantic-scholar fallback: [arxiv, crossref] timeout_ms: 30000 ranking: relevance_weight: 0.40 citation_weight: 0.30 venue_weight: 0.20 recency_weight: 0.10 gap_detection: sparse_cluster_threshold: 5 contradiction_variance: 0.50 ``` ### Acquisition Agent Settings ```yaml # .aiwg/research/config/research-acquisition-agent.yaml acquisition_agent: download: timeout_seconds: 60 concurrent_downloads: 5 fair_scoring: findable: doi_present: 40 metadata_complete: 50 accessible: persistent_url: 50 clear_license: 50 ``` ### Documentation Agent Settings ```yaml # .aiwg/research/config/documentation-agent.yaml documentation_agent: llm: default_model: claude-opus-4 fallback_model: claude-sonnet-4 temperature: 0.3 hallucination: enabled: true confidence_threshold: 0.9 user_review_required: true ``` ## Architecture ### Data Flow ``` Discovery Agent ↓ (search results + acquisition queue) Acquisition Agent ↓ (PDFs + metadata + checksums) Documentation Agent ↓ (summaries + extractions + literature notes) Knowledge Base ``` ### Storage Structure ``` .aiwg/research/ ├── discovery/ # Stage 1: Discovery │ ├── search-results-*.json │ ├── search-strategy.md │ ├── gap-report-*.md │ └── acquisition-queue.json │ ├── sources/ # Stage 2: Acquisition │ ├── pdfs/ │ │ └── REF-XXX-{slug}.pdf │ ├── metadata/ │ │ └── REF-XXX-metadata.json │ ├── checksums.txt │ └── ref-counter.txt │ ├── knowledge/ # Stage 3: Documentation │ ├── summaries/ │ │ └── REF-XXX-summary.md │ ├── extractions/ │ │ └── REF-XXX-extraction.json │ └── notes/ │ ├── REF-XXX-literature-note.md │ └── permanent-*.md │ └── config/ # Configuration ├── discovery-agent.yaml ├── research-acquisition-agent.yaml └── documentation-agent.yaml ``` ## Quality Standards ### Discovery Quality Targets - Search completion time: <10 seconds for 100 results - Gap analysis time: <30 seconds - API success rate: >95% - Relevance precision: >80% (user-validated) ### Acquisition Quality Targets - Download time per paper: <60 seconds (median) - Success rate: >90% - FAIR score average: >80/100 - Duplicate detection: 100% accuracy ### Documentation Quality Targets - Summarization time: <5 minutes per paper - Hallucination detection rate: >95% recall - Extraction completeness: >90% fields populated - GRADE consistency: >80% agreement with expert assessments ## Research Foundations This framework's design is grounded in academic research: | Practice | Source | Key Finding | |----------|--------|-------------| | Semantic Search | Semantic Scholar API | 200M+ papers indexed, relevance ranking | | Gap Detection | Systematic review methodology | Cluster analysis identifies under-researched topics | | FAIR Principles | GO-FAIR Initiative | Findability, Accessibility, Interoperability, Reusability | | RAG Pattern | Lewis et al. (2020) | Retrieval-Augmented Generation reduces hallucinations | | GRADE Framework | GRADE Working Group | Standardized quality assessment for evidence | | Zettelkasten | Ahrens (2017) | Atomic notes with linking enables knowledge synthesis | ## Integration with SDLC Framework The Research Framework can be used alongside the SDLC Framework: ```bash # During Inception: Discover research on requirements elicitation aiwg research search "requirements elicitation agile" --year 2018-2024 # During Elaboration: Research architectural patterns aiwg research search "microservices patterns" --venue conference # During Construction: Find test automation best practices aiwg research search "test automation best practices" --citation-network # Documentation: Cite acquired research aiwg research cite REF-025 --format bibtex ``` ## Troubleshooting ### Discovery Issues **Problem:** Empty search results ```bash # Check API connectivity aiwg research discovery --health-check # Try alternative query formulation aiwg research search "your query" --refine-from last ``` **Problem:** Rate limit hit ```bash # Configure API key for higher limits export SEMANTIC_SCHOLAR_API_KEY="your-api-key" # Use fallback APIs # Automatic fallback to arXiv and CrossRef ``` ### Acquisition Issues **Problem:** Download failures ```bash # Retry failed downloads aiwg research acquire --retry-failed # Check disk space df -h .aiwg/research/sources/ ``` **Problem:** Paywalled papers ```bash # Manual upload workflow aiwg research acquire --upload /path/to/paper.pdf --ref REF-XXX ``` ### Documentation Issues **Problem:** Hallucination detected ```bash # Review flagged content cat .aiwg/research/knowledge/logs/REF-XXX-hallucination-detected.log # Regenerate with stricter validation aiwg research summarize REF-XXX --strict-validation ``` **Problem:** Poor text extraction ```bash # Enable OCR for scanned PDFs aiwg research summarize REF-XXX --ocr ``` ## Roadmap ### v1.0.0 (Current - Beta) - ✅ Discovery Agent (semantic search, gap detection, citation network traversal) - ✅ Acquisition Agent (PDF download, FAIR validation, REF-XXX assignment) - ✅ Documentation Agent (RAG summarization, literature notes, concept mapping) - ✅ Citation Agent (9000+ styles, citation networks, bibliography management) - ✅ Quality Agent (GRADE assessments, FAIR validation, quality reporting) - ✅ Archival Agent (OAIS compliance, SIP/AIP/DIP packages, integrity verification) - ✅ Provenance Agent (W3C PROV logging, lineage tracking, reproducibility) - ✅ Workflow Agent (DAG orchestration, parallel execution, failure recovery) ### v1.1.0 (Q2 2026) - Enhanced gap detection (contradiction analysis) - Multi-language support (non-English papers) - Corpus health monitoring dashboard ### v1.2.0 (Q3 2026) - Synthesis Agent (permanent notes, knowledge graphs) - ML-based recommendation engine - Advanced deduplication with fuzzy matching ## Contributing Research Framework is part of the AIWG project. See main project documentation for contribution guidelines. ## References - @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/ - Complete elaboration specifications - @$AIWG_ROOT/agentic/code/frameworks/research-complete/inception/vision-document.md - Framework vision and goals - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/thought-protocol.md - Agent reasoning protocol - [Semantic Scholar API](https://www.semanticscholar.org/product/api) - [FAIR Principles](https://www.go-fair.org/fair-principles/) - [GRADE Framework](https://www.gradeworkinggroup.org/) ## Support - **Documentation**: `@$AIWG_ROOT/agentic/code/frameworks/research-complete/` - **Issues**: https://github.com/jmagly/aiwg/issues - **Discord**: https://discord.gg/BuAusFMxdA - **Telegram**: https://t.me/+oJg9w2lE6A5lOGFh --- **License:** MIT **Maintainers:** AIWG Research Team **Last Updated:** 2026-02-03