aiwg
Version:
Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo
481 lines (388 loc) • 15.8 kB
Markdown
# Agent Specification: Documentation Agent
## 1. Agent Overview
| Attribute | Value |
|-----------|-------|
| **Name** | Documentation Agent |
| **ID** | research-documentation-agent |
| **Purpose** | Summarize papers using LLM with RAG pattern, extract structured data, grade source quality, and create Zettelkasten-style literature notes |
| **Lifecycle Stage** | Documentation (Stage 3 of Research Framework) |
| **Model** | opus (for summarization quality) or sonnet (for efficiency) |
| **Version** | 1.0.0 |
| **Status** | Draft |
### Description
The Documentation Agent transforms raw PDFs into actionable knowledge. It extracts text from PDFs, generates summaries using RAG (Retrieval-Augmented Generation) to prevent hallucinations, extracts structured data (claims, methods, findings), calculates GRADE-inspired quality scores, and creates Zettelkasten literature notes with proper attribution. The agent achieves 75% time savings compared to manual documentation (5 minutes vs. 20 minutes per paper).
## 2. Capabilities
### Primary Capabilities
| Capability | Description | NFR Reference |
|------------|-------------|---------------|
| PDF Text Extraction | Extract readable text preserving structure | NFR-RF-D-03 |
| RAG Summarization | Generate summaries grounded in source text | NFR-RF-D-01 |
| Hallucination Detection | Validate claims against source content | NFR-RF-D-02 |
| Structured Extraction | Extract claims, methods, datasets, findings | NFR-RF-D-03 |
| GRADE Scoring | Assess evidence quality (risk of bias, consistency, etc.) | NFR-RF-D-05 |
| Literature Notes | Create atomic, tagged, linked notes (Zettelkasten) | NFR-RF-D-06 |
### Secondary Capabilities
| Capability | Description |
|------------|-------------|
| Progressive Summarization | Generate multi-level summaries (1-page, 1-paragraph, 1-sentence) |
| OCR Fallback | Extract text from scanned/image-based PDFs |
| Bulk Processing | Document multiple papers sequentially |
| Map of Content | Generate topic-based indexes across notes |
## 3. Tools
### Required Tools
| Tool | Purpose | Permission |
|------|---------|------------|
| Bash | Execute PDF tools, manage files | Execute |
| Read | Access PDFs, metadata, existing notes | Read |
| Write | Save summaries, extractions, notes | Write |
| Glob | Find related notes for linking | Read |
| Grep | Search for hallucination validation | Read |
### System Tools
| Tool | Purpose | Required |
|------|---------|----------|
| `pdftotext` | PDF text extraction (poppler-utils) | Yes |
| `tesseract` | OCR for scanned PDFs | Optional |
| `PyPDF2` | Python PDF manipulation | Optional |
### LLM Integration
| Model | Purpose | Settings |
|-------|---------|----------|
| Claude (opus/sonnet) | Summarization, extraction | Temperature: 0.3 |
| Local LLM (optional) | Fallback for privacy/cost | Temperature: 0.3 |
## 4. Triggers
### Automatic Triggers
| Trigger | Condition | Action |
|---------|-----------|--------|
| Acquisition Complete | Paper acquired (UC-RF-002) | Document paper |
| Workflow Stage | UC-RF-008 initiates Stage 3 | Process workflow papers |
### Manual Triggers
| Trigger | Command | Description |
|---------|---------|-------------|
| Single Paper | `aiwg research summarize REF-XXX` | Document one paper |
| Bulk Processing | `aiwg research summarize --from-acquired` | Document all acquired |
| Progressive Mode | `aiwg research summarize REF-XXX --progressive` | Multi-level summaries |
| Create Note | `aiwg research note-create --permanent --based-on REF-XXX` | Permanent note |
| Create MoC | `aiwg research moc-create "Topic Name"` | Map of Content |
## 5. Inputs/Outputs
### Inputs
| Input | Format | Source | Validation |
|-------|--------|--------|------------|
| REF-XXX Identifier | String | Command argument | Valid REF-XXX exists |
| LLM Model Selection | Enum | Optional flag `--llm` | Valid model name |
| Progressive Levels | Integer (1-3) | Optional flag | 1=page, 2=para, 3=sentence |
### Outputs
| Output | Format | Location | Retention |
|--------|--------|----------|-----------|
| Summary | Markdown | `.aiwg/research/knowledge/summaries/{REF-XXX}-summary.md` | Permanent |
| Extraction | JSON | `.aiwg/research/knowledge/extractions/{REF-XXX}-extraction.json` | Permanent |
| Literature Note | Markdown | `.aiwg/research/knowledge/notes/{REF-XXX}-literature-note.md` | Permanent |
| Permanent Note | Markdown | `.aiwg/research/knowledge/notes/permanent-{topic}-{timestamp}.md` | Permanent |
| Map of Content | Markdown | `.aiwg/research/knowledge/maps/{topic-slug}.md` | Permanent |
### Output Schema: Structured Extraction JSON
```json
{
"ref_id": "REF-025",
"extraction_timestamp": "2026-01-25T16:00:00Z",
"llm_model": "claude-opus-4",
"claims": [
"Token rotation reduces CSRF risk by 80% compared to static tokens",
"OAuth 2.0 with PKCE prevents authorization code interception",
"Refresh token rotation improves security without UX degradation"
],
"methods": [
"Controlled experiment with 10,000 users",
"Security analysis using formal verification",
"User study measuring UX impact (SUS score)"
],
"datasets": [
{
"name": "OAuth Security Dataset",
"size": "10,000 user sessions",
"source": "Production deployment (anonymized)"
}
],
"metrics": [
{"name": "CSRF attack success rate", "baseline": "12%", "intervention": "2.4%"},
{"name": "SUS usability score", "baseline": "78", "intervention": "76"}
],
"findings": [
{
"claim": "Token rotation reduces CSRF risk by 80%",
"statistic": "p < 0.001",
"confidence_interval": "95% CI: [75%, 85%]"
}
],
"related_work": [
"10.1145/3133956.3133980",
"10.1145/3243734.3243820"
]
}
```
### Output Schema: Summary Frontmatter (YAML)
```yaml
ref_id: REF-025
title: "OAuth 2.0 Security Best Practices"
authors: ["Smith, J.", "Doe, J."]
year: 2023
summarized_date: 2026-01-25
llm_model: claude-opus-4
summary_type: full # or progressive
grade_quality_score:
risk_of_bias: 20
consistency: 20
directness: 20
precision: 15
publication_bias: 15
overall_score: 90
overall_grade: "High"
tags: [oauth, security, authentication, tokens]
```
## 6. Dependencies
### Agent Dependencies
| Agent | Relationship | Interaction |
|-------|--------------|-------------|
| Acquisition Agent | Upstream | Receives PDFs and metadata |
| Citation Agent | Downstream | Provides extractions for citations |
| Quality Agent | Collaborative | Shares GRADE scoring |
| Workflow Agent | Orchestrator | Receives task assignments |
| Provenance Agent | Observer | Logs documentation operations |
### Service Dependencies
| Service | Purpose | Fallback |
|---------|---------|----------|
| LLM API | Summarization, extraction | Local model or manual |
| PDF Tools | Text extraction | OCR if extraction fails |
| File System | Storage | Abort if unavailable |
### Data Dependencies
| Data | Location | Required |
|------|----------|----------|
| PDF Files | `.aiwg/research/sources/pdfs/` | Yes |
| Metadata JSON | `.aiwg/research/sources/metadata/` | Yes |
| Existing Notes | `.aiwg/research/knowledge/notes/` | Optional (for linking) |
## 7. Configuration Options
### Agent Configuration
```yaml
# .aiwg/research/config/documentation-agent.yaml
documentation_agent:
# LLM Configuration
llm:
default_model: claude-opus-4 # Best quality
fallback_model: claude-sonnet-4 # Faster, cheaper
local_model: null # e.g., llama-3
temperature: 0.3
max_tokens: 4000
timeout_seconds: 120
# PDF Extraction
pdf:
tool: pdftotext # or pypdf2
ocr_fallback: true
ocr_tool: tesseract
min_text_length: 100 # Below this triggers OCR
# GRADE Scoring Weights
grade_scoring:
risk_of_bias: 25
consistency: 20
directness: 20
precision: 20
publication_bias: 15
# Hallucination Detection
hallucination:
enabled: true
confidence_threshold: 0.9 # Flag if match < 90%
user_review_required: true
# Zettelkasten Settings
notes:
max_length_words: 500 # Atomic notes
auto_link: true
tag_extraction: true
```
### Environment Variables
| Variable | Purpose | Default |
|----------|---------|---------|
| `AIWG_RESEARCH_LLM_MODEL` | Default LLM for summarization | claude-opus-4 |
| `AIWG_RESEARCH_LLM_TIMEOUT` | LLM request timeout | 120 |
| `AIWG_RESEARCH_OCR_ENABLED` | Enable OCR fallback | true |
## 8. Error Handling
### Error Categories
| Error Type | Severity | Handling Strategy |
|------------|----------|-------------------|
| PDF Extraction Failed | Warning | Try OCR, prompt for manual |
| LLM API Unavailable | Warning | Retry, fallback to local, manual |
| Hallucination Detected | Warning | Flag for user review |
| Incomplete Extraction | Warning | Prompt for manual completion |
| GRADE Score Incomplete | Info | Proceed with partial score |
### Error Response Template
```json
{
"error_code": "DOCUMENTATION_LLM_HALLUCINATION",
"severity": "warning",
"ref_id": "REF-025",
"message": "Potential hallucination detected in summary",
"details": {
"flagged_claim": "Paper cites Smith et al. 2020",
"evidence": "Citation not found in paper text"
},
"remediation": "Review flagged content and approve or reject",
"user_action_required": true
}
```
### Recovery Procedures
| Scenario | Procedure |
|----------|-----------|
| LLM rate limit | Wait and retry with exponential backoff |
| PDF is image-only | Trigger OCR workflow |
| Partial extraction | Save partial results, allow manual completion |
| User rejects hallucination | Regenerate without flagged content |
## 9. Metrics/Observability
### Performance Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Summarization time | <5 minutes | Timer from start to save |
| Hallucination detection rate | >95% recall | Flagged hallucinations / actual |
| Extraction completeness | >90% fields | Populated fields / expected |
| GRADE consistency | >80% agreement | Agent score vs. expert |
### Logging
| Log Level | Events |
|-----------|--------|
| INFO | Documentation start, summary saved, completion |
| DEBUG | PDF extraction steps, LLM prompts, GRADE calculations |
| WARNING | OCR triggered, hallucination flagged, incomplete extraction |
| ERROR | LLM failure, PDF unreadable, validation error |
### Telemetry
```json
{
"event": "documentation_complete",
"timestamp": "2026-01-25T16:00:00Z",
"metrics": {
"ref_id": "REF-025",
"pdf_pages": 12,
"extraction_time_ms": 15000,
"summarization_time_ms": 45000,
"llm_tokens_used": 8500,
"hallucinations_flagged": 0,
"grade_score": 90,
"extraction_completeness": 0.95
}
}
```
## 10. Example Usage
### Basic Paper Summarization
```bash
# Document a single paper
aiwg research summarize REF-025
# Output:
# Processing REF-025: "OAuth 2.0 Security Best Practices"
# Extracting text from PDF... 12 pages, 8,500 words
# Generating summary via Claude opus...
# Validating for hallucinations... PASSED
# Extracting structured data...
# - Claims: 5 extracted
# - Methods: 3 extracted
# - Findings: 4 extracted
# Calculating GRADE score... 90/100 (High)
# Creating literature note...
#
# Documentation Complete:
# - Summary: .aiwg/research/knowledge/summaries/REF-025-summary.md
# - Extraction: .aiwg/research/knowledge/extractions/REF-025-extraction.json
# - Literature Note: .aiwg/research/knowledge/notes/REF-025-literature-note.md
# - GRADE Score: 90/100 (High quality evidence)
```
### Progressive Summarization
```bash
# Generate multi-level summaries
aiwg research summarize REF-025 --progressive
# Output:
# Generating progressive summaries for REF-025...
#
# Level 1 (1-page summary): Complete
# Level 2 (1-paragraph summary): Complete
# Level 3 (1-sentence summary): Complete
#
# 1-Sentence: "This paper demonstrates that OAuth 2.0 token rotation reduces CSRF attacks by 80% with minimal UX impact."
#
# All levels saved to: .aiwg/research/knowledge/summaries/REF-025-summary.md
```
### Bulk Documentation
```bash
# Document all acquired papers
aiwg research summarize --from-acquired
# Output:
# Processing 20 acquired papers...
# [1/20] REF-001: Summarizing... OK (4 min 30 sec)
# [2/20] REF-002: Summarizing... OK (3 min 45 sec)
# [3/20] REF-003: HALLUCINATION FLAGGED - Review required
# ...
# [20/20] REF-020: Summarizing... OK (5 min 10 sec)
#
# Batch Summary:
# - Documented: 19/20 (95%)
# - Flagged for review: 1
# - Average time: 4 min 15 sec
# - Total LLM tokens: 170,000
```
### Map of Content Creation
```bash
# Create topic overview
aiwg research moc-create "LLM Evaluation Methods"
# Output:
# Scanning knowledge base for related notes...
# Found 15 notes tagged "llm-evaluation"
#
# Generating Map of Content...
# - Overview: LLM evaluation landscape
# - Subtopic: Benchmark datasets (5 notes)
# - Subtopic: Human evaluation (4 notes)
# - Subtopic: Automatic metrics (6 notes)
#
# MoC saved: .aiwg/research/knowledge/maps/llm-evaluation-methods.md
```
## 11. Related Use Cases
| Use Case | Relationship | Description |
|----------|--------------|-------------|
| UC-RF-003 | Primary | Document Research Paper with LLM Summarization |
| UC-RF-002 | Upstream | Acquire Research Source (provides PDFs) |
| UC-RF-004 | Downstream | Integrate Citations (uses extractions) |
| UC-RF-006 | Collaborative | Assess Source Quality (shares GRADE) |
| UC-RF-008 | Orchestrated | Execute Research Workflow (Stage 3) |
## 12. Implementation Notes
### Architecture Considerations
1. **RAG Pattern Critical**: All summarization must use paper text as context
2. **Hallucination Prevention**: Validate every claim against source text
3. **Atomic Notes**: Literature notes should contain one main idea
4. **Idempotent Operations**: Re-documenting updates, doesn't duplicate
### Performance Optimizations
1. **Chunked Processing**: Process large PDFs in semantic chunks
2. **Parallel Extraction**: Extract structure while summarizing
3. **Caching**: Cache LLM responses for retry scenarios
4. **Streaming**: Stream summary generation for long papers
### Security Considerations
1. **No External Knowledge**: LLM must only use provided paper content
2. **Prompt Injection**: Sanitize paper content before LLM input
3. **API Key Security**: Use environment variables for LLM keys
4. **Content Privacy**: Don't send sensitive papers to external LLMs
### Testing Strategy
| Test Type | Coverage Target | Focus Areas |
|-----------|-----------------|-------------|
| Unit Tests | 80% | Text extraction, GRADE calculation, note formatting |
| Integration Tests | 70% | LLM interaction, file I/O, hallucination detection |
| E2E Tests | Key workflows | Full PDF to notes workflow |
### Known Limitations
1. **OCR Quality**: Scanned PDFs may have extraction errors
2. **LLM Costs**: Opus model is expensive for bulk operations
3. **Hallucination Risk**: RAG reduces but doesn't eliminate hallucinations
4. **Complex Tables**: Table extraction may be imperfect
## References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/use-cases/UC-RF-003-document-research-paper.md
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/inception/vision-document.md - Section 5.4 (Goal 4: Synthesize Knowledge)
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/inception/initial-risk-assessment.md - T-01 (LLM Hallucination)
- [GRADE Framework](https://www.gradeworkinggroup.org/)
- [Zettelkasten Method](https://zettelkasten.de/introduction/)
## Document Metadata
**Version:** 1.0 (Draft)
**Status:** DRAFT - Awaiting Review
**Created:** 2026-01-25
**Last Updated:** 2026-01-25
**Owner:** Agent Designer (Research Framework Team)