aiwg
Version:
Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 6 more) so one source of truth works across 10 platfo
348 lines (264 loc) • 9.83 kB
Markdown
---
description: Acquire research papers from acquisition queue, download PDFs, validate integrity, and organize artifacts
category: research-management
argument-hint: [--queue <file>] [--doi <identifier>] [--all] [--validate-only]
allowed-tools: Bash, Read, Write, Grep, Glob
model: sonnet
---
# Research Acquisition Command
## Task
Download and validate research papers from acquisition queue. Retrieves PDFs from open access sources, institutional repositories, or DOI resolvers. Validates file integrity and organizes artifacts for documentation.
When invoked with `/research-acquire [options]`:
1. **Load** acquisition queue (from discovery or manual DOI list)
2. **Resolve** DOI URLs to PDF download locations
3. **Download** PDFs with retry logic and validation
4. **Verify** file integrity (checksum, PDF validity)
5. **Organize** files in `.aiwg/research/papers/`
6. **Update** acquisition status and metadata
## Parameters
- **`--queue <file>`** (optional): Custom acquisition queue (default: `.aiwg/research/discovery/acquisition-queue.json`)
- **`--doi <identifier>`** (optional): Acquire single paper by DOI (e.g., `10.1145/example`)
- **`--all`** (optional): Acquire all papers in queue (no confirmation)
- **`--validate-only`** (optional): Validate existing downloads without re-downloading
## Inputs
- **Acquisition queue**: `.aiwg/research/discovery/acquisition-queue.json`
- **Configuration**: `.aiwg/research/config.yaml` (download settings, retry limits)
- **Manual DOI** (if --doi): User-provided DOI identifier
## Outputs
- **Downloaded PDFs**: `.aiwg/research/papers/{paper-id}.pdf`
- **Metadata files**: `.aiwg/research/papers/{paper-id}.json`
- **Acquisition log**: `.aiwg/research/logs/acquisition-{timestamp}.log`
- **Updated queue**: Marks papers as `acquired`, `failed`, or `pending`
## Workflow
### Step 1: Load Acquisition Queue
```bash
# Read queue from discovery or custom file
QUEUE_FILE="${1:-.aiwg/research/discovery/acquisition-queue.json}"
if [ ! -f "$QUEUE_FILE" ]; then
echo "Error: Acquisition queue not found: $QUEUE_FILE"
echo "Run /research-discover first to create queue"
exit 1
fi
# Parse paper list
PAPERS=$(jq -r '.papers[] | select(.status == "pending") | .paper_id' "$QUEUE_FILE")
TOTAL=$(echo "$PAPERS" | wc -l)
echo "Acquisition queue: $TOTAL papers pending"
```
### Step 2: Resolve DOI to PDF
```typescript
// DOI resolution strategy (priority order)
async function resolveDOI(doi: string): Promise<string> {
// 1. Check Unpaywall API for open access PDFs
const unpaywall = await fetchUnpaywall(doi);
if (unpaywall.bestOaLocation?.pdfUrl) {
return unpaywall.bestOaLocation.pdfUrl;
}
// 2. Try DOI.org resolver (may redirect to publisher)
const doiUrl = `https://doi.org/${doi}`;
const response = await fetch(doiUrl, { redirect: 'follow' });
if (response.ok && isPDF(response)) {
return response.url;
}
// 3. Check arXiv if paper has arXiv ID
const arxivId = extractArxivId(doi);
if (arxivId) {
return `https://arxiv.org/pdf/${arxivId}.pdf`;
}
// 4. Check institutional repositories (if configured)
const institutional = await checkInstitutionalRepos(doi);
if (institutional) {
return institutional;
}
throw new Error(`No open access PDF found for DOI: ${doi}`);
}
```
### Step 3: Download PDF with Validation
**Security Note**: Download URLs from trusted sources only (Unpaywall, arXiv, DOI.org). Validate file integrity.
```bash
bash <<'EOF'
# Download paper PDF
DOI="10.1145/example"
PAPER_ID="abc123def456"
PDF_URL=$(resolve_doi "$DOI")
# Download with retry
for i in {1..3}; do
curl -L -o ".aiwg/research/papers/${PAPER_ID}.pdf" "$PDF_URL" && break
echo "Download failed, retry $i/3..."
sleep 5
done
# Validate PDF integrity
if ! file ".aiwg/research/papers/${PAPER_ID}.pdf" | grep -q "PDF"; then
echo "Error: Downloaded file is not a valid PDF"
rm ".aiwg/research/papers/${PAPER_ID}.pdf"
exit 1
fi
# Compute checksum
sha256sum ".aiwg/research/papers/${PAPER_ID}.pdf" > ".aiwg/research/papers/${PAPER_ID}.sha256"
echo "✓ Acquired: ${PAPER_ID}.pdf"
EOF
```
### Step 4: Save Metadata
```json
{
"paper_id": "abc123def456",
"doi": "10.1145/example",
"title": "OAuth 2.0 Security Best Practices",
"authors": ["Smith, J.", "Doe, A."],
"year": 2023,
"venue": "ACM CCS",
"acquired_date": "2026-01-25T10:30:00Z",
"source_url": "https://doi.org/10.1145/example",
"pdf_path": ".aiwg/research/papers/abc123def456.pdf",
"checksum_sha256": "abc123...",
"file_size_bytes": 2048576,
"status": "acquired"
}
```
### Step 5: Update Acquisition Queue
```typescript
// Mark paper as acquired in queue
function updateQueue(paperId: string, status: 'acquired' | 'failed') {
const queue = JSON.parse(fs.readFileSync(QUEUE_FILE, 'utf-8'));
const paper = queue.papers.find(p => p.paper_id === paperId);
if (paper) {
paper.status = status;
paper.acquired_date = new Date().toISOString();
}
fs.writeFileSync(QUEUE_FILE, JSON.stringify(queue, null, 2));
}
```
## Examples
### Acquire All Papers in Queue
```bash
# Download all pending papers
aiwg research acquire --all
```
**Output**:
```
Acquisition queue: 10 papers pending
Acquiring papers...
✓ [1/10] abc123def456 - OAuth 2.0 Security Best Practices
✓ [2/10] def456ghi789 - Token Refresh Vulnerabilities
✗ [3/10] ghi789jkl012 - No open access PDF found
✓ [4/10] jkl012mno345 - PKCE Extension for OAuth 2.0
...
Summary:
Acquired: 8 papers
Failed: 2 papers (no open access)
Total size: 45.2 MB
Failed papers:
- ghi789jkl012: No open access PDF
- xyz789abc123: Publisher paywall
Next steps:
- Review acquired papers: ls .aiwg/research/papers/
- Document papers: /research-document
```
### Acquire Single Paper by DOI
```bash
# Download specific paper
aiwg research acquire --doi 10.1145/3491102.3501874
```
**Output**:
```
Resolving DOI: 10.1145/3491102.3501874
✓ Found open access PDF via Unpaywall
✓ Downloaded: abc123def456.pdf (2.1 MB)
✓ Validated: PDF integrity confirmed
✓ Metadata saved: abc123def456.json
Paper details:
Title: OAuth 2.0 Security Best Practices
Authors: Smith, J., Doe, A.
Year: 2023
Venue: ACM CCS
Path: .aiwg/research/papers/abc123def456.pdf
```
### Validate Existing Downloads
```bash
# Check integrity of downloaded papers
aiwg research acquire --validate-only
```
**Output**:
```
Validating 8 downloaded papers...
✓ abc123def456.pdf - Valid (checksum matches)
✓ def456ghi789.pdf - Valid (checksum matches)
✗ jkl012mno345.pdf - Corrupted (checksum mismatch)
...
Summary:
Valid: 7 papers
Corrupted: 1 paper
Missing metadata: 0 papers
Recommendations:
- Re-download corrupted paper: aiwg research acquire --doi 10.1145/jkl012
```
## Related Agents
- **Acquisition Agent** (@$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/agents/acquisition-agent-spec.md): Primary agent executing this command
- **Quality Agent** (@$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/agents/quality-agent-spec.md): Validates file integrity
- **Documentation Agent** (@$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/agents/documentation-agent-spec.md): Next step after acquisition
## Skill Definition (Natural Language Triggers)
**Skill Name**: Research Acquisition
**Natural Language Patterns**:
- "Download the papers"
- "Acquire research papers"
- "Get PDFs for selected papers"
- "Download paper {DOI}"
- "Fetch research sources"
**Example Invocations**:
```
User: "Download the papers from the queue"
Agent: Executing /research-acquire --all
User: "Get the PDF for 10.1145/3491102.3501874"
Agent: Executing /research-acquire --doi 10.1145/3491102.3501874
```
## Success Criteria
- [ ] All pending papers processed (acquired or failed status)
- [ ] PDFs validated as genuine PDF files
- [ ] Checksums computed and stored
- [ ] Metadata saved for each paper
- [ ] Acquisition queue updated with status
- [ ] User can proceed to `/research-document` (UC-RF-003)
## Error Handling
### No Open Access PDF Found
```
Warning: No open access PDF for DOI: 10.1145/example
Options:
1. Check institutional access (if configured)
2. Request via inter-library loan
3. Contact authors directly
4. Mark as "manual acquisition required"
Continuing with next paper...
```
### Download Failed (Network Error)
```
Error: Download failed for abc123def456
Retrying... (Attempt 2/3)
✓ Retry successful
```
### PDF Validation Failed
```
Error: Downloaded file is not a valid PDF
File type: text/html (likely publisher paywall page)
Deleting invalid file...
Marking paper as "failed - paywall"
```
## Best Practices
1. **Prefer open access**: Use Unpaywall API to find free PDFs
2. **Validate integrity**: Always compute checksums for downloaded files
3. **Respect rate limits**: Space downloads to avoid throttling
4. **Batch acquisition**: Use `--all` for efficient bulk downloads
5. **Manual fallback**: Document papers requiring institutional access
## References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/use-cases/UC-RF-002-acquire-research-source.md - Use case specification
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/elaboration/agents/acquisition-agent-spec.md - Acquisition Agent
- [Unpaywall API](https://unpaywall.org/products/api) - Open access discovery
- [DOI System](https://www.doi.org/) - Digital Object Identifier resolution
## Security Notes
- Download URLs from trusted sources only (Unpaywall, arXiv, DOI.org)
- Validate file integrity with checksums
- No API keys hardcoded (loaded from config.yaml)
- Avoid executing downloaded content (PDFs are data, not code)
---
**Status**: DRAFT
**Created**: 2026-01-25
**Owner**: Acquisition Agent Designer
**UC Mapping**: UC-RF-002 (Acquire Research Source)