aiwg
Version:
Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.
434 lines (380 loc) • 16.4 kB
YAML
# Structured Extraction Template
# Machine-readable extraction of claims, methods, findings from research papers
template_id: extraction
version: 1.0.0
reasoning_required: true
framework: research-complete
format: yaml
# USAGE NOTES:
# This template extracts structured data from papers for machine processing.
# Complete REASONING section in markdown comment first, then populate YAML fields.
# Use this for: citation networks, claim verification, method comparison, finding synthesis.
# REASONING (Complete before extraction):
#
# 1. **Extraction Scope**: What information do we need from this paper?
# > [Define which claims, methods, findings are relevant to AIWG - not everything]
#
# EXAMPLE:
# For REF-018 ReAct, extract: TAO loop structure, performance metrics on tool-use tasks,
# hallucination reduction data, reasoning trace format. Skip: Benchmark-specific details
# not applicable to SDLC workflows.
#
# 2. **Claim Classification**: How do we categorize extracted claims?
# > [Decide taxonomy: empirical vs theoretical, quantitative vs qualitative, causal vs correlational]
#
# EXAMPLE:
# - Empirical claims: "ReAct reduces hallucinations to 0%" (measured)
# - Theoretical claims: "Reasoning traces enable better oversight" (proposed)
# - Causal claims: "TAO loop CAUSES performance improvement" (proven causation)
# - Correlational claims: "TAO loop ASSOCIATED with fewer errors" (correlation observed)
#
# 3. **Evidence Strength Assessment**: How do we rate evidence quality?
# > [Use GRADE levels: HIGH/MODERATE/LOW/VERY LOW based on methodology]
#
# EXAMPLE:
# HIGH: Controlled experiment, multiple baselines, reproducible = "34% improvement"
# MODERATE: Single comparison, limited scope = "Improved performance observed"
# LOW: Case study, no controls = "Practitioner reports benefits"
#
# 4. **Method Abstraction**: What level of detail for methods?
# > [Balance: enough detail to evaluate applicability, not so much it's unusable]
#
# EXAMPLE:
# Extract: Core algorithm (TAO loop), key parameters (max iterations), evaluation approach
# Skip: Benchmark-specific implementation details, dataset preprocessing minutiae
#
# 5. **Applicability Mapping**: How does this apply to AIWG?
# > [For each extraction, note AIWG component(s) affected]
#
# EXAMPLE:
# TAO loop → All tool-using agents (@.claude/rules/tao-loop.md)
# Thought types → Agent reasoning (@.claude/rules/thought-protocol.md)
# Hallucination reduction → Citation verification (@.claude/rules/citation-policy.md)
metadata:
extraction_id: "extraction-REF-XXX"
paper_ref: "REF-XXX"
paper_title: "Full Paper Title"
authors:
- "Author 1"
- "Author 2"
year: YYYY
extraction_date: "YYYY-MM-DD"
extractor: "Agent or Human Name"
extraction_confidence: 0.95 # 0-1 scale
# EXAMPLE:
# extraction_id: "extraction-REF-018"
# paper_ref: "REF-018"
# paper_title: "ReAct: Synergizing Reasoning and Acting in Language Models"
# authors:
# - "Yao, S."
# - "Zhao, J."
# - "Yu, D."
# year: 2022
# extraction_date: "2026-02-03"
# extractor: "Discovery Agent"
# extraction_confidence: 0.95
claims:
# List of specific claims made in the paper with evidence
# Each claim is independently verifiable
- claim_id: "claim-001"
claim_text: "[Exact claim or close paraphrase]"
claim_type: "empirical" # empirical | theoretical | methodological
evidence_type: "quantitative" # quantitative | qualitative | mixed
causality: "causal" # causal | correlational | descriptive
# EXAMPLE:
# claim_id: "claim-001"
# claim_text: "ReAct improves success rate by 34% on HotpotQA compared to Act-only baseline"
# claim_type: "empirical"
# evidence_type: "quantitative"
# causality: "causal"
evidence:
data_points:
- metric: "[Metric name]"
baseline: "[Baseline value]"
result: "[Result value]"
improvement: "[% or absolute improvement]"
# EXAMPLE:
# metric: "HotpotQA Success Rate"
# baseline: "49%"
# result: "66%"
# improvement: "+34% relative"
source_location:
page: "X"
section: "Section Name"
figure_table: "Figure/Table Y"
# EXAMPLE:
# page: "4"
# section: "4.1 Question Answering Results"
# figure_table: "Table 1"
quote: "[Exact quote from paper if available]"
# EXAMPLE:
# quote: "ReAct achieves 66% success rate on HotpotQA, compared to 49% for Act-only, representing a 34% relative improvement"
grade_quality: "HIGH" # HIGH | MODERATE | LOW | VERY_LOW
quality_rationale: "[Why this quality rating?]"
# EXAMPLE:
# grade_quality: "HIGH"
# quality_rationale: "Controlled experiment, clear baselines, multiple task evaluation, reproducible methodology"
applicability_to_aiwg:
relevance: "HIGH" # HIGH | MODERATE | LOW | NONE
affected_components:
- "@.claude/rules/tao-loop.md"
- "@agentic/code/frameworks/sdlc-complete/agents/*.md"
implementation_notes: "[How this applies to AIWG]"
# EXAMPLE:
# relevance: "HIGH"
# affected_components:
# - "@.claude/rules/tao-loop.md"
# - "@agentic/code/frameworks/sdlc-complete/agents/requirements-analyst.md"
# implementation_notes: "Standardize TAO loop across all tool-using agents to achieve similar performance gains"
limitations:
- "[Limitation 1 of this claim]"
- "[Limitation 2 of this claim]"
# EXAMPLE:
# - "Tested only on QA tasks, not full SDLC workflows"
# - "Single-agent context, multi-agent coordination not evaluated"
# ANTI-PATTERN EXAMPLE:
# - claim_id: "claim-bad"
# claim_text: "The method works well" # Too vague
# claim_type: "empirical"
# evidence_type: "qualitative"
# causality: "descriptive"
# evidence:
# quote: "It improved performance" # No specifics
# grade_quality: "LOW"
# BETTER:
# - claim_id: "claim-good"
# claim_text: "ReAct reduces hallucination rate from 56% to 0% on FEVER fact verification"
# claim_type: "empirical"
# evidence_type: "quantitative"
# causality: "causal"
# evidence:
# data_points:
# - metric: "Hallucination Rate"
# baseline: "56%"
# result: "0%"
# improvement: "100% reduction"
# source_location:
# page: "6"
# figure_table: "Figure 3"
methods:
# Methodologies, algorithms, techniques introduced or used
- method_id: "method-001"
method_name: "[Short descriptive name]"
method_type: "algorithm" # algorithm | framework | evaluation_protocol | experimental_design
# EXAMPLE:
# method_id: "method-001"
# method_name: "ReAct Loop (Thought→Action→Observation)"
# method_type: "algorithm"
description: "[Detailed description of the method]"
# EXAMPLE:
# description: "Iterative loop interleaving reasoning traces with tool actions and observations. Each iteration: 1) THOUGHT - reasoning about current state, 2) ACTION - tool invocation, 3) OBSERVATION - result capture."
pseudocode: |
# Optional: pseudocode representation
# EXAMPLE:
# while not task_complete and iterations < max:
# thought = generate_reasoning(state)
# action = select_tool_and_params(thought)
# observation = execute_tool(action)
# state = update_state(observation)
key_parameters:
- param: "[Parameter name]"
value: "[Value or range]"
description: "[What this parameter controls]"
# EXAMPLE:
# param: "max_iterations"
# value: "5-10"
# description: "Maximum TAO loop iterations before termination"
- param: "[Parameter name]"
value: "[Value or range]"
description: "[What this parameter controls]"
# EXAMPLE:
# param: "temperature"
# value: "0.7"
# description: "LLM sampling temperature for thought generation"
evaluation:
benchmarks:
- name: "[Benchmark name]"
result: "[Result on this benchmark]"
# EXAMPLE:
# name: "HotpotQA"
# result: "66% success rate"
baselines:
- name: "[Baseline name]"
result: "[Baseline result]"
comparison: "[How method compares]"
# EXAMPLE:
# name: "Act-only (no reasoning)"
# result: "49% success rate"
# comparison: "+34% improvement with ReAct"
reproducibility:
code_available: true # true | false
code_url: "https://github.com/..."
data_available: true # true | false
data_url: "https://..."
# EXAMPLE:
# code_available: true
# code_url: "https://github.com/ysymyth/ReAct"
# data_available: true
# data_url: "https://hotpotqa.github.io/"
applicability_to_aiwg:
can_implement: true # true | false | partial
implementation_complexity: "moderate" # low | moderate | high
dependencies:
- "[Dependency 1]"
- "[Dependency 2]"
# EXAMPLE:
# can_implement: true
# implementation_complexity: "moderate"
# dependencies:
# - "LLM API (GPT-4, Claude, etc.)"
# - "Tool execution environment (Bash, Read, Write, etc.)"
implementation_status:
- component: "[AIWG component]"
status: "implemented" # planned | in_progress | implemented | not_applicable
reference: "@path/to/implementation"
# EXAMPLE:
# component: "TAO Loop Standardization"
# status: "implemented"
# reference: "@.claude/rules/tao-loop.md"
findings:
# Key findings, insights, discoveries from the research
- finding_id: "finding-001"
finding_text: "[Clear statement of the finding]"
finding_type: "performance" # performance | insight | limitation | recommendation
# EXAMPLE:
# finding_id: "finding-001"
# finding_text: "Tool grounding (external observations) reduces hallucinations to near-zero"
# finding_type: "insight"
supporting_evidence:
- claim_ref: "claim-001" # Reference to claim ID above
- claim_ref: "claim-002"
# EXAMPLE:
# - claim_ref: "claim-002" # (hypothetical) "0% hallucinations with tool use"
significance: "HIGH" # HIGH | MODERATE | LOW
significance_rationale: "[Why this finding matters]"
# EXAMPLE:
# significance: "HIGH"
# significance_rationale: "Directly addresses critical failure mode in LLM systems. Provides actionable pattern for reducing fabricated information."
implications:
- domain: "[Domain this affects]"
implication: "[What this means for that domain]"
# EXAMPLE:
# domain: "Agent Reliability"
# implication: "Agents must ground claims in tool observations (Read, Grep results) rather than generating from parametric knowledge alone"
- domain: "[Domain this affects]"
implication: "[What this means for that domain]"
# EXAMPLE:
# domain: "Citation Verification"
# implication: "Before citing sources, agents must use Read tool to verify file exists and extract exact quote"
limitations:
- "[Limitation 1]"
- "[Limitation 2]"
# EXAMPLE:
# - "Finding based on QA tasks; applicability to code generation unknown"
# - "Tool reliability assumed; unreliable tools may introduce new errors"
future_work:
- "[Research gap 1]"
- "[Research gap 2]"
# EXAMPLE:
# - "Evaluate tool grounding in multi-agent coordination scenarios"
# - "Develop tool reliability metrics and selection strategies"
relationships:
# Connections to other papers in the corpus
builds_on:
- paper_ref: "REF-XXX"
relationship: "[How this builds on that paper]"
# EXAMPLE:
# paper_ref: "REF-016"
# relationship: "Extends Chain-of-Thought by adding action execution and observation phases"
extends:
- paper_ref: "REF-XXX"
relationship: "[How this extends that paper]"
# EXAMPLE:
# paper_ref: "REF-019"
# relationship: "Extends Toolformer by adding explicit reasoning traces before tool use"
contradicts:
- paper_ref: "REF-XXX"
relationship: "[How this contradicts that paper]"
contradiction_type: "methodology" # methodology | findings | interpretation
# EXAMPLE:
# paper_ref: "REF-XXX"
# relationship: "Contradicts assumption that more tool use always improves performance; shows reasoning-first is critical"
# contradiction_type: "findings"
cited_by:
- paper_ref: "REF-XXX"
relationship: "[How that paper uses this work]"
# EXAMPLE:
# paper_ref: "REF-022"
# relationship: "AutoGen adopts ReAct patterns for agent communication"
synthesis:
# High-level synthesis across claims, methods, findings
core_contribution: "[The single most important contribution of this paper]"
# EXAMPLE:
# core_contribution: "Demonstrates that interleaving reasoning with actions significantly improves LLM task performance and enables tool grounding that eliminates hallucinations"
practical_takeaways:
- "[Actionable takeaway 1]"
- "[Actionable takeaway 2]"
- "[Actionable takeaway 3]"
# EXAMPLE:
# - "Implement TAO loop structure in all agents that use tools"
# - "Track thought types (goal, progress, extraction, reasoning, exception, synthesis)"
# - "Require agents to ground claims in tool observations before stating facts"
open_questions:
- "[Unanswered question 1]"
- "[Unanswered question 2]"
# EXAMPLE:
# - "How does TAO loop scale to 10+ iteration sessions (Ralph loops)?"
# - "Can reasoning quality be measured automatically?"
# - "How do multiple agents coordinate with ReAct patterns?"
confidence_assessment:
overall_confidence: 0.90 # 0-1 scale
confidence_factors:
methodology_rigor: 0.95
evidence_strength: 0.90
generalizability: 0.85
confidence_notes: "[Why this confidence level?]"
# EXAMPLE:
# overall_confidence: 0.90
# confidence_factors:
# methodology_rigor: 0.95 # Excellent experimental design
# evidence_strength: 0.90 # Strong quantitative results
# generalizability: 0.85 # QA tasks, not full SDLC workflows
# confidence_notes: "High confidence in findings for tool-use tasks; moderate confidence for complex SDLC workflows"
references:
# Links to related artifacts
literature_note: "@.aiwg/research/findings/REF-XXX.md"
summary: "@.aiwg/research/summaries/REF-XXX-summary.md"
source_pdf: "@.aiwg/research/sources/REF-XXX.pdf"
provenance_record: "@.aiwg/research/provenance/records/REF-XXX.prov.yaml"
aiwg_implementations:
- "@.claude/rules/tao-loop.md"
- "@.claude/rules/thought-protocol.md"
- "@agentic/code/frameworks/sdlc-complete/agents/*.md"
# EXAMPLE:
# literature_note: "@.aiwg/research/findings/REF-018-react.md"
# summary: "@.aiwg/research/summaries/REF-018-summary.md"
# source_pdf: "@.aiwg/research/sources/yao-2022-react.pdf"
# provenance_record: "@.aiwg/research/provenance/records/REF-018.prov.yaml"
# aiwg_implementations:
# - "@.claude/rules/tao-loop.md"
# - "@.claude/rules/thought-protocol.md"
metadata_schema_version: "1.0.0"
extraction_schema: "@agentic/code/frameworks/research-complete/schemas/extraction-schema.yaml"
# VALIDATION CHECKLIST (verify before finalizing):
# [ ] All claims have evidence with source location
# [ ] All claims have GRADE quality assessment
# [ ] All methods have applicability to AIWG
# [ ] All findings reference supporting claims
# [ ] Relationships to other papers documented
# [ ] Confidence assessment completed with rationale
# [ ] References to AIWG implementations included
# [ ] No vague claims ("improves performance" → specify metric and magnitude)
# [ ] No unsupported quality ratings (justify HIGH/MODERATE/LOW)
# ANTI-PATTERNS TO AVOID:
# ❌ Extracting every claim (focus on AIWG-relevant only)
# ❌ Vague evidence ("page 5" without specific quote or data)
# ❌ Missing quality assessment (every claim needs GRADE level)
# ❌ Copy-pasting abstract (synthesize, don't duplicate)
# ❌ No applicability notes (always map to AIWG components)