aiwg

Version:

Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.

aiwg.io

jmagly/aiwg

597 lines (442 loc) • 19.9 kB

Markdown

# GRADE Quality Assessment Template --- template_id: quality-assessment version: 1.0.0 reasoning_required: true framework: research-complete --- ## Ownership & Collaboration - Document Owner: Research Analyst - Contributor Roles: Domain Expert, Quality Auditor - Automation Inputs: Paper metadata, methodology sections, results tables - Automation Outputs: `quality-assessment-REF-XXX.md` with GRADE rating and justification ## Phase 1: Core (ESSENTIAL) ### Paper Identification **Reference ID:** REF-XXX  **Title:** [Full paper title]  **Authors:** [Author list] **Year:** YYYY **Source:** [Journal/Conference/Preprint]  ### Quality Rating Summary **GRADE Level:** HIGH | MODERATE | LOW | VERY LOW  **Baseline (Source Type):** [Initial quality based on publication venue]  **Adjustments:** [Factors that raised or lowered from baseline]  **One-Line Rationale:** [Brief justification]  ## Reasoning > Complete this section BEFORE detailed assessment. Per @.claude/rules/reasoning-sections.md 1. **Baseline Determination**: What is the starting quality level? > [Assess publication venue quality per GRADE guidelines]  2. **Study Design Evaluation**: How rigorous is the methodology? > [Assess experimental design, controls, baselines, reproducibility]  3. **Evidence Strength Assessment**: How strong is the evidence for claims? > [Evaluate sample size, statistical significance, effect sizes]  4. **Generalizability Analysis**: How broadly do findings apply? > [Assess scope: single domain vs cross-domain, single model vs multiple]  5. **Risk of Bias Assessment**: Are there methodological biases? > [Check for selection bias, reporting bias, funding conflicts]  ## Phase 2: Detailed GRADE Assessment (EXPAND WHEN READY) <details> <summary>Click to expand detailed GRADE criteria evaluation</summary> ### GRADE Framework GRADE (Grading of Recommendations Assessment, Development and Evaluation) assesses evidence quality across five domains: 1. **Study Design** (starting point) 2. **Risk of Bias** (may downgrade) 3. **Inconsistency** (may downgrade) 4. **Indirectness** (may downgrade) 5. **Imprecision** (may downgrade) 6. **Publication Bias** (may downgrade) 7. **Large Effect Size** (may upgrade) 8. **Dose-Response** (may upgrade) 9. **Confounders** (may upgrade) ### 1. Study Design (Baseline) | Design Type | Baseline GRADE | This Paper | |-------------|----------------|------------| | Systematic review | HIGH | ☐ | | RCT | HIGH | ☐ | | Cohort study | MODERATE | ☐ | | Case-control | MODERATE | ☐ | | Case series | LOW | ☐ | | Expert opinion | VERY LOW | ☐ | | **Experimental (ML)** | **HIGH** | **☑** | | Preprint/unreviewed | LOW | ☐ | **Assessment:**  ### 2. Risk of Bias (Downgrade?) | Bias Type | Risk Level | Impact on Rating | |-----------|------------|------------------| | Selection bias | LOW / MODERATE / HIGH | 0 / -1 / -2 | | Performance bias | LOW / MODERATE / HIGH | 0 / -1 / -2 | | Detection bias | LOW / MODERATE / HIGH | 0 / -1 / -2 | | Attrition bias | LOW / MODERATE / HIGH | 0 / -1 / -2 | | Reporting bias | LOW / MODERATE / HIGH | 0 / -1 / -2 | **Assessment:**  ### 3. Inconsistency (Downgrade?) | Consistency Check | Status | Impact | |-------------------|--------|--------| | Results consistent across tasks? | YES / NO / MIXED | 0 / -1 / -2 | | Results align with prior work? | YES / NO / MIXED | 0 / -1 | | Effect sizes consistent? | YES / NO / MIXED | 0 / -1 | **Assessment:**  ### 4. Indirectness (Downgrade?) | Indirectness Check | Status | Impact | |--------------------|--------|--------| | Population matches our target? | YES / NO / PARTIAL | 0 / -1 / -2 | | Intervention matches our use? | YES / NO / PARTIAL | 0 / -1 / -2 | | Outcomes match our needs? | YES / NO / PARTIAL | 0 / -1 | **Assessment:**  ### 5. Imprecision (Downgrade?) | Precision Check | Status | Impact | |-----------------|--------|--------| | Large sample size? | YES / NO | 0 / -1 | | Narrow confidence intervals? | YES / NO / UNKNOWN | 0 / -1 / -1 | | Statistical significance reported? | YES / NO | 0 / -1 | **Assessment:**  ### 6. Publication Bias (Downgrade?) | Bias Check | Status | Impact | |------------|--------|--------| | Preregistration? | YES / NO | N/A for ML | | Negative results published? | YES / NO / UNKNOWN | 0 / -1 | | File drawer problem likely? | YES / NO | 0 / -1 | **Assessment:**  ### 7. Large Effect Size (Upgrade?) | Effect Check | Status | Impact | |--------------|--------|--------| | Large effect (>2x baseline)? | YES / NO | +1 / 0 | | Very large effect (>5x baseline)? | YES / NO | +2 / 0 | **Assessment:**  ### 8. Dose-Response Gradient (Upgrade?) | Gradient Check | Status | Impact | |----------------|--------|--------| | Clear dose-response? | YES / NO / N/A | +1 / 0 | **Assessment:**  ### 9. Confounders (Upgrade?) | Confounder Check | Status | Impact | |------------------|--------|--------| | Plausible confounders? | YES / NO | 0 / +1 | | Confounders would reduce effect? | YES / NO / N/A | 0 / +1 | **Assessment:**  ### GRADE Calculation **Starting Point:** HIGH (peer-reviewed experimental study) **Downgrades:** - Risk of Bias: -0 (LOW to MODERATE, no serious issues) - Inconsistency: -0 (results consistent) - Indirectness: -0 (extrapolation reasonable) - Imprecision: -0 (large effects compensate for lack of CIs) - Publication Bias: -0 (no evidence of bias) **Upgrades:** - Large Effect: +0 (effects moderate, not >2x) - Dose-Response: +0 (N/A) - Confounders: +0 (none that would strengthen) **Final GRADE Level:** HIGH  </details> ## Phase 3: Applicability Analysis (ADVANCED) <details> <summary>Click to expand AIWG-specific applicability assessment</summary> ### Applicability to AIWG **Overall Applicability:** HIGH | MODERATE | LOW  **Rationale:**  ### Implementation Confidence | AIWG Component | Confidence | Rationale | |----------------|------------|-----------| | [Component 1] | HIGH / MODERATE / LOW | [Why this confidence level] | | [Component 2] | HIGH / MODERATE / LOW | [Why this confidence level] |  ### Evidence Gaps for AIWG **Gap 1:** [Specific AIWG use case not covered by research]  **Gap 2:** [Another AIWG-specific gap]  ### Recommendations **For implementation:**  **For monitoring:**  **For future research:**  </details> ## Quality Summary Table | GRADE Criterion | Assessment | Impact | Notes | |-----------------|------------|--------|-------| | **Baseline (Study Design)** | HIGH | Starting point | Peer-reviewed experimental | | Risk of Bias | LOW-MODERATE | -0 | Minor concerns (OpenAI affiliation) | | Inconsistency | LOW | -0 | Results consistent | | Indirectness | MODERATE | -0 | Extrapolation reasonable | | Imprecision | MODERATE | -0 | Large effects compensate | | Publication Bias | LOW | -0 | No evidence of selective reporting | | Large Effect | NO | +0 | 1.35x not >2x threshold | | Dose-Response | N/A | +0 | Not applicable | | Confounders | UNLIKELY | +0 | None that strengthen effect | | **Final GRADE** | **HIGH** | **HIGH** | **High confidence in findings** | ## References - @.aiwg/research/sources/[PDF-filename].pdf - Original paper - @.aiwg/research/findings/REF-XXX.md - Literature note - @.claude/rules/citation-policy.md - GRADE-based citation language - @.claude/rules/research-metadata.md - Baseline quality by source type - @agentic/code/frameworks/research-complete/schemas/grade-schema.yaml - GRADE assessment schema ## Template Usage Notes **When to perform GRADE assessment:** - When adding paper to corpus - Before citing paper in implementation decisions - When updating quality assessments (annually) - Before recommending paper as "definitive" on topic **Assessment approach:** 1. Determine baseline quality (publication venue) 2. Evaluate each GRADE criterion systematically 3. Apply downgrades/upgrades per framework 4. Calculate final GRADE level 5. Assess AIWG-specific applicability 6. Document confidence and gaps **Common pitfalls:** - Conflating quality with relevance (HIGH quality ≠ HIGH relevance) - Ignoring indirect evidence (extrapolation often reasonable) - Over-penalizing for missing statistical tests (common in ML) - Not documenting applicability gaps **GRADE levels in citation language:** - HIGH: "demonstrates", "shows", "establishes" - MODERATE: "suggests", "indicates", "supports" - LOW: "limited evidence", "preliminary findings" - VERY LOW: "anecdotal", "exploratory" Per @.claude/rules/citation-policy.md ## Metadata - **Template Type:** research-quality-assessment - **Framework:** research-complete - **Primary Agent:** @agentic/code/frameworks/research-complete/agents/quality-agent.md - **Related Templates:** - @agentic/code/frameworks/research-complete/templates/literature-note.md - @agentic/code/frameworks/research-complete/templates/extraction.yaml - **Version:** 1.0.0 - **Last Updated:** 2026-02-03