UNPKG

aiwg

Version:

Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.

316 lines (249 loc) 7.71 kB
# SDLC Extensions Evaluation Plan ## Overview This document defines the evaluation criteria, test scenarios, and quality gates for the language/platform-specific SDLC extensions. ## Extensions Covered | Extension | Type | Skills | Agents | |-----------|------|--------|--------| | Python | Language | 6 | 1 | | JavaScript | Language | 5 | 1 | | GitHub | Platform | 5 | 1 | ## Research Compliance Validation Each extension skill must demonstrate compliance with: - **REF-001**: Production-Grade Agentic Workflows - **REF-002**: LLM Failure Modes in Agentic Scenarios ### Python Extension Compliance | Skill | Archetype 1 | Archetype 2 | Archetype 3 | Archetype 4 | |-------|-------------|-------------|-------------|-------------| | pytest-runner | | | | | | venv-manager | | | | | | pip-auditor | | | | | | pylint-checker | | | | | | mypy-validator | | | | | | poetry-manager | | | | | ### JavaScript Extension Compliance | Skill | Archetype 1 | Archetype 2 | Archetype 3 | Archetype 4 | |-------|-------------|-------------|-------------|-------------| | vitest-runner | | | | | | eslint-checker | | | | | | typescript-validator | | | | | | npm-auditor | | | | | | bundle-analyzer | | | | | ### GitHub Extension Compliance | Skill | Archetype 1 | Archetype 2 | Archetype 3 | Archetype 4 | |-------|-------------|-------------|-------------|-------------| | repo-analyzer | | | | | | pr-reviewer | | | | | | actions-checker | | | | | | issue-tracker | | | | | | release-manager | | | | | ## Python Extension Evaluation ### pytest-runner Scenarios **Test Case PY-001: Basic Test Execution** ``` Input: Python project with pytest Expected: Test results with pass/fail counts Grounding: venv activated, pytest installed Recovery: Handle missing dependencies gracefully ``` **Test Case PY-002: Coverage Report** ``` Input: Tests with coverage flag Expected: Coverage report in multiple formats Grounding: pytest-cov installed Recovery: Skip coverage if not available ``` ### venv-manager Scenarios **Test Case VM-001: Create Virtual Environment** ``` Input: Project directory Expected: venv/ created with correct Python Grounding: Python version confirmed Recovery: Handle creation failures ``` **Test Case VM-002: Dependency Installation** ``` Input: requirements.txt Expected: All packages installed Grounding: pip upgraded first Recovery: Report failed packages ``` ## JavaScript Extension Evaluation ### vitest-runner Scenarios **Test Case JS-001: Basic Test Execution** ``` Input: TypeScript project with Vitest Expected: Test results with pass/fail counts Grounding: node_modules present Recovery: Handle missing dependencies ``` **Test Case JS-002: Coverage Report** ``` Input: Tests with coverage flag Expected: Coverage report generated Grounding: v8 coverage available Recovery: Skip if not configured ``` ### eslint-checker Scenarios **Test Case ES-001: Lint Execution** ``` Input: TypeScript source files Expected: Lint results with error/warning counts Grounding: ESLint config validated Recovery: Handle config errors ``` **Test Case ES-002: Auto-Fix** ``` Input: Files with fixable issues Expected: Issues fixed, report generated Grounding: User confirmation for --fix Recovery: No destructive changes without approval ``` ## GitHub Extension Evaluation ### repo-analyzer Scenarios **Test Case GH-001: Repository Analysis** ``` Input: GitHub repository URL Expected: Structure analysis, health report Grounding: gh CLI authenticated Recovery: Handle API rate limits ``` **Test Case GH-002: Private Repository** ``` Input: Private repo URL Expected: Analysis with proper auth Grounding: Token permissions verified Recovery: Clear error on auth failure ``` ### pr-reviewer Scenarios **Test Case PR-001: PR Review** ``` Input: PR number Expected: Review with findings, suggestions Grounding: PR exists, user has access Recovery: Handle large diffs gracefully ``` **Test Case PR-002: Security Scan** ``` Input: PR with potential vulnerabilities Expected: Security issues flagged Escalation: User decision on blocking issues Recovery: Continue review if scan fails ``` ## Quality Gates ### Gate 1: Extension Structure - [ ] extension.json valid - [ ] All declared skills exist - [ ] Agent orchestrator present - [ ] Templates defined (if any) ### Gate 2: Skill Structure - [ ] SKILL.md follows template - [ ] Grounding checkpoint present - [ ] Recovery protocol defined - [ ] Context scope documented ### Gate 3: Research Compliance - [ ] BP-4 Single Responsibility - [ ] BP-9 KISS principle - [ ] All 4 archetypes addressed - [ ] Uncertainty escalation clear ### Gate 4: Integration Testing - [ ] Orchestrator coordinates skills - [ ] Cross-skill data flow works - [ ] Checkpoint handoff successful - [ ] Recovery cascade works ## Test Execution Matrix ### Python Extension ```bash # Setup python3 -m venv test_venv source test_venv/bin/activate pip install pytest pytest-cov pylint mypy # Tests pytest tests/ -v --cov=src pylint src/ --output-format=json mypy src/ --json-report ``` ### JavaScript Extension ```bash # Setup npm install # Tests npx vitest run --coverage npx eslint src/ --format json npx tsc --noEmit ``` ### GitHub Extension ```bash # Setup gh auth status # Tests gh repo view owner/repo --json name,languages gh pr view 42 --json title,files,status gh run list --limit 10 ``` ## CI/CD Integration ```yaml # .github/workflows/extension-evaluation.yml name: Extension Evaluation on: [push, pull_request] jobs: python-extension: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v4 with: python-version: '3.11' - name: Validate Extension run: | test -f "agentic/code/frameworks/sdlc-complete/extensions/python/extension.json" for skill in pytest-runner venv-manager; do test -f "agentic/code/frameworks/sdlc-complete/extensions/python/skills/$skill/SKILL.md" done javascript-extension: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: '20' - name: Validate Extension run: | test -f "agentic/code/frameworks/sdlc-complete/extensions/javascript/extension.json" for skill in vitest-runner eslint-checker; do test -f "agentic/code/frameworks/sdlc-complete/extensions/javascript/skills/$skill/SKILL.md" done github-extension: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Validate Extension run: | test -f "agentic/code/frameworks/sdlc-complete/extensions/github/extension.json" for skill in repo-analyzer pr-reviewer; do test -f "agentic/code/frameworks/sdlc-complete/extensions/github/skills/$skill/SKILL.md" done ``` ## Metrics ### Extension Quality Score ``` Structure (30 points) - extension.json valid: 10 - All skills present: 10 - Agent present: 5 - Templates present: 5 Skills (40 points) - SKILL.md per skill: 10 - Grounding checkpoints: 10 - Recovery protocols: 10 - Context scopes: 10 Integration (30 points) - Orchestrator functional: 15 - Cross-skill flow: 10 - Checkpoint support: 5 Total: 100 points PASS: ≥80 | WARN: 60-79 | FAIL: <60 ``` ## Revision History | Version | Date | Changes | |---------|------|---------| | 1.0.0 | 2025-01-15 | Initial evaluation plan |