@blundergoat/goat-flow

Version:

AI coding agent harness and local dashboard for Claude Code, OpenAI Codex, Google Antigravity, and GitHub Copilot - setup audits, guardrails, structured skills, deny hooks, and persistent learning loops.

github.com/blundergoat/goat-flow

blundergoat/goat-flow

87 lines (69 loc) • 4.58 kB

Markdown

View Raw

# Skill Quality Config goat-flow ships a zero-config skill-quality rubric. Consumer projects can override parts of it through `.goat-flow/config.yaml` under `quality:` without editing `workflow/manifest.json`. Use config only when the project has real conventions that differ from goat-flow defaults: custom artifact roots, different verification-gate vocabulary, project-specific subtype profiles, or extra fixture corpora. ## Minimal Example ```yaml # .goat-flow/config.yaml quality: gate-vocabulary: verification-gate: - BLOCKING GATE - Release Gate - SLO Gate additional-fixtures: - test/fixtures/skill-quality/team-expected-scores.json ``` With no `quality:` block, the default rubric is unchanged. ## Custom Existing Subtype Customize an existing subtype when the project has a repeatable artifact shape that should adjust goat-flow's default scoring or detection rules. The current evaluator supports these subtype keys: `workflow`, `dispatcher`, `report`, `playbook`, `index`, and `meta`. ```yaml quality: subtypes: report: detection: kinds: - skill heading-patterns: - "^## Audit Mode\\b" must-not-have: - "^## Step 0\\b" name-patterns: [] profile: trigger-clarity: 15 workflow-completeness: 5 gate-quality: 10 evidence-testability: 10 cold-start: 10 token-cost: 10 tool-deps: 5 write-risk: 0 skill-reference-fit: 10 notes: "Audit-only skills score like reports but use a domain-specific Audit Mode marker." ``` Subtype detection contributes to the classification confidence shown in the dashboard. `subtype` remains the applied scoring profile, so existing reports and fixtures keep a stable contract. Newer reports may also include additive shape fields such as `detectedShape`, `shapeConfidence`, and `shapeMismatch`; these describe what the content reads like without changing the profile used for scoring. For example, an uploaded skill file can keep `subtype: workflow` while reporting `detectedShape: playbook` when the content is really a runbook. A high structure score with low subtype confidence still returns `consider-reclassifying`. Fallback-only subtype matches are intentionally low confidence. If a subtype only matched because it is the default fallback for a kind, the evaluator must not report that as certain. ## Supported Keys | Key | Purpose | |---|---| | `walk-roots.skills` | Skill directories to inventory. | | `walk-roots.references` | Reference directories to inventory. | | `composition` | Shared preamble/conventions paths, skill-reference-fit regex, and composed byte cap. | | `gate-vocabulary` | Regex sources for verification gates, explicit pass/fail language, and human-stop language. | | `tool-keywords-regex` | Regex source for external tool dependencies. | | `subtypes` | Detection rules, metric profile caps, and notes for artifact subtypes. | | `fixture-path` | Primary expected-score fixture for the current project. | | `additional-fixtures` | Extra expected-score fixtures for consumer corpora. | ## Guardrails - Keep config project-scoped. Do not store consumer overrides in `workflow/manifest.json`. - Add a fixture when changing subtype profiles so score drift is intentional. - Prefer extending defaults. Replace defaults only when the project convention is incompatible. - Avoid brittle provider-specific regexes unless a real project artifact needs them. - Keep the scoring rubric portable. Generic or uploaded skills must earn cold-start and evidence credit through explicit context, prerequisites, gates, and evidence rules; they should not be required to reference goat-flow's shared preamble unless they are installed goat-flow skills that actually inherit it. - Browser, MCP, and GitHub CLI dependencies count as external tools. Defaults include `browser-use`, `Playwright MCP`, `browser_*` commands, `mcp__*` tool names, and `gh`; ordinary shell/runtime commands such as `npm`, `git`, `node`, or `bash` do not trigger tool-dependency deductions by themselves. - Do not cite gitignored task, scratchpad, or log paths from committed fixtures or docs. When a local artifact exposes a useful failure shape, sanitize it into tracked test content without private domains, accounts, credentials, or `.goat-flow/plans/**` references. ## Verification Run the focused tests after config changes: ```bash node --import tsx --test test/unit/quality-config.test.ts node --import tsx --test test/unit/skill-quality/*.test.ts ```