@blundergoat/goat-flow
Version:
AI coding agent harness and local dashboard for Claude Code, OpenAI Codex, Google Antigravity, and GitHub Copilot - setup audits, guardrails, structured skills, deny hooks, and persistent learning loops.
87 lines (69 loc) • 4.58 kB
Markdown
# Skill Quality Config
goat-flow ships a zero-config skill-quality rubric. Consumer projects can override parts of it through `.goat-flow/config.yaml` under `quality:` without editing `workflow/manifest.json`.
Use config only when the project has real conventions that differ from goat-flow defaults: custom artifact roots, different verification-gate vocabulary, project-specific subtype profiles, or extra fixture corpora.
## Minimal Example
```yaml
# .goat-flow/config.yaml
quality:
gate-vocabulary:
verification-gate:
- BLOCKING GATE
- Release Gate
- SLO Gate
additional-fixtures:
- test/fixtures/skill-quality/team-expected-scores.json
```
With no `quality:` block, the default rubric is unchanged.
## Custom Existing Subtype
Customize an existing subtype when the project has a repeatable artifact shape that should adjust goat-flow's default scoring or detection rules. The current evaluator supports these subtype keys: `workflow`, `dispatcher`, `report`, `playbook`, `index`, and `meta`.
```yaml
quality:
subtypes:
report:
detection:
kinds:
- skill
heading-patterns:
- "^## Audit Mode\\b"
must-not-have:
- "^## Step 0\\b"
name-patterns: []
profile:
trigger-clarity: 15
workflow-completeness: 5
gate-quality: 10
evidence-testability: 10
cold-start: 10
token-cost: 10
tool-deps: 5
write-risk: 0
skill-reference-fit: 10
notes: "Audit-only skills score like reports but use a domain-specific Audit Mode marker."
```
Subtype detection contributes to the classification confidence shown in the dashboard. `subtype` remains the applied scoring profile, so existing reports and fixtures keep a stable contract. Newer reports may also include additive shape fields such as `detectedShape`, `shapeConfidence`, and `shapeMismatch`; these describe what the content reads like without changing the profile used for scoring. For example, an uploaded skill file can keep `subtype: workflow` while reporting `detectedShape: playbook` when the content is really a runbook. A high structure score with low subtype confidence still returns `consider-reclassifying`.
Fallback-only subtype matches are intentionally low confidence. If a subtype only matched because it is the default fallback for a kind, the evaluator must not report that as certain.
## Supported Keys
| Key | Purpose |
|---|---|
| `walk-roots.skills` | Skill directories to inventory. |
| `walk-roots.references` | Reference directories to inventory. |
| `composition` | Shared preamble/conventions paths, skill-reference-fit regex, and composed byte cap. |
| `gate-vocabulary` | Regex sources for verification gates, explicit pass/fail language, and human-stop language. |
| `tool-keywords-regex` | Regex source for external tool dependencies. |
| `subtypes` | Detection rules, metric profile caps, and notes for artifact subtypes. |
| `fixture-path` | Primary expected-score fixture for the current project. |
| `additional-fixtures` | Extra expected-score fixtures for consumer corpora. |
## Guardrails
- Keep config project-scoped. Do not store consumer overrides in `workflow/manifest.json`.
- Add a fixture when changing subtype profiles so score drift is intentional.
- Prefer extending defaults. Replace defaults only when the project convention is incompatible.
- Avoid brittle provider-specific regexes unless a real project artifact needs them.
- Keep the scoring rubric portable. Generic or uploaded skills must earn cold-start and evidence credit through explicit context, prerequisites, gates, and evidence rules; they should not be required to reference goat-flow's shared preamble unless they are installed goat-flow skills that actually inherit it.
- Browser, MCP, and GitHub CLI dependencies count as external tools. Defaults include `browser-use`, `Playwright MCP`, `browser_*` commands, `mcp__*` tool names, and `gh`; ordinary shell/runtime commands such as `npm`, `git`, `node`, or `bash` do not trigger tool-dependency deductions by themselves.
- Do not cite gitignored task, scratchpad, or log paths from committed fixtures or docs. When a local artifact exposes a useful failure shape, sanitize it into tracked test content without private domains, accounts, credentials, or `.goat-flow/plans/**` references.
## Verification
Run the focused tests after config changes:
```bash
node --import tsx --test test/unit/quality-config.test.ts
node --import tsx --test test/unit/skill-quality/*.test.ts
```