Update quality-playbook skill to v1.5.6 + add agent (#1402)

Rebuilds branch from upstream/staged (was previously merged from
upstream/main, which brought in materialized plugin files that
fail Check Plugin Structure on PRs targeting staged).

Changes vs. staged:
- Update skills/quality-playbook/ to v1.5.6 (31 bundled assets:
  SKILL.md + LICENSE.txt + 16 references/ + 9 phase_prompts/ +
  3 agents/ + bin/citation_verifier.py + quality_gate.py).
- Add agents/quality-playbook.agent.md (top-level orchestrator).
  name: quality-playbook (validator-compliant).
- Update docs/README.skills.md quality-playbook row description
  + bundled-assets list to v1.5.6.
- Fix 'unparseable' → 'unparsable' in quality_gate.py (5 instances;
  codespell preference, both spellings valid).

Closes the v1.4.0 → v1.5.6 update in a single clean commit on top of
upstream/staged. The preserved backup branch backup-bedbe84-pre-rebuild
(SHA bedbe848fa3c0f0eda8e653c42b599a17dd2e354) holds the prior history for reference.
This commit is contained in:
Andrew Stellman
2026-05-10 21:31:53 -04:00
committed by GitHub
parent e7755069e9
commit b8441d218b
32 changed files with 9639 additions and 543 deletions
@@ -0,0 +1,47 @@
# phase_prompts/
Externalized phase prompt bodies for the Quality Playbook.
v1.5.4 F-1 (Bootstrap_Findings 2026-04-30) extracted these from
`bin/run_playbook.py`'s inline string templates so both execution
modes — UI-context skill-direct (a coding agent walking through
SKILL.md inline) and CLI-automation runner-driven (`python -m
bin.run_playbook`) — read from the same single source of truth.
Without externalization the two modes drift; with it, an edit to a
phase prompt lands once and benefits both.
## File layout
- `phase1.md` ... `phase6.md` — one file per pipeline phase. Loaded
by `bin/run_playbook.py::_load_phase_prompt`.
- `single_pass.md` — the legacy single-prompt invocation (used when
the operator wants the LLM to drive all six phases inline rather
than via the per-phase orchestrator).
- `iteration.md` — the iteration-strategy prompt (gap, unfiltered,
parity, adversarial — see `bin/run_playbook.py::next_strategy`).
## Substitution conventions
Most files are pure-literal markdown — the loader returns them
unchanged. Three files use `str.format()` substitution with named
placeholders:
- `phase1.md``{seed_instruction}` (skip Phase 0/0b prelude when
`--no-seeds`) and `{role_taxonomy}` (rendered from
`bin.role_map.ROLE_DESCRIPTIONS`).
- `single_pass.md``{skill_fallback_guide}` and
`{seed_instruction}`.
- `iteration.md``{skill_fallback_guide}` and `{strategy}`.
Inside files that go through `.format()`, JSON braces and other
literal `{` / `}` characters MUST be doubled (`{{` / `}}`) per
Python's format-string escaping rules. Pure-literal files do not
need any escaping.
## Editing discipline
When you change a phase prompt, the loader picks up the new content
at the next invocation — there is no caching layer to invalidate. The
test suite at `bin/tests/test_phase_prompts_externalized.py` pins the
loader's contract; if you add a new substitution variable, extend
those tests.
@@ -0,0 +1 @@
{skill_fallback_guide} Run the next iteration using the {strategy} strategy. Any updates to quality/PROGRESS.md must keep the existing phase tracker in checkbox format (`- [x] Phase N - <name>`) — do not rewrite it as a table. The orchestrator appends `## Iteration: <strategy> started/complete` sections itself; iteration work should not touch the existing phase tracker lines.
@@ -0,0 +1,229 @@
You are a quality engineer. {skill_fallback_guide} For this phase read ONLY the sections up through Phase 1 (stop at the "---" line before "Phase 2"). Also read the reference files (under whichever references/ directory matches the install path you resolved) that are relevant to exploration.
{seed_instruction}
Execute Phase 1: Explore the codebase. The reference_docs/ directory contains gathered documentation - read it to supplement your exploration. Top-level files are Tier 4 context (AI chats, design notes, retrospectives). Files under reference_docs/cite/ are citable sources (project specs, RFCs). If reference_docs/ is missing or empty, proceed with Tier 3 evidence (source tree) alone and note this in EXPLORATION.md.
### MANDATORY FILE-ROLE TAGGING (v1.5.4 Part 1)
Before (or as part of) writing EXPLORATION.md, produce `quality/exploration_role_map.json`. Begin by reading `SKILL.md` at the repository root if present (also check for any other top-level skill-shaped entry file — the indicator is content + name, not extension; a `README.md` is NOT a skill-shaped entry just because it sits at the root). The prose context informs every subsequent file's role tag.
**File source (v1.5.4 Phase 3.6.1, codex-prevention).** Use `git ls-files` as the canonical file list when the target is a git repo — this respects `.gitignore` automatically and is the ONLY supported enumeration source. Do NOT use `os.walk`, `find`, `os.listdir`, or any recursive directory walker — those will pull in `.git/`, `.venv/`, `node_modules/`, build outputs, and vendored dependencies, all of which are FORBIDDEN in the role map (the validator rejects them and aborts the run). When the target is not a git repo, use a filesystem walk that explicitly skips the disallowed paths listed below; record this fallback in the role map's `provenance` field.
**Disallowed paths (MUST NOT appear in the role map under any role):** `.git/`, `.venv/`, `venv/`, `node_modules/`, `__pycache__/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.tox/`, plus any path with a component ending in `.egg-info` or `.dist-info`. The validator at `bin/role_map.py::DISALLOWED_PATH_PREFIXES` enforces this — if your role map contains any such path, the run aborts. There is also a hard ceiling of 2000 entries; a role map with more is treated as evidence Phase 1 walked .gitignored content.
**Provenance (v1.5.4 Phase 3.6.1).** The role map's top-level `provenance` field MUST be one of:
- `"git-ls-files"` — preferred. Target is a git repo; you ran `git ls-files` to enumerate.
- `"filesystem-walk-with-skips"` — fallback. Target is not a git repo; you walked the filesystem with explicit skips for every entry in the disallowed-paths list above.
- `"unknown"` — accepted only on legacy role maps; do NOT emit this for fresh runs.
For each in-scope file, emit a record with the role taxonomy below. The judgment is content-based: read the file (or enough of it to judge), do NOT pattern-match on extension or directory name alone.
**Sentinel files (v1.5.4 Phase 3.6.1).** Files named `.gitkeep` (or similar empty-directory markers) in the repository's tracked tree MUST NOT be deleted. They keep otherwise-empty directories present in git history. If you find such a file and don't understand its purpose, leave it alone. The pre-flight check verifies all `.gitignore !`-rule sentinels are present and aborts the run if any are missing.
**If you encounter a bug in QPB itself during this run** (e.g., an exception from `bin/run_playbook.py`, a missing import, a broken assertion in QPB source), STOP the run immediately and report:
1. The exact error and where it occurred (file:line + traceback)
2. A diagnosis of the likely root cause
3. A proposed fix shape (do NOT apply it)
Do NOT patch QPB source code yourself. QPB source changes go through Council review (see `~/Documents/AI-Driven Development/CLAUDE.md`). A structural backstop captures the QPB source tree's git SHA at run start and verifies it unchanged at every phase boundary; an autonomous source patch will fail the gate with a diagnostic naming the modified files.
Role taxonomy (single source of truth: `bin/role_map.py::ROLE_DESCRIPTIONS`):
{role_taxonomy}
If a file genuinely doesn't fit any of these, you may add a new role — but document the addition in your role map's first entry as a comment-style rationale.
The output file `quality/exploration_role_map.json` MUST conform to this schema:
```
{{
"schema_version": "1.0",
"timestamp_start": "<ISO 8601 UTC timestamp at the start of Phase 1>",
"provenance": "git-ls-files",
"files": [
{{
"path": "<repo-relative POSIX path>",
"role": "<one of the role taxonomy values>",
"size_bytes": <int>,
"rationale": "<one or two sentences justifying the tag, content-based>"
}}
// ... one entry per in-scope file. When role == "skill-tool", also
// include a "skill_prose_reference" string pointing at the SKILL.md /
// reference-file location that names this script (e.g., "SKILL.md:47"
// or "references/forms.md:section-3"); the prose-to-code divergence
// check in Phase 4 reads this back to find the cited prose.
]
}}
```
**You only produce `files[]` and `provenance`.** The two mechanically-derivable fields — `breakdown` and `summary` — are computed by the runner between Phase 1 LLM exit and the Phase 2 entry-gate (v1.5.6 cluster 047 architectural fix). The runner calls `bin.role_map.compute_breakdown(files)` and `bin.role_map.summarize_role_map(...)` and writes the canonical values into the on-disk file before validation. Don't include `breakdown` or `summary` in your output — even if you do, the runner will overwrite them. Your job is the analytical work (per-file role tagging in `files[]` plus `provenance`); the deterministic aggregations are runner-owned. (Pre-v1.5.6 the LLM was instructed to compute these too, which produced a class of failures where the LLM reverted to intuitive summarization that drifted from the strict mechanical contract; runner-side computation removes the failure mode.)
Tagging discipline:
1. `skill-tool` and `code` is the load-bearing distinction. A script is only `skill-tool` if SKILL.md (or a doc SKILL.md cites) explicitly names it and tells the agent to invoke it. Independent code modules — even small ones in a `scripts/` directory — are `code` if no SKILL.md prose directs the agent to use them.
2. Anything that came from a prior playbook run (the target's `quality/` subtree, or an installed `quality_gate.py` from QPB itself — the file the installer copies next to SKILL.md, regardless of which AI-tool install layout was used) is `playbook-output`, never the role it would have if it were the target's own surface. This prevents the v1.5.3 LOC-pollution failure mode where a target's apparent code surface was inflated by QPB's own infrastructure.
3. If SKILL.md is absent at the root and no other skill-shaped entry file exists, the role map will have zero `skill-prose` entries. That's fine — the four-pass derivation pipeline will no-op for this target.
Handling edge cases (v1.5.4 Phase 1 edge-case discipline):
- **No SKILL.md at root, no other skill-shaped entry.** Tag every file by content as usual. The role map will carry zero `skill-prose` and `skill-reference` entries; the four-pass pipeline will no-op. Do NOT invent a synthetic SKILL.md or label something `skill-prose` for a project that genuinely has no skill surface.
- **SKILL.md references a script that does not exist.** Add a top-level `broken_references` array to the role map carrying `{{"prose_location": "<file>:<line>", "missing_script": "<path-as-cited>"}}` entries. Do NOT add a synthetic file entry for the missing script. Note the broken reference in EXPLORATION.md so Phase 4's prose-to-code divergence check can register it as a known gap. (This field is additive; the gate's role-map validator does not require it.)
- **Target with a very large file count (1000+).** Process in batches. The `files` array can grow incrementally as you walk the tree; once you've made all per-file judgments, write the file once. Do not write a partial role map mid-walk — the validator considers the file complete when it appears, and the runner-side `normalize_role_map_for_gate` step (v1.5.6 cluster 047) computes `breakdown` and `summary` after you exit Phase 1.
- **Ambiguous prose ("the helper script", "the validator").** Default to `code`. `skill-tool` requires an unambiguous citation: SKILL.md or a referenced doc must name the file (or a path-suffix that uniquely identifies it) AND direct the agent to invoke it. When in doubt, tag `code` and capture the ambiguity in `rationale` — it's better to under-tag `skill-tool` than to inflate the surface area Phase 4's prose-to-code check operates on.
- **Generated files (build outputs, vendored dependencies, lockfiles).** Skip them at the ignore-rule layer; do not include them in the role map. If you can't tell whether a file is generated, look for a generation marker (header comment naming the generator, sibling `.generated` file, presence in `.gitignore`); if generated, omit from the role map.
When Phase 1 is complete, write your full exploration findings to
`quality/EXPLORATION.md`. The file MUST contain ALL of the following
section titles VERBATIM (the Phase 1 gate at SKILL.md:1257-1273 enforces
each mechanically; `bin/run_state_lib.validate_phase_artifacts(quality_dir, phase=1)`
is the programmatic enforcer — your artifact has to pass it before
Phase 2 will start). The exact titles are load-bearing — do NOT
substitute "equivalent" headings:
1. `## Open Exploration Findings` — at least 8 numbered entries
(`1.`, `2.`, ...). Each entry has at least one file:line citation
in the body (e.g., `bin/foo.py:120-135`). At least 3 of these
entries trace behavior across 2 or more distinct file:line
locations (multi-location traces — the entry cites two or more
different file:line ranges).
2. `## Quality Risks` — domain-knowledge risk analysis. Numbered or
bulleted; cite file:line where risks are concretely visible in
code or docs.
3. `## Pattern Applicability Matrix` — a Markdown table with one row
per exploration pattern from `references/exploration_patterns.md`.
Decision column values are `FULL` or `SKIP`. Between 3 and 4
patterns must be marked `FULL` (inclusive — the gate rejects
below 3 because exploration didn't pick enough patterns, and
above 4 because exploration ran every pattern instead of
selecting). Skipped patterns are still listed with `SKIP` and a
brief reason, so the matrix is exhaustive.
4. `## Pattern Deep Dive — <pattern-name>` — at least 3 sections,
one per `FULL` pattern. Each deep dive enumerates concrete
findings with file:line citations. At least 2 of these sections
trace code paths across 2 or more distinct identifiers (e.g.,
backtick-quoted function or symbol names like `\`docs_present\``,
`\`_evaluate_documentation_state\``) OR across 2 or more distinct
file:line locations — that's how the gate detects "multi-function
trace" rather than a one-anchor finding.
5. `## Candidate Bugs for Phase 2` — numbered list of bug
hypotheses promoted from the deep dives + open exploration. Each
entry has a `Stage:` line attributing the source (e.g., `Stage:
open exploration`, `Stage: quality risks`, or
`Stage: <Pattern Name>`). At least 2 entries must be sourced from
`open exploration` / `quality risks` AND at least 1 entry must be
sourced from a pattern deep dive. Combo stages
(`Stage: open exploration + Cross-Implementation Consistency`)
count toward both buckets.
6. `## Gate Self-Check` — proves you ran the Phase 1 gate. List each
of the 13 checks (≥120 lines + six required headings + ≥3 Pattern
Deep Dive sections + PROGRESS.md mark + ≥8 findings with citations
+ ≥3 multi-location findings + 3-4 FULL pattern matrix rows + ≥2
multi-function deep dives + candidate-bug source mix) and mark
whether the artifact satisfies each.
In addition, ensure `quality/PROGRESS.md` exists and its Phase 1
line is marked `[x]` (the gate's check 8) before declaring Phase 1
complete.
The exploration content the prior versions of this prompt asked for
(domain and stack identification, architecture map, existing test
inventory, specification summary, skeleton/dispatch analysis,
derived requirements `REQ-NNN`, derived use cases `UC-NN`,
file-role tagging summary) lives WITHIN these required sections —
for example, the architecture map and module enumeration belong
under `## Open Exploration Findings` as multi-location findings;
the file-role tagging summary and the `exploration_role_map.json`
breakdown summary belong under `## Open Exploration Findings` or
`## Quality Risks` as analytical content; derived REQ-NNN and UC-NN
sections may appear after `## Gate Self-Check` as additional
analytical material the playbook downstream phases consume. Do NOT
use these alternative names as TOP-level section titles — the gate
requires the six exact titles above and the Pattern Deep Dive
prefix; additional `## ` sections beyond these are tolerated for
analytical extension but the six gate-required titles MUST appear
verbatim.
### MANDATORY CARTESIAN UC RULE (Lever 1, v1.5.2)
For every requirement with a `References` field naming ≥2 files (or ≥2 file:line ranges in distinct files), apply the **Cartesian eligibility check** before deciding whether to emit a single umbrella UC or per-site UCs:
**Gate 1 — Path-suffix match.** At least two references must share a path-suffix role: the last segment before the extension, or a matching function-name pattern that appears across the files.
- Example of a match: `virtio_mmio.c`, `virtio_vdpa.c`, `virtio_pci_modern.c` all implement `_finalize_features`. The `_finalize_features` function is the shared role.
- Example of a non-match: `CONFIG_FOO`, `CONFIG_BAR` flags in the same kconfig file — same kind of thing, but not parallel implementations.
**Gate 2 — Function-level similarity.** Each matching reference must cite a line range of similar size (within 2× of the median) and each range must be inside a function body — not a file-header, a kconfig block, or a macro expansion list.
**Decision:**
- **Both gates pass →** emit one UC per site, numbered `UC-N.a`, `UC-N.b`, `UC-N.c`, … Each per-site UC has its own Actors, Preconditions, Flow, Postconditions. The parent REQ-N remains as the umbrella.
- **Only Gate 1 passes →** keep a single umbrella UC and mark the reference cluster `heterogeneous` in a `<!-- cluster: heterogeneous -->` HTML comment in the UC body. Phase 3 can still override if it finds per-site divergence.
- **Neither gate passes →** single umbrella UC, no special marking.
### Worked example — REQ-010 / VIRTIO_F_RING_RESET (virtio)
Suppose Phase 1 derives:
### REQ-010: Virtio transports must honor VIRTIO_F_RING_RESET negotiation
- References: drivers/virtio/virtio_mmio.c, drivers/virtio/virtio_vdpa.c, drivers/virtio/virtio_pci_modern.c
- Pattern: whitelist
Applying the Cartesian check:
- Gate 1: all three files contain `_finalize_features` functions — matches.
- Gate 2: each cited range is inside a function body of similar size — matches.
Both gates pass → emit per-site UCs:
### UC-10.a: VIRTIO_F_RING_RESET on PCI modern transport
- Actors: virtio_pci_modern driver, guest kernel
- Preconditions: device advertises VIRTIO_F_RING_RESET
- Flow: vp_modern_finalize_features propagates bit through config space …
- Postconditions: feature_bit reflected in final config
### UC-10.b: VIRTIO_F_RING_RESET on MMIO transport
- Actors: virtio_mmio driver, guest kernel
- Preconditions: device advertises VIRTIO_F_RING_RESET
- Flow: vm_finalize_features must mirror PCI modern behavior …
- Postconditions: feature_bit survives finalize call
### UC-10.c: VIRTIO_F_RING_RESET on vDPA transport
- Actors: virtio_vdpa driver, vdpa device backend
- Preconditions: device advertises VIRTIO_F_RING_RESET
- Flow: virtio_vdpa_finalize_features forwards through set_driver_features …
- Postconditions: feature_bit visible to vdpa backend
### CONFIRMATION CHECKLIST (Cartesian UC rule)
Before completing Phase 1, confirm each item explicitly in EXPLORATION.md under a section titled "Cartesian UC rule confirmation":
1. For every REQ with ≥2 References, I ran Gate 1 (path-suffix match).
2. For every REQ that passed Gate 1, I ran Gate 2 (function-level similarity).
3. Where both gates passed, I emitted per-site UCs (UC-N.a, UC-N.b, …).
4. Where only Gate 1 passed, I marked the cluster `<!-- cluster: heterogeneous -->`.
5. Where neither gate passed, I kept a single umbrella UC without marking.
6. For each REQ with a pattern match in Gate 1, I added `Pattern: whitelist|parity|compensation` to the REQ block.
Also initialize quality/PROGRESS.md with the run metadata and the phase tracker in the EXACT checkbox format below. This format is a hard contract: the Phase 5 gate checks for the substring `- [x] Phase 4` before allowing reconciliation to start, and it only matches the checkbox form. Do NOT substitute a Markdown table, bulleted prose, or any other layout — table-format runs have aborted mid-pipeline because the gate does not see "Complete" in a table cell as equivalent.
Template for the phase tracker section of PROGRESS.md (fill in the Skill version from SKILL.md metadata):
```
# Quality Playbook Progress
Skill version: <vX.Y.Z>
Date: <YYYY-MM-DD>
## Phase tracker
- [x] Phase 1 - Explore
- [ ] Phase 2 - Generate
- [ ] Phase 3 - Code Review
- [ ] Phase 4 - Spec Audit
- [ ] Phase 5 - Reconciliation
- [ ] Phase 6 - Verify
```
As each later phase completes it will flip its own `- [ ]` to `- [x]` — keep the line text (including the phase name after the dash) stable so substring matching in the Phase 5 gate and downstream tooling works.
IMPORTANT: Do NOT proceed to Phase 2. Your only job is exploration and writing findings to disk. Write thorough, detailed findings - the next phase will read EXPLORATION.md to generate artifacts, so everything important must be captured in that file.
@@ -0,0 +1,27 @@
{skill_fallback_guide}
You are a quality engineer continuing a phase-by-phase quality playbook run. Phase 1 (exploration) is already complete.
Read these files to get context:
1. quality/EXPLORATION.md - your Phase 1 findings (requirements, risks, architecture)
2. quality/PROGRESS.md - run metadata and phase status
3. SKILL.md - read the Phase 2 section (from "Phase 2: Generate the Quality Playbook" through the "Checkpoint: Update PROGRESS.md after artifact generation" section). Also read the reference files cited in that section. Resolve SKILL.md and reference files via the documented fallback list above; do NOT assume any single install layout (`.github/skills/`, `.claude/skills/quality-playbook/`, `.cursor/skills/quality-playbook/`, `.continue/skills/quality-playbook/`, or root).
**Field preservation rule (v1.5.2, Lever 2).** When transcribing REQ hypotheses from EXPLORATION.md into `quality/REQUIREMENTS.md` and `quality/requirements_manifest.json`, every `- Pattern: <value>` field present on the source hypothesis MUST appear on the corresponding REQ in both output files. Pattern values are `whitelist | parity | compensation`. Phase 1's Cartesian UC rule (confirmation checklist item 6) requires Pattern tagging for every REQ where both UC gates match; Phase 2 must not silently drop these tags. If a hypothesis lacks Pattern but you believe it should have one (per-site UCs emitted with `UC-N.a`/`UC-N.b` suffixes, multi-file `References` suggesting a parallel structure), add Pattern during Phase 2 — do not omit the field. The Phase 5 cardinality gate cannot enforce coverage on a REQ it doesn't know is pattern-tagged; silent omission is a documented v1.4.5-regression vector.
Execute Phase 2: Generate all quality artifacts. Use the exploration findings in EXPLORATION.md as your source - do not re-explore the codebase from scratch. Generate:
- quality/QUALITY.md (quality constitution)
- quality/CONTRACTS.md (behavioral contracts)
- quality/REQUIREMENTS.md (with REQ-NNN and UC-NN identifiers from EXPLORATION.md)
- quality/COVERAGE_MATRIX.md
- Functional tests (quality/test_functional.*)
- quality/RUN_CODE_REVIEW.md (code review protocol)
- quality/RUN_INTEGRATION_TESTS.md (integration test protocol)
- quality/RUN_SPEC_AUDIT.md (spec audit protocol)
- quality/RUN_TDD_TESTS.md (TDD verification protocol)
- quality/COMPLETENESS_REPORT.md (baseline, without verdict)
- If dispatch/enumeration contracts exist: quality/mechanical/ with verify.sh and extraction artifacts. Run verify.sh immediately and save receipts.
Update PROGRESS.md: mark Phase 2 complete (use the checkbox format `- [x] Phase 2 - Generate` — do NOT switch to a table), update artifact inventory.
IMPORTANT: Do NOT proceed to Phase 3 (code review). Your job is artifact generation only. The next phase will execute the review protocols you generated.
@@ -0,0 +1,154 @@
{skill_fallback_guide}
You are a quality engineer continuing a phase-by-phase quality playbook run. Phases 1-2 are complete.
Read these files to get context:
1. quality/PROGRESS.md - run metadata, phase status, artifact inventory
2. quality/EXPLORATION.md - Phase 1 findings (especially the "Candidate Bugs for Phase 2" section)
3. quality/REQUIREMENTS.md - derived requirements and use cases
4. quality/CONTRACTS.md - behavioral contracts
5. SKILL.md - read the Phase 3 section ("Phase 3: Code Review and Regression Tests"). Also read references/review_protocols.md. Resolve SKILL.md and the references/ directory via the documented fallback list above; do NOT assume any single install layout.
Execute Phase 3: Code Review + Regression Tests.
Run the 3-pass code review per quality/RUN_CODE_REVIEW.md. For every confirmed bug:
- Add to quality/BUGS.md with ### BUG-NNN heading format
- Write a regression test (xfail-marked)
- Generate quality/patches/BUG-NNN-regression-test.patch (MANDATORY for every confirmed bug)
- Generate quality/patches/BUG-NNN-fix.patch (strongly encouraged)
- Write code review reports to quality/code_reviews/
- Update PROGRESS.md BUG tracker
### MANDATORY GRID STEP (Lever 2, v1.5.2) — pattern-tagged REQs only
For every REQ in quality/REQUIREMENTS.md that has a `Pattern:` field (`whitelist`, `parity`, or `compensation`), you MUST produce a compensation grid BEFORE writing any BUG entries for that REQ.
**Step 1. Enumerate the authoritative item set.** Mechanical extraction from source — uapi header, spec section, documented constants. Do NOT invent. Example: for VIRTIO_F_RING_RESET-family, grep `include/uapi/linux/virtio_config.h` for `VIRTIO_F_*` and list the bits the REQ covers.
**Step 2. Enumerate the sites.** From the REQ's per-site UCs (UC-N.a, UC-N.b, …). If the REQ has a single umbrella UC but is pattern-tagged, the grid is 1-dimensional over items.
**Step 3. Produce the grid.** Write `quality/compensation_grid.json` with one entry per REQ:
```json
{
"schema_version": "1.5.2",
"reqs": {
"REQ-010": {
"pattern": "whitelist",
"items": ["RING_RESET", "ADMIN_VQ", "NOTIF_CONFIG_DATA", "SR_IOV"],
"sites": ["PCI", "MMIO", "vDPA"],
"cells": [
{"cell_id": "REQ-010/cell-RING_RESET-PCI", "item": "RING_RESET", "site": "PCI", "present": true, "evidence": "drivers/virtio/virtio_pci_modern.c:XXX-YYY"},
{"cell_id": "REQ-010/cell-RING_RESET-MMIO", "item": "RING_RESET", "site": "MMIO", "present": false, "evidence": "drivers/virtio/virtio_mmio.c: no match for RING_RESET"}
]
}
}
}
```
Cell IDs are mechanical: `REQ-<N>/cell-<item>-<site>`. No whitespace, uppercase item/site identifiers where natural.
**Step 4. Apply the BUG-default rule.** For every cell where:
- the item is defined in authoritative source AND
- the item is absent from any shared filter AND
- the item is absent from the site's compensation path
→ the cell DEFAULTS to BUG. Emit one `### BUG-NNN` entry with the cell's file:line citation, spec basis, and expected-vs-actual behavior. Include a `- Covers: [REQ-N/cell-<item>-<site>]` line (see schemas.md §8 for the field contract).
**Step 5. Downgrade to QUESTION requires a structured JSON record.** Append one record per downgraded cell to `quality/compensation_grid_downgrades.json`:
```json
{
"schema_version": "1.5.2",
"downgrades": [
{
"cell_id": "REQ-010/cell-RING_RESET-MMIO",
"authority_ref": "include/uapi/linux/virtio_config.h:116",
"site_citation": "drivers/virtio/virtio_mmio.c:109-131",
"reason_class": "intentionally-partial",
"falsifiable_claim": "MMIO does not support RING_RESET because the MMIO transport predates the feature bit and kernel docs at Documentation/virtio/virtio_mmio.rst:42-55 state the transport is frozen at its v1.0 feature set; falsifiable by showing MMIO re-sets bit 40 under any kernel release."
}
]
}
```
- `reason_class` enum: `out-of-scope | deprecated | platform-gated | handled-upstream | intentionally-partial`.
- `authority_ref`, `site_citation`, `falsifiable_claim` are required and non-empty.
- `falsifiable_claim` must state an observable condition that would make the claim wrong.
- Missing any required field, or `reason_class` outside the enum, or zero-length `falsifiable_claim` → cell REVERTS to BUG at Phase 5 gate time. There is no re-prompt loop.
**Step 6. Self-check.** Before finalizing BUGS.md for this REQ, verify that every cell in the grid appears in either:
- some BUG's `- Covers: [...]` list, OR
- a downgrade record in `quality/compensation_grid_downgrades.json`.
Any cell missing from both will fail the Phase 5 cardinality gate. This self-check is advisory in Phase 3; the blocking gate runs in Phase 5.
### Worked example — RING_RESET grid (virtio)
REQ-010 pattern: whitelist. Items: {RING_RESET, ADMIN_VQ, NOTIF_CONFIG_DATA, SR_IOV}. Sites: {PCI, MMIO, vDPA}. Grid: 4 × 3 = 12 cells.
Code inspection reveals PCI implements all four; MMIO implements none of the four (frozen at v1.0 feature set); vDPA implements NOTIF_CONFIG_DATA but not the other three.
Grid (present=T, absent=F):
| | PCI | MMIO | vDPA |
|-----------------------|-----|------|------|
| RING_RESET | T | F | F |
| ADMIN_VQ | T | F | F |
| NOTIF_CONFIG_DATA | T | F | T |
| SR_IOV | T | F | F |
BUG-default applies to every F cell (8 total). Possible consolidation:
### BUG-001: MMIO ignores VIRTIO_F_RING_RESET
- Primary requirement: REQ-010
- Covers: [REQ-010/cell-RING_RESET-MMIO]
### BUG-002: vDPA ignores VIRTIO_F_RING_RESET
- Primary requirement: REQ-010
- Covers: [REQ-010/cell-RING_RESET-vDPA]
### BUG-003: vDPA missing ADMIN_VQ hookup
- Primary requirement: REQ-010
- Covers: [REQ-010/cell-ADMIN_VQ-vDPA]
### BUG-004: MMIO ignores NOTIF_CONFIG_DATA negotiation (common filter gap)
- Primary requirement: REQ-010
- Covers: [REQ-010/cell-NOTIF_CONFIG_DATA-MMIO]
### BUG-005: MMIO + vDPA both miss SR_IOV propagation
- Primary requirement: REQ-010
- Covers: [REQ-010/cell-SR_IOV-MMIO, REQ-010/cell-SR_IOV-vDPA]
- Consolidation rationale: shared fix path in both transports goes through the same feature-bit filter; single patch on the shared helper closes both cells.
If the reviewer concluded MMIO ADMIN_VQ is intentionally out-of-scope because ADMIN_VQ is a PCI-only spec feature, the downgrade record would be:
```json
{
"cell_id": "REQ-010/cell-ADMIN_VQ-MMIO",
"authority_ref": "include/uapi/linux/virtio_pci.h:NN",
"site_citation": "drivers/virtio/virtio_mmio.c: no admin virtqueue implementation",
"reason_class": "out-of-scope",
"falsifiable_claim": "ADMIN_VQ is MMIO-scoped — falsifiable by citing any virtio-spec normative text requiring ADMIN_VQ on non-PCI transports."
}
```
Union check: 8 BUG-covered cells + 1 downgrade cell = 9. Grid has 12 cells; 4 present cells don't need coverage. Total: 8 F cells covered via BUGs + 1 via downgrade = all 9 absent cells accounted for. Grid → clean.
### ITERATION mode addendum (MANDATORY INCREMENTAL WRITE, Phase 8)
When running in iteration mode (gap / unfiltered / parity / adversarial), write candidate BUG stubs to disk immediately on identification, not at end-of-review. Path: `quality/code_reviews/<iteration>-candidates.md`. One `### CANDIDATE-NNN` heading per candidate, with at least a file:line citation. Reviewer upgrades candidates to confirmed BUGs in BUGS.md only after full triage.
### CONFIRMATION CHECKLIST (Lever 2, v1.5.2)
Before writing the Phase 3 completion checkpoint to PROGRESS.md, confirm each item explicitly in your Phase 3 summary:
1. For every pattern-tagged REQ, I produced a compensation grid in `quality/compensation_grid.json`.
2. For every grid, I applied the BUG-default rule mechanically.
3. Every BUG emitted for a pattern-tagged REQ has a `- Covers: [...]` field with valid cell IDs.
4. Every BUG whose Covers list has ≥2 entries has a non-empty `- Consolidation rationale: ...` field.
5. For every downgraded cell, I wrote a complete structured record in `quality/compensation_grid_downgrades.json` with all five required fields and a valid `reason_class`.
6. For every pattern-tagged REQ, the union of Covers lists + downgrade cells equals the grid's cell set.
Mark Phase 3 (Code review + regression tests) complete in PROGRESS.md (use the checkbox format `- [x] Phase 3 - Code Review` — do NOT switch to a table).
IMPORTANT: Do NOT proceed to Phase 4 (spec audit). The next phase will run the spec audit with a fresh context window.
@@ -0,0 +1,54 @@
{skill_fallback_guide}
You are a quality engineer continuing a phase-by-phase quality playbook run. Phases 1-3 are complete.
Read these files to get context:
1. quality/PROGRESS.md - run metadata, phase status, BUG tracker
2. quality/REQUIREMENTS.md - derived requirements
3. quality/BUGS.md - bugs found in Phase 3 (code review)
4. SKILL.md - read the Phase 4 section ("Phase 4: Spec Audit and Triage"). Also read references/spec_audit.md. Resolve SKILL.md and the references/ directory via the documented fallback list above; do NOT assume any single install layout.
Execute Phase 4: Spec Audit + Triage + Layer-2 semantic citation check.
Part A — spec audit:
Run the spec audit per quality/RUN_SPEC_AUDIT.md. Produce:
- Individual auditor reports at quality/spec_audits/YYYY-MM-DD-auditor-N.md (one per auditor)
- Triage synthesis at quality/spec_audits/YYYY-MM-DD-triage.md
- Executable triage probes at quality/spec_audits/triage_probes.sh
- Regression tests and patches for any net-new spec audit bugs
- Update BUGS.md and PROGRESS.md BUG tracker with any new findings
Part B — Layer-2 semantic citation check (v1.5.1):
The gate's invariant #17 (schemas.md §10) requires three Council members to
vote on each Tier 1/2 REQ's citation_excerpt. Execute these steps:
1. Generate per-Council-member prompts:
python3 -m bin.quality_playbook semantic-check plan .
This writes one or more prompt files to
quality/council_semantic_check_prompts/<member>.txt per member in the
Council roster (bin/council_config.py: claude-opus-4.7, gpt-5.4,
gemini-2.5-pro). For >15 Tier 1/2 REQs, prompts are split into batches
of 5 (<member>-batch<N>.txt).
If no Tier 1/2 REQs exist (Spec Gap run), this step writes an empty
quality/citation_semantic_check.json directly — skip steps 2-4.
2. For each Council member's prompt file, feed the prompt to that model
(the same roster that ran Part A) and capture its JSON-array response
to quality/council_semantic_check_responses/<member>.json. If the
member was batched, concatenate the per-batch responses into a single
array in the response file. Every entry must have req_id, verdict
(supports|overreaches|unclear), and reasoning.
3. Assemble the semantic-check output:
python3 -m bin.quality_playbook semantic-check assemble . \
--member claude-opus-4.7 --response quality/council_semantic_check_responses/claude-opus-4.7.json \
--member gpt-5.4 --response quality/council_semantic_check_responses/gpt-5.4.json \
--member gemini-2.5-pro --response quality/council_semantic_check_responses/gemini-2.5-pro.json
This writes quality/citation_semantic_check.json per schemas.md §9.
4. Verify the output file exists. Phase 6's gate invariant #17 requires
it on every Tier 1/2 run.
Mark Phase 4 (Spec audit + triage + semantic check) complete in PROGRESS.md (use the checkbox format `- [x] Phase 4 - Spec Audit` — the Phase 5 entry gate looks for that exact substring and will abort if it finds a table row or any other layout).
IMPORTANT: Do NOT proceed to Phase 5 (reconciliation). The next phase will handle reconciliation and TDD.
@@ -0,0 +1,119 @@
{skill_fallback_guide}
You are a quality engineer continuing a phase-by-phase quality playbook run. Phases 1-4 are complete.
Read these files to get context:
1. quality/PROGRESS.md - run metadata, phase status, cumulative BUG tracker
2. quality/BUGS.md - all confirmed bugs from code review and spec audit
3. quality/REQUIREMENTS.md - derived requirements
4. SKILL.md - read the Phase 5 section ("Phase 5: Post-Review Reconciliation and Closure Verification"). Also read references/requirements_pipeline.md, references/review_protocols.md, and references/spec_audit.md. Resolve SKILL.md and the references/ directory via the documented fallback list above; do NOT assume any single install layout.
Execute Phase 5: Reconciliation + TDD + Closure.
1. Run the Post-Review Reconciliation per references/requirements_pipeline.md. Update COMPLETENESS_REPORT.md.
2. Run closure verification: every BUG in the tracker must have either a regression test or an explicit exemption.
3. Write bug writeups at quality/writeups/BUG-NNN.md for EVERY confirmed bug. The canonical template is the "Bug writeup generation" section of SKILL.md (resolve via the fallback list above) — read that section before writing. Use the exact field headings listed there: **Summary, Spec reference, The code, Observable consequence, Depth judgment, The fix, The test, Related issues**. Sections 14, 6, 7 are required in every writeup; section 5 (Depth judgment) fires only when the consequence isn't self-evident from the immediate code; section 8 (Related issues) is included only when related bugs exist. Do NOT introduce fields that aren't in the template (no "Minimal reproduction" as a top-level field, no "Patch path:" as a top-level field — those belong inside Spec reference and The test respectively).
**MANDATORY HYDRATION STEP.** Before writing a writeup, re-open quality/BUGS.md and locate the `### BUG-NNN:` entry for the bug you are about to write up. Every confirmed bug in BUGS.md already has the content you need — your job is to copy it into the writeup's sections, not to invent it. If a field is missing from BUGS.md, that is a reconciliation error to surface in PROGRESS.md, not a field to fabricate. Use this field map:
| BUGS.md field | Writeup section | How to use it |
|----------------------------|------------------------------|-------------------------------------------------------------------------------|
| Title line (### BUG-NNN:…) | Summary | One sentence naming the function/code path and the observable failure. |
| Primary requirement | Spec reference | `- Requirement: REQ-NNN` |
| Spec basis | Spec reference | `- Spec basis: <doc path + line range(s), semicolon-separated if multiple>` plus a ≤15-word contract quote copied verbatim from the cited lines. |
| Location | The code | Cite `file:line` and describe what the current path does there. |
| Minimal reproduction | Observable consequence | Weave into the consequence paragraph as the triggering input. |
| Expected + Actual behavior | Observable consequence | The actual behavior is the observable failure; the expected defines the gap. |
| Regression test | The test | `- Regression test: <function name>` — verbatim from BUGS.md. |
| Patches (regression) | The test | `- Regression patch: <path>` — verbatim from BUGS.md. |
| Patches (fix) | The fix + The test | If a fix patch file exists, read it and paste the unified diff inside ```diff; also list the patch path as `- Fix patch: <path>` under The test. If no fix patch exists (confirmed-open bug), write the minimal concrete unified diff directly in The fix anyway — SKILL.md requires an inline diff in every writeup. In the no-patch case, omit the `Fix patch:` bullet from The test. |
| Red/green logs | The test | `- Red receipt: quality/results/BUG-NNN.red.log` and the matching green path. |
**Worked example.** The BUGS.md entry for BUG-004 is:
### BUG-004: naive upstream timestamps crash ETA math
- Source: Code Review
- Severity: HIGH
- Primary requirement: REQ-006
- Location: bus_tracker.py:138-144
- Spec basis: quality/REQUIREMENTS.md:163-172; quality/QUALITY.md:57-65
- Minimal reproduction: Return a visit whose ExpectedArrivalTime is an ISO string
without timezone information, such as 2026-04-21T12:00:00.
- Expected behavior: The affected arrival degrades to unknown-time while the rest
of the stop remains usable.
- Actual behavior: datetime.fromisoformat() returns a naive datetime and
subtracting it from datetime.now(timezone.utc) raises TypeError, aborting the
stop/request path.
- Regression test: quality.test_regression.TestPhase3Regressions.test_bug_004_fetch_stop_arrivals_degrades_naive_timestamps
- Patches: quality/patches/BUG-004-regression-test.patch, quality/patches/BUG-004-fix.patch
The hydrated writeup sections look like this (sketch — paste the real diff from the
fix patch file into ```diff, don't make one up):
## Summary
fetch_stop_arrivals() crashes the whole stop/request path when an upstream visit
carries a naive ExpectedArrivalTime, instead of degrading that arrival to
unknown-time.
## Spec reference
- Requirement: REQ-006
- Spec basis: quality/REQUIREMENTS.md:163-172; quality/QUALITY.md:57-65
- Behavioral contract quote: "degrade a bad per-arrival timestamp to unknown-time instead of aborting the whole response path"
## The code
At bus_tracker.py:138-144, the parser calls datetime.fromisoformat(...) on
ExpectedArrivalTime and subtracts the result from datetime.now(timezone.utc)…
## Observable consequence
When the upstream visit returns ExpectedArrivalTime="2026-04-21T12:00:00"
(no timezone), fromisoformat() returns a naive datetime, the subtraction
raises TypeError, and the entire stop/request path aborts rather than the
single affected arrival degrading to unknown-time.
## The fix
```diff
<paste the real unified diff from quality/patches/BUG-004-fix.patch here>
```
## The test
- Regression test: quality.test_regression.TestPhase3Regressions.test_bug_004_fetch_stop_arrivals_degrades_naive_timestamps
- Regression patch: quality/patches/BUG-004-regression-test.patch
- Fix patch: quality/patches/BUG-004-fix.patch
- Red receipt: quality/results/BUG-004.red.log
- Green receipt: quality/results/BUG-004.green.log
**Confirmation checklist (per writeup, before moving to the next bug).** (a) Every
required section has populated content copied from BUGS.md or the patch files —
no empty backticks, no sentinel filler like "is a confirmed code bug in ``" or
"The affected implementation lives at ``" or "Patch path: ``". (b) The ```diff
fence contains at least one `+` or `-` line from the actual fix patch. (c) The
Summary names a real function or code path, not the BUG identifier. (d) No
angle-bracket placeholders (e.g., `<...>`) remain in the final writeup — those are
pedagogical markers from the worked example and from SKILL.md, never acceptable
output.
4. Run the TDD red-green cycle: for each confirmed bug, run the regression test against unpatched code -> quality/results/BUG-NNN.red.log. If a fix patch exists, run against patched code -> quality/results/BUG-NNN.green.log. If the test runner is unavailable, create the log with NOT_RUN on the first line.
5. Generate sidecar JSON: quality/results/tdd-results.json and quality/results/integration-results.json (schema_version "1.1", canonical fields: id, requirement, red_phase, green_phase, verdict, fix_patch_present, writeup_path).
6. If mechanical verification artifacts exist, run quality/mechanical/verify.sh and save receipts.
7. Run terminal gate verification, write it to PROGRESS.md.
### MANDATORY CARDINALITY GATE (Lever 3, v1.5.2)
Before finalizing this phase, run the cardinality reconciliation gate against the current repo state. Locate `quality_gate.py` via the same fallback list used for SKILL.md (it sits in the same directory as SKILL.md in every install layout), then invoke it as a script — `quality_gate.py` runs `check_v1_5_2_cardinality_gate(repo_dir)` as part of its standard pass:
python3 <resolved_quality_gate_path> .
Where `<resolved_quality_gate_path>` is the first hit when walking the documented install-location fallback list, with `SKILL.md` swapped for `quality_gate.py` (e.g., `quality_gate.py`, `.claude/skills/quality-playbook/quality_gate.py`, `.github/skills/quality_gate.py`, `.cursor/skills/quality-playbook/quality_gate.py`, `.continue/skills/quality-playbook/quality_gate.py`, `.github/skills/quality-playbook/quality_gate.py`).
If the gate output contains any line beginning with `cardinality gate:`, or reports uncovered cells, malformed cell IDs, missing consolidation rationale on multi-cell Covers, or malformed downgrade records, STOP. Fix the BUGS.md entries or the `compensation_grid_downgrades.json` file. Do NOT proceed to completion until those failure lines no longer appear.
For every pattern-tagged REQ, the Phase 5 contract is:
- Every grid cell with `"present": false` appears in either a BUG's `Covers:` list or a downgrade record.
- Every `Covers:` entry uses the canonical cell ID form `REQ-N/cell-<item>-<site>`.
- Every BUG with ≥2 `Covers:` entries has a non-empty `Consolidation rationale:` line.
- Every downgrade record has `cell_id`, `authority_ref`, `site_citation`, `reason_class` (in the enum), `falsifiable_claim` (non-empty).
The cardinality gate is blocking. It is intentionally stricter than the Phase 3 advisory self-check; the advisory check is meant to surface problems early, but Phase 5 is where they become fatal.
Mark Phase 5 complete in PROGRESS.md (use the checkbox format `- [x] Phase 5 - Reconciliation` — do NOT switch to a table).
IMPORTANT: quality_gate.py will FAIL Phase 5 if any writeup is missing a non-empty ```diff block or contains any of these sentinel phrases verbatim: "is a confirmed code bug in ``", "The affected implementation lives at ``", "Patch path: ``", "- Regression test: ``", "- Regression patch: ``". Those two checks are the hard gate. Skipping the BUGS.md hydration step above is not gate-enforced but will produce writeups that read as unpopulated stubs and fail a human review — do not skip it.
@@ -0,0 +1,23 @@
{skill_fallback_guide}
You are a quality engineer doing the verification phase of a quality playbook run. Phases 1-5 are complete.
Read SKILL.md - the Phase 6 section ("Phase 6: Verify"). Resolve SKILL.md via the documented fallback list above; do NOT assume any single install layout. Follow the incremental verification steps (6.1 through 6.5).
Step 6.1: If quality/mechanical/verify.sh exists, run it. Record exit code.
Step 6.2: Run quality_gate.py. Locate it via the same fallback list used for SKILL.md (`quality_gate.py` sits in the same directory as SKILL.md in every install layout — e.g., `quality_gate.py`, `.claude/skills/quality-playbook/quality_gate.py`, `.github/skills/quality_gate.py`, `.cursor/skills/quality-playbook/quality_gate.py`, `.continue/skills/quality-playbook/quality_gate.py`, `.github/skills/quality-playbook/quality_gate.py`). Then run:
python3 <resolved_quality_gate_path> .
Read the output carefully. For every FAIL result, fix the issue:
- Missing regression-test patches: generate quality/patches/BUG-NNN-regression-test.patch
- Missing inline diffs in writeups: add a ```diff block
- Non-canonical JSON fields: fix tdd-results.json (use 'id' not 'bug_id', etc.)
- Missing files: create them
After fixing all FAILs, run quality_gate.py again. Repeat until 0 FAIL.
Save final output to quality/results/quality-gate.log.
Step 6.3: Run functional tests if a test runner is available.
Step 6.4: File-by-file verification checklist (read one file at a time, check, move on).
Step 6.5: Metadata consistency check.
Append each step's result to quality/results/phase6-verification.log.
Mark Phase 6 complete in PROGRESS.md (use the checkbox format `- [x] Phase 6 - Verify` — do NOT switch to a table).
@@ -0,0 +1 @@
{skill_fallback_guide} Execute the quality playbook for this project.{seed_instruction}