Files
awesome-copilot/skills/quality-playbook/references/run_state_schema.md
T
Andrew Stellman b8441d218b Update quality-playbook skill to v1.5.6 + add agent (#1402)
Rebuilds branch from upstream/staged (was previously merged from
upstream/main, which brought in materialized plugin files that
fail Check Plugin Structure on PRs targeting staged).

Changes vs. staged:
- Update skills/quality-playbook/ to v1.5.6 (31 bundled assets:
  SKILL.md + LICENSE.txt + 16 references/ + 9 phase_prompts/ +
  3 agents/ + bin/citation_verifier.py + quality_gate.py).
- Add agents/quality-playbook.agent.md (top-level orchestrator).
  name: quality-playbook (validator-compliant).
- Update docs/README.skills.md quality-playbook row description
  + bundled-assets list to v1.5.6.
- Fix 'unparseable' → 'unparsable' in quality_gate.py (5 instances;
  codespell preference, both spellings valid).

Closes the v1.4.0 → v1.5.6 update in a single clean commit on top of
upstream/staged. The preserved backup branch backup-bedbe84-pre-rebuild
(SHA bedbe848fa3c0f0eda8e653c42b599a17dd2e354) holds the prior history for reference.
2026-05-11 11:31:53 +10:00

367 lines
18 KiB
Markdown

# Run-State Schema (v1.5.6)
*Authoritative schema for `quality/run_state.jsonl`, `quality/PROGRESS.md`, and `Calibration Cycles/<cycle>/run_state.jsonl`. The playbook AI writes these files directly via the file-tool layer; the orchestrator AI reads them to drive multi-benchmark calibration cycles.*
*Companion to: `docs/design/QPB_v1.5.5_Design.md` ("Design — Run-state event taxonomy" section).*
---
## File locations and ownership
- `<benchmark>/quality/run_state.jsonl` — per-run event log. Append-only. Written by the AI executing the playbook.
- `<benchmark>/quality/PROGRESS.md` — human-readable run status. Atomically rewritten by the AI on each event.
- `Calibration Cycles/<cycle>/run_state.jsonl` — cycle-level event log. Append-only. Written by the orchestrator AI.
All three live in the bind-mounted workspace owned by the user. The AI writes via Edit/Write file tools, never via shell redirection or `tee` (which routes through a different UID layer in some sandbox runtimes).
---
## Schema versioning
Every `run_state.jsonl` opens with an `_index` event recording `schema_version`. Current version: `"1.5.6"`. Schema bumps preserve backward compatibility — older files remain readable by newer parsers. Breaking schema changes bump the major number.
---
## Required fields (every event)
Every event object MUST have:
- `ts` — ISO 8601 UTC timestamp with `Z` suffix (e.g. `"2026-05-15T14:32:01Z"`). Sub-second precision allowed but not required.
- `event` — string, the event-type name. Must match one of the names listed in `_index.event_types`.
Events MAY have additional fields per their type's spec below. Unknown fields are tolerated by readers (forward-compatible).
---
## Per-run events (`<benchmark>/quality/run_state.jsonl`)
### `_index`
ALWAYS the first line. Records schema metadata.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | Always `"_index"` |
| `ts` | string | yes | ISO 8601 UTC |
| `schema_version` | string | yes | `"1.5.6"` |
| `event_types` | array of string | yes | Every event type this file uses |
| `benchmark` | string | yes | E.g. `"chi-1.3.45"`, `"virtio-1.5.1"` |
| `lever_state` | string | yes | E.g. `"pre-pattern7"`, `"post-pattern7"`, `"baseline"` |
| `started_at` | string | yes | ISO 8601 UTC, equals `ts` of this event |
### `run_start`
Marks the beginning of a playbook run.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"run_start"` |
| `ts` | string | yes | |
| `runner` | string | yes | One of `"claude"`, `"codex"`, `"copilot"`, `"cursor"` |
| `playbook_version` | string | yes | E.g. `"1.5.6-pre"`, `"1.5.6"` (matches `bin.benchmark_lib.RELEASE_VERSION`) |
| `target_path` | string | yes | Relative path to benchmark target |
### `phase_start`
Marks the beginning of one of the six playbook phases.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"phase_start"` |
| `ts` | string | yes | |
| `phase` | integer | yes | 1, 2, 3, 4, 5, or 6 |
### `pattern_walked`
Phase 1 only. Records that one of the seven exploration patterns was walked.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"pattern_walked"` |
| `ts` | string | yes | |
| `phase` | integer | yes | Always 1 |
| `pattern` | integer | yes | 1 through 7 |
| `findings_count` | integer | yes | Number of findings produced by this pattern |
| `duration_seconds` | number | optional | Wall-clock for this pattern walk |
### `pass_started` / `pass_ended`
Phase 4 only. Records start/end of one of the four skill-derivation passes.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"pass_started"` or `"pass_ended"` |
| `ts` | string | yes | |
| `phase` | integer | yes | Always 4 |
| `pass` | string | yes | One of `"A"`, `"B"`, `"C"`, `"D"` |
| `output_artifact` | string | optional | Relative path to pass artifact (on `pass_ended`) |
### `finding_logged`
Records that a finding (skill-divergence, code-bug, etc.) was logged in the current phase.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"finding_logged"` |
| `ts` | string | yes | |
| `phase` | integer | yes | 1-6 |
| `finding_id` | string | yes | E.g. `"BUG-007"`, `"REQ-042"` |
| `category` | string | yes | E.g. `"code-bug"`, `"skill-divergence"`, `"missing-citation"`, `"prose-to-code-mismatch"` |
### `artifact_written`
Records that an artifact file was produced/updated.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"artifact_written"` |
| `ts` | string | yes | |
| `relative_path` | string | yes | Path relative to benchmark target (e.g. `"quality/EXPLORATION.md"`) |
| `byte_size` | integer | optional | Size of the file at write time |
| `line_count` | integer | optional | Line count |
### `gate_check`
Records the outcome of a single quality-gate check.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"gate_check"` |
| `ts` | string | yes | |
| `gate_name` | string | yes | Identifier from `quality_gate.py` |
| `verdict` | string | yes | One of `"pass"`, `"fail"`, `"warn"`, `"skip"` |
| `reason` | string | optional | Human-readable explanation |
### `phase_end`
Marks the end of a phase. Cross-validated against the phase's expected artifacts before being written (see "Cross-validation rules" below).
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"phase_end"` |
| `ts` | string | yes | |
| `phase` | integer | yes | 1-6 |
| `key_counts` | object | yes | Phase-specific counts (see below) |
| `artifacts_produced` | array of string | yes | Relative paths of artifacts produced this phase |
| `duration_seconds` | number | optional | Wall-clock for the whole phase |
`key_counts` per phase:
- Phase 1: `{"findings_total": N, "patterns_walked": M}` (M should be 7 for full Phase 1)
- Phase 2: `{"findings_promoted": N, "findings_dropped": M}`
- Phase 3: `{"bugs_identified": N, "bug_writeups": M}`
- Phase 4: `{"req_count": N, "uc_count": M, "passes_complete": K}` (K should be 4)
- Phase 5: `{"gate_checks_total": N, "gate_failures": M}`
- Phase 6: `{"bugs_md_count": N, "gate_verdict": "pass|fail|partial"}`
### `error`
Records an error during the run.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"error"` |
| `ts` | string | yes | |
| `phase` | integer | optional | If error is phase-scoped |
| `message` | string | yes | Human-readable description |
| `recoverable` | boolean | yes | If true, the run will retry the affected phase; if false, the run is aborting |
### `documentation_state`
v1.5.6+. Records the documentation-availability state at Phase 1 entry. Currently the only emitted state is `"code_only"`, indicating that `reference_docs/` and `reference_docs/cite/` carry no recognized plaintext content (`.md` or `.txt`) and Phase 1 is proceeding in code-only mode (see `references/code-only-mode.md`). A `"with_docs"` value is reserved for future explicit emission; today the absence of a `documentation_state` event implies docs were present.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"documentation_state"` |
| `ts` | string | yes | |
| `state` | string | yes | Currently `"code_only"`. Future values may include `"with_docs"`. |
| `reason` | string | yes | Free-form (e.g. `"reference_docs/ empty"`) |
When `documentation_state state="code_only"` is emitted, the playbook also prepends a "Documentation status: code-only mode" section to `quality/EXPLORATION.md` and adds a "Documentation state: code_only" line to `quality/PROGRESS.md` so the downgrade is visible to anyone reading either artifact. New runs adding the `documentation_state` event must include it in the `_index.event_types` list.
### `aborted_missing_docs`
v1.5.6+. Records that the run aborted at Phase 1 entry because `--require-docs` was set and `reference_docs/` was empty. Mutually exclusive with `documentation_state state="code_only"` for the same Phase 1 entry — `--require-docs` is the opt-IN abort path; the absence of the flag preserves the documented code-only-mode downgrade. After this event the runner returns non-zero without invoking any LLM work, so no `phase_start phase=1` is recorded.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"aborted_missing_docs"` |
| `ts` | string | yes | |
| `reason` | string | yes | Free-form (e.g. `"reference_docs/ empty and --require-docs set"`) |
When `aborted_missing_docs` is emitted, the playbook also writes an `ERROR: aborted_missing_docs — <reason>` block to `quality/PROGRESS.md` so the abort is visible without reading the JSONL. New runs that pass `--require-docs` against an empty `reference_docs/` must include `aborted_missing_docs` in the `_index.event_types` list.
### `run_end`
Marks the end of the playbook run.
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"run_end"` |
| `ts` | string | yes | |
| `status` | string | yes | One of `"success"`, `"aborted"`, `"failed"` |
| `total_findings` | integer | optional | Sum across all phases |
| `final_verdict` | string | optional | The Phase 6 gate verdict |
---
## Cycle-level events (`Calibration Cycles/<cycle>/run_state.jsonl`)
### `_index` (cycle-level)
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"_index"` |
| `ts` | string | yes | |
| `schema_version` | string | yes | `"1.5.6"` |
| `event_types` | array of string | yes | |
| `cycle_name` | string | yes | E.g. `"2026-05-15-pattern7-displacement-recovery"` |
| `lever_under_test` | string | yes | E.g. `"lever-1-exploration-breadth-depth"` |
| `benchmarks` | array of string | yes | Cycle's pinned benchmark list |
| `iteration` | integer | yes | Iteration ordinal (1, 2, or 3 — see iterate-cap) |
### `cycle_start`
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"cycle_start"` |
| `ts` | string | yes | |
| `hypothesis` | string | yes | The cycle's testable hypothesis |
| `noise_floor_threshold` | number | yes | Recall delta below this is treated as noise (default 0.05) |
### `benchmark_start`
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"benchmark_start"` |
| `ts` | string | yes | |
| `benchmark` | string | yes | |
| `lever_state` | string | yes | `"pre-lever"` or `"post-lever"` |
### `lever_change_applied`
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"lever_change_applied"` |
| `ts` | string | yes | |
| `lever_id` | string | yes | E.g. `"lever-1-exploration-breadth-depth"` |
| `files_changed` | array of string | yes | Paths relative to QPB repo root |
| `commit_sha` | string | yes | Commit SHA on the implementing branch |
| `description` | string | yes | What the change is (e.g. `"Pattern 7 budget cap 3-5 → 2-3"`) |
### `lever_change_reverted`
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"lever_change_reverted"` |
| `ts` | string | yes | |
| `files_changed` | array of string | yes | |
| `commit_sha` | string | optional | Null/absent if revert is uncommitted |
### `benchmark_end`
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"benchmark_end"` |
| `ts` | string | yes | |
| `benchmark` | string | yes | |
| `lever_state` | string | yes | |
| `recall` | number | yes | 0.0-1.0 |
| `bugs_found` | array of string | yes | Bug IDs found this run |
| `bugs_missed` | array of string | yes | Bug IDs in baseline missed this run |
| `historical_baseline_path` | string | yes | Path to the baseline BUGS.md used for recall computation |
### `cycle_end`
| Field | Type | Required | Notes |
|---|---|---|---|
| `event` | string | yes | `"cycle_end"` |
| `ts` | string | yes | |
| `verdict` | string | yes | One of `"ship"`, `"revert"`, `"iterate"`, `"halt-iterate-cap"` |
| `recall_before` | object | yes | Per-benchmark recall before lever change |
| `recall_after` | object | yes | Per-benchmark recall after lever change |
| `delta` | object | yes | Per-benchmark delta (recall_after - recall_before) |
| `cross_benchmark_check` | object | yes | `{"clean": bool, "regressions": [list of bench/bug pairs that regressed]}` |
---
## Cross-validation rules (per `phase_end`)
The AI verifies these conditions before appending a `phase_end` event. If any check fails, the AI appends an `error` event with `recoverable: true` and re-runs the failing phase.
| Phase | Required conditions |
|---|---|
| 1 | `quality/EXPLORATION.md` exists, ≥ 120 lines (aligned with the Phase 2 startup gate in `bin/run_playbook.check_phase_gate`), contains at least one finding section (regex `^##\s+(Finding\|Open Exploration Findings\|\d+\.)` — accepts `## Finding ...`, the SKILL-prescribed exact heading `## Open Exploration Findings`, and numbered `## N.` headings) |
| 2 | All nine fixed-name Generate-contract artifacts exist non-empty under `quality/`: `REQUIREMENTS.md`, `QUALITY.md`, `CONTRACTS.md`, `COVERAGE_MATRIX.md`, `COMPLETENESS_REPORT.md`, `RUN_CODE_REVIEW.md`, `RUN_INTEGRATION_TESTS.md`, `RUN_SPEC_AUDIT.md`, `RUN_TDD_TESTS.md`. Plus at least one non-empty `quality/test_functional.<ext>` (extension varies by primary language). Pre-v1.5.6 this row described the v1.5.5-design triage model (`EXPLORATION_MERGED.md` / `triage.md`); that mapping was never adopted by shipped SKILL.md / orchestrator_protocol.md / agent files, which always documented Phase 2 as Generate. |
| 3 | `quality/code_reviews/` directory contains at least one review file. If `quality/BUGS.md` has any `### BUG-` heading, `quality/patches/` contains at least one `BUG-*-regression-test.patch` file. Pre-v1.5.6 this row checked `quality/RUN_CODE_REVIEW.md` (a Phase 2 Generate output, not a Phase 3 review result) — same v1.5.5-design / shipped-Generate drift class as the Phase 2 row. Cluster B reconciled. |
| 4 | `quality/spec_audits/` directory contains at least one `*-triage.md` file AND at least one `*-auditor-*.md` file (per orchestrator_protocol.md naming convention). When neither name pattern matches, the validator falls back to a weaker "≥2 files" check — older bootstrap runs with arbitrary `.md` names still pass; the gate at Phase 6 enforces deeper conformance. Pre-v1.5.6 this row checked `quality/REQUIREMENTS.md` + `COVERAGE_MATRIX.md` (Phase 2 outputs) — same v1.5.5-design drift class. Cluster B reconciled. |
| 5 | If `quality/BUGS.md` has confirmed `### BUG-` entries: `quality/results/tdd-results.json` exists non-empty; for every confirmed bug, `quality/writeups/BUG-NNN.md` exists AND `quality/results/BUG-NNN.red.log` exists. With no confirmed bugs the row is vacuously satisfied. Pre-v1.5.6 this row checked `quality/results/quality-gate.log` (a Phase 6 output) — same v1.5.5-design drift class. Cluster B reconciled. |
| 6 | `quality/results/quality-gate.log` exists non-empty AND `quality/PROGRESS.md` contains a `Terminal Gate Verification` section (the orchestrator-protocol marker that Phase 6 ran the script-verified gate to completion). Pre-v1.5.6 this row checked `quality/BUGS.md` + `quality/INDEX.md` — BUGS.md is a Phase 3 output, INDEX.md was never adopted in the shipped contract. Same v1.5.5-design drift class. Cluster B reconciled. |
The `run_end` event additionally requires: all 6 `phase_end` events present in the log; the final BUGS.md count matches `phase_end phase=6 key_counts.bugs_md_count`.
---
## Resume semantics
When an AI session starts on a run directory:
1. If `quality/run_state.jsonl` does not exist: fresh run. Write `_index` + `run_start` + `phase_start phase=1`.
2. If it exists: read all events. Find the last `phase_start` not followed by a matching `phase_end`. Call it the "in-progress phase".
3. Verify the in-progress phase's expected artifacts (per cross-validation rules above):
- If artifacts complete: append the missing `phase_end` event and proceed to the next phase. Note: this is the "session crashed mid-phase but the work is done" recovery path.
- If artifacts incomplete: re-run that phase from scratch. The prior session left a partial state that can't be safely resumed.
4. If all 6 `phase_end` events are present but no `run_end`: append `run_end status=success` and finalize.
The policy is "trust artifacts more than events." If events claim phase 4 done but `REQUIREMENTS.md` doesn't exist, the AI re-runs phase 4. If events stop mid-phase but artifacts are complete, the AI catches up the events.
---
## PROGRESS.md format
Atomically rewritten on every event. Markdown.
```markdown
# QPB Run Progress
**Started:** 2026-05-15T14:32:01Z **Benchmark:** chi-1.5.1 **Lever:** post-pattern7
**Runner:** claude **Playbook version:** 1.5.6
## Phases
- [x] Phase 1 — Explore (10:10, 12 findings, patterns 1-7 walked)
- [x] Phase 2 — Generate (0:42, 9 artifacts produced)
- [x] Phase 3 — Code Review (15:31, 6 bugs identified)
- [x] Phase 4 — Spec Audit (3 auditors, 1 triage)
- [ ] Phase 5 — Reconciliation *(in progress, started 14:58:31Z)*
- [ ] Phase 6 — Verify
## Recent events (last 10)
- 2026-05-15T14:58:31Z — phase_start phase=5
- 2026-05-15T14:58:30Z — phase_end phase=4 passes=[A,B,C,D] req_count=89
- 2026-05-15T14:42:11Z — phase_end phase=1 findings=12
## Artifacts produced
- quality/EXPLORATION.md (12,034 bytes)
- quality/REQUIREMENTS.md (28,891 bytes)
- quality/COVERAGE_MATRIX.md (3,022 bytes)
```
Sections (header, phase checklist, recent events, artifacts produced) are required. Phase checklist uses `[x]` for complete phases (with summary stats), `[ ]` for incomplete, with in-progress phase noted explicitly with start time. Recent events shows last 10 event lines from `run_state.jsonl` in human-readable form. Artifacts produced shows files written this run with byte sizes.
---
## Format invariants (enforced by `bin/run_state_lib.py` validators)
1. `_index` is line 1.
2. Every line is valid JSON (one object per line).
3. Every event has `ts` and `event` fields.
4. Every `event` value appears in `_index.event_types`.
5. Append-only: events are added, never edited. Editing a prior event is a schema violation.
6. `phase_start` and `phase_end` events for a given phase appear at most once per run (no out-of-order or duplicate phase markers).
7. `run_start` is the second line (after `_index`); `run_end` is the last line if the run completed.
Validators are read-only checks. They surface violations as findings; they don't auto-correct.