mirror of
https://github.com/github/awesome-copilot.git
synced 2026-05-15 19:21:45 +00:00
b8441d218b
Rebuilds branch from upstream/staged (was previously merged from upstream/main, which brought in materialized plugin files that fail Check Plugin Structure on PRs targeting staged). Changes vs. staged: - Update skills/quality-playbook/ to v1.5.6 (31 bundled assets: SKILL.md + LICENSE.txt + 16 references/ + 9 phase_prompts/ + 3 agents/ + bin/citation_verifier.py + quality_gate.py). - Add agents/quality-playbook.agent.md (top-level orchestrator). name: quality-playbook (validator-compliant). - Update docs/README.skills.md quality-playbook row description + bundled-assets list to v1.5.6. - Fix 'unparseable' → 'unparsable' in quality_gate.py (5 instances; codespell preference, both spellings valid). Closes the v1.4.0 → v1.5.6 update in a single clean commit on top of upstream/staged. The preserved backup branch backup-bedbe84-pre-rebuild (SHA bedbe848fa3c0f0eda8e653c42b599a17dd2e354) holds the prior history for reference.
367 lines
18 KiB
Markdown
367 lines
18 KiB
Markdown
# Run-State Schema (v1.5.6)
|
|
|
|
*Authoritative schema for `quality/run_state.jsonl`, `quality/PROGRESS.md`, and `Calibration Cycles/<cycle>/run_state.jsonl`. The playbook AI writes these files directly via the file-tool layer; the orchestrator AI reads them to drive multi-benchmark calibration cycles.*
|
|
|
|
*Companion to: `docs/design/QPB_v1.5.5_Design.md` ("Design — Run-state event taxonomy" section).*
|
|
|
|
---
|
|
|
|
## File locations and ownership
|
|
|
|
- `<benchmark>/quality/run_state.jsonl` — per-run event log. Append-only. Written by the AI executing the playbook.
|
|
- `<benchmark>/quality/PROGRESS.md` — human-readable run status. Atomically rewritten by the AI on each event.
|
|
- `Calibration Cycles/<cycle>/run_state.jsonl` — cycle-level event log. Append-only. Written by the orchestrator AI.
|
|
|
|
All three live in the bind-mounted workspace owned by the user. The AI writes via Edit/Write file tools, never via shell redirection or `tee` (which routes through a different UID layer in some sandbox runtimes).
|
|
|
|
---
|
|
|
|
## Schema versioning
|
|
|
|
Every `run_state.jsonl` opens with an `_index` event recording `schema_version`. Current version: `"1.5.6"`. Schema bumps preserve backward compatibility — older files remain readable by newer parsers. Breaking schema changes bump the major number.
|
|
|
|
---
|
|
|
|
## Required fields (every event)
|
|
|
|
Every event object MUST have:
|
|
|
|
- `ts` — ISO 8601 UTC timestamp with `Z` suffix (e.g. `"2026-05-15T14:32:01Z"`). Sub-second precision allowed but not required.
|
|
- `event` — string, the event-type name. Must match one of the names listed in `_index.event_types`.
|
|
|
|
Events MAY have additional fields per their type's spec below. Unknown fields are tolerated by readers (forward-compatible).
|
|
|
|
---
|
|
|
|
## Per-run events (`<benchmark>/quality/run_state.jsonl`)
|
|
|
|
### `_index`
|
|
|
|
ALWAYS the first line. Records schema metadata.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | Always `"_index"` |
|
|
| `ts` | string | yes | ISO 8601 UTC |
|
|
| `schema_version` | string | yes | `"1.5.6"` |
|
|
| `event_types` | array of string | yes | Every event type this file uses |
|
|
| `benchmark` | string | yes | E.g. `"chi-1.3.45"`, `"virtio-1.5.1"` |
|
|
| `lever_state` | string | yes | E.g. `"pre-pattern7"`, `"post-pattern7"`, `"baseline"` |
|
|
| `started_at` | string | yes | ISO 8601 UTC, equals `ts` of this event |
|
|
|
|
### `run_start`
|
|
|
|
Marks the beginning of a playbook run.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"run_start"` |
|
|
| `ts` | string | yes | |
|
|
| `runner` | string | yes | One of `"claude"`, `"codex"`, `"copilot"`, `"cursor"` |
|
|
| `playbook_version` | string | yes | E.g. `"1.5.6-pre"`, `"1.5.6"` (matches `bin.benchmark_lib.RELEASE_VERSION`) |
|
|
| `target_path` | string | yes | Relative path to benchmark target |
|
|
|
|
### `phase_start`
|
|
|
|
Marks the beginning of one of the six playbook phases.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"phase_start"` |
|
|
| `ts` | string | yes | |
|
|
| `phase` | integer | yes | 1, 2, 3, 4, 5, or 6 |
|
|
|
|
### `pattern_walked`
|
|
|
|
Phase 1 only. Records that one of the seven exploration patterns was walked.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"pattern_walked"` |
|
|
| `ts` | string | yes | |
|
|
| `phase` | integer | yes | Always 1 |
|
|
| `pattern` | integer | yes | 1 through 7 |
|
|
| `findings_count` | integer | yes | Number of findings produced by this pattern |
|
|
| `duration_seconds` | number | optional | Wall-clock for this pattern walk |
|
|
|
|
### `pass_started` / `pass_ended`
|
|
|
|
Phase 4 only. Records start/end of one of the four skill-derivation passes.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"pass_started"` or `"pass_ended"` |
|
|
| `ts` | string | yes | |
|
|
| `phase` | integer | yes | Always 4 |
|
|
| `pass` | string | yes | One of `"A"`, `"B"`, `"C"`, `"D"` |
|
|
| `output_artifact` | string | optional | Relative path to pass artifact (on `pass_ended`) |
|
|
|
|
### `finding_logged`
|
|
|
|
Records that a finding (skill-divergence, code-bug, etc.) was logged in the current phase.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"finding_logged"` |
|
|
| `ts` | string | yes | |
|
|
| `phase` | integer | yes | 1-6 |
|
|
| `finding_id` | string | yes | E.g. `"BUG-007"`, `"REQ-042"` |
|
|
| `category` | string | yes | E.g. `"code-bug"`, `"skill-divergence"`, `"missing-citation"`, `"prose-to-code-mismatch"` |
|
|
|
|
### `artifact_written`
|
|
|
|
Records that an artifact file was produced/updated.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"artifact_written"` |
|
|
| `ts` | string | yes | |
|
|
| `relative_path` | string | yes | Path relative to benchmark target (e.g. `"quality/EXPLORATION.md"`) |
|
|
| `byte_size` | integer | optional | Size of the file at write time |
|
|
| `line_count` | integer | optional | Line count |
|
|
|
|
### `gate_check`
|
|
|
|
Records the outcome of a single quality-gate check.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"gate_check"` |
|
|
| `ts` | string | yes | |
|
|
| `gate_name` | string | yes | Identifier from `quality_gate.py` |
|
|
| `verdict` | string | yes | One of `"pass"`, `"fail"`, `"warn"`, `"skip"` |
|
|
| `reason` | string | optional | Human-readable explanation |
|
|
|
|
### `phase_end`
|
|
|
|
Marks the end of a phase. Cross-validated against the phase's expected artifacts before being written (see "Cross-validation rules" below).
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"phase_end"` |
|
|
| `ts` | string | yes | |
|
|
| `phase` | integer | yes | 1-6 |
|
|
| `key_counts` | object | yes | Phase-specific counts (see below) |
|
|
| `artifacts_produced` | array of string | yes | Relative paths of artifacts produced this phase |
|
|
| `duration_seconds` | number | optional | Wall-clock for the whole phase |
|
|
|
|
`key_counts` per phase:
|
|
|
|
- Phase 1: `{"findings_total": N, "patterns_walked": M}` (M should be 7 for full Phase 1)
|
|
- Phase 2: `{"findings_promoted": N, "findings_dropped": M}`
|
|
- Phase 3: `{"bugs_identified": N, "bug_writeups": M}`
|
|
- Phase 4: `{"req_count": N, "uc_count": M, "passes_complete": K}` (K should be 4)
|
|
- Phase 5: `{"gate_checks_total": N, "gate_failures": M}`
|
|
- Phase 6: `{"bugs_md_count": N, "gate_verdict": "pass|fail|partial"}`
|
|
|
|
### `error`
|
|
|
|
Records an error during the run.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"error"` |
|
|
| `ts` | string | yes | |
|
|
| `phase` | integer | optional | If error is phase-scoped |
|
|
| `message` | string | yes | Human-readable description |
|
|
| `recoverable` | boolean | yes | If true, the run will retry the affected phase; if false, the run is aborting |
|
|
|
|
### `documentation_state`
|
|
|
|
v1.5.6+. Records the documentation-availability state at Phase 1 entry. Currently the only emitted state is `"code_only"`, indicating that `reference_docs/` and `reference_docs/cite/` carry no recognized plaintext content (`.md` or `.txt`) and Phase 1 is proceeding in code-only mode (see `references/code-only-mode.md`). A `"with_docs"` value is reserved for future explicit emission; today the absence of a `documentation_state` event implies docs were present.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"documentation_state"` |
|
|
| `ts` | string | yes | |
|
|
| `state` | string | yes | Currently `"code_only"`. Future values may include `"with_docs"`. |
|
|
| `reason` | string | yes | Free-form (e.g. `"reference_docs/ empty"`) |
|
|
|
|
When `documentation_state state="code_only"` is emitted, the playbook also prepends a "Documentation status: code-only mode" section to `quality/EXPLORATION.md` and adds a "Documentation state: code_only" line to `quality/PROGRESS.md` so the downgrade is visible to anyone reading either artifact. New runs adding the `documentation_state` event must include it in the `_index.event_types` list.
|
|
|
|
### `aborted_missing_docs`
|
|
|
|
v1.5.6+. Records that the run aborted at Phase 1 entry because `--require-docs` was set and `reference_docs/` was empty. Mutually exclusive with `documentation_state state="code_only"` for the same Phase 1 entry — `--require-docs` is the opt-IN abort path; the absence of the flag preserves the documented code-only-mode downgrade. After this event the runner returns non-zero without invoking any LLM work, so no `phase_start phase=1` is recorded.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"aborted_missing_docs"` |
|
|
| `ts` | string | yes | |
|
|
| `reason` | string | yes | Free-form (e.g. `"reference_docs/ empty and --require-docs set"`) |
|
|
|
|
When `aborted_missing_docs` is emitted, the playbook also writes an `ERROR: aborted_missing_docs — <reason>` block to `quality/PROGRESS.md` so the abort is visible without reading the JSONL. New runs that pass `--require-docs` against an empty `reference_docs/` must include `aborted_missing_docs` in the `_index.event_types` list.
|
|
|
|
### `run_end`
|
|
|
|
Marks the end of the playbook run.
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"run_end"` |
|
|
| `ts` | string | yes | |
|
|
| `status` | string | yes | One of `"success"`, `"aborted"`, `"failed"` |
|
|
| `total_findings` | integer | optional | Sum across all phases |
|
|
| `final_verdict` | string | optional | The Phase 6 gate verdict |
|
|
|
|
---
|
|
|
|
## Cycle-level events (`Calibration Cycles/<cycle>/run_state.jsonl`)
|
|
|
|
### `_index` (cycle-level)
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"_index"` |
|
|
| `ts` | string | yes | |
|
|
| `schema_version` | string | yes | `"1.5.6"` |
|
|
| `event_types` | array of string | yes | |
|
|
| `cycle_name` | string | yes | E.g. `"2026-05-15-pattern7-displacement-recovery"` |
|
|
| `lever_under_test` | string | yes | E.g. `"lever-1-exploration-breadth-depth"` |
|
|
| `benchmarks` | array of string | yes | Cycle's pinned benchmark list |
|
|
| `iteration` | integer | yes | Iteration ordinal (1, 2, or 3 — see iterate-cap) |
|
|
|
|
### `cycle_start`
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"cycle_start"` |
|
|
| `ts` | string | yes | |
|
|
| `hypothesis` | string | yes | The cycle's testable hypothesis |
|
|
| `noise_floor_threshold` | number | yes | Recall delta below this is treated as noise (default 0.05) |
|
|
|
|
### `benchmark_start`
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"benchmark_start"` |
|
|
| `ts` | string | yes | |
|
|
| `benchmark` | string | yes | |
|
|
| `lever_state` | string | yes | `"pre-lever"` or `"post-lever"` |
|
|
|
|
### `lever_change_applied`
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"lever_change_applied"` |
|
|
| `ts` | string | yes | |
|
|
| `lever_id` | string | yes | E.g. `"lever-1-exploration-breadth-depth"` |
|
|
| `files_changed` | array of string | yes | Paths relative to QPB repo root |
|
|
| `commit_sha` | string | yes | Commit SHA on the implementing branch |
|
|
| `description` | string | yes | What the change is (e.g. `"Pattern 7 budget cap 3-5 → 2-3"`) |
|
|
|
|
### `lever_change_reverted`
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"lever_change_reverted"` |
|
|
| `ts` | string | yes | |
|
|
| `files_changed` | array of string | yes | |
|
|
| `commit_sha` | string | optional | Null/absent if revert is uncommitted |
|
|
|
|
### `benchmark_end`
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"benchmark_end"` |
|
|
| `ts` | string | yes | |
|
|
| `benchmark` | string | yes | |
|
|
| `lever_state` | string | yes | |
|
|
| `recall` | number | yes | 0.0-1.0 |
|
|
| `bugs_found` | array of string | yes | Bug IDs found this run |
|
|
| `bugs_missed` | array of string | yes | Bug IDs in baseline missed this run |
|
|
| `historical_baseline_path` | string | yes | Path to the baseline BUGS.md used for recall computation |
|
|
|
|
### `cycle_end`
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `event` | string | yes | `"cycle_end"` |
|
|
| `ts` | string | yes | |
|
|
| `verdict` | string | yes | One of `"ship"`, `"revert"`, `"iterate"`, `"halt-iterate-cap"` |
|
|
| `recall_before` | object | yes | Per-benchmark recall before lever change |
|
|
| `recall_after` | object | yes | Per-benchmark recall after lever change |
|
|
| `delta` | object | yes | Per-benchmark delta (recall_after - recall_before) |
|
|
| `cross_benchmark_check` | object | yes | `{"clean": bool, "regressions": [list of bench/bug pairs that regressed]}` |
|
|
|
|
---
|
|
|
|
## Cross-validation rules (per `phase_end`)
|
|
|
|
The AI verifies these conditions before appending a `phase_end` event. If any check fails, the AI appends an `error` event with `recoverable: true` and re-runs the failing phase.
|
|
|
|
| Phase | Required conditions |
|
|
|---|---|
|
|
| 1 | `quality/EXPLORATION.md` exists, ≥ 120 lines (aligned with the Phase 2 startup gate in `bin/run_playbook.check_phase_gate`), contains at least one finding section (regex `^##\s+(Finding\|Open Exploration Findings\|\d+\.)` — accepts `## Finding ...`, the SKILL-prescribed exact heading `## Open Exploration Findings`, and numbered `## N.` headings) |
|
|
| 2 | All nine fixed-name Generate-contract artifacts exist non-empty under `quality/`: `REQUIREMENTS.md`, `QUALITY.md`, `CONTRACTS.md`, `COVERAGE_MATRIX.md`, `COMPLETENESS_REPORT.md`, `RUN_CODE_REVIEW.md`, `RUN_INTEGRATION_TESTS.md`, `RUN_SPEC_AUDIT.md`, `RUN_TDD_TESTS.md`. Plus at least one non-empty `quality/test_functional.<ext>` (extension varies by primary language). Pre-v1.5.6 this row described the v1.5.5-design triage model (`EXPLORATION_MERGED.md` / `triage.md`); that mapping was never adopted by shipped SKILL.md / orchestrator_protocol.md / agent files, which always documented Phase 2 as Generate. |
|
|
| 3 | `quality/code_reviews/` directory contains at least one review file. If `quality/BUGS.md` has any `### BUG-` heading, `quality/patches/` contains at least one `BUG-*-regression-test.patch` file. Pre-v1.5.6 this row checked `quality/RUN_CODE_REVIEW.md` (a Phase 2 Generate output, not a Phase 3 review result) — same v1.5.5-design / shipped-Generate drift class as the Phase 2 row. Cluster B reconciled. |
|
|
| 4 | `quality/spec_audits/` directory contains at least one `*-triage.md` file AND at least one `*-auditor-*.md` file (per orchestrator_protocol.md naming convention). When neither name pattern matches, the validator falls back to a weaker "≥2 files" check — older bootstrap runs with arbitrary `.md` names still pass; the gate at Phase 6 enforces deeper conformance. Pre-v1.5.6 this row checked `quality/REQUIREMENTS.md` + `COVERAGE_MATRIX.md` (Phase 2 outputs) — same v1.5.5-design drift class. Cluster B reconciled. |
|
|
| 5 | If `quality/BUGS.md` has confirmed `### BUG-` entries: `quality/results/tdd-results.json` exists non-empty; for every confirmed bug, `quality/writeups/BUG-NNN.md` exists AND `quality/results/BUG-NNN.red.log` exists. With no confirmed bugs the row is vacuously satisfied. Pre-v1.5.6 this row checked `quality/results/quality-gate.log` (a Phase 6 output) — same v1.5.5-design drift class. Cluster B reconciled. |
|
|
| 6 | `quality/results/quality-gate.log` exists non-empty AND `quality/PROGRESS.md` contains a `Terminal Gate Verification` section (the orchestrator-protocol marker that Phase 6 ran the script-verified gate to completion). Pre-v1.5.6 this row checked `quality/BUGS.md` + `quality/INDEX.md` — BUGS.md is a Phase 3 output, INDEX.md was never adopted in the shipped contract. Same v1.5.5-design drift class. Cluster B reconciled. |
|
|
|
|
The `run_end` event additionally requires: all 6 `phase_end` events present in the log; the final BUGS.md count matches `phase_end phase=6 key_counts.bugs_md_count`.
|
|
|
|
---
|
|
|
|
## Resume semantics
|
|
|
|
When an AI session starts on a run directory:
|
|
|
|
1. If `quality/run_state.jsonl` does not exist: fresh run. Write `_index` + `run_start` + `phase_start phase=1`.
|
|
2. If it exists: read all events. Find the last `phase_start` not followed by a matching `phase_end`. Call it the "in-progress phase".
|
|
3. Verify the in-progress phase's expected artifacts (per cross-validation rules above):
|
|
- If artifacts complete: append the missing `phase_end` event and proceed to the next phase. Note: this is the "session crashed mid-phase but the work is done" recovery path.
|
|
- If artifacts incomplete: re-run that phase from scratch. The prior session left a partial state that can't be safely resumed.
|
|
4. If all 6 `phase_end` events are present but no `run_end`: append `run_end status=success` and finalize.
|
|
|
|
The policy is "trust artifacts more than events." If events claim phase 4 done but `REQUIREMENTS.md` doesn't exist, the AI re-runs phase 4. If events stop mid-phase but artifacts are complete, the AI catches up the events.
|
|
|
|
---
|
|
|
|
## PROGRESS.md format
|
|
|
|
Atomically rewritten on every event. Markdown.
|
|
|
|
```markdown
|
|
# QPB Run Progress
|
|
|
|
**Started:** 2026-05-15T14:32:01Z **Benchmark:** chi-1.5.1 **Lever:** post-pattern7
|
|
**Runner:** claude **Playbook version:** 1.5.6
|
|
|
|
## Phases
|
|
|
|
- [x] Phase 1 — Explore (10:10, 12 findings, patterns 1-7 walked)
|
|
- [x] Phase 2 — Generate (0:42, 9 artifacts produced)
|
|
- [x] Phase 3 — Code Review (15:31, 6 bugs identified)
|
|
- [x] Phase 4 — Spec Audit (3 auditors, 1 triage)
|
|
- [ ] Phase 5 — Reconciliation *(in progress, started 14:58:31Z)*
|
|
- [ ] Phase 6 — Verify
|
|
|
|
## Recent events (last 10)
|
|
|
|
- 2026-05-15T14:58:31Z — phase_start phase=5
|
|
- 2026-05-15T14:58:30Z — phase_end phase=4 passes=[A,B,C,D] req_count=89
|
|
- 2026-05-15T14:42:11Z — phase_end phase=1 findings=12
|
|
|
|
## Artifacts produced
|
|
|
|
- quality/EXPLORATION.md (12,034 bytes)
|
|
- quality/REQUIREMENTS.md (28,891 bytes)
|
|
- quality/COVERAGE_MATRIX.md (3,022 bytes)
|
|
```
|
|
|
|
Sections (header, phase checklist, recent events, artifacts produced) are required. Phase checklist uses `[x]` for complete phases (with summary stats), `[ ]` for incomplete, with in-progress phase noted explicitly with start time. Recent events shows last 10 event lines from `run_state.jsonl` in human-readable form. Artifacts produced shows files written this run with byte sizes.
|
|
|
|
---
|
|
|
|
## Format invariants (enforced by `bin/run_state_lib.py` validators)
|
|
|
|
1. `_index` is line 1.
|
|
2. Every line is valid JSON (one object per line).
|
|
3. Every event has `ts` and `event` fields.
|
|
4. Every `event` value appears in `_index.event_types`.
|
|
5. Append-only: events are added, never edited. Editing a prior event is a schema violation.
|
|
6. `phase_start` and `phase_end` events for a given phase appear at most once per run (no out-of-order or duplicate phase markers).
|
|
7. `run_start` is the second line (after `_index`); `run_end` is the last line if the run completed.
|
|
|
|
Validators are read-only checks. They surface violations as findings; they don't auto-correct.
|