Update quality-playbook skill to v1.5.6 + add agent (#1402)

Rebuilds branch from upstream/staged (was previously merged from upstream/main, which brought in materialized plugin files that fail Check Plugin Structure on PRs targeting staged). Changes vs. staged: - Update skills/quality-playbook/ to v1.5.6 (31 bundled assets: SKILL.md + LICENSE.txt + 16 references/ + 9 phase_prompts/ + 3 agents/ + bin/citation_verifier.py + quality_gate.py). - Add agents/quality-playbook.agent.md (top-level orchestrator). name: quality-playbook (validator-compliant). - Update docs/README.skills.md quality-playbook row description + bundled-assets list to v1.5.6. - Fix 'unparseable' → 'unparsable' in quality_gate.py (5 instances; codespell preference, both spellings valid). Closes the v1.4.0 → v1.5.6 update in a single clean commit on top of upstream/staged. The preserved backup branch backup-bedbe84-pre-rebuild (SHA bedbe848fa3c0f0eda8e653c42b599a17dd2e354) holds the prior history for reference.
2026-05-15 19:21:45 +00:00 · 2026-05-10 21:31:53 -04:00
parent e7755069e9
commit b8441d218b
32 changed files with 9639 additions and 543 deletions
@@ -32,94 +32,26 @@ For a medium-sized project (5–15 source files), this typically yields 35–50

 Before writing any test code, read 2–3 existing test files and identify how they import project modules. This is critical — projects handle imports differently and getting it wrong means every test fails with resolution errors.

+Identify the import convention used in the project. Whatever pattern the existing tests use, copy it exactly. Do not guess or invent a different pattern.
+
 Common patterns by language:

-**Python:**
- `sys.path.insert(0, "src/")` then bare imports (`from module import func`)
- Package imports (`from myproject.module import func`)
- Relative imports with conftest.py path manipulation
-
-**Java:**
- `import com.example.project.Module;` matching the package structure
- Test source root must mirror main source root
-
-**Scala:**
- `import com.example.project._` or `import com.example.project.{ClassA, ClassB}`
- SBT project layout: `src/test/scala/` mirrors `src/main/scala/`
-
-**TypeScript/JavaScript:**
- `import { func } from '../src/module'` with relative paths
- Path aliases from `tsconfig.json` (e.g., `@/module`)
-
-**Go:**
- Same package: test files in the same directory with `package mypackage`
- Black-box testing: `package mypackage_test` with explicit imports
- Internal packages may require specific import paths
-
-**Rust:**
- `use crate::module::function;` for unit tests in the same crate
- `use myproject::module::function;` for integration tests in `tests/`
-
-Whatever pattern the existing tests use, copy it exactly. Do not guess or invent a different pattern.
+- **Python:** `sys.path.insert(0, "src/")` then bare imports; package imports (`from myproject.module import func`); relative imports with conftest.py path manipulation
+- **Go:** Same-package tests (`package mypackage`) give access to unexported identifiers; black-box tests (`package mypackage_test`) test only exported API; internal packages may require specific import paths
+- **Java:** `import com.example.project.Module;` matching the package structure; test source root must mirror main source root
+- **TypeScript:** `import { func } from '../src/module'` with relative paths; path aliases from `tsconfig.json` (e.g., `@/module`)
+- **Rust:** `use crate::module::function;` for unit tests in the same crate; `use myproject::module::function;` for integration tests in `tests/`
+- **Scala:** `import com.example.project._` or `import com.example.project.{ClassA, ClassB}`; SBT layout mirrors `src/main/scala/` in `src/test/scala/`

 ## Create Test Setup BEFORE Writing Tests

 Every test framework has a mechanism for shared setup. If your tests use shared fixtures or test data, you MUST create the setup file before writing tests. Test frameworks do not auto-discover fixtures from other directories.

-**By language:**
-
-**Python (pytest):** Create `quality/conftest.py` defining every fixture. Fixtures in `tests/conftest.py` are NOT available to `quality/test_functional.py`. Preferred: write tests that create data inline using `tmp_path` to eliminate conftest dependency.
-
-**Java (JUnit):** Use `@BeforeEach`/`@BeforeAll` methods in the test class, or create a shared `TestFixtures` utility class in the same package.
-
-**Scala (ScalaTest):** Mix in a trait with `before`/`after` blocks, or use inline data builders. If using SBT, ensure the test file is in the correct source tree.
-
-**TypeScript (Jest):** Use `beforeAll`/`beforeEach` in the test file, or create a `quality/testUtils.ts` with factory functions.
-
-**Go (testing):** Helper functions in the same `_test.go` file with `t.Helper()`. Use `t.TempDir()` for temporary directories. Go convention strongly prefers inline setup — avoid shared test state.
-
-**Rust (cargo test):** Helper functions in a `#[cfg(test)] mod tests` block or a `test_utils.rs` module. Use builder patterns for constructing test data. For integration tests, place files in `tests/`.
+Identify your framework's setup mechanism (fixtures, `@BeforeEach`, `beforeAll`, helper functions, builder patterns, etc.) and follow the conventions already used in the project's existing tests.

 **Rule: Every fixture or test helper referenced must be defined.** If a test depends on shared setup that doesn't exist, the test will error during setup (not fail during assertion) — producing broken tests that look like they pass.

-**Preferred approach across all languages:** Write tests that create their own data inline. This eliminates cross-file dependencies:
-
-```python
-# Python
-def test_config_validation(tmp_path):
-    config = {"pipeline": {"name": "Test", "steps": [...]}}
-```
-
-```java
-// Java
-@Test
-void testConfigValidation(@TempDir Path tempDir) {
-    var config = Map.of("pipeline", Map.of("name", "Test"));
-}
-```
-
-```typescript
-// TypeScript
-test('config validation', () => {
-    const config = { pipeline: { name: 'Test', steps: [] } };
-});
-```
-
-```go
-// Go
-func TestConfigValidation(t *testing.T) {
-    tmpDir := t.TempDir()
-    config := Config{Pipeline: Pipeline{Name: "Test"}}
-}
-```
-
-```rust
-// Rust
-#[test]
-fn test_config_validation() {
-    let config = Config { pipeline: Pipeline { name: "Test".into() } };
-}
-```
+**Preferred approach across all languages:** Write tests that create their own data inline. This eliminates cross-file dependencies. Create test data directly in each test function using the framework's temporary directory support and literal data structures.

 **After writing all tests, run the test suite and check for setup errors.** Setup errors (fixture not found, import failures) count as broken tests regardless of how the framework categorizes them.

@@ -133,14 +65,14 @@ If you genuinely cannot write a meaningful test for a defensive pattern (e.g., i

 Before writing a single test, build a function call map. For every function you plan to test:

-1. **Read the function/method signature** — not just the name, but every parameter, its type, and default value. In Python, read the `def` line and type hints. In Java, read the method signature and generics. In Scala, read the method definition and implicit parameters. In TypeScript, read the type annotations.
+1. **Read the function/method signature** — not just the name, but every parameter, its type, and default value.
 2. **Read the documentation** — docstrings, Javadoc, TSDoc, ScalaDoc. They often specify return types, exceptions, and edge case behavior.
 3. **Read one existing test that calls it** — existing tests show you the exact calling convention, fixture shape, and assertion pattern.
 4. **Read real data files** — if the function processes configs, schemas, or data files, read an actual file from the project. Your test fixtures must match this shape exactly.

 **Common failure pattern:** The agent explores the architecture, understands conceptually what a function does, then writes a test call with guessed parameters. The test fails because the real function takes `(config, items_data, limit)` not `(items, seed, strategy)`. Reading the actual signature takes 5 seconds and prevents this entirely.

-**Library version awareness:** Check the project's dependency manifest (`requirements.txt`, `build.sbt`, `package.json`, `pom.xml`, `build.gradle`, `Cargo.toml`) to verify what's available. Use the test framework's skip mechanism for optional dependencies: Python `pytest.importorskip()`, JUnit `Assumptions.assumeTrue()`, ScalaTest `assume()`, Jest conditional `describe.skip`, Go `t.Skip()`, Rust `#[ignore]` with a comment explaining the prerequisite.
+**Library version awareness:** Check the project's dependency manifest (`requirements.txt`, `build.sbt`, `package.json`, `pom.xml`, `build.gradle`, `Cargo.toml`) to verify what's available. Use the test framework's skip mechanism for optional dependencies (e.g., `pytest.importorskip()`, `Assumptions.assumeTrue()`, `t.Skip()`, `#[ignore]`, etc.).

 ## Writing Spec-Derived Tests

@@ -151,68 +83,9 @@ Each test should:
 2. **Execute** — Call the function, run the pipeline, make the request
 3. **Assert specific properties** the spec requires

-```python
-# Python (pytest)
-class TestSpecRequirements:
-    def test_requirement_from_spec_section_N(self, fixture):
-        """[Req: formal — Design Doc §N] X should produce Y."""
-        result = process(fixture)
-        assert result.property == expected_value
-```
+Each test should include a traceability annotation (via docstring, display name, or comment) citing the spec section it verifies, e.g., `[Req: formal — Design Doc §N] X should produce Y`.

-```java
-// Java (JUnit 5)
-class SpecRequirementsTest {
-    @Test
-    @DisplayName("[Req: formal — Design Doc §N] X should produce Y")
-    void testRequirementFromSpecSectionN() {
-        var result = process(fixture);
-        assertEquals(expectedValue, result.getProperty());
-    }
-}
-```

-```scala
-// Scala (ScalaTest)
-class SpecRequirements extends FlatSpec with Matchers {
-  // [Req: formal — Design Doc §N] X should produce Y
-  "Section N requirement" should "produce Y from X" in {
-    val result = process(fixture)
-    result.property should equal (expectedValue)
-  }
-}
-```
-
-```typescript
-// TypeScript (Jest)
-describe('Spec Requirements', () => {
-  test('[Req: formal — Design Doc §N] X should produce Y', () => {
-    const result = process(fixture);
-    expect(result.property).toBe(expectedValue);
-  });
-});
-```
-
-```go
-// Go (testing)
-func TestSpecRequirement_SectionN_XProducesY(t *testing.T) {
-    // [Req: formal — Design Doc §N] X should produce Y
-    result := Process(fixture)
-    if result.Property != expectedValue {
-        t.Errorf("expected %v, got %v", expectedValue, result.Property)
-    }
-}
-```
-
-```rust
-// Rust (cargo test)
-#[test]
-fn test_spec_requirement_section_n_x_produces_y() {
-    // [Req: formal — Design Doc §N] X should produce Y
-    let result = process(&fixture);
-    assert_eq!(result.property, expected_value);
-}
-```

 ## What Makes a Good Functional Test

@@ -226,72 +99,9 @@ fn test_spec_requirement_section_n_x_produces_y() {

 If the project handles multiple input types, cross-variant coverage is where silent bugs hide. Aim for roughly 30% of tests exercising all variants — the exact percentage matters less than ensuring every cross-cutting property is tested across all variants.

-Use your framework's parametrization mechanism:
+Use your framework's parametrization mechanism (e.g., `@pytest.mark.parametrize`, `@ParameterizedTest`, `test.each`, table-driven tests, iterating over cases) to run the same assertion logic across all variants.

-```python
-# Python (pytest)
-@pytest.mark.parametrize("variant", [variant_a, variant_b, variant_c])
-def test_feature_works(variant):
-    output = process(variant.input)
-    assert output.has_expected_property
-```

-```java
-// Java (JUnit 5)
-@ParameterizedTest
-@MethodSource("variantProvider")
-void testFeatureWorks(Variant variant) {
-    var output = process(variant.getInput());
-    assertTrue(output.hasExpectedProperty());
-}
-```
-
-```scala
-// Scala (ScalaTest)
-Seq(variantA, variantB, variantC).foreach { variant =>
-  it should s"work for ${variant.name}" in {
-    val output = process(variant.input)
-    output should have ('expectedProperty (true))
-  }
-}
-```
-
-```typescript
-// TypeScript (Jest)
-test.each([variantA, variantB, variantC])(
-  'feature works for %s', (variant) => {
-    const output = process(variant.input);
-    expect(output).toHaveProperty('expectedProperty');
-});
-```
-
-```go
-// Go (testing) — table-driven tests
-func TestFeatureWorksAcrossVariants(t *testing.T) {
-    variants := []Variant{variantA, variantB, variantC}
-    for _, v := range variants {
-        t.Run(v.Name, func(t *testing.T) {
-            output := Process(v.Input)
-            if !output.HasExpectedProperty() {
-                t.Errorf("variant %s: missing expected property", v.Name)
-            }
-        })
-    }
-}
-```
-
-```rust
-// Rust (cargo test) — iterate over cases
-#[test]
-fn test_feature_works_across_variants() {
-    let variants = [variant_a(), variant_b(), variant_c()];
-    for v in &variants {
-        let output = process(&v.input);
-        assert!(output.has_expected_property(),
-            "variant {}: missing expected property", v.name);
-    }
-}
-```

 If parametrization doesn't fit, loop explicitly within a single test.

@@ -312,68 +122,15 @@ These patterns look like tests but don't catch real bugs:

 ### The Exception-Catching Anti-Pattern in Detail

-```java
-// Java — WRONG: tests the validation mechanism
-@Test
-void testBadValueRejected() {
-    fixture.setField("invalid");  // Schema rejects this!
-    assertThrows(ValidationException.class, () -> process(fixture));
-    // Tells you nothing about output
-}
-
-// Java — RIGHT: tests the requirement
-@Test
-void testBadValueNotInOutput() {
-    fixture.setField(null);  // Schema accepts null for Optional
-    var output = process(fixture);
-    assertFalse(output.contains(badProperty));  // Bad data absent
-    assertTrue(output.contains(expectedType));   // Rest still works
-}
-```
-
-```scala
-// Scala — WRONG: tests the decoder, not the requirement
-"bad value" should "be rejected" in {
-  val input = fixture.copy(field = "invalid")  // Circe decoder fails!
-  a [DecodingFailure] should be thrownBy process(input)
-  // Tells you nothing about output
-}
-
-// Scala — RIGHT: tests the requirement
-"missing optional field" should "not produce bad output" in {
-  val input = fixture.copy(field = None)  // Option[String] accepts None
-  val output = process(input)
-  output should not contain badProperty  // Bad data absent
-  output should contain (expectedType)   // Rest still works
-}
-```
-
-```typescript
-// TypeScript — WRONG: tests the validation mechanism
-test('bad value rejected', () => {
-    fixture.field = 'invalid';  // Zod schema rejects this!
-    expect(() => process(fixture)).toThrow(ZodError);
-    // Tells you nothing about output
-});
-
-// TypeScript — RIGHT: tests the requirement
-test('bad value not in output', () => {
-    fixture.field = undefined;  // Schema accepts undefined for optional
-    const output = process(fixture);
-    expect(output).not.toContain(badProperty);  // Bad data absent
-    expect(output).toContain(expectedType);      // Rest still works
-});
-```
-
 ```python
-# Python — WRONG: tests the validation mechanism
+# WRONG: tests the validation mechanism
 def test_bad_value_rejected(fixture):
    fixture.field = "invalid"  # Schema rejects this!
    with pytest.raises(ValidationError):
        process(fixture)
    # Tells you nothing about output

-# Python — RIGHT: tests the requirement
+# RIGHT: tests the requirement
 def test_bad_value_not_in_output(fixture):
    fixture.field = None  # Schema accepts None for Optional
    output = process(fixture)
@@ -381,42 +138,9 @@ def test_bad_value_not_in_output(fixture):
    assert expected_type in output  # Rest still works
 ```

-```go
-// Go — WRONG: tests the error, not the outcome
-func TestBadValueRejected(t *testing.T) {
-    fixture.Field = "invalid"  // Validator rejects this!
-    _, err := Process(fixture)
-    if err == nil { t.Fatal("expected error") }
-    // Tells you nothing about output
-}
+The pattern is the same in every language: don't test that the validation mechanism rejects bad input — test that the system produces correct output when given edge-case input the schema accepts. The WRONG approach tests the implementation (the validator); the RIGHT approach tests the requirement (the output).

-// Go — RIGHT: tests the requirement
-func TestBadValueNotInOutput(t *testing.T) {
-    fixture.Field = ""  // Zero value is valid
-    output, err := Process(fixture)
-    if err != nil { t.Fatalf("unexpected error: %v", err) }
-    if containsBadProperty(output) { t.Error("bad data should be absent") }
-    if !containsExpectedType(output) { t.Error("expected data should be present") }
-}
-```

-```rust
-// Rust — WRONG: tests the error, not the outcome
-#[test]
-fn test_bad_value_rejected() {
-    let input = Fixture { field: "invalid".into(), ..default() };
-    assert!(process(&input).is_err());  // Tells you nothing about output
-}
-
-// Rust — RIGHT: tests the requirement
-#[test]
-fn test_bad_value_not_in_output() {
-    let input = Fixture { field: None, ..default() };  // Option accepts None
-    let output = process(&input).expect("should succeed");
-    assert!(!output.contains(bad_property));  // Bad data absent
-    assert!(output.contains(expected_type));   // Rest still works
-}
-```

 Always check your Step 5b schema map before choosing mutation values.

@@ -428,154 +152,20 @@ Ask: "What does the *spec* say should happen?" The spec says "invalid data shoul

 ## Fitness-to-Purpose Scenario Tests

-For each scenario in QUALITY.md, write a test. This is a 1:1 mapping:
+For each scenario in QUALITY.md, write a test. This is a 1:1 mapping. Each test should include a traceability annotation citing the scenario, e.g., `[Req: formal — QUALITY.md Scenario 1]`, and be named to match the scenario's memorable name.

-```scala
-// Scala (ScalaTest)
-class FitnessScenarios extends FlatSpec with Matchers {
-  // [Req: formal — QUALITY.md Scenario 1]
-  "Scenario 1: [Name]" should "prevent [failure mode]" in {
-    val result = process(fixture)
-    result.property should equal (expectedValue)
-  }
-}
-```

-```python
-# Python (pytest)
-class TestFitnessScenarios:
-    """Tests for fitness-to-purpose scenarios from QUALITY.md."""
-
-    def test_scenario_1_memorable_name(self, fixture):
-        """[Req: formal — QUALITY.md Scenario 1] [Name].
-        Requirement: [What the code must do].
-        """
-        result = process(fixture)
-        assert condition_that_prevents_the_failure
-```
-
-```java
-// Java (JUnit 5)
-class FitnessScenariosTest {
-    @Test
-    @DisplayName("[Req: formal — QUALITY.md Scenario 1] [Name]")
-    void testScenario1MemorableName() {
-        var result = process(fixture);
-        assertTrue(conditionThatPreventsFailure(result));
-    }
-}
-```
-
-```typescript
-// TypeScript (Jest)
-describe('Fitness Scenarios', () => {
-  test('[Req: formal — QUALITY.md Scenario 1] [Name]', () => {
-    const result = process(fixture);
-    expect(conditionThatPreventsFailure(result)).toBe(true);
-  });
-});
-```
-
-```go
-// Go (testing)
-func TestScenario1_MemorableName(t *testing.T) {
-    // [Req: formal — QUALITY.md Scenario 1] [Name]
-    // Requirement: [What the code must do]
-    result := Process(fixture)
-    if !conditionThatPreventsFailure(result) {
-        t.Error("scenario 1 failed: [describe expected behavior]")
-    }
-}
-```
-
-```rust
-// Rust (cargo test)
-#[test]
-fn test_scenario_1_memorable_name() {
-    // [Req: formal — QUALITY.md Scenario 1] [Name]
-    // Requirement: [What the code must do]
-    let result = process(&fixture);
-    assert!(condition_that_prevents_the_failure(&result));
-}
-```

 ## Boundary and Negative Tests

-One test per defensive pattern from Step 5:
+One test per defensive pattern from Step 5. Each test should include a traceability annotation citing the defensive pattern, e.g., `[Req: inferred — from function_name() guard] guards against X`.

-```typescript
-// TypeScript (Jest)
-describe('Boundaries and Edge Cases', () => {
-  test('[Req: inferred — from functionName() guard] guards against X', () => {
-    const input = { ...validFixture, field: null };
-    const result = process(input);
-    expect(result).not.toContainBadOutput();
-  });
-});
-```
+For each boundary test:
+1. Mutate input to trigger the defensive code path (using a value the schema accepts)
+2. Process the mutated input
+3. Assert graceful handling — the result is valid despite the edge-case input

-```python
-# Python (pytest)
-class TestBoundariesAndEdgeCases:
-    """Tests for boundary conditions, malformed input, error handling."""

-    def test_defensive_pattern_name(self, fixture):
-        """[Req: inferred — from function_name() guard] guards against X."""
-        # Mutate to trigger defensive code path
-        # Assert graceful handling
-```
-
-```java
-// Java (JUnit 5)
-class BoundariesAndEdgeCasesTest {
-    @Test
-    @DisplayName("[Req: inferred — from methodName() guard] guards against X")
-    void testDefensivePatternName() {
-        fixture.setField(null);  // Trigger defensive code path
-        var result = process(fixture);
-        assertNotNull(result);  // Assert graceful handling
-        assertFalse(result.containsBadData());
-    }
-}
-```
-
-```scala
-// Scala (ScalaTest)
-class BoundariesAndEdgeCases extends FlatSpec with Matchers {
-  // [Req: inferred — from methodName() guard]
-  "defensive pattern: methodName()" should "guard against X" in {
-    val input = fixture.copy(field = None)  // Trigger defensive code path
-    val result = process(input)
-    result should equal (defined)
-    result.get should not contain badData
-  }
-}
-```
-
-```go
-// Go (testing)
-func TestDefensivePattern_FunctionName_GuardsAgainstX(t *testing.T) {
-    // [Req: inferred — from FunctionName() guard] guards against X
-    input := defaultFixture()
-    input.Field = nil  // Trigger defensive code path
-    result, err := Process(input)
-    if err != nil {
-        t.Fatalf("expected graceful handling, got: %v", err)
-    }
-    // Assert result is valid despite edge-case input
-}
-```
-
-```rust
-// Rust (cargo test)
-#[test]
-fn test_defensive_pattern_function_name_guards_against_x() {
-    // [Req: inferred — from function_name() guard] guards against X
-    let input = Fixture { field: None, ..default_fixture() };
-    let result = process(&input).expect("expected graceful handling");
-    // Assert result is valid despite edge-case input
-}
-```

 Use your Step 5b schema map when choosing mutation values. Every mutation must use a value the schema accepts.