Update quality-playbook skill to v1.5.6 + add agent (#1402)

Rebuilds branch from upstream/staged (was previously merged from
upstream/main, which brought in materialized plugin files that
fail Check Plugin Structure on PRs targeting staged).

Changes vs. staged:
- Update skills/quality-playbook/ to v1.5.6 (31 bundled assets:
  SKILL.md + LICENSE.txt + 16 references/ + 9 phase_prompts/ +
  3 agents/ + bin/citation_verifier.py + quality_gate.py).
- Add agents/quality-playbook.agent.md (top-level orchestrator).
  name: quality-playbook (validator-compliant).
- Update docs/README.skills.md quality-playbook row description
  + bundled-assets list to v1.5.6.
- Fix 'unparseable' → 'unparsable' in quality_gate.py (5 instances;
  codespell preference, both spellings valid).

Closes the v1.4.0 → v1.5.6 update in a single clean commit on top of
upstream/staged. The preserved backup branch backup-bedbe84-pre-rebuild
(SHA bedbe848fa3c0f0eda8e653c42b599a17dd2e354) holds the prior history for reference.
This commit is contained in:
Andrew Stellman
2026-05-10 21:31:53 -04:00
committed by GitHub
parent e7755069e9
commit b8441d218b
32 changed files with 9639 additions and 543 deletions
@@ -32,94 +32,26 @@ For a medium-sized project (515 source files), this typically yields 3550
Before writing any test code, read 23 existing test files and identify how they import project modules. This is critical — projects handle imports differently and getting it wrong means every test fails with resolution errors.
Identify the import convention used in the project. Whatever pattern the existing tests use, copy it exactly. Do not guess or invent a different pattern.
Common patterns by language:
**Python:**
- `sys.path.insert(0, "src/")` then bare imports (`from module import func`)
- Package imports (`from myproject.module import func`)
- Relative imports with conftest.py path manipulation
**Java:**
- `import com.example.project.Module;` matching the package structure
- Test source root must mirror main source root
**Scala:**
- `import com.example.project._` or `import com.example.project.{ClassA, ClassB}`
- SBT project layout: `src/test/scala/` mirrors `src/main/scala/`
**TypeScript/JavaScript:**
- `import { func } from '../src/module'` with relative paths
- Path aliases from `tsconfig.json` (e.g., `@/module`)
**Go:**
- Same package: test files in the same directory with `package mypackage`
- Black-box testing: `package mypackage_test` with explicit imports
- Internal packages may require specific import paths
**Rust:**
- `use crate::module::function;` for unit tests in the same crate
- `use myproject::module::function;` for integration tests in `tests/`
Whatever pattern the existing tests use, copy it exactly. Do not guess or invent a different pattern.
- **Python:** `sys.path.insert(0, "src/")` then bare imports; package imports (`from myproject.module import func`); relative imports with conftest.py path manipulation
- **Go:** Same-package tests (`package mypackage`) give access to unexported identifiers; black-box tests (`package mypackage_test`) test only exported API; internal packages may require specific import paths
- **Java:** `import com.example.project.Module;` matching the package structure; test source root must mirror main source root
- **TypeScript:** `import { func } from '../src/module'` with relative paths; path aliases from `tsconfig.json` (e.g., `@/module`)
- **Rust:** `use crate::module::function;` for unit tests in the same crate; `use myproject::module::function;` for integration tests in `tests/`
- **Scala:** `import com.example.project._` or `import com.example.project.{ClassA, ClassB}`; SBT layout mirrors `src/main/scala/` in `src/test/scala/`
## Create Test Setup BEFORE Writing Tests
Every test framework has a mechanism for shared setup. If your tests use shared fixtures or test data, you MUST create the setup file before writing tests. Test frameworks do not auto-discover fixtures from other directories.
**By language:**
**Python (pytest):** Create `quality/conftest.py` defining every fixture. Fixtures in `tests/conftest.py` are NOT available to `quality/test_functional.py`. Preferred: write tests that create data inline using `tmp_path` to eliminate conftest dependency.
**Java (JUnit):** Use `@BeforeEach`/`@BeforeAll` methods in the test class, or create a shared `TestFixtures` utility class in the same package.
**Scala (ScalaTest):** Mix in a trait with `before`/`after` blocks, or use inline data builders. If using SBT, ensure the test file is in the correct source tree.
**TypeScript (Jest):** Use `beforeAll`/`beforeEach` in the test file, or create a `quality/testUtils.ts` with factory functions.
**Go (testing):** Helper functions in the same `_test.go` file with `t.Helper()`. Use `t.TempDir()` for temporary directories. Go convention strongly prefers inline setup — avoid shared test state.
**Rust (cargo test):** Helper functions in a `#[cfg(test)] mod tests` block or a `test_utils.rs` module. Use builder patterns for constructing test data. For integration tests, place files in `tests/`.
Identify your framework's setup mechanism (fixtures, `@BeforeEach`, `beforeAll`, helper functions, builder patterns, etc.) and follow the conventions already used in the project's existing tests.
**Rule: Every fixture or test helper referenced must be defined.** If a test depends on shared setup that doesn't exist, the test will error during setup (not fail during assertion) — producing broken tests that look like they pass.
**Preferred approach across all languages:** Write tests that create their own data inline. This eliminates cross-file dependencies:
```python
# Python
def test_config_validation(tmp_path):
config = {"pipeline": {"name": "Test", "steps": [...]}}
```
```java
// Java
@Test
void testConfigValidation(@TempDir Path tempDir) {
var config = Map.of("pipeline", Map.of("name", "Test"));
}
```
```typescript
// TypeScript
test('config validation', () => {
const config = { pipeline: { name: 'Test', steps: [] } };
});
```
```go
// Go
func TestConfigValidation(t *testing.T) {
tmpDir := t.TempDir()
config := Config{Pipeline: Pipeline{Name: "Test"}}
}
```
```rust
// Rust
#[test]
fn test_config_validation() {
let config = Config { pipeline: Pipeline { name: "Test".into() } };
}
```
**Preferred approach across all languages:** Write tests that create their own data inline. This eliminates cross-file dependencies. Create test data directly in each test function using the framework's temporary directory support and literal data structures.
**After writing all tests, run the test suite and check for setup errors.** Setup errors (fixture not found, import failures) count as broken tests regardless of how the framework categorizes them.
@@ -133,14 +65,14 @@ If you genuinely cannot write a meaningful test for a defensive pattern (e.g., i
Before writing a single test, build a function call map. For every function you plan to test:
1. **Read the function/method signature** — not just the name, but every parameter, its type, and default value. In Python, read the `def` line and type hints. In Java, read the method signature and generics. In Scala, read the method definition and implicit parameters. In TypeScript, read the type annotations.
1. **Read the function/method signature** — not just the name, but every parameter, its type, and default value.
2. **Read the documentation** — docstrings, Javadoc, TSDoc, ScalaDoc. They often specify return types, exceptions, and edge case behavior.
3. **Read one existing test that calls it** — existing tests show you the exact calling convention, fixture shape, and assertion pattern.
4. **Read real data files** — if the function processes configs, schemas, or data files, read an actual file from the project. Your test fixtures must match this shape exactly.
**Common failure pattern:** The agent explores the architecture, understands conceptually what a function does, then writes a test call with guessed parameters. The test fails because the real function takes `(config, items_data, limit)` not `(items, seed, strategy)`. Reading the actual signature takes 5 seconds and prevents this entirely.
**Library version awareness:** Check the project's dependency manifest (`requirements.txt`, `build.sbt`, `package.json`, `pom.xml`, `build.gradle`, `Cargo.toml`) to verify what's available. Use the test framework's skip mechanism for optional dependencies: Python `pytest.importorskip()`, JUnit `Assumptions.assumeTrue()`, ScalaTest `assume()`, Jest conditional `describe.skip`, Go `t.Skip()`, Rust `#[ignore]` with a comment explaining the prerequisite.
**Library version awareness:** Check the project's dependency manifest (`requirements.txt`, `build.sbt`, `package.json`, `pom.xml`, `build.gradle`, `Cargo.toml`) to verify what's available. Use the test framework's skip mechanism for optional dependencies (e.g., `pytest.importorskip()`, `Assumptions.assumeTrue()`, `t.Skip()`, `#[ignore]`, etc.).
## Writing Spec-Derived Tests
@@ -151,68 +83,9 @@ Each test should:
2. **Execute** — Call the function, run the pipeline, make the request
3. **Assert specific properties** the spec requires
```python
# Python (pytest)
class TestSpecRequirements:
def test_requirement_from_spec_section_N(self, fixture):
"""[Req: formal — Design Doc §N] X should produce Y."""
result = process(fixture)
assert result.property == expected_value
```
Each test should include a traceability annotation (via docstring, display name, or comment) citing the spec section it verifies, e.g., `[Req: formal — Design Doc §N] X should produce Y`.
```java
// Java (JUnit 5)
class SpecRequirementsTest {
@Test
@DisplayName("[Req: formal — Design Doc §N] X should produce Y")
void testRequirementFromSpecSectionN() {
var result = process(fixture);
assertEquals(expectedValue, result.getProperty());
}
}
```
```scala
// Scala (ScalaTest)
class SpecRequirements extends FlatSpec with Matchers {
// [Req: formal — Design Doc §N] X should produce Y
"Section N requirement" should "produce Y from X" in {
val result = process(fixture)
result.property should equal (expectedValue)
}
}
```
```typescript
// TypeScript (Jest)
describe('Spec Requirements', () => {
test('[Req: formal — Design Doc §N] X should produce Y', () => {
const result = process(fixture);
expect(result.property).toBe(expectedValue);
});
});
```
```go
// Go (testing)
func TestSpecRequirement_SectionN_XProducesY(t *testing.T) {
// [Req: formal — Design Doc §N] X should produce Y
result := Process(fixture)
if result.Property != expectedValue {
t.Errorf("expected %v, got %v", expectedValue, result.Property)
}
}
```
```rust
// Rust (cargo test)
#[test]
fn test_spec_requirement_section_n_x_produces_y() {
// [Req: formal — Design Doc §N] X should produce Y
let result = process(&fixture);
assert_eq!(result.property, expected_value);
}
```
## What Makes a Good Functional Test
@@ -226,72 +99,9 @@ fn test_spec_requirement_section_n_x_produces_y() {
If the project handles multiple input types, cross-variant coverage is where silent bugs hide. Aim for roughly 30% of tests exercising all variants — the exact percentage matters less than ensuring every cross-cutting property is tested across all variants.
Use your framework's parametrization mechanism:
Use your framework's parametrization mechanism (e.g., `@pytest.mark.parametrize`, `@ParameterizedTest`, `test.each`, table-driven tests, iterating over cases) to run the same assertion logic across all variants.
```python
# Python (pytest)
@pytest.mark.parametrize("variant", [variant_a, variant_b, variant_c])
def test_feature_works(variant):
output = process(variant.input)
assert output.has_expected_property
```
```java
// Java (JUnit 5)
@ParameterizedTest
@MethodSource("variantProvider")
void testFeatureWorks(Variant variant) {
var output = process(variant.getInput());
assertTrue(output.hasExpectedProperty());
}
```
```scala
// Scala (ScalaTest)
Seq(variantA, variantB, variantC).foreach { variant =>
it should s"work for ${variant.name}" in {
val output = process(variant.input)
output should have ('expectedProperty (true))
}
}
```
```typescript
// TypeScript (Jest)
test.each([variantA, variantB, variantC])(
'feature works for %s', (variant) => {
const output = process(variant.input);
expect(output).toHaveProperty('expectedProperty');
});
```
```go
// Go (testing) — table-driven tests
func TestFeatureWorksAcrossVariants(t *testing.T) {
variants := []Variant{variantA, variantB, variantC}
for _, v := range variants {
t.Run(v.Name, func(t *testing.T) {
output := Process(v.Input)
if !output.HasExpectedProperty() {
t.Errorf("variant %s: missing expected property", v.Name)
}
})
}
}
```
```rust
// Rust (cargo test) — iterate over cases
#[test]
fn test_feature_works_across_variants() {
let variants = [variant_a(), variant_b(), variant_c()];
for v in &variants {
let output = process(&v.input);
assert!(output.has_expected_property(),
"variant {}: missing expected property", v.name);
}
}
```
If parametrization doesn't fit, loop explicitly within a single test.
@@ -312,68 +122,15 @@ These patterns look like tests but don't catch real bugs:
### The Exception-Catching Anti-Pattern in Detail
```java
// Java — WRONG: tests the validation mechanism
@Test
void testBadValueRejected() {
fixture.setField("invalid"); // Schema rejects this!
assertThrows(ValidationException.class, () -> process(fixture));
// Tells you nothing about output
}
// Java — RIGHT: tests the requirement
@Test
void testBadValueNotInOutput() {
fixture.setField(null); // Schema accepts null for Optional
var output = process(fixture);
assertFalse(output.contains(badProperty)); // Bad data absent
assertTrue(output.contains(expectedType)); // Rest still works
}
```
```scala
// Scala — WRONG: tests the decoder, not the requirement
"bad value" should "be rejected" in {
val input = fixture.copy(field = "invalid") // Circe decoder fails!
a [DecodingFailure] should be thrownBy process(input)
// Tells you nothing about output
}
// Scala — RIGHT: tests the requirement
"missing optional field" should "not produce bad output" in {
val input = fixture.copy(field = None) // Option[String] accepts None
val output = process(input)
output should not contain badProperty // Bad data absent
output should contain (expectedType) // Rest still works
}
```
```typescript
// TypeScript — WRONG: tests the validation mechanism
test('bad value rejected', () => {
fixture.field = 'invalid'; // Zod schema rejects this!
expect(() => process(fixture)).toThrow(ZodError);
// Tells you nothing about output
});
// TypeScript — RIGHT: tests the requirement
test('bad value not in output', () => {
fixture.field = undefined; // Schema accepts undefined for optional
const output = process(fixture);
expect(output).not.toContain(badProperty); // Bad data absent
expect(output).toContain(expectedType); // Rest still works
});
```
```python
# Python — WRONG: tests the validation mechanism
# WRONG: tests the validation mechanism
def test_bad_value_rejected(fixture):
fixture.field = "invalid" # Schema rejects this!
with pytest.raises(ValidationError):
process(fixture)
# Tells you nothing about output
# Python — RIGHT: tests the requirement
# RIGHT: tests the requirement
def test_bad_value_not_in_output(fixture):
fixture.field = None # Schema accepts None for Optional
output = process(fixture)
@@ -381,42 +138,9 @@ def test_bad_value_not_in_output(fixture):
assert expected_type in output # Rest still works
```
```go
// Go — WRONG: tests the error, not the outcome
func TestBadValueRejected(t *testing.T) {
fixture.Field = "invalid" // Validator rejects this!
_, err := Process(fixture)
if err == nil { t.Fatal("expected error") }
// Tells you nothing about output
}
The pattern is the same in every language: don't test that the validation mechanism rejects bad input — test that the system produces correct output when given edge-case input the schema accepts. The WRONG approach tests the implementation (the validator); the RIGHT approach tests the requirement (the output).
// Go — RIGHT: tests the requirement
func TestBadValueNotInOutput(t *testing.T) {
fixture.Field = "" // Zero value is valid
output, err := Process(fixture)
if err != nil { t.Fatalf("unexpected error: %v", err) }
if containsBadProperty(output) { t.Error("bad data should be absent") }
if !containsExpectedType(output) { t.Error("expected data should be present") }
}
```
```rust
// Rust — WRONG: tests the error, not the outcome
#[test]
fn test_bad_value_rejected() {
let input = Fixture { field: "invalid".into(), ..default() };
assert!(process(&input).is_err()); // Tells you nothing about output
}
// Rust — RIGHT: tests the requirement
#[test]
fn test_bad_value_not_in_output() {
let input = Fixture { field: None, ..default() }; // Option accepts None
let output = process(&input).expect("should succeed");
assert!(!output.contains(bad_property)); // Bad data absent
assert!(output.contains(expected_type)); // Rest still works
}
```
Always check your Step 5b schema map before choosing mutation values.
@@ -428,154 +152,20 @@ Ask: "What does the *spec* say should happen?" The spec says "invalid data shoul
## Fitness-to-Purpose Scenario Tests
For each scenario in QUALITY.md, write a test. This is a 1:1 mapping:
For each scenario in QUALITY.md, write a test. This is a 1:1 mapping. Each test should include a traceability annotation citing the scenario, e.g., `[Req: formal — QUALITY.md Scenario 1]`, and be named to match the scenario's memorable name.
```scala
// Scala (ScalaTest)
class FitnessScenarios extends FlatSpec with Matchers {
// [Req: formal — QUALITY.md Scenario 1]
"Scenario 1: [Name]" should "prevent [failure mode]" in {
val result = process(fixture)
result.property should equal (expectedValue)
}
}
```
```python
# Python (pytest)
class TestFitnessScenarios:
"""Tests for fitness-to-purpose scenarios from QUALITY.md."""
def test_scenario_1_memorable_name(self, fixture):
"""[Req: formal — QUALITY.md Scenario 1] [Name].
Requirement: [What the code must do].
"""
result = process(fixture)
assert condition_that_prevents_the_failure
```
```java
// Java (JUnit 5)
class FitnessScenariosTest {
@Test
@DisplayName("[Req: formal — QUALITY.md Scenario 1] [Name]")
void testScenario1MemorableName() {
var result = process(fixture);
assertTrue(conditionThatPreventsFailure(result));
}
}
```
```typescript
// TypeScript (Jest)
describe('Fitness Scenarios', () => {
test('[Req: formal — QUALITY.md Scenario 1] [Name]', () => {
const result = process(fixture);
expect(conditionThatPreventsFailure(result)).toBe(true);
});
});
```
```go
// Go (testing)
func TestScenario1_MemorableName(t *testing.T) {
// [Req: formal — QUALITY.md Scenario 1] [Name]
// Requirement: [What the code must do]
result := Process(fixture)
if !conditionThatPreventsFailure(result) {
t.Error("scenario 1 failed: [describe expected behavior]")
}
}
```
```rust
// Rust (cargo test)
#[test]
fn test_scenario_1_memorable_name() {
// [Req: formal — QUALITY.md Scenario 1] [Name]
// Requirement: [What the code must do]
let result = process(&fixture);
assert!(condition_that_prevents_the_failure(&result));
}
```
## Boundary and Negative Tests
One test per defensive pattern from Step 5:
One test per defensive pattern from Step 5. Each test should include a traceability annotation citing the defensive pattern, e.g., `[Req: inferred — from function_name() guard] guards against X`.
```typescript
// TypeScript (Jest)
describe('Boundaries and Edge Cases', () => {
test('[Req: inferred — from functionName() guard] guards against X', () => {
const input = { ...validFixture, field: null };
const result = process(input);
expect(result).not.toContainBadOutput();
});
});
```
For each boundary test:
1. Mutate input to trigger the defensive code path (using a value the schema accepts)
2. Process the mutated input
3. Assert graceful handling — the result is valid despite the edge-case input
```python
# Python (pytest)
class TestBoundariesAndEdgeCases:
"""Tests for boundary conditions, malformed input, error handling."""
def test_defensive_pattern_name(self, fixture):
"""[Req: inferred — from function_name() guard] guards against X."""
# Mutate to trigger defensive code path
# Assert graceful handling
```
```java
// Java (JUnit 5)
class BoundariesAndEdgeCasesTest {
@Test
@DisplayName("[Req: inferred — from methodName() guard] guards against X")
void testDefensivePatternName() {
fixture.setField(null); // Trigger defensive code path
var result = process(fixture);
assertNotNull(result); // Assert graceful handling
assertFalse(result.containsBadData());
}
}
```
```scala
// Scala (ScalaTest)
class BoundariesAndEdgeCases extends FlatSpec with Matchers {
// [Req: inferred — from methodName() guard]
"defensive pattern: methodName()" should "guard against X" in {
val input = fixture.copy(field = None) // Trigger defensive code path
val result = process(input)
result should equal (defined)
result.get should not contain badData
}
}
```
```go
// Go (testing)
func TestDefensivePattern_FunctionName_GuardsAgainstX(t *testing.T) {
// [Req: inferred — from FunctionName() guard] guards against X
input := defaultFixture()
input.Field = nil // Trigger defensive code path
result, err := Process(input)
if err != nil {
t.Fatalf("expected graceful handling, got: %v", err)
}
// Assert result is valid despite edge-case input
}
```
```rust
// Rust (cargo test)
#[test]
fn test_defensive_pattern_function_name_guards_against_x() {
// [Req: inferred — from function_name() guard] guards against X
let input = Fixture { field: None, ..default_fixture() };
let result = process(&input).expect("expected graceful handling");
// Assert result is valid despite edge-case input
}
```
Use your Step 5b schema map when choosing mutation values. Every mutation must use a value the schema accepts.