--- name: harness-engineering description: 'Adopt repository-level harness engineering for coding agents. Use when a user wants to prevent repeated AI coding-agent mistakes by turning failures into durable instructions, drift checks, regression tests, failure memory, and adoption reports tailored to the target repository.' --- # Harness Engineering Harness engineering turns repeated coding-agent mistakes into durable repository artifacts: ```text Harness = Instructions + Constraints + Feedback + Memory + Evaluation + Governance ``` Use this skill when the user asks to: - make a repository more reliable for GitHub Copilot or other coding agents - add durable agent instructions, repository rules, or guardrails - prevent repeated AI coding-agent mistakes - record known failure paths and the checks that prevent recurrence - add lightweight drift checks for project rules - review, refresh, or update an existing agent harness Do not use this skill for ordinary feature implementation unless the user asks to improve the repository's agent operating environment. ## Core Principles - Treat the target repository as the source of truth. - Inspect before editing. Preserve the existing stack, package manager, CI, docs, naming, and architecture. - Add the smallest useful harness. Prefer updating existing files over adding duplicate guidance. - Make important rules enforceable where practical through tests, linters, type checks, CI, pre-commit hooks, or drift scripts. - Use manual review points only when automation would be brittle or misleading. - Record high-risk failures that should not recur, and name the check or review point that catches recurrence. - Do not copy generic templates blindly. Adapt every artifact to real evidence in the target repository. ## Discovery Before proposing or making harness changes, inspect the repository for existing rules and evidence. Read these files and folders when they exist: - `README.md` - `AGENTS.md` - `.github/copilot-instructions.md` - `.github/instructions/` - `.github/workflows/` - `CONTRIBUTING.md` - package manifests such as `package.json`, `pyproject.toml`, `go.mod`, `Cargo.toml`, `pom.xml`, or `build.gradle` - existing docs under `docs/` - existing scripts under `scripts/` - existing tests and CI checks Then summarize: - stack, package manager, and entry points - existing development and verification commands - current agent instructions or repository conventions - known failures, incidents, flaky paths, or repeated review comments - gaps where project rules are not enforced ## Adoption Workflow Follow this sequence: 1. Choose the harness surface that fits the target repository. 2. Write target-specific agent instructions. 3. Add enforceable checks for high-value rules. 4. Record failure memory for high-risk or recurring failures. 5. Add drift checks for guidance that can silently become stale. 6. Report the adoption with evidence, assumptions, and follow-up. ### 1. Choose the Harness Surface Pick only the surfaces that fit the target repository: | Need | Preferred artifact | | --- | --- | | Always-on agent behavior | `AGENTS.md` or `.github/copilot-instructions.md` | | File-scoped guidance | `.github/instructions/*.instructions.md` | | Recurring project checks | `scripts/check_*.py`, shell scripts, or package scripts | | CI enforcement | existing workflow files or a small new workflow | | Known failures | `docs/failures/*.md` | | Architecture or process decisions | `docs/decisions/*.md` | | Adoption evidence | `docs/harness/adoption-report.md` or similar | If the repository already has an equivalent location, update it instead of creating a parallel system. ### 2. Write Agent Instructions Agent instructions should be concrete and operational. Include: - project purpose and major ownership boundaries - setup, test, lint, build, and verification commands - package manager and dependency rules - safe editing rules, generated file rules, and forbidden paths - testing expectations for changed code - PR and commit conventions if the repo has them - how to record new failures or decisions Avoid broad personality guidance, generic best practices, and rules that cannot be checked or reviewed. ### 3. Add Enforceable Checks Convert high-value rules into checks. Good harness checks are: - narrow enough to avoid false positives - fast enough to run locally and in CI - named clearly so agents can run them before finishing - documented with the rule they protect Examples: ```text Rule: Do not edit generated API clients. Check: script scans diffs for generated paths and fails with a clear message. Rule: Every failure memory note names a regression check. Check: script validates docs/failures/*.md for a "Detection" section. Rule: Profile docs and templates must stay aligned. Check: test compares profile README files to expected template files. ``` ### 4. Record Failure Memory Record failures when they are user-visible, high-risk, or likely to recur. Use a new file under `docs/failures/` unless an existing note already covers the same root cause. Recommended structure: ```markdown # Short Failure Title ## Summary What failed, who saw it, and why it matters. ## Root Cause The technical or process cause. Avoid blame. ## Prevention Instruction, test, drift check, CI gate, fixture, or manual review point that prevents or detects recurrence. ## Evidence Links to issue, PR, test, log, command output, or file paths. ``` If no automated check is practical, record the manual review point and why automation would be unsafe or misleading. ### 5. Add Drift Checks Use drift checks for guidance that can silently become stale. Common examples: - docs mention commands that no longer exist - profile snippets and generated examples diverge - failure notes omit regression checks - decision records are missing for structural changes - CI references stale scripts or package commands Prefer small scripts using the repository's existing language. If the repo has no scripting convention, Python with only the standard library is a portable default. ### 6. Report the Adoption Finish substantial harness work with an adoption report that includes: - files changed - rules added or updated - checks added or reused - commands run and results - assumptions and manual follow-up - failure memory created or intentionally skipped - how effectiveness will be measured ## Review Workflow When asked to review a harness change, take an opposing perspective. Look for: - generic rules copied without evidence from the target repository - duplicate or conflicting instruction files - broad checks that are likely to fail on valid changes - unenforced high-risk rules - missing failure memory for repeated mistakes or runtime failures - generated docs not refreshed after source changes - CI gates that do not run the relevant checks - target repository conventions being overwritten by harness defaults Report findings first, ordered by severity, with file and line references when available. Do not modify files during a review unless the user explicitly asks for fixes. ## Output Contract Before finishing harness adoption work, verify: - the target repository was inspected before edits - new guidance is specific to the target repository - changed checks can be run locally or have a documented manual substitute - failure memory was recorded when required, or the final response explains why it was skipped - generated docs or indexes are refreshed - the final report names every command run and its result