---
name: harness-engineering
description: 'Adopt repository-level harness engineering for coding agents. Use when a user wants to prevent repeated AI coding-agent mistakes by turning failures into durable instructions, drift checks, regression tests, failure memory, and adoption reports tailored to the target repository.'
---

# Harness Engineering

Harness engineering turns repeated coding-agent mistakes into durable
repository artifacts:

```text
Harness = Instructions + Constraints + Feedback + Memory + Evaluation + Governance
```

Use this skill when the user asks to:

- make a repository more reliable for GitHub Copilot or other coding agents
- add durable agent instructions, repository rules, or guardrails
- prevent repeated AI coding-agent mistakes
- record known failure paths and the checks that prevent recurrence
- add lightweight drift checks for project rules
- review, refresh, or update an existing agent harness

Do not use this skill for ordinary feature implementation unless the user asks
to improve the repository's agent operating environment.

## Core Principles

- Treat the target repository as the source of truth.
- Inspect before editing. Preserve the existing stack, package manager, CI,
  docs, naming, and architecture.
- Add the smallest useful harness. Prefer updating existing files over adding
  duplicate guidance.
- Make important rules enforceable where practical through tests, linters,
  type checks, CI, pre-commit hooks, or drift scripts.
- Use manual review points only when automation would be brittle or misleading.
- Record high-risk failures that should not recur, and name the check or review
  point that catches recurrence.
- Do not copy generic templates blindly. Adapt every artifact to real evidence
  in the target repository.

## Discovery

Before proposing or making harness changes, inspect the repository for existing
rules and evidence.

Read these files and folders when they exist:

- `README.md`
- `AGENTS.md`
- `.github/copilot-instructions.md`
- `.github/instructions/`
- `.github/workflows/`
- `CONTRIBUTING.md`
- package manifests such as `package.json`, `pyproject.toml`, `go.mod`,
  `Cargo.toml`, `pom.xml`, or `build.gradle`
- existing docs under `docs/`
- existing scripts under `scripts/`
- existing tests and CI checks

Then summarize:

- stack, package manager, and entry points
- existing development and verification commands
- current agent instructions or repository conventions
- known failures, incidents, flaky paths, or repeated review comments
- gaps where project rules are not enforced

## Adoption Workflow

Follow this sequence:

1. Choose the harness surface that fits the target repository.
2. Write target-specific agent instructions.
3. Add enforceable checks for high-value rules.
4. Record failure memory for high-risk or recurring failures.
5. Add drift checks for guidance that can silently become stale.
6. Report the adoption with evidence, assumptions, and follow-up.

### 1. Choose the Harness Surface

Pick only the surfaces that fit the target repository:

| Need | Preferred artifact |
| --- | --- |
| Always-on agent behavior | `AGENTS.md` or `.github/copilot-instructions.md` |
| File-scoped guidance | `.github/instructions/*.instructions.md` |
| Recurring project checks | `scripts/check_*.py`, shell scripts, or package scripts |
| CI enforcement | existing workflow files or a small new workflow |
| Known failures | `docs/failures/*.md` |
| Architecture or process decisions | `docs/decisions/*.md` |
| Adoption evidence | `docs/harness/adoption-report.md` or similar |

If the repository already has an equivalent location, update it instead of
creating a parallel system.

### 2. Write Agent Instructions

Agent instructions should be concrete and operational. Include:

- project purpose and major ownership boundaries
- setup, test, lint, build, and verification commands
- package manager and dependency rules
- safe editing rules, generated file rules, and forbidden paths
- testing expectations for changed code
- PR and commit conventions if the repo has them
- how to record new failures or decisions

Avoid broad personality guidance, generic best practices, and rules that cannot
be checked or reviewed.

### 3. Add Enforceable Checks

Convert high-value rules into checks. Good harness checks are:

- narrow enough to avoid false positives
- fast enough to run locally and in CI
- named clearly so agents can run them before finishing
- documented with the rule they protect

Examples:

```text
Rule: Do not edit generated API clients.
Check: script scans diffs for generated paths and fails with a clear message.

Rule: Every failure memory note names a regression check.
Check: script validates docs/failures/*.md for a "Detection" section.

Rule: Profile docs and templates must stay aligned.
Check: test compares profile README files to expected template files.
```

### 4. Record Failure Memory

Record failures when they are user-visible, high-risk, or likely to recur.
Use a new file under `docs/failures/` unless an existing note already covers
the same root cause.

Recommended structure:

```markdown
# Short Failure Title

## Summary

What failed, who saw it, and why it matters.

## Root Cause

The technical or process cause. Avoid blame.

## Prevention

Instruction, test, drift check, CI gate, fixture, or manual review point that
prevents or detects recurrence.

## Evidence

Links to issue, PR, test, log, command output, or file paths.
```

If no automated check is practical, record the manual review point and why
automation would be unsafe or misleading.

### 5. Add Drift Checks

Use drift checks for guidance that can silently become stale. Common examples:

- docs mention commands that no longer exist
- profile snippets and generated examples diverge
- failure notes omit regression checks
- decision records are missing for structural changes
- CI references stale scripts or package commands

Prefer small scripts using the repository's existing language. If the repo has
no scripting convention, Python with only the standard library is a portable
default.

### 6. Report the Adoption

Finish substantial harness work with an adoption report that includes:

- files changed
- rules added or updated
- checks added or reused
- commands run and results
- assumptions and manual follow-up
- failure memory created or intentionally skipped
- how effectiveness will be measured

## Review Workflow

When asked to review a harness change, take an opposing perspective. Look for:

- generic rules copied without evidence from the target repository
- duplicate or conflicting instruction files
- broad checks that are likely to fail on valid changes
- unenforced high-risk rules
- missing failure memory for repeated mistakes or runtime failures
- generated docs not refreshed after source changes
- CI gates that do not run the relevant checks
- target repository conventions being overwritten by harness defaults

Report findings first, ordered by severity, with file and line references when
available. Do not modify files during a review unless the user explicitly asks
for fixes.

## Output Contract

Before finishing harness adoption work, verify:

- the target repository was inspected before edits
- new guidance is specific to the target repository
- changed checks can be run locally or have a documented manual substitute
- failure memory was recorded when required, or the final response explains why
  it was skipped
- generated docs or indexes are refreshed
- the final report names every command run and its result