Files
awesome-copilot/skills/harness-engineering/SKILL.md
T
2026-06-09 11:55:32 +09:00

220 lines
7.4 KiB
Markdown

---
name: harness-engineering
description: 'Adopt repository-level harness engineering for coding agents. Use when a user wants to prevent repeated AI coding-agent mistakes by turning failures into durable instructions, drift checks, regression tests, failure memory, and adoption reports tailored to the target repository.'
---
# Harness Engineering
Harness engineering turns repeated coding-agent mistakes into durable
repository artifacts:
```text
Harness = Instructions + Constraints + Feedback + Memory + Evaluation + Governance
```
Use this skill when the user asks to:
- make a repository more reliable for GitHub Copilot or other coding agents
- add durable agent instructions, repository rules, or guardrails
- prevent repeated AI coding-agent mistakes
- record known failure paths and the checks that prevent recurrence
- add lightweight drift checks for project rules
- review, refresh, or update an existing agent harness
Do not use this skill for ordinary feature implementation unless the user asks
to improve the repository's agent operating environment.
## Core Principles
- Treat the target repository as the source of truth.
- Inspect before editing. Preserve the existing stack, package manager, CI,
docs, naming, and architecture.
- Add the smallest useful harness. Prefer updating existing files over adding
duplicate guidance.
- Make important rules enforceable where practical through tests, linters,
type checks, CI, pre-commit hooks, or drift scripts.
- Use manual review points only when automation would be brittle or misleading.
- Record high-risk failures that should not recur, and name the check or review
point that catches recurrence.
- Do not copy generic templates blindly. Adapt every artifact to real evidence
in the target repository.
## Discovery
Before proposing or making harness changes, inspect the repository for existing
rules and evidence.
Read these files and folders when they exist:
- `README.md`
- `AGENTS.md`
- `.github/copilot-instructions.md`
- `.github/instructions/`
- `.github/workflows/`
- `CONTRIBUTING.md`
- package manifests such as `package.json`, `pyproject.toml`, `go.mod`,
`Cargo.toml`, `pom.xml`, or `build.gradle`
- existing docs under `docs/`
- existing scripts under `scripts/`
- existing tests and CI checks
Then summarize:
- stack, package manager, and entry points
- existing development and verification commands
- current agent instructions or repository conventions
- known failures, incidents, flaky paths, or repeated review comments
- gaps where project rules are not enforced
## Adoption Workflow
Follow this sequence:
1. Choose the harness surface that fits the target repository.
2. Write target-specific agent instructions.
3. Add enforceable checks for high-value rules.
4. Record failure memory for high-risk or recurring failures.
5. Add drift checks for guidance that can silently become stale.
6. Report the adoption with evidence, assumptions, and follow-up.
### 1. Choose the Harness Surface
Pick only the surfaces that fit the target repository:
| Need | Preferred artifact |
| --- | --- |
| Always-on agent behavior | `AGENTS.md` or `.github/copilot-instructions.md` |
| File-scoped guidance | `.github/instructions/*.instructions.md` |
| Recurring project checks | `scripts/check_*.py`, shell scripts, or package scripts |
| CI enforcement | existing workflow files or a small new workflow |
| Known failures | `docs/failures/*.md` |
| Architecture or process decisions | `docs/decisions/*.md` |
| Adoption evidence | `docs/harness/adoption-report.md` or similar |
If the repository already has an equivalent location, update it instead of
creating a parallel system.
### 2. Write Agent Instructions
Agent instructions should be concrete and operational. Include:
- project purpose and major ownership boundaries
- setup, test, lint, build, and verification commands
- package manager and dependency rules
- safe editing rules, generated file rules, and forbidden paths
- testing expectations for changed code
- PR and commit conventions if the repo has them
- how to record new failures or decisions
Avoid broad personality guidance, generic best practices, and rules that cannot
be checked or reviewed.
### 3. Add Enforceable Checks
Convert high-value rules into checks. Good harness checks are:
- narrow enough to avoid false positives
- fast enough to run locally and in CI
- named clearly so agents can run them before finishing
- documented with the rule they protect
Examples:
```text
Rule: Do not edit generated API clients.
Check: script scans diffs for generated paths and fails with a clear message.
Rule: Every failure memory note names a regression check.
Check: script validates docs/failures/*.md for a "Detection" section.
Rule: Profile docs and templates must stay aligned.
Check: test compares profile README files to expected template files.
```
### 4. Record Failure Memory
Record failures when they are user-visible, high-risk, or likely to recur.
Use a new file under `docs/failures/` unless an existing note already covers
the same root cause.
Recommended structure:
```markdown
# Short Failure Title
## Summary
What failed, who saw it, and why it matters.
## Root Cause
The technical or process cause. Avoid blame.
## Prevention
Instruction, test, drift check, CI gate, fixture, or manual review point that
prevents or detects recurrence.
## Evidence
Links to issue, PR, test, log, command output, or file paths.
```
If no automated check is practical, record the manual review point and why
automation would be unsafe or misleading.
### 5. Add Drift Checks
Use drift checks for guidance that can silently become stale. Common examples:
- docs mention commands that no longer exist
- profile snippets and generated examples diverge
- failure notes omit regression checks
- decision records are missing for structural changes
- CI references stale scripts or package commands
Prefer small scripts using the repository's existing language. If the repo has
no scripting convention, Python with only the standard library is a portable
default.
### 6. Report the Adoption
Finish substantial harness work with an adoption report that includes:
- files changed
- rules added or updated
- checks added or reused
- commands run and results
- assumptions and manual follow-up
- failure memory created or intentionally skipped
- how effectiveness will be measured
## Review Workflow
When asked to review a harness change, take an opposing perspective. Look for:
- generic rules copied without evidence from the target repository
- duplicate or conflicting instruction files
- broad checks that are likely to fail on valid changes
- unenforced high-risk rules
- missing failure memory for repeated mistakes or runtime failures
- generated docs not refreshed after source changes
- CI gates that do not run the relevant checks
- target repository conventions being overwritten by harness defaults
Report findings first, ordered by severity, with file and line references when
available. Do not modify files during a review unless the user explicitly asks
for fixes.
## Output Contract
Before finishing harness adoption work, verify:
- the target repository was inspected before edits
- new guidance is specific to the target repository
- changed checks can be run locally or have a documented manual substitute
- failure memory was recorded when required, or the final response explains why
it was skipped
- generated docs or indexes are refreshed
- the final report names every command run and its result