mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-30 20:25:55 +00:00
47 lines
2.4 KiB
Markdown
47 lines
2.4 KiB
Markdown
# Self-Reflection Quality Gate
|
||
|
||
After completing analysis, internally score the output across domain-relevant categories (1–10 scale).
|
||
|
||
## Scoring Rules
|
||
|
||
- **Pass**: All categories ≥ 8
|
||
- **Fail**: Any score < 8 → revisit the failing dimension before delivering output. Max 2 rework iterations.
|
||
- **If unresolvable after 2 iterations**: Deliver output with an explicit confidence note stating which dimension fell short and why.
|
||
|
||
## Base Categories (All Agents)
|
||
|
||
| Category | Question | Threshold |
|
||
| ----------------- | --------------------------------------------------------------------------------------- | :-------: |
|
||
| **Completeness** | Were all required phases/categories evaluated with evidence? | ≥ 8 |
|
||
| **Accuracy** | Are findings backed by concrete references (code, architecture, CVEs), not speculation? | ≥ 8 |
|
||
| **Actionability** | Does every Critical/High finding have a specific, implementable fix or mitigation? | ≥ 8 |
|
||
| **Consistency** | Are severity ratings, mappings, and verdicts internally consistent? | ≥ 8 |
|
||
| **Coverage** | Were all entry points, trust boundaries, modules, or manifests identified and analyzed? | ≥ 8 |
|
||
|
||
## Domain-Specific Extensions
|
||
|
||
### Multi-tool Pipeline — add:
|
||
|
||
| **Deduplication** | Are cross-tool duplicates properly merged with corroboration notes? | ≥ 8 |
|
||
|
||
### Code Quality (SonarQube-style) — adapt Completeness to:
|
||
|
||
| **Completeness** | Were all issue types (Bugs, Vulnerabilities, Hotspots, Smells, Duplication) evaluated? | ≥ 8 |
|
||
|
||
### SAST/SCA — adapt Coverage to:
|
||
|
||
| **Coverage** | Were all entry points taint-traced and all dependency manifests audited? | ≥ 8 |
|
||
|
||
### STRIDE Threat Modeling — adapt Completeness to:
|
||
|
||
| **Completeness** | Were all six STRIDE categories evaluated for every trust boundary and data flow? | ≥ 8 |
|
||
|
||
### STRIDE-LM — adapt Completeness and Coverage to:
|
||
|
||
| **Completeness** | Were all seven STRIDE-LM categories evaluated for every asset and trust boundary? | ≥ 8 |
|
||
| **Coverage** | Were all lateral movement paths, trust boundaries, and post-exploitation chains assessed? | ≥ 8 |
|
||
|
||
### Code Review — adapt Coverage to:
|
||
|
||
| **Coverage** | Were all entry points, trust boundaries, and data flows traced from source to sink? | ≥ 8 |
|