feat: add SAST/SCA Security Analyzer agent and audit-integrity skill (#1458)

Co-authored-by: Vijay Bandi <vijay.bandi@hp.com>
This commit is contained in:
Vijay Bandi
2026-04-27 20:46:05 -05:00
committed by GitHub
parent ca56e9577d
commit ba16533333
11 changed files with 682 additions and 0 deletions

View File

@@ -0,0 +1,50 @@
---
name: 'audit-integrity'
description: 'Shared audit integrity framework for all AppSec agents — enforces output quality, intellectual honesty, and continuous improvement through anti-rationalization guards, self-critique loops, retry protocols, non-negotiable behaviors, self-reflection quality gates (1-10 scoring, ≥8 threshold), and a self-learning system with lesson/memory governance for security analysis agents.'
compatibility: 'Cross-platform. Works with any language or framework analyzed by AppSec agents.'
metadata:
version: '1.0'
---
# Audit Integrity Skill
Enforces output quality, intellectual honesty, and continuous improvement across all AppSec agents.
## When to Use
- Every security analysis, code review, threat model, or quality scan agent run
- Applied automatically as a post-analysis quality gate
- Applicable to any agent performing SAST, SCA, threat modeling, or code quality analysis
## Components
This skill provides 7 reusable capabilities. Agents apply all 7 unless their scope excludes a specific component.
| Component | Reference File | Purpose |
|-----------|---------------|---------|
| Clarification Protocol | [clarification-protocol.md](references/clarification-protocol.md) | Ask ≤2 targeted questions before analysis when scope is ambiguous |
| Anti-Rationalization Guard | [anti-rationalization-guard.md](references/anti-rationalization-guard.md) | Table of prohibited rationalizations with mandatory responses |
| Self-Critique Loop | [self-critique-loop.md](references/self-critique-loop.md) | Mandatory second-pass review after initial analysis |
| Retry Protocol | [retry-protocol.md](references/retry-protocol.md) | Tool failure handling — retry once, then document |
| Non-Negotiable Behaviors | [non-negotiable-behaviors.md](references/non-negotiable-behaviors.md) | Hard rules: never fabricate, always cite evidence, report gaps |
| Self-Reflection Quality Gate | [self-reflection-quality-gate.md](references/self-reflection-quality-gate.md) | 110 scoring rubric with ≥8 threshold per category |
| Self-Learning System | [self-learning-system.md](references/self-learning-system.md) | Lesson/Memory templates and governance rules |
## Execution Flow
1. **Before analysis**: Apply Clarification Protocol if scope is ambiguous
2. **During analysis**: Apply Anti-Rationalization Guard at every decision point
3. **After initial pass**: Execute Self-Critique Loop (mandatory second pass)
4. **On tool failure**: Apply Retry Protocol
5. **Before delivery**: Run Self-Reflection Quality Gate (all categories must score ≥8)
6. **After delivery**: Create Lessons/Memories for novel findings, false positives, or methodology gaps (see Self-Learning System)
## Agent-Specific Adaptation
Each agent customizes the **Self-Critique Loop** checklist and **Self-Reflection Quality Gate** categories to match its domain. The reference files provide the base templates; agents extend them with domain-specific items.
### Example extensions per agent type
- **SAST/SCA agents**: Add taint trace completeness and manifest coverage checks
- **SonarQube-style agents**: Add rating sanity check (AE consistency with findings)
- **Threat modeling agents**: Add STRIDE category completeness per trust boundary
- **Code review agents**: Add trust boundary audit with data flow tracing

View File

@@ -0,0 +1,38 @@
# Anti-Rationalization Guard
These rationalizations are **never** valid justifications for skipping, omitting, or downgrading findings:
## Universal Rationalizations (All Agents)
| If you think... | Mandatory response |
| ---------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| "No issues/threats found on first pass" | Systematic evaluation across all categories is required before concluding clean. Expand scope and complete the full matrix. |
| "This looks fine, skip deep analysis" | "Looks fine" is not evidence. Evidence = code trace, architecture reference, or rule match. Run checks. |
| "The risk is probably lower in practice" | Risk level is based on impact × likelihood (CVSS/exploitability). Justify any downgrade with explicit evidence. |
| "This is a false positive" | Flag it as a potential false positive but include it — do not silently suppress. Document the rationale for human review. |
| "This is outside scope" | State explicitly why, with a reference to the declared scope or assessment boundary. |
| "No controls/mitigations needed here" | State "No gap identified — rationale: [X]" explicitly. Silence is not assurance. |
## SAST/SCA-Specific
| If you think... | Mandatory response |
| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| "SCA CVE isn't exploitable here" | Include the CVE with a documented context note — do not silently suppress. |
| "This phase can be skipped" | All phases are mandatory. Document any phase that genuinely cannot be completed due to missing inputs. |
| "Severity should be lower given context" | Severity is based on CVSS/exploitability. Justify any downgrade with explicit evidence. Document, don't suppress. |
## Code Quality-Specific
| If you think... | Mandatory response |
| ------------------------------------------ | ---------------------------------------------------------------------------------------------------- |
| "The team will refactor this later" | Technical debt still counts toward the debt ratio today. Document it accurately. |
| "Quality Gate failure is a false positive" | Include it as a finding, document the suspected false positive rationale, and mark for human review. |
## Threat Modeling-Specific
| If you think... | Mandatory response |
| ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| "This threat is mitigated by the architecture" | Document the specific compensating control and verify it is actually implemented — do not assume. |
| "This category has no applicable threats here" | State "No applicable threats identified — rationale: [X]" explicitly. Do not silently omit. |
| "Lateral movement is unlikely here" | Document the specific architectural control that prevents pivoting and verify it is implemented — do not assume. |
| "This threat actor wouldn't target this" | Document the basis for that exclusion. Insider threats and supply chain actors must always be considered. |

View File

@@ -0,0 +1,15 @@
# Clarification Protocol
Before beginning analysis, pause and ask the user at most **2 targeted questions** when:
- The system scope, asset boundary, or target module is ambiguous and cannot be inferred from the provided context
- A critical trust boundary, privilege tier, or authentication zone is undefined and the analysis would significantly change depending on the interpretation
- The business context required for impact prioritization or compliance framework selection is entirely absent
- The language or framework cannot be auto-detected from the workspace
**Rules:**
1. State your working assumptions explicitly, then proceed
2. Do not wait for confirmation unless the ambiguity would fundamentally alter the attack surface definition, trust boundary map, or which phases are executed
3. Maximum 2 questions — if more ambiguity exists, infer from available evidence and document assumptions
4. If no ambiguity exists, proceed directly without questions

View File

@@ -0,0 +1,17 @@
# Non-Negotiable Behaviors
These rules apply to **all** AppSec agents with no exceptions:
1. **Never fabricate findings**: Do not report vulnerabilities, threats, bugs, code smells, or risk assessments without direct evidence from the analyzed source code, architecture, manifests, or threat intelligence.
2. **Always cite evidence**: Every finding must reference a specific file path, line number, CVE ID, component, trust boundary, data flow, or rule key. Generic findings without precise traceability are prohibited.
3. **Explain rationale for risk decisions**: When assigning severity, risk levels, quality ratings, policy compliance verdicts, or composite risk scores, state the reasoning based on exploitability, impact, and evidence — do not rely on unexplained judgment.
4. **Do not modify source files**: Do not alter code, configuration, dependency files, or deployment manifests unless explicitly requested by the user.
5. **Report honestly on coverage gaps**: If any analysis phase, STRIDE category, scan type, or methodology step could not be completed (missing files, unsupported language, inaccessible components), state it explicitly rather than silently omitting.
6. **Complete all phases**: Partial runs are not acceptable. If a phase is blocked, document why and continue with remaining phases.
7. **Provide progress summaries**: For multi-phase analysis, summarize findings after completing each major phase before proceeding to the next.

View File

@@ -0,0 +1,8 @@
# Retry Protocol
On tool failure or empty results:
1. **Retry once** with a refined query or a different search pattern.
2. **If second attempt fails**, state the failure explicitly and continue with available evidence.
3. **Never silently skip** a phase because a tool call returned no results — distinguish "tool found nothing" from "tool failed to execute."
4. **Document the gap**: If a phase is genuinely blocked (missing manifests, unsupported language, inaccessible files), state it explicitly in the output rather than silently omitting the phase.

View File

@@ -0,0 +1,46 @@
# Self-Critique Loop
After completing the initial analysis, perform a **mandatory second pass** before delivering output.
## Universal Checks (All Agents)
1. **Evidence check**: Every finding must cite a concrete reference (file:line, component, architecture element, CVE ID, rule key). Remove any finding without supporting evidence.
2. **Coverage check**: Verify that all categories, phases, or scan types relevant to the agent's methodology were explicitly evaluated. State "None detected" for each clean category rather than silently omitting.
3. **Mitigation/remediation check**: Every Critical and High finding must have a specific, implementable fix — not a generic recommendation.
## Domain-Specific Extensions
Each agent adds domain checks to the universal list above:
### STRIDE Threat Modeling
4. **STRIDE completeness**: Did you evaluate all six STRIDE categories (S/T/R/I/D/E) for every trust boundary and data flow?
5. **Trust boundary audit**: Re-verify that every identified trust boundary has at least one evaluated data flow crossing it.
### STRIDE-LM (Lateral Movement)
4. **STRIDE-LM completeness**: Did you evaluate all seven categories (S/T/R/I/D/E/LM) for every asset and trust boundary?
5. **Control coverage**: Every Critical/High threat maps to a control function (Inventory/Collect/Detect/Protect/Manage/Respond).
6. **Lateral movement audit**: Re-trace all identified pivot paths. Verify no uncontrolled path exists from compromised entry point to high-value asset.
### Code Review Threat Modeling
4. **STRIDE completeness**: All six STRIDE categories evaluated for every trust boundary and data flow.
5. **Trust boundary audit**: Every trust boundary has evaluated data flows crossing it.
### Code Quality (SonarQube-style)
4. **Issue type coverage**: All five issue types (Bug, Vulnerability, Hotspot, Smell, Duplication) explicitly evaluated.
5. **Rating sanity check**: AE ratings are consistent with finding counts before finalizing Quality Gate verdict.
### SAST/SCA
4. **Taint trace completeness**: Every entry point identified in discovery was taint-traced through to sinks.
5. **Manifest coverage**: All dependency manifests identified in discovery were audited.
### Multi-tool Pipeline
4. **Phase coverage**: All deliverable files generated and saved.
5. **Cross-correlation**: SAST findings corroborated by SCA findings → elevate corroborated items.
6. **Deduplication**: Same finding doesn't appear under multiple tool outputs.
7. **Roadmap completeness**: Every Critical/High finding appears in the immediate remediation tier.

View File

@@ -0,0 +1,92 @@
# Self-Learning System
Maintain project learning artifacts under a designated lessons/memories directory (e.g., `.github/SecurityLessons` and `.github/SecurityMemories`).
## When to Create
### Lesson
Create a lesson when:
- A scan produces a false positive that required manual correction
- A finding category, STRIDE category, or flaw type is missed on first pass and caught by the self-critique loop
- A tool or methodology limitation is discovered
- A language-specific rule misfires
- An SCA dependency cannot be resolved
### Memory
Create a memory when:
- An architecture decision, security convention, or technology stack detail is discovered
- A dependency management pattern, domain-specific threat pattern, or threat actor profile is identified
- A project coding convention, framework idiom, or known false-positive pattern is found
- Any codebase-specific knowledge would be useful for future scans of the same codebase
## Lesson Template
```markdown
# Security Lesson: <short-title>
## Metadata
- CreatedAt: <date>
- Status: active | deprecated
- Supersedes: <previous lesson if any>
## Context
- Triggering scan/task:
- Component analyzed:
## Issue
- What went wrong or was missed:
- Expected behavior:
- Actual behavior:
## Root Cause
- Why was this missed or incorrect:
## Resolution
- How it was corrected:
## Preventive Guidance
- How to avoid this in future scans:
```
## Memory Template
```markdown
# Security Memory: <short-title>
## Metadata
- CreatedAt: <date>
- Status: active | deprecated
- Supersedes: <previous memory if any>
## Context
- Triggering scan/task:
- Scope/system:
## Key Fact
- What was discovered:
- Why it matters for security analysis:
## Reuse Guidance
- When to apply this knowledge:
- Related components:
```
## Governance Rules
1. **Dedup check**: Before creating a new lesson or memory, search existing files for similar content. Update existing records rather than creating duplicates.
2. **Conflict resolution**: If new evidence conflicts with an existing active lesson/memory, mark the older one as `deprecated` and create the updated version with a `Supersedes` reference.
3. **Reuse at scan start**: At the start of every analysis, check the lessons/memories directory for applicable context. Apply relevant guidance before beginning analysis.

View File

@@ -0,0 +1,46 @@
# Self-Reflection Quality Gate
After completing analysis, internally score the output across domain-relevant categories (110 scale).
## Scoring Rules
- **Pass**: All categories ≥ 8
- **Fail**: Any score < 8 → revisit the failing dimension before delivering output. Max 2 rework iterations.
- **If unresolvable after 2 iterations**: Deliver output with an explicit confidence note stating which dimension fell short and why.
## Base Categories (All Agents)
| Category | Question | Threshold |
| ----------------- | --------------------------------------------------------------------------------------- | :-------: |
| **Completeness** | Were all required phases/categories evaluated with evidence? | ≥ 8 |
| **Accuracy** | Are findings backed by concrete references (code, architecture, CVEs), not speculation? | ≥ 8 |
| **Actionability** | Does every Critical/High finding have a specific, implementable fix or mitigation? | ≥ 8 |
| **Consistency** | Are severity ratings, mappings, and verdicts internally consistent? | ≥ 8 |
| **Coverage** | Were all entry points, trust boundaries, modules, or manifests identified and analyzed? | ≥ 8 |
## Domain-Specific Extensions
### Multi-tool Pipeline — add:
| **Deduplication** | Are cross-tool duplicates properly merged with corroboration notes? | ≥ 8 |
### Code Quality (SonarQube-style) — adapt Completeness to:
| **Completeness** | Were all issue types (Bugs, Vulnerabilities, Hotspots, Smells, Duplication) evaluated? | ≥ 8 |
### SAST/SCA — adapt Coverage to:
| **Coverage** | Were all entry points taint-traced and all dependency manifests audited? | ≥ 8 |
### STRIDE Threat Modeling — adapt Completeness to:
| **Completeness** | Were all six STRIDE categories evaluated for every trust boundary and data flow? | ≥ 8 |
### STRIDE-LM — adapt Completeness and Coverage to:
| **Completeness** | Were all seven STRIDE-LM categories evaluated for every asset and trust boundary? | ≥ 8 |
| **Coverage** | Were all lateral movement paths, trust boundaries, and post-exploitation chains assessed? | ≥ 8 |
### Code Review — adapt Coverage to:
| **Coverage** | Were all entry points, trust boundaries, and data flows traced from source to sink? | ≥ 8 |