feat: add SAST/SCA Security Analyzer agent and audit-integrity skill (#1458)

Co-authored-by: Vijay Bandi <vijay.bandi@hp.com>
2026-04-30 12:15:56 +00:00 · 2026-04-27 20:46:05 -05:00
parent ca56e9577d
commit ba16533333
11 changed files with 682 additions and 0 deletions
--- a/skills/audit-integrity/SKILL.md
+++ b/skills/audit-integrity/SKILL.md
@@ -0,0 +1,50 @@
+---
+name: 'audit-integrity'
+description: 'Shared audit integrity framework for all AppSec agents — enforces output quality, intellectual honesty, and continuous improvement through anti-rationalization guards, self-critique loops, retry protocols, non-negotiable behaviors, self-reflection quality gates (1-10 scoring, ≥8 threshold), and a self-learning system with lesson/memory governance for security analysis agents.'
+compatibility: 'Cross-platform. Works with any language or framework analyzed by AppSec agents.'
+metadata:
+  version: '1.0'
+---
+
+# Audit Integrity Skill
+
+Enforces output quality, intellectual honesty, and continuous improvement across all AppSec agents.
+
+## When to Use
+
+- Every security analysis, code review, threat model, or quality scan agent run
+- Applied automatically as a post-analysis quality gate
+- Applicable to any agent performing SAST, SCA, threat modeling, or code quality analysis
+
+## Components
+
+This skill provides 7 reusable capabilities. Agents apply all 7 unless their scope excludes a specific component.
+
+| Component | Reference File | Purpose |
+|-----------|---------------|---------|
+| Clarification Protocol | [clarification-protocol.md](references/clarification-protocol.md) | Ask ≤2 targeted questions before analysis when scope is ambiguous |
+| Anti-Rationalization Guard | [anti-rationalization-guard.md](references/anti-rationalization-guard.md) | Table of prohibited rationalizations with mandatory responses |
+| Self-Critique Loop | [self-critique-loop.md](references/self-critique-loop.md) | Mandatory second-pass review after initial analysis |
+| Retry Protocol | [retry-protocol.md](references/retry-protocol.md) | Tool failure handling — retry once, then document |
+| Non-Negotiable Behaviors | [non-negotiable-behaviors.md](references/non-negotiable-behaviors.md) | Hard rules: never fabricate, always cite evidence, report gaps |
+| Self-Reflection Quality Gate | [self-reflection-quality-gate.md](references/self-reflection-quality-gate.md) | 1–10 scoring rubric with ≥8 threshold per category |
+| Self-Learning System | [self-learning-system.md](references/self-learning-system.md) | Lesson/Memory templates and governance rules |
+
+## Execution Flow
+
+1. **Before analysis**: Apply Clarification Protocol if scope is ambiguous
+2. **During analysis**: Apply Anti-Rationalization Guard at every decision point
+3. **After initial pass**: Execute Self-Critique Loop (mandatory second pass)
+4. **On tool failure**: Apply Retry Protocol
+5. **Before delivery**: Run Self-Reflection Quality Gate (all categories must score ≥8)
+6. **After delivery**: Create Lessons/Memories for novel findings, false positives, or methodology gaps (see Self-Learning System)
+
+## Agent-Specific Adaptation
+
+Each agent customizes the **Self-Critique Loop** checklist and **Self-Reflection Quality Gate** categories to match its domain. The reference files provide the base templates; agents extend them with domain-specific items.
+
+### Example extensions per agent type
+- **SAST/SCA agents**: Add taint trace completeness and manifest coverage checks
+- **SonarQube-style agents**: Add rating sanity check (A–E consistency with findings)
+- **Threat modeling agents**: Add STRIDE category completeness per trust boundary
+- **Code review agents**: Add trust boundary audit with data flow tracing
--- a/skills/audit-integrity/references/anti-rationalization-guard.md
+++ b/skills/audit-integrity/references/anti-rationalization-guard.md
@@ -0,0 +1,38 @@
+# Anti-Rationalization Guard
+
+These rationalizations are **never** valid justifications for skipping, omitting, or downgrading findings:
+
+## Universal Rationalizations (All Agents)
+
+| If you think...                          | Mandatory response                                                                                                          |
+| ---------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
+| "No issues/threats found on first pass"  | Systematic evaluation across all categories is required before concluding clean. Expand scope and complete the full matrix. |
+| "This looks fine, skip deep analysis"    | "Looks fine" is not evidence. Evidence = code trace, architecture reference, or rule match. Run checks.                     |
+| "The risk is probably lower in practice" | Risk level is based on impact × likelihood (CVSS/exploitability). Justify any downgrade with explicit evidence.             |
+| "This is a false positive"               | Flag it as a potential false positive but include it — do not silently suppress. Document the rationale for human review.   |
+| "This is outside scope"                  | State explicitly why, with a reference to the declared scope or assessment boundary.                                        |
+| "No controls/mitigations needed here"    | State "No gap identified — rationale: [X]" explicitly. Silence is not assurance.                                            |
+
+## SAST/SCA-Specific
+
+| If you think...                          | Mandatory response                                                                                                |
+| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
+| "SCA CVE isn't exploitable here"         | Include the CVE with a documented context note — do not silently suppress.                                        |
+| "This phase can be skipped"              | All phases are mandatory. Document any phase that genuinely cannot be completed due to missing inputs.            |
+| "Severity should be lower given context" | Severity is based on CVSS/exploitability. Justify any downgrade with explicit evidence. Document, don't suppress. |
+
+## Code Quality-Specific
+
+| If you think...                            | Mandatory response                                                                                   |
+| ------------------------------------------ | ---------------------------------------------------------------------------------------------------- |
+| "The team will refactor this later"        | Technical debt still counts toward the debt ratio today. Document it accurately.                     |
+| "Quality Gate failure is a false positive" | Include it as a finding, document the suspected false positive rationale, and mark for human review. |
+
+## Threat Modeling-Specific
+
+| If you think...                                | Mandatory response                                                                                               |
+| ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
+| "This threat is mitigated by the architecture" | Document the specific compensating control and verify it is actually implemented — do not assume.                |
+| "This category has no applicable threats here" | State "No applicable threats identified — rationale: [X]" explicitly. Do not silently omit.                      |
+| "Lateral movement is unlikely here"            | Document the specific architectural control that prevents pivoting and verify it is implemented — do not assume. |
+| "This threat actor wouldn't target this"       | Document the basis for that exclusion. Insider threats and supply chain actors must always be considered.        |
--- a/skills/audit-integrity/references/clarification-protocol.md
+++ b/skills/audit-integrity/references/clarification-protocol.md
@@ -0,0 +1,15 @@
+# Clarification Protocol
+
+Before beginning analysis, pause and ask the user at most **2 targeted questions** when:
+
+- The system scope, asset boundary, or target module is ambiguous and cannot be inferred from the provided context
+- A critical trust boundary, privilege tier, or authentication zone is undefined and the analysis would significantly change depending on the interpretation
+- The business context required for impact prioritization or compliance framework selection is entirely absent
+- The language or framework cannot be auto-detected from the workspace
+
+**Rules:**
+
+1. State your working assumptions explicitly, then proceed
+2. Do not wait for confirmation unless the ambiguity would fundamentally alter the attack surface definition, trust boundary map, or which phases are executed
+3. Maximum 2 questions — if more ambiguity exists, infer from available evidence and document assumptions
+4. If no ambiguity exists, proceed directly without questions
--- a/skills/audit-integrity/references/non-negotiable-behaviors.md
+++ b/skills/audit-integrity/references/non-negotiable-behaviors.md
@@ -0,0 +1,17 @@
+# Non-Negotiable Behaviors
+
+These rules apply to **all** AppSec agents with no exceptions:
+
+1. **Never fabricate findings**: Do not report vulnerabilities, threats, bugs, code smells, or risk assessments without direct evidence from the analyzed source code, architecture, manifests, or threat intelligence.
+
+2. **Always cite evidence**: Every finding must reference a specific file path, line number, CVE ID, component, trust boundary, data flow, or rule key. Generic findings without precise traceability are prohibited.
+
+3. **Explain rationale for risk decisions**: When assigning severity, risk levels, quality ratings, policy compliance verdicts, or composite risk scores, state the reasoning based on exploitability, impact, and evidence — do not rely on unexplained judgment.
+
+4. **Do not modify source files**: Do not alter code, configuration, dependency files, or deployment manifests unless explicitly requested by the user.
+
+5. **Report honestly on coverage gaps**: If any analysis phase, STRIDE category, scan type, or methodology step could not be completed (missing files, unsupported language, inaccessible components), state it explicitly rather than silently omitting.
+
+6. **Complete all phases**: Partial runs are not acceptable. If a phase is blocked, document why and continue with remaining phases.
+
+7. **Provide progress summaries**: For multi-phase analysis, summarize findings after completing each major phase before proceeding to the next.
--- a/skills/audit-integrity/references/retry-protocol.md
+++ b/skills/audit-integrity/references/retry-protocol.md
@@ -0,0 +1,8 @@
+# Retry Protocol
+
+On tool failure or empty results:
+
+1. **Retry once** with a refined query or a different search pattern.
+2. **If second attempt fails**, state the failure explicitly and continue with available evidence.
+3. **Never silently skip** a phase because a tool call returned no results — distinguish "tool found nothing" from "tool failed to execute."
+4. **Document the gap**: If a phase is genuinely blocked (missing manifests, unsupported language, inaccessible files), state it explicitly in the output rather than silently omitting the phase.
--- a/skills/audit-integrity/references/self-critique-loop.md
+++ b/skills/audit-integrity/references/self-critique-loop.md
@@ -0,0 +1,46 @@
+# Self-Critique Loop
+
+After completing the initial analysis, perform a **mandatory second pass** before delivering output.
+
+## Universal Checks (All Agents)
+
+1. **Evidence check**: Every finding must cite a concrete reference (file:line, component, architecture element, CVE ID, rule key). Remove any finding without supporting evidence.
+2. **Coverage check**: Verify that all categories, phases, or scan types relevant to the agent's methodology were explicitly evaluated. State "None detected" for each clean category rather than silently omitting.
+3. **Mitigation/remediation check**: Every Critical and High finding must have a specific, implementable fix — not a generic recommendation.
+
+## Domain-Specific Extensions
+
+Each agent adds domain checks to the universal list above:
+
+### STRIDE Threat Modeling
+
+4. **STRIDE completeness**: Did you evaluate all six STRIDE categories (S/T/R/I/D/E) for every trust boundary and data flow?
+5. **Trust boundary audit**: Re-verify that every identified trust boundary has at least one evaluated data flow crossing it.
+
+### STRIDE-LM (Lateral Movement)
+
+4. **STRIDE-LM completeness**: Did you evaluate all seven categories (S/T/R/I/D/E/LM) for every asset and trust boundary?
+5. **Control coverage**: Every Critical/High threat maps to a control function (Inventory/Collect/Detect/Protect/Manage/Respond).
+6. **Lateral movement audit**: Re-trace all identified pivot paths. Verify no uncontrolled path exists from compromised entry point to high-value asset.
+
+### Code Review Threat Modeling
+
+4. **STRIDE completeness**: All six STRIDE categories evaluated for every trust boundary and data flow.
+5. **Trust boundary audit**: Every trust boundary has evaluated data flows crossing it.
+
+### Code Quality (SonarQube-style)
+
+4. **Issue type coverage**: All five issue types (Bug, Vulnerability, Hotspot, Smell, Duplication) explicitly evaluated.
+5. **Rating sanity check**: A–E ratings are consistent with finding counts before finalizing Quality Gate verdict.
+
+### SAST/SCA
+
+4. **Taint trace completeness**: Every entry point identified in discovery was taint-traced through to sinks.
+5. **Manifest coverage**: All dependency manifests identified in discovery were audited.
+
+### Multi-tool Pipeline
+
+4. **Phase coverage**: All deliverable files generated and saved.
+5. **Cross-correlation**: SAST findings corroborated by SCA findings → elevate corroborated items.
+6. **Deduplication**: Same finding doesn't appear under multiple tool outputs.
+7. **Roadmap completeness**: Every Critical/High finding appears in the immediate remediation tier.
--- a/skills/audit-integrity/references/self-learning-system.md
+++ b/skills/audit-integrity/references/self-learning-system.md
@@ -0,0 +1,92 @@
+# Self-Learning System
+
+Maintain project learning artifacts under a designated lessons/memories directory (e.g., `.github/SecurityLessons` and `.github/SecurityMemories`).
+
+## When to Create
+
+### Lesson
+
+Create a lesson when:
+
+- A scan produces a false positive that required manual correction
+- A finding category, STRIDE category, or flaw type is missed on first pass and caught by the self-critique loop
+- A tool or methodology limitation is discovered
+- A language-specific rule misfires
+- An SCA dependency cannot be resolved
+
+### Memory
+
+Create a memory when:
+
+- An architecture decision, security convention, or technology stack detail is discovered
+- A dependency management pattern, domain-specific threat pattern, or threat actor profile is identified
+- A project coding convention, framework idiom, or known false-positive pattern is found
+- Any codebase-specific knowledge would be useful for future scans of the same codebase
+
+## Lesson Template
+
+```markdown
+# Security Lesson: <short-title>
+
+## Metadata
+
+- CreatedAt: <date>
+- Status: active | deprecated
+- Supersedes: <previous lesson if any>
+
+## Context
+
+- Triggering scan/task:
+- Component analyzed:
+
+## Issue
+
+- What went wrong or was missed:
+- Expected behavior:
+- Actual behavior:
+
+## Root Cause
+
+- Why was this missed or incorrect:
+
+## Resolution
+
+- How it was corrected:
+
+## Preventive Guidance
+
+- How to avoid this in future scans:
+```
+
+## Memory Template
+
+```markdown
+# Security Memory: <short-title>
+
+## Metadata
+
+- CreatedAt: <date>
+- Status: active | deprecated
+- Supersedes: <previous memory if any>
+
+## Context
+
+- Triggering scan/task:
+- Scope/system:
+
+## Key Fact
+
+- What was discovered:
+- Why it matters for security analysis:
+
+## Reuse Guidance
+
+- When to apply this knowledge:
+- Related components:
+```
+
+## Governance Rules
+
+1. **Dedup check**: Before creating a new lesson or memory, search existing files for similar content. Update existing records rather than creating duplicates.
+2. **Conflict resolution**: If new evidence conflicts with an existing active lesson/memory, mark the older one as `deprecated` and create the updated version with a `Supersedes` reference.
+3. **Reuse at scan start**: At the start of every analysis, check the lessons/memories directory for applicable context. Apply relevant guidance before beginning analysis.
--- a/skills/audit-integrity/references/self-reflection-quality-gate.md
+++ b/skills/audit-integrity/references/self-reflection-quality-gate.md
@@ -0,0 +1,46 @@
+# Self-Reflection Quality Gate
+
+After completing analysis, internally score the output across domain-relevant categories (1–10 scale).
+
+## Scoring Rules
+
+- **Pass**: All categories ≥ 8
+- **Fail**: Any score < 8 → revisit the failing dimension before delivering output. Max 2 rework iterations.
+- **If unresolvable after 2 iterations**: Deliver output with an explicit confidence note stating which dimension fell short and why.
+
+## Base Categories (All Agents)
+
+| Category          | Question                                                                                | Threshold |
+| ----------------- | --------------------------------------------------------------------------------------- | :-------: |
+| **Completeness**  | Were all required phases/categories evaluated with evidence?                            |    ≥ 8    |
+| **Accuracy**      | Are findings backed by concrete references (code, architecture, CVEs), not speculation? |    ≥ 8    |
+| **Actionability** | Does every Critical/High finding have a specific, implementable fix or mitigation?      |    ≥ 8    |
+| **Consistency**   | Are severity ratings, mappings, and verdicts internally consistent?                     |    ≥ 8    |
+| **Coverage**      | Were all entry points, trust boundaries, modules, or manifests identified and analyzed? |    ≥ 8    |
+
+## Domain-Specific Extensions
+
+### Multi-tool Pipeline — add:
+
+| **Deduplication** | Are cross-tool duplicates properly merged with corroboration notes? | ≥ 8 |
+
+### Code Quality (SonarQube-style) — adapt Completeness to:
+
+| **Completeness** | Were all issue types (Bugs, Vulnerabilities, Hotspots, Smells, Duplication) evaluated? | ≥ 8 |
+
+### SAST/SCA — adapt Coverage to:
+
+| **Coverage** | Were all entry points taint-traced and all dependency manifests audited? | ≥ 8 |
+
+### STRIDE Threat Modeling — adapt Completeness to:
+
+| **Completeness** | Were all six STRIDE categories evaluated for every trust boundary and data flow? | ≥ 8 |
+
+### STRIDE-LM — adapt Completeness and Coverage to:
+
+| **Completeness** | Were all seven STRIDE-LM categories evaluated for every asset and trust boundary? | ≥ 8 |
+| **Coverage** | Were all lateral movement paths, trust boundaries, and post-exploitation chains assessed? | ≥ 8 |
+
+### Code Review — adapt Coverage to:
+
+| **Coverage** | Were all entry points, trust boundaries, and data flows traced from source to sink? | ≥ 8 |