mirror of https://github.com/github/awesome-copilot.git synced 2026-04-30 12:15:56 +00:00

Files

Shubham Jiyani 8ca38ffb9e feat: add data-breach-blast-radius skill for pre-breach impact analysis (#1487 )

* feat: add data-breach-blast-radius skill for pre-breach impact analysis

* fix: resolve codespell false positives (ZAR currency code, SME abbreviation)

* fix: remove ZAR abbreviation to pass codespell check

2026-04-28 14:26:20 +10:00

10 KiB

Raw Blame History

Blast Radius Calculator

Formulas, scoring matrices, and estimation heuristics for quantifying how many people, records, and systems would be affected by a data breach in the codebase under analysis.

Core Blast Radius Formula

Blast Radius Score (BRS) = Tier_Weight × Exposure_Likelihood × Population_Scale × Completeness_Factor × Context_Multiplier

Score ranges:

0–25: Low — limited exposure, few records
26–50: Medium — meaningful exposure, focused population
51–75: High — significant exposure, broad regulatory consequences
76–100: Critical — catastrophic exposure, immediate action required

Factor 1: Tier Weight (T)

Based on the data classification tier from data-classification.md:

Tier	Label	Weight
T1	Catastrophic	5.0
T2	Critical	4.0
T3	High	3.0
T4	Elevated	2.0
T5	Standard	1.0

Rule: When multiple tiers exist in the same exposure vector, use the highest tier weight.

Aggregation uplift: If 3+ fields from different tiers are exposed together, add +0.5 to the highest tier weight (aggregation attack risk).

Factor 2: Exposure Likelihood (E)

How likely is this vector to be exploited in a realistic breach scenario?

Likelihood Score	Label	Criteria
1.0	Certain	Data is publicly accessible today (no auth required)
0.9	Near Certain	Auth bypass is trivial (e.g., IDOR on sequential IDs, broken JWT validation)
0.8	Very Likely	Auth required but missing for this specific endpoint; or data leaked in logs accessible by most engineers
0.7	Likely	Auth required but over-broad access (all users can see all data); missing field-level access control
0.6	Moderate	Requires privilege escalation or chaining with another bug; internal system with broad developer access
0.5	Possible	Requires significant attacker effort but no defense-in-depth; DB accessible from dev environment
0.3	Unlikely	Multiple security controls in place; but controls are not verified by the codebase review
0.1	Remote	Strong defense-in-depth: encryption, field masking, proper authz, rate limiting, anomaly detection all present

Factor 3: Population Scale (P)

Normalize the estimated number of affected records to a 0–1 scale.

Estimating Record Counts

Step 1: Look for explicit signals in the codebase

# Strong signals (use these if found):
- README mentions user count ("serves 5M users")
- Seeder/fixture files with record counts
- Migration comments ("adding index for 50K users")
- Analytics dashboards or monitoring configs mentioning scale
- Infrastructure configs (DB instance size implies scale):
  - db.t3.micro → < 10K active users
  - db.r5.large → 10K–500K users
  - db.r5.4xlarge / Aurora Serverless → > 500K users

# Medium signals:
- App category (SaaS product → higher, internal tool → lower)
- Multi-tenant vs. single-tenant architecture
- Presence of sharding or partitioning in DB schema

# Weak signals:
- Tech stack alone (no reliable correlation to user count)

Step 2: Apply default estimates when no signals are found

Application Type	Conservative Estimate	Typical Estimate
Internal corporate tool	100–1,000	500
B2B SaaS (small/startup)	1,000–10,000	5,000
B2B SaaS (established)	10,000–100,000	50,000
B2C app (consumer startup)	10,000–100,000	50,000
B2C app (growth stage)	100,000–1,000,000	500,000
B2C app (scale)	1,000,000–100,000,000	10,000,000
Healthcare system	1,000–100,000	20,000
Financial services	5,000–500,000	50,000
Government / public sector	10,000–10,000,000	1,000,000

Always state the assumption used.

Population Scale Score (P)

Records at Risk	Score
< 100	0.1
100–1,000	0.2
1,000–10,000	0.3
10,000–50,000	0.4
50,000–100,000	0.5
100,000–500,000	0.6
500,000–1,000,000	0.7
1M–10M	0.8
10M–100M	0.9
> 100M	1.0

Factor 4: Completeness Factor (C)

How complete/useful is the exposed data for an attacker?

Factor	Score	Description
Full Profile	1.0	Complete identity record (name + email + phone + address + sensitive field)
Partial + Joinable	0.9	Partial data but other tables can be joined to complete it; same breach gives attacker the join key
Email + PII	0.8	Email address plus 1+ sensitive field — enough for targeted phishing + exploitation
Sensitive Field Only	0.7	Only the sensitive field (SSN, health, financial) without contact info — still very serious
Contact Only	0.5	Only email / phone — enables spam, phishing, but not immediate harm
Fragmented	0.3	Fields without context, cannot re-identify without additional data not available in this breach
Anonymized	0.1	Properly anonymized — re-identification requires significant external data linking

Factor 5: Context Multipliers (M)

Apply these multipliers to the final score for specific contexts:

Context	Multiplier	Rationale
Children's data present (COPPA / GDPR Art 8)	× 2.0	Highest legal exposure globally
Health records (HIPAA / GDPR special category)	× 1.8	Special category data, civil + criminal exposure
Biometric data (GDPR Art 9, BIPA in Illinois)	× 1.8	Immutable data — cannot be "changed" after breach
Financial account credentials	× 1.7	Direct financial theft possible
Government IDs (SSN, passport)	× 1.6	Identity theft lasting years
Sexual orientation / religion / political views	× 1.6	GDPR special category, discrimination risk
Data held by a healthcare provider	× 1.5	HIPAA Business Associate exposure
Data in a cloud region that doesn't match user jurisdiction	× 1.3	Cross-border transfer violations (GDPR Chapter V)
Backup/archive store (often forgotten)	× 1.2	Backups frequently missed in breach containment

Blast Radius Score Calculation Examples

Example 1: E-commerce checkout system

Exposure vector: API endpoint /api/users/{id}/payment-methods — no ownership check (IDOR)

Tier: T2 (card last 4 + billing address) = 4.0
Exposure Likelihood: 0.9 (IDOR on sequential IDs, near-certain exploitation)
Population Scale: 100K users = 0.6
Completeness: Partial profile + joinable to user table = 0.9
Context Multiplier: Payment data = 1.7

BRS = 4.0 × 0.9 × 0.6 × 0.9 × 1.7 = 3.30 (raw) → normalized to 66/100 → HIGH

Example 2: Internal HR tool

Exposure vector: Employees table visible to all company users via /api/employees

Tier: T2 (salary + home address + SSN) = 5.0 (SSN is T1)
Exposure Likelihood: 0.7 (auth required, but no RBAC; any employee can see all)
Population Scale: 2,000 employees = 0.3
Completeness: Full profile = 1.0
Context Multiplier: Government IDs (SSN) = 1.6

BRS = 5.0 × 0.7 × 0.3 × 1.0 × 1.6 = 1.68 (raw) → normalized to 34/100 → MEDIUM

However — financial impact overrides score here because SSN exposure is Tier 1. Flag as HIGH regardless of score.

Score Normalization

The raw formula output typically ranges 0–8. Normalize to 0–100:

Normalized_BRS = min(100, (raw_BRS / 8.0) × 100)

Blast Radius Summary Table (per exposure vector)

Use this format when reporting:

| # | Exposure Vector | Tier | Likelihood | Pop. at Risk | BRS | Severity | Jurisdiction |
|---|----------------|------|-----------|-------------|-----|----------|--------------|
| 1 | /api/users endpoint - SSN returned in response | T1 | 0.9 | 50K | 87 | CRITICAL | GDPR, CCPA |
| 2 | Logs contain plaintext emails | T3 | 0.6 | 50K | 45 | MEDIUM | GDPR |
| 3 | Redis cache stores full user objects | T2 | 0.5 | 50K | 38 | MEDIUM | GDPR, CCPA |
| 4 | S3 bucket - public read on user avatars | T4 | 1.0 | 50K | 28 | LOW | - |

Total Organizational Blast Radius

After scoring all exposure vectors, compute:

Maximum Simultaneous Exposure (MSE): The number of unique individuals that could be affected if a single attacker gained broad DB access (worst case). This is the number used in regulatory reporting.

Expected Breach Exposure (EBE): The typical exposure based on the most likely attack vector (the highest-likelihood finding, not the highest-impact one).

Regulatory Trigger Count: The number of distinct regulatory regimes triggered (each one has its own notification obligation and fine formula).

## Organizational Blast Radius Summary

| Metric | Value |
|--------|-------|
| Maximum records at risk | [number] |
| Users with Tier 1 data | [number] |
| Users with Tier 2 data | [number] |
| Users with Tier 3+ data | [number] |
| Regulations triggered | GDPR, CCPA, [others] |
| Worst-case BRS | [score] |
| Most likely attack vector | [description] |
| Time to detect (estimated) | [industry avg: 194 days if no SIEM] |
| Time to contain (estimated) | [industry avg: 73 days] |

Breach Cost Benchmarks (IBM Data — Verify Annual Edition)

Use these when no specific cost data is available. Figures below are from the IBM 2024 edition. IBM publishes a new edition annually at https://www.ibm.com/reports/data-breach — the 2025 report reports a ~9% decrease in global average cost.

Metric	Value (IBM 2024)
Global average cost per breach	$4.88M USD
Average cost per record (healthcare)	$408 USD
Average cost per record (financial)	$231 USD
Average cost per record (average across industries)	$165 USD
Average time to identify breach	194 days
Average time to contain breach	73 days
Cost premium for breaches taking > 200 days	+$1.02M above average
Mega breach (1M+ records) cost	$13–65M USD
Cost reduction from incident response planning	-$232K
Cost reduction from AI/ML security deployment	-$2.22M
Cost reduction from employee training	-$258K

Source: IBM Cost of a Data Breach Report 2024. State these as benchmarks, not guarantees. Update this table when a new edition is released.

10 KiB Raw Blame History Unescape Escape