Files
awesome-copilot/skills/data-breach-blast-radius/references/blast-radius-calculator.md
Shubham Jiyani 8ca38ffb9e feat: add data-breach-blast-radius skill for pre-breach impact analysis (#1487)
* feat: add data-breach-blast-radius skill for pre-breach impact analysis

* fix: resolve codespell false positives (ZAR currency code, SME abbreviation)

* fix: remove ZAR abbreviation to pass codespell check
2026-04-28 14:26:20 +10:00

254 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Blast Radius Calculator
Formulas, scoring matrices, and estimation heuristics for quantifying how many people, records, and systems would be affected by a data breach in the codebase under analysis.
---
## Core Blast Radius Formula
```
Blast Radius Score (BRS) = Tier_Weight × Exposure_Likelihood × Population_Scale × Completeness_Factor × Context_Multiplier
```
**Score ranges:**
- 025: **Low** — limited exposure, few records
- 2650: **Medium** — meaningful exposure, focused population
- 5175: **High** — significant exposure, broad regulatory consequences
- 76100: **Critical** — catastrophic exposure, immediate action required
---
## Factor 1: Tier Weight (T)
Based on the data classification tier from `data-classification.md`:
| Tier | Label | Weight |
|------|-------|--------|
| T1 | Catastrophic | 5.0 |
| T2 | Critical | 4.0 |
| T3 | High | 3.0 |
| T4 | Elevated | 2.0 |
| T5 | Standard | 1.0 |
**Rule:** When multiple tiers exist in the same exposure vector, use the **highest** tier weight.
**Aggregation uplift:** If 3+ fields from different tiers are exposed together, add +0.5 to the highest tier weight (aggregation attack risk).
---
## Factor 2: Exposure Likelihood (E)
How likely is this vector to be exploited in a realistic breach scenario?
| Likelihood Score | Label | Criteria |
|-----------------|-------|---------|
| 1.0 | **Certain** | Data is publicly accessible today (no auth required) |
| 0.9 | **Near Certain** | Auth bypass is trivial (e.g., IDOR on sequential IDs, broken JWT validation) |
| 0.8 | **Very Likely** | Auth required but missing for this specific endpoint; or data leaked in logs accessible by most engineers |
| 0.7 | **Likely** | Auth required but over-broad access (all users can see all data); missing field-level access control |
| 0.6 | **Moderate** | Requires privilege escalation or chaining with another bug; internal system with broad developer access |
| 0.5 | **Possible** | Requires significant attacker effort but no defense-in-depth; DB accessible from dev environment |
| 0.3 | **Unlikely** | Multiple security controls in place; but controls are not verified by the codebase review |
| 0.1 | **Remote** | Strong defense-in-depth: encryption, field masking, proper authz, rate limiting, anomaly detection all present |
---
## Factor 3: Population Scale (P)
Normalize the estimated number of affected records to a 01 scale.
### Estimating Record Counts
**Step 1: Look for explicit signals in the codebase**
```
# Strong signals (use these if found):
- README mentions user count ("serves 5M users")
- Seeder/fixture files with record counts
- Migration comments ("adding index for 50K users")
- Analytics dashboards or monitoring configs mentioning scale
- Infrastructure configs (DB instance size implies scale):
- db.t3.micro → < 10K active users
- db.r5.large → 10K500K users
- db.r5.4xlarge / Aurora Serverless → > 500K users
# Medium signals:
- App category (SaaS product → higher, internal tool → lower)
- Multi-tenant vs. single-tenant architecture
- Presence of sharding or partitioning in DB schema
# Weak signals:
- Tech stack alone (no reliable correlation to user count)
```
**Step 2: Apply default estimates when no signals are found**
| Application Type | Conservative Estimate | Typical Estimate |
|-----------------|----------------------|-----------------|
| Internal corporate tool | 1001,000 | 500 |
| B2B SaaS (small/startup) | 1,00010,000 | 5,000 |
| B2B SaaS (established) | 10,000100,000 | 50,000 |
| B2C app (consumer startup) | 10,000100,000 | 50,000 |
| B2C app (growth stage) | 100,0001,000,000 | 500,000 |
| B2C app (scale) | 1,000,000100,000,000 | 10,000,000 |
| Healthcare system | 1,000100,000 | 20,000 |
| Financial services | 5,000500,000 | 50,000 |
| Government / public sector | 10,00010,000,000 | 1,000,000 |
**Always state the assumption used.**
### Population Scale Score (P)
| Records at Risk | Score |
|----------------|-------|
| < 100 | 0.1 |
| 1001,000 | 0.2 |
| 1,00010,000 | 0.3 |
| 10,00050,000 | 0.4 |
| 50,000100,000 | 0.5 |
| 100,000500,000 | 0.6 |
| 500,0001,000,000 | 0.7 |
| 1M10M | 0.8 |
| 10M100M | 0.9 |
| > 100M | 1.0 |
---
## Factor 4: Completeness Factor (C)
How complete/useful is the exposed data for an attacker?
| Factor | Score | Description |
|--------|-------|-------------|
| **Full Profile** | 1.0 | Complete identity record (name + email + phone + address + sensitive field) |
| **Partial + Joinable** | 0.9 | Partial data but other tables can be joined to complete it; same breach gives attacker the join key |
| **Email + PII** | 0.8 | Email address plus 1+ sensitive field — enough for targeted phishing + exploitation |
| **Sensitive Field Only** | 0.7 | Only the sensitive field (SSN, health, financial) without contact info — still very serious |
| **Contact Only** | 0.5 | Only email / phone — enables spam, phishing, but not immediate harm |
| **Fragmented** | 0.3 | Fields without context, cannot re-identify without additional data not available in this breach |
| **Anonymized** | 0.1 | Properly anonymized — re-identification requires significant external data linking |
---
## Factor 5: Context Multipliers (M)
Apply these multipliers to the final score for specific contexts:
| Context | Multiplier | Rationale |
|---------|-----------|-----------|
| Children's data present (COPPA / GDPR Art 8) | × 2.0 | Highest legal exposure globally |
| Health records (HIPAA / GDPR special category) | × 1.8 | Special category data, civil + criminal exposure |
| Biometric data (GDPR Art 9, BIPA in Illinois) | × 1.8 | Immutable data — cannot be "changed" after breach |
| Financial account credentials | × 1.7 | Direct financial theft possible |
| Government IDs (SSN, passport) | × 1.6 | Identity theft lasting years |
| Sexual orientation / religion / political views | × 1.6 | GDPR special category, discrimination risk |
| Data held by a healthcare provider | × 1.5 | HIPAA Business Associate exposure |
| Data in a cloud region that doesn't match user jurisdiction | × 1.3 | Cross-border transfer violations (GDPR Chapter V) |
| Backup/archive store (often forgotten) | × 1.2 | Backups frequently missed in breach containment |
---
## Blast Radius Score Calculation Examples
### Example 1: E-commerce checkout system
**Exposure vector:** API endpoint `/api/users/{id}/payment-methods` — no ownership check (IDOR)
- Tier: T2 (card last 4 + billing address) = 4.0
- Exposure Likelihood: 0.9 (IDOR on sequential IDs, near-certain exploitation)
- Population Scale: 100K users = 0.6
- Completeness: Partial profile + joinable to user table = 0.9
- Context Multiplier: Payment data = 1.7
```
BRS = 4.0 × 0.9 × 0.6 × 0.9 × 1.7 = 3.30 (raw) → normalized to 66/100 → HIGH
```
### Example 2: Internal HR tool
**Exposure vector:** Employees table visible to all company users via `/api/employees`
- Tier: T2 (salary + home address + SSN) = 5.0 (SSN is T1)
- Exposure Likelihood: 0.7 (auth required, but no RBAC; any employee can see all)
- Population Scale: 2,000 employees = 0.3
- Completeness: Full profile = 1.0
- Context Multiplier: Government IDs (SSN) = 1.6
```
BRS = 5.0 × 0.7 × 0.3 × 1.0 × 1.6 = 1.68 (raw) → normalized to 34/100 → MEDIUM
```
However — **financial impact** overrides score here because SSN exposure is Tier 1. Flag as HIGH regardless of score.
---
## Score Normalization
The raw formula output typically ranges 08. Normalize to 0100:
```
Normalized_BRS = min(100, (raw_BRS / 8.0) × 100)
```
---
## Blast Radius Summary Table (per exposure vector)
Use this format when reporting:
```markdown
| # | Exposure Vector | Tier | Likelihood | Pop. at Risk | BRS | Severity | Jurisdiction |
|---|----------------|------|-----------|-------------|-----|----------|--------------|
| 1 | /api/users endpoint - SSN returned in response | T1 | 0.9 | 50K | 87 | CRITICAL | GDPR, CCPA |
| 2 | Logs contain plaintext emails | T3 | 0.6 | 50K | 45 | MEDIUM | GDPR |
| 3 | Redis cache stores full user objects | T2 | 0.5 | 50K | 38 | MEDIUM | GDPR, CCPA |
| 4 | S3 bucket - public read on user avatars | T4 | 1.0 | 50K | 28 | LOW | - |
```
---
## Total Organizational Blast Radius
After scoring all exposure vectors, compute:
**Maximum Simultaneous Exposure (MSE):** The number of unique individuals that could be affected if a single attacker gained broad DB access (worst case). This is the number used in regulatory reporting.
**Expected Breach Exposure (EBE):** The typical exposure based on the most likely attack vector (the highest-likelihood finding, not the highest-impact one).
**Regulatory Trigger Count:** The number of distinct regulatory regimes triggered (each one has its own notification obligation and fine formula).
```markdown
## Organizational Blast Radius Summary
| Metric | Value |
|--------|-------|
| Maximum records at risk | [number] |
| Users with Tier 1 data | [number] |
| Users with Tier 2 data | [number] |
| Users with Tier 3+ data | [number] |
| Regulations triggered | GDPR, CCPA, [others] |
| Worst-case BRS | [score] |
| Most likely attack vector | [description] |
| Time to detect (estimated) | [industry avg: 194 days if no SIEM] |
| Time to contain (estimated) | [industry avg: 73 days] |
```
---
## Breach Cost Benchmarks (IBM Data — Verify Annual Edition)
Use these when no specific cost data is available. Figures below are from the **IBM 2024 edition**. IBM publishes a new edition annually at https://www.ibm.com/reports/data-breach — the 2025 report reports a ~9% decrease in global average cost.
| Metric | Value (IBM 2024) |
|--------|------------------|
| Global average cost per breach | $4.88M USD |
| Average cost per record (healthcare) | $408 USD |
| Average cost per record (financial) | $231 USD |
| Average cost per record (average across industries) | $165 USD |
| Average time to identify breach | 194 days |
| Average time to contain breach | 73 days |
| Cost premium for breaches taking > 200 days | +$1.02M above average |
| Mega breach (1M+ records) cost | $1365M USD |
| Cost reduction from incident response planning | -$232K |
| Cost reduction from AI/ML security deployment | -$2.22M |
| Cost reduction from employee training | -$258K |
> Source: IBM Cost of a Data Breach Report 2024. State these as benchmarks, not guarantees. Update this table when a new edition is released.