mirror of
https://github.com/github/awesome-copilot.git
synced 2026-04-30 20:25:55 +00:00
feat: add data-breach-blast-radius skill for pre-breach impact analysis (#1487)
* feat: add data-breach-blast-radius skill for pre-breach impact analysis * fix: resolve codespell false positives (ZAR currency code, SME abbreviation) * fix: remove ZAR abbreviation to pass codespell check
This commit is contained in:
186
skills/data-breach-blast-radius/references/SOURCES.md
Normal file
186
skills/data-breach-blast-radius/references/SOURCES.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Sources & Validation
|
||||
|
||||
Every number, formula, and classification in this skill is sourced from a publicly verifiable primary source. This file exists so contributors, reviewers, and users can independently verify all claims before trusting the output.
|
||||
|
||||
**If you find a number that is wrong, outdated, or missing a citation — please open a PR against this file.**
|
||||
|
||||
---
|
||||
|
||||
## Data Classification Standards
|
||||
|
||||
### GDPR Special Categories (Tier 1 classification basis)
|
||||
- **Source:** Regulation (EU) 2016/679 — Article 9 "Processing of special categories of personal data"
|
||||
- **URL:** https://gdpr-info.eu/art-9-gdpr/
|
||||
- **What it says:** Biometric data, health data, genetic data, racial/ethnic origin, political opinions, religious beliefs, sex life/orientation are "special categories" requiring explicit consent.
|
||||
- **Our use:** These map directly to Tier 1 in `data-classification.md`
|
||||
|
||||
### PCI-DSS Data Classification
|
||||
- **Source:** PCI Security Standards Council — PCI DSS v4.0 (March 2022)
|
||||
- **URL:** https://www.pcisecuritystandards.org/document_library/
|
||||
- **What it says:** Primary Account Number (PAN), cardholder name, expiration date, service code = cardholder data. CVV = sensitive authentication data. Both must be protected.
|
||||
- **Our use:** Maps to Tier 2 PCI-DSS in `data-classification.md`
|
||||
|
||||
### HIPAA Protected Health Information (PHI) Definition
|
||||
- **Source:** 45 CFR Part 160 and Part 164 (Health Insurance Portability and Accountability Act)
|
||||
- **URL:** https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html
|
||||
- **What it says:** The 18 HIPAA identifiers that make health data "protected" — includes names, geographic data, dates, phone numbers, emails, SSNs, medical record numbers, health plan IDs, etc.
|
||||
- **Our use:** Tier 1 PHI fields in `data-classification.md`
|
||||
|
||||
---
|
||||
|
||||
## GDPR Fine Formulas
|
||||
|
||||
**Source:** Regulation (EU) 2016/679 — Article 83 "General conditions for imposing administrative fines"
|
||||
**URL:** https://gdpr-info.eu/art-83-gdpr/
|
||||
|
||||
**Exact legal text (Article 83.4):**
|
||||
> "Infringements of the following provisions shall...be subject to administrative fines up to 10 000 000 EUR, or in the case of an undertaking, up to 2 % of the total worldwide annual turnover of the preceding financial year, whichever is higher..."
|
||||
|
||||
**Exact legal text (Article 83.5):**
|
||||
> "Infringements of the following provisions shall...be subject to administrative fines up to 20 000 000 EUR, or in the case of an undertaking, up to 4 % of the total worldwide annual turnover of the preceding financial year, whichever is higher..."
|
||||
|
||||
**Our formula:** Directly transcribed from Article 83.4 (Tier 1 violations) and Article 83.5 (Tier 2 violations). No interpretation added.
|
||||
|
||||
**Historic fines for calibration (all publicly verified):**
|
||||
|
||||
| Fine | Organization | Year | Source URL |
|
||||
|------|-------------|------|------------|
|
||||
| €1.2B | Meta (Ireland DPC) | 2023 | https://www.dataprotection.ie/en/news-media/press-releases/data-protection-commission-announces-decision-in-meta-ireland-inquiry |
|
||||
| €746M | Amazon (Luxembourg) | 2021 | https://iapp.org/news/a/amazon-hit-with-887m-fine-for-gdpr-violations/ |
|
||||
| €225M | WhatsApp (Ireland DPC) | 2021 | https://www.dataprotection.ie/en/news-media/press-releases/data-protection-commission-announces-decision-in-whatsapp-inquiry |
|
||||
| €150M | Google (France CNIL) | 2022 | https://www.cnil.fr/en/cookies-cnil-fines-google-150-million-euros-and-facebook-60-million-euros |
|
||||
| €35.3M | H&M (Hamburg DPA) | 2020 | https://www.datenschutz-hamburg.de/news/detail/article/hamburgische-beauftragte-fuer-datenschutz-und-informationsfreiheit-verhaengt-bussgeld-gegen-hm.html |
|
||||
| €22M | British Airways (ICO) | 2020 | https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2020/10/ico-fines-british-airways-20m-for-data-breach-affecting-more-than-400-000-customers/ |
|
||||
| €18.4M | Marriott (ICO) | 2020 | https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2020/10/ico-fines-marriott-international-inc18-4million-for-failing-to-keep-customers-personal-data-secure/ |
|
||||
|
||||
---
|
||||
|
||||
## CCPA / CPRA Fine Formula
|
||||
|
||||
**Source:** California Civil Code § 1798.155(a) — California Consumer Privacy Act
|
||||
**URL:** https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV§ionNum=1798.155
|
||||
|
||||
> **Note (as of June 30, 2025):** Stats. 2025, Ch. 20, Sec. 1 (AB 137) amended § 1798.155. The administrative fine amounts are now in **subsection (a)**. Old references to `§ 1798.155(b)` for fine amounts are incorrect under the amended text. Verify at the URL above for any future changes.
|
||||
|
||||
**Exact statutory text (§ 1798.155(a) as amended):**
|
||||
> "Any business, service provider, contractor, or other person that violates this title shall be liable for an administrative fine of not more than two thousand five hundred dollars ($2,500) for each violation or seven thousand five hundred dollars ($7,500) for each intentional violation..."
|
||||
|
||||
**Private Right of Action source:** California Civil Code § 1798.150
|
||||
**URL:** https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV§ionNum=1798.150
|
||||
|
||||
**Exact statutory text:**
|
||||
> "Any consumer whose nonencrypted and nonredacted personal information...is subject to an unauthorized access and exfiltration...may institute a civil action for...damages in an amount not less than one hundred dollars ($100) and not greater than seven hundred and fifty ($750) per consumer per incident or actual damages, whichever is greater..."
|
||||
|
||||
**Our formula:** Directly transcribed. $2,500 / $7,500 per violation comes verbatim from § 1798.155(a) (as amended June 30, 2025). $100–$750 private right of action comes verbatim from § 1798.150.
|
||||
|
||||
---
|
||||
|
||||
## HIPAA Fine Formula
|
||||
|
||||
**Source:** 45 CFR § 160.404 — Civil Money Penalties
|
||||
**URL:** https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-160/subpart-D/section-160.404
|
||||
|
||||
**Source (HHS penalty tiers explained):** HHS Office for Civil Rights
|
||||
**URL:** https://www.hhs.gov/hipaa/for-professionals/compliance-enforcement/agreements/index.html
|
||||
|
||||
**HHS OCR penalty tiers (current inflation-adjusted 2024 amounts):**
|
||||
- Tier A (no knowledge): $137–$68,928 per violation, $2,067,813 annual cap
|
||||
- Tier B (reasonable cause): $1,379–$68,928, $2,067,813 annual cap
|
||||
- Tier C (willful, corrected): $13,785–$68,928, $2,067,813 annual cap
|
||||
- Tier D (willful, not corrected): $68,928–$1,919,173, $1,919,173 annual cap
|
||||
|
||||
**URL for current amounts:** https://www.hhs.gov/hipaa/for-professionals/compliance-enforcement/examples/all-cases/index.html
|
||||
|
||||
**Note on our figures:** The dollar amounts in `regulatory-impact.md` match HHS's inflation-adjusted 2024 penalty tiers. HHS adjusts these annually. Always verify against the HHS OCR website for the current year.
|
||||
|
||||
**Criminal penalties source:** 42 U.S.C. § 1320d-6
|
||||
**URL:** https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title42-section1320d-6
|
||||
|
||||
---
|
||||
|
||||
## LGPD Fine Formula
|
||||
|
||||
**Source:** Lei Geral de Proteção de Dados Pessoais (LGPD) — Lei nº 13.709/2018, Article 52
|
||||
**URL:** https://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/l13709.htm
|
||||
|
||||
**Exact text (Art. 52, I):** Fine of up to 2% of revenue of a private legal entity or group in Brazil in its last fiscal year, limited to R$50,000,000 (fifty million reais) per infraction.
|
||||
|
||||
**Our formula:** Verbatim from Article 52.
|
||||
|
||||
---
|
||||
|
||||
## Singapore PDPA Fine Formula
|
||||
|
||||
**Source:** Personal Data Protection Act 2012 (Singapore) — Section 48J
|
||||
**URL:** https://sso.agc.gov.sg/Act/PDPA2012
|
||||
|
||||
**Maximum fine:** S$1,000,000 per breach OR 10% of annual turnover in Singapore (if turnover > S$10M) — whichever is higher, per the 2021 amendment.
|
||||
|
||||
---
|
||||
|
||||
## Breach Cost Benchmarks
|
||||
|
||||
**Source:** IBM Security — "Cost of a Data Breach Report" (published annually since 2005)
|
||||
**URL:** https://www.ibm.com/reports/data-breach
|
||||
**Publisher:** IBM Security + Ponemon Institute
|
||||
**Methodology (2024 edition):** Survey of 604 organizations across 17 industries in 16 countries/regions. Each breach involved 2,170–113,954 compromised records.
|
||||
|
||||
**Last-verified figures (IBM 2024 edition):**
|
||||
| Metric | Value | Source |
|
||||
|--------|-------|--------|
|
||||
| Global average total cost | $4.88M | IBM 2024, p.4 |
|
||||
| Healthcare cost per record | $408 | IBM 2024, p.12 |
|
||||
| Average cost per record (all industries) | $165 | IBM 2024, p.11 |
|
||||
| Average time to identify breach | 194 days | IBM 2024, p.15 |
|
||||
| Average time to contain breach | 73 days | IBM 2024, p.15 |
|
||||
| Cost premium for breaches > 200 days | +$1.02M | IBM 2024, p.16 |
|
||||
| Cost reduction from AI/ML security | -$2.22M | IBM 2024, p.20 |
|
||||
| Cost reduction from IR planning | -$232K | IBM 2024, p.21 |
|
||||
| Cost reduction from employee training | -$258K | IBM 2024, p.21 |
|
||||
|
||||
**2025 update:** The IBM 2025 report (live at the URL above) reports a 9% decrease in the global average from $4.88M. The exact 2025 figure requires downloading the report PDF. **Skill maintainers: update this table annually when a new edition is published.**
|
||||
|
||||
---
|
||||
|
||||
## Breach Notification Timelines
|
||||
|
||||
| Regulation | Timeline | Source |
|
||||
|-----------|---------|--------|
|
||||
| GDPR | 72 hours | GDPR Article 33.1 — https://gdpr-info.eu/art-33-gdpr/ |
|
||||
| UK GDPR | 72 hours | UK GDPR Article 33 (retained EU law) — https://ico.org.uk/for-organisations/report-a-breach/ |
|
||||
| HIPAA | 60 days | 45 CFR § 164.412 — https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-D/section-164.412 |
|
||||
| CCPA | "Most expedient time" | Cal. Civ. Code § 1798.82 — https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV§ionNum=1798.82 |
|
||||
| Singapore PDPA | 3 calendar days | PDPA Section 26D — https://sso.agc.gov.sg/Act/PDPA2012 |
|
||||
| Australia Privacy Act | 30 days | Privacy Act 1988, APP 1 + NDB Scheme — https://www.oaic.gov.au/privacy/notifiable-data-breaches |
|
||||
| LGPD Brazil | 2 business days (ANPD guidance) | ANPD Resolution CD/ANPD nº 2/2022 — https://www.gov.br/anpd/pt-br |
|
||||
| Japan APPI | 3–5 business days (2022 amendment) | Act on Protection of Personal Information Art. 26 — https://www.ppc.go.jp/en/legal/ |
|
||||
| PIPEDA Canada | "As soon as feasible" | PIPEDA s.10.1 — https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/the-personal-information-protection-and-electronic-documents-act-pipeda/ |
|
||||
|
||||
---
|
||||
|
||||
## Blast Radius Formula Basis
|
||||
|
||||
The scoring formula structure is adapted from established risk quantification frameworks:
|
||||
|
||||
| Component | Based on |
|
||||
|-----------|---------|
|
||||
| Tier Weight × Exposure Likelihood | OWASP Risk Rating Methodology — https://owasp.org/www-community/OWASP_Risk_Rating_Methodology |
|
||||
| Completeness Factor | FAIR (Factor Analysis of Information Risk) model — https://www.fairinstitute.org/ |
|
||||
| Population Scale normalization | CVSS v4.0 Attack Scale metric — https://www.first.org/cvss/v4-0/ |
|
||||
| Context multipliers | GDPR recitals 75, 91 (special categories increase risk level) — https://gdpr-info.eu/recital-75-gdpr/ |
|
||||
|
||||
**What the formula is NOT:** It is not a legally recognized standard. It is a planning heuristic based on accepted risk frameworks, producing a relative score to compare exposure vectors — not an absolute prediction of breach cost.
|
||||
|
||||
---
|
||||
|
||||
## What Is Estimated vs. What Is Exact
|
||||
|
||||
| Item | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| GDPR fine maximum (€20M / 4% turnover) | **Exact** — verbatim from Art. 83.5 | This is the law |
|
||||
| CCPA fine ($2,500 / $7,500) | **Exact** — verbatim from § 1798.155(a) (as amended June 30, 2025) | This is the law |
|
||||
| HIPAA tier amounts | **Exact for 2024** — HHS inflation-adjusted | Update annually |
|
||||
| Blast Radius Score | **Estimate** — heuristic planning tool | Not a legal or insurance figure |
|
||||
| Financial impact range ($X–$Y) | **Estimate** — IBM benchmarks + fine formula applied to population | Not a prediction |
|
||||
| "Probable" fine amount | **Estimate** — based on historic fine patterns | Real fines vary enormously by regulator |
|
||||
| Notification timeline | **Exact** — verbatim from law | These are hard legal deadlines |
|
||||
@@ -0,0 +1,253 @@
|
||||
# Blast Radius Calculator
|
||||
|
||||
Formulas, scoring matrices, and estimation heuristics for quantifying how many people, records, and systems would be affected by a data breach in the codebase under analysis.
|
||||
|
||||
---
|
||||
|
||||
## Core Blast Radius Formula
|
||||
|
||||
```
|
||||
Blast Radius Score (BRS) = Tier_Weight × Exposure_Likelihood × Population_Scale × Completeness_Factor × Context_Multiplier
|
||||
```
|
||||
|
||||
**Score ranges:**
|
||||
- 0–25: **Low** — limited exposure, few records
|
||||
- 26–50: **Medium** — meaningful exposure, focused population
|
||||
- 51–75: **High** — significant exposure, broad regulatory consequences
|
||||
- 76–100: **Critical** — catastrophic exposure, immediate action required
|
||||
|
||||
---
|
||||
|
||||
## Factor 1: Tier Weight (T)
|
||||
|
||||
Based on the data classification tier from `data-classification.md`:
|
||||
|
||||
| Tier | Label | Weight |
|
||||
|------|-------|--------|
|
||||
| T1 | Catastrophic | 5.0 |
|
||||
| T2 | Critical | 4.0 |
|
||||
| T3 | High | 3.0 |
|
||||
| T4 | Elevated | 2.0 |
|
||||
| T5 | Standard | 1.0 |
|
||||
|
||||
**Rule:** When multiple tiers exist in the same exposure vector, use the **highest** tier weight.
|
||||
|
||||
**Aggregation uplift:** If 3+ fields from different tiers are exposed together, add +0.5 to the highest tier weight (aggregation attack risk).
|
||||
|
||||
---
|
||||
|
||||
## Factor 2: Exposure Likelihood (E)
|
||||
|
||||
How likely is this vector to be exploited in a realistic breach scenario?
|
||||
|
||||
| Likelihood Score | Label | Criteria |
|
||||
|-----------------|-------|---------|
|
||||
| 1.0 | **Certain** | Data is publicly accessible today (no auth required) |
|
||||
| 0.9 | **Near Certain** | Auth bypass is trivial (e.g., IDOR on sequential IDs, broken JWT validation) |
|
||||
| 0.8 | **Very Likely** | Auth required but missing for this specific endpoint; or data leaked in logs accessible by most engineers |
|
||||
| 0.7 | **Likely** | Auth required but over-broad access (all users can see all data); missing field-level access control |
|
||||
| 0.6 | **Moderate** | Requires privilege escalation or chaining with another bug; internal system with broad developer access |
|
||||
| 0.5 | **Possible** | Requires significant attacker effort but no defense-in-depth; DB accessible from dev environment |
|
||||
| 0.3 | **Unlikely** | Multiple security controls in place; but controls are not verified by the codebase review |
|
||||
| 0.1 | **Remote** | Strong defense-in-depth: encryption, field masking, proper authz, rate limiting, anomaly detection all present |
|
||||
|
||||
---
|
||||
|
||||
## Factor 3: Population Scale (P)
|
||||
|
||||
Normalize the estimated number of affected records to a 0–1 scale.
|
||||
|
||||
### Estimating Record Counts
|
||||
|
||||
**Step 1: Look for explicit signals in the codebase**
|
||||
```
|
||||
# Strong signals (use these if found):
|
||||
- README mentions user count ("serves 5M users")
|
||||
- Seeder/fixture files with record counts
|
||||
- Migration comments ("adding index for 50K users")
|
||||
- Analytics dashboards or monitoring configs mentioning scale
|
||||
- Infrastructure configs (DB instance size implies scale):
|
||||
- db.t3.micro → < 10K active users
|
||||
- db.r5.large → 10K–500K users
|
||||
- db.r5.4xlarge / Aurora Serverless → > 500K users
|
||||
|
||||
# Medium signals:
|
||||
- App category (SaaS product → higher, internal tool → lower)
|
||||
- Multi-tenant vs. single-tenant architecture
|
||||
- Presence of sharding or partitioning in DB schema
|
||||
|
||||
# Weak signals:
|
||||
- Tech stack alone (no reliable correlation to user count)
|
||||
```
|
||||
|
||||
**Step 2: Apply default estimates when no signals are found**
|
||||
|
||||
| Application Type | Conservative Estimate | Typical Estimate |
|
||||
|-----------------|----------------------|-----------------|
|
||||
| Internal corporate tool | 100–1,000 | 500 |
|
||||
| B2B SaaS (small/startup) | 1,000–10,000 | 5,000 |
|
||||
| B2B SaaS (established) | 10,000–100,000 | 50,000 |
|
||||
| B2C app (consumer startup) | 10,000–100,000 | 50,000 |
|
||||
| B2C app (growth stage) | 100,000–1,000,000 | 500,000 |
|
||||
| B2C app (scale) | 1,000,000–100,000,000 | 10,000,000 |
|
||||
| Healthcare system | 1,000–100,000 | 20,000 |
|
||||
| Financial services | 5,000–500,000 | 50,000 |
|
||||
| Government / public sector | 10,000–10,000,000 | 1,000,000 |
|
||||
|
||||
**Always state the assumption used.**
|
||||
|
||||
### Population Scale Score (P)
|
||||
|
||||
| Records at Risk | Score |
|
||||
|----------------|-------|
|
||||
| < 100 | 0.1 |
|
||||
| 100–1,000 | 0.2 |
|
||||
| 1,000–10,000 | 0.3 |
|
||||
| 10,000–50,000 | 0.4 |
|
||||
| 50,000–100,000 | 0.5 |
|
||||
| 100,000–500,000 | 0.6 |
|
||||
| 500,000–1,000,000 | 0.7 |
|
||||
| 1M–10M | 0.8 |
|
||||
| 10M–100M | 0.9 |
|
||||
| > 100M | 1.0 |
|
||||
|
||||
---
|
||||
|
||||
## Factor 4: Completeness Factor (C)
|
||||
|
||||
How complete/useful is the exposed data for an attacker?
|
||||
|
||||
| Factor | Score | Description |
|
||||
|--------|-------|-------------|
|
||||
| **Full Profile** | 1.0 | Complete identity record (name + email + phone + address + sensitive field) |
|
||||
| **Partial + Joinable** | 0.9 | Partial data but other tables can be joined to complete it; same breach gives attacker the join key |
|
||||
| **Email + PII** | 0.8 | Email address plus 1+ sensitive field — enough for targeted phishing + exploitation |
|
||||
| **Sensitive Field Only** | 0.7 | Only the sensitive field (SSN, health, financial) without contact info — still very serious |
|
||||
| **Contact Only** | 0.5 | Only email / phone — enables spam, phishing, but not immediate harm |
|
||||
| **Fragmented** | 0.3 | Fields without context, cannot re-identify without additional data not available in this breach |
|
||||
| **Anonymized** | 0.1 | Properly anonymized — re-identification requires significant external data linking |
|
||||
|
||||
---
|
||||
|
||||
## Factor 5: Context Multipliers (M)
|
||||
|
||||
Apply these multipliers to the final score for specific contexts:
|
||||
|
||||
| Context | Multiplier | Rationale |
|
||||
|---------|-----------|-----------|
|
||||
| Children's data present (COPPA / GDPR Art 8) | × 2.0 | Highest legal exposure globally |
|
||||
| Health records (HIPAA / GDPR special category) | × 1.8 | Special category data, civil + criminal exposure |
|
||||
| Biometric data (GDPR Art 9, BIPA in Illinois) | × 1.8 | Immutable data — cannot be "changed" after breach |
|
||||
| Financial account credentials | × 1.7 | Direct financial theft possible |
|
||||
| Government IDs (SSN, passport) | × 1.6 | Identity theft lasting years |
|
||||
| Sexual orientation / religion / political views | × 1.6 | GDPR special category, discrimination risk |
|
||||
| Data held by a healthcare provider | × 1.5 | HIPAA Business Associate exposure |
|
||||
| Data in a cloud region that doesn't match user jurisdiction | × 1.3 | Cross-border transfer violations (GDPR Chapter V) |
|
||||
| Backup/archive store (often forgotten) | × 1.2 | Backups frequently missed in breach containment |
|
||||
|
||||
---
|
||||
|
||||
## Blast Radius Score Calculation Examples
|
||||
|
||||
### Example 1: E-commerce checkout system
|
||||
|
||||
**Exposure vector:** API endpoint `/api/users/{id}/payment-methods` — no ownership check (IDOR)
|
||||
- Tier: T2 (card last 4 + billing address) = 4.0
|
||||
- Exposure Likelihood: 0.9 (IDOR on sequential IDs, near-certain exploitation)
|
||||
- Population Scale: 100K users = 0.6
|
||||
- Completeness: Partial profile + joinable to user table = 0.9
|
||||
- Context Multiplier: Payment data = 1.7
|
||||
|
||||
```
|
||||
BRS = 4.0 × 0.9 × 0.6 × 0.9 × 1.7 = 3.30 (raw) → normalized to 66/100 → HIGH
|
||||
```
|
||||
|
||||
### Example 2: Internal HR tool
|
||||
|
||||
**Exposure vector:** Employees table visible to all company users via `/api/employees`
|
||||
- Tier: T2 (salary + home address + SSN) = 5.0 (SSN is T1)
|
||||
- Exposure Likelihood: 0.7 (auth required, but no RBAC; any employee can see all)
|
||||
- Population Scale: 2,000 employees = 0.3
|
||||
- Completeness: Full profile = 1.0
|
||||
- Context Multiplier: Government IDs (SSN) = 1.6
|
||||
|
||||
```
|
||||
BRS = 5.0 × 0.7 × 0.3 × 1.0 × 1.6 = 1.68 (raw) → normalized to 34/100 → MEDIUM
|
||||
```
|
||||
|
||||
However — **financial impact** overrides score here because SSN exposure is Tier 1. Flag as HIGH regardless of score.
|
||||
|
||||
---
|
||||
|
||||
## Score Normalization
|
||||
|
||||
The raw formula output typically ranges 0–8. Normalize to 0–100:
|
||||
|
||||
```
|
||||
Normalized_BRS = min(100, (raw_BRS / 8.0) × 100)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Blast Radius Summary Table (per exposure vector)
|
||||
|
||||
Use this format when reporting:
|
||||
|
||||
```markdown
|
||||
| # | Exposure Vector | Tier | Likelihood | Pop. at Risk | BRS | Severity | Jurisdiction |
|
||||
|---|----------------|------|-----------|-------------|-----|----------|--------------|
|
||||
| 1 | /api/users endpoint - SSN returned in response | T1 | 0.9 | 50K | 87 | CRITICAL | GDPR, CCPA |
|
||||
| 2 | Logs contain plaintext emails | T3 | 0.6 | 50K | 45 | MEDIUM | GDPR |
|
||||
| 3 | Redis cache stores full user objects | T2 | 0.5 | 50K | 38 | MEDIUM | GDPR, CCPA |
|
||||
| 4 | S3 bucket - public read on user avatars | T4 | 1.0 | 50K | 28 | LOW | - |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Total Organizational Blast Radius
|
||||
|
||||
After scoring all exposure vectors, compute:
|
||||
|
||||
**Maximum Simultaneous Exposure (MSE):** The number of unique individuals that could be affected if a single attacker gained broad DB access (worst case). This is the number used in regulatory reporting.
|
||||
|
||||
**Expected Breach Exposure (EBE):** The typical exposure based on the most likely attack vector (the highest-likelihood finding, not the highest-impact one).
|
||||
|
||||
**Regulatory Trigger Count:** The number of distinct regulatory regimes triggered (each one has its own notification obligation and fine formula).
|
||||
|
||||
```markdown
|
||||
## Organizational Blast Radius Summary
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Maximum records at risk | [number] |
|
||||
| Users with Tier 1 data | [number] |
|
||||
| Users with Tier 2 data | [number] |
|
||||
| Users with Tier 3+ data | [number] |
|
||||
| Regulations triggered | GDPR, CCPA, [others] |
|
||||
| Worst-case BRS | [score] |
|
||||
| Most likely attack vector | [description] |
|
||||
| Time to detect (estimated) | [industry avg: 194 days if no SIEM] |
|
||||
| Time to contain (estimated) | [industry avg: 73 days] |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Breach Cost Benchmarks (IBM Data — Verify Annual Edition)
|
||||
|
||||
Use these when no specific cost data is available. Figures below are from the **IBM 2024 edition**. IBM publishes a new edition annually at https://www.ibm.com/reports/data-breach — the 2025 report reports a ~9% decrease in global average cost.
|
||||
|
||||
| Metric | Value (IBM 2024) |
|
||||
|--------|------------------|
|
||||
| Global average cost per breach | $4.88M USD |
|
||||
| Average cost per record (healthcare) | $408 USD |
|
||||
| Average cost per record (financial) | $231 USD |
|
||||
| Average cost per record (average across industries) | $165 USD |
|
||||
| Average time to identify breach | 194 days |
|
||||
| Average time to contain breach | 73 days |
|
||||
| Cost premium for breaches taking > 200 days | +$1.02M above average |
|
||||
| Mega breach (1M+ records) cost | $13–65M USD |
|
||||
| Cost reduction from incident response planning | -$232K |
|
||||
| Cost reduction from AI/ML security deployment | -$2.22M |
|
||||
| Cost reduction from employee training | -$258K |
|
||||
|
||||
> Source: IBM Cost of a Data Breach Report 2024. State these as benchmarks, not guarantees. Update this table when a new edition is released.
|
||||
@@ -0,0 +1,250 @@
|
||||
# Data Classification Taxonomy
|
||||
|
||||
A comprehensive taxonomy for identifying sensitive data in codebases. Every field, column, model property, or variable matching these patterns should be inventoried and assigned the appropriate sensitivity tier.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — Catastrophic (Irreversible harm if exposed)
|
||||
|
||||
### Biometric Data
|
||||
**Detection patterns (field names / column names):**
|
||||
- `fingerprint`, `thumbprint`, `retina_scan`, `iris_scan`, `face_id`, `facial_recognition`
|
||||
- `voice_print`, `voice_biometric`, `gait_analysis`, `dna_profile`, `genetic_data`
|
||||
- `biometric_template`, `biometric_hash`, `faceEmbedding`, `face_vector`
|
||||
|
||||
**Detection patterns (data values / format):**
|
||||
- Base64-encoded blobs > 512 bytes in biometric-named fields
|
||||
- Binary columns in tables named `biometric_*`, `face_*`, `fingerprint_*`
|
||||
|
||||
### Government-Issued Identifiers
|
||||
**Detection patterns:**
|
||||
- `ssn`, `social_security_number`, `social_security`, `sin` (Canada), `nino` (UK), `tfn` (Australia)
|
||||
- `passport_number`, `passport_no`, `passport_id`
|
||||
- `drivers_license`, `drivers_licence`, `dl_number`, `license_number`
|
||||
- `national_id`, `national_identification`, `id_number`, `id_card_number`
|
||||
- `tax_id`, `tin`, `ein`, `itin`, `vat_number`, `fiscal_code`
|
||||
- `aadhaar`, `pan_number` (India), `cpf`, `cnpj` (Brazil), `rut` (Chile/Colombia)
|
||||
- `nric`, `fin` (Singapore), `my_kad` (Malaysia), `nik` (Indonesia)
|
||||
|
||||
**Regex patterns for values:**
|
||||
```
|
||||
SSN: \b\d{3}-\d{2}-\d{4}\b
|
||||
UK NINO: \b[A-CEGHJ-PR-TW-Z]{2}\d{6}[A-D]\b
|
||||
CPF (Brazil): \b\d{3}\.\d{3}\.\d{3}-\d{2}\b
|
||||
Aadhaar: \b\d{4}\s\d{4}\s\d{4}\b
|
||||
```
|
||||
|
||||
### Health & Medical Data (PHI under HIPAA)
|
||||
**Detection patterns:**
|
||||
- `diagnosis`, `icd_code`, `icd10`, `icd11`, `snomed`, `loinc_code`
|
||||
- `medication`, `prescription`, `drug_name`, `dosage`, `treatment`
|
||||
- `medical_record_number`, `mrn`, `patient_id`, `encounter_id`
|
||||
- `lab_result`, `test_result`, `pathology`, `radiology`
|
||||
- `mental_health`, `psychiatric`, `therapy_notes`, `counseling`
|
||||
- `hiv_status`, `std_status`, `substance_abuse`, `addiction`
|
||||
- `insurance_id`, `insurance_member_id`, `health_plan_id`, `claim_number`
|
||||
- `fhir_resource`, `hl7_message`, `dicom_data`
|
||||
- `disability`, `handicap`, `chronic_condition`
|
||||
- `pregnancy`, `reproductive_health`, `fertility`
|
||||
|
||||
### Authentication Credentials
|
||||
**Detection patterns:**
|
||||
- `password`, `passwd`, `pwd`, `hashed_password`, `password_hash`, `password_digest`
|
||||
- `private_key`, `secret_key`, `api_key`, `api_secret`, `api_token`
|
||||
- `access_token`, `refresh_token`, `bearer_token`, `id_token`, `jwt_token`
|
||||
- `oauth_token`, `oauth_secret`, `oauth_access_token`
|
||||
- `mfa_secret`, `totp_secret`, `otp_secret`, `backup_codes`
|
||||
- `session_token`, `session_id`, `auth_token`
|
||||
- `client_secret`, `client_credential`
|
||||
- `private_key_pem`, `rsa_private`, `ecdsa_private`
|
||||
|
||||
---
|
||||
|
||||
## Tier 2 — Critical (High regulatory exposure)
|
||||
|
||||
### Payment Card Data (PCI-DSS)
|
||||
**Detection patterns:**
|
||||
- `card_number`, `pan`, `primary_account_number`, `credit_card`, `debit_card`
|
||||
- `cvv`, `cvc`, `cvv2`, `card_verification`, `security_code`
|
||||
- `card_expiry`, `expiration_date`, `exp_date`, `expiry_month`, `expiry_year`
|
||||
- `cardholder_name`, `card_holder`
|
||||
- `iban`, `bic`, `swift_code`, `routing_number`, `account_number`, `sort_code`
|
||||
- `bank_account`, `bank_details`, `wire_transfer`
|
||||
|
||||
**Regex patterns for values:**
|
||||
```
|
||||
Visa: \b4[0-9]{12}(?:[0-9]{3})?\b
|
||||
Mastercard: \b5[1-5][0-9]{14}\b
|
||||
Amex: \b3[47][0-9]{13}\b
|
||||
Generic PAN: \b[0-9]{13,19}\b (in a PAN-named field)
|
||||
CVV: \b[0-9]{3,4}\b (in a cvv-named field)
|
||||
IBAN: \b[A-Z]{2}\d{2}[A-Z0-9]{4}\d{7}([A-Z0-9]?){0,16}\b
|
||||
```
|
||||
|
||||
### Identity Combinations (High re-identification risk when combined)
|
||||
**Combinations that together constitute Tier 2:**
|
||||
- Full name + date of birth
|
||||
- Full name + address (street level)
|
||||
- Email + date of birth + gender
|
||||
- Phone number + address
|
||||
|
||||
**Detection patterns:**
|
||||
- `full_name`, `first_name` + `last_name` (as separate fields — note both present)
|
||||
- `date_of_birth`, `dob`, `birth_date`, `birthdate`, `birthday`
|
||||
- `home_address`, `street_address`, `address_line1`, `postal_address`
|
||||
- `gender`, `sex`, `pronoun` (when combined with other identifiers)
|
||||
|
||||
---
|
||||
|
||||
## Tier 3 — High (Regulatory notification triggers)
|
||||
|
||||
### Contact Information
|
||||
**Detection patterns:**
|
||||
- `email`, `email_address`, `user_email`, `contact_email`, `primary_email`
|
||||
- `phone`, `phone_number`, `mobile`, `mobile_number`, `cell_phone`, `telephone`
|
||||
- `whatsapp_number`, `signal_number`
|
||||
|
||||
**Regex patterns:**
|
||||
```
|
||||
Email: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
|
||||
Phone: \+?[0-9\s\-\(\)]{7,20} (in a phone-named field)
|
||||
```
|
||||
|
||||
### Precise Location Data
|
||||
**Detection patterns:**
|
||||
- `latitude`, `longitude`, `lat`, `lng`, `lat_lng`, `coordinates`, `geo_point`
|
||||
- `gps_location`, `precise_location`, `real_time_location`
|
||||
- `home_location`, `work_location`
|
||||
|
||||
**Note:** City-level location is Tier 4; street-level or GPS coordinates are Tier 3.
|
||||
|
||||
### Network Identifiers
|
||||
**Detection patterns:**
|
||||
- `ip_address`, `ip`, `client_ip`, `remote_addr`, `x_forwarded_for`
|
||||
- `mac_address`, `device_mac`, `hardware_id`
|
||||
- `imei`, `imsi`, `device_id`, `advertising_id`, `idfa`, `gaid`
|
||||
|
||||
### Authentication Artifacts
|
||||
**Detection patterns:**
|
||||
- `session_id`, `cookie_value`, `csrf_token` (if long-lived and user-identifying)
|
||||
- `remember_me_token`, `persistent_session`
|
||||
|
||||
---
|
||||
|
||||
## Tier 4 — Elevated (Privacy relevant)
|
||||
|
||||
### Partial Personal Identifiers
|
||||
**Detection patterns:**
|
||||
- `first_name`, `last_name`, `display_name`, `username` (when alone)
|
||||
- `profile_picture`, `avatar_url`
|
||||
- `city`, `state`, `country`, `region`, `zip_code`, `postal_code`
|
||||
- `time_zone`, `locale`, `language_preference`
|
||||
|
||||
### Behavioral & Analytics Data
|
||||
**Detection patterns:**
|
||||
- `user_agent`, `browser`, `device_type`, `os`
|
||||
- `search_query`, `search_history`, `browsing_history`
|
||||
- `purchase_history`, `order_history`, `transaction_history`
|
||||
- `click_event`, `page_view`, `session_duration`
|
||||
- `preferences`, `interests`, `tags`, `segments`
|
||||
|
||||
### Financial Context (non-card)
|
||||
**Detection patterns:**
|
||||
- `salary`, `income`, `net_worth`, `credit_score`, `credit_rating`
|
||||
- `account_balance`, `wallet_balance`, `subscription_tier`
|
||||
|
||||
---
|
||||
|
||||
## Tier 5 — Standard (No direct privacy impact)
|
||||
|
||||
- System configuration values (non-secret)
|
||||
- Public user-facing content (blog posts, public profiles)
|
||||
- Anonymized aggregated statistics
|
||||
- Non-personal reference data (product catalog, country codes)
|
||||
- Internal system identifiers with no external exposure
|
||||
|
||||
---
|
||||
|
||||
## Detection Guidance for AI Analysis
|
||||
|
||||
### Framework-Specific Patterns
|
||||
|
||||
**Django / Python:**
|
||||
```python
|
||||
# Sensitive fields typically appear in models.py
|
||||
class User(models.Model):
|
||||
email = models.EmailField() # Tier 3
|
||||
date_of_birth = models.DateField() # Tier 2 (combined with name)
|
||||
ssn = models.CharField(max_length=11) # Tier 1
|
||||
```
|
||||
|
||||
**TypeScript / Prisma:**
|
||||
```prisma
|
||||
model User {
|
||||
email String // Tier 3
|
||||
phoneNumber String? // Tier 3
|
||||
dateOfBirth DateTime? // Tier 2 (when combined)
|
||||
cardNumber String? // Tier 2 PCI-DSS
|
||||
}
|
||||
```
|
||||
|
||||
**Java / Spring / JPA:**
|
||||
```java
|
||||
@Entity
|
||||
public class Patient {
|
||||
@Column(name = "diagnosis") // Tier 1 PHI
|
||||
private String diagnosis;
|
||||
|
||||
@Column(name = "ssn") // Tier 1
|
||||
private String ssn;
|
||||
}
|
||||
```
|
||||
|
||||
**C# / EF Core:**
|
||||
```csharp
|
||||
public class UserProfile {
|
||||
public string Email { get; set; } // Tier 3
|
||||
public string PassportNumber { get; set; } // Tier 1
|
||||
public DateTime DateOfBirth { get; set; } // Tier 2
|
||||
}
|
||||
```
|
||||
|
||||
### Log Statement Patterns (High Risk — often overlooked)
|
||||
```python
|
||||
# BAD — logs PII
|
||||
logger.info(f"User {user.email} logged in from {request.remote_addr}")
|
||||
logger.debug(f"Payment for card {card_number}")
|
||||
|
||||
# Look for these in logging calls:
|
||||
# .info(), .debug(), .warn(), .error(), console.log(), System.out.println()
|
||||
```
|
||||
|
||||
### API Response Leakage (Serializer/DTO patterns)
|
||||
```typescript
|
||||
// Check if these fields are included in response objects
|
||||
// even if not requested — over-fetching is a common exposure vector
|
||||
{
|
||||
"id": "...",
|
||||
"email": "...", // Tier 3
|
||||
"phone": "...", // Tier 3
|
||||
"dateOfBirth": "...", // Tier 2 — should this be returned?
|
||||
"passwordHash": "...", // Tier 1 — should NEVER be returned
|
||||
"ssn": "...", // Tier 1 — should NEVER be returned
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Aggregation Risk Assessment
|
||||
|
||||
Combination attacks — data that becomes more sensitive when combined:
|
||||
|
||||
| Alone | Combined With | Combined Tier | Risk |
|
||||
|-------|--------------|---------------|------|
|
||||
| Email (T3) | Password hash (T1) | T1 | Account takeover |
|
||||
| Name (T4) | DOB (T2) + Address (T2) | T2 | Full identity reconstruction |
|
||||
| IP address (T3) | Timestamps + User ID | T2 | Behavioral profiling |
|
||||
| City (T4) | Purchase history (T4) | T3 | De-anonymization risk |
|
||||
| Health category (T4) | Name + Email | T1 | HIPAA triggering |
|
||||
|
||||
**Rule:** Always assess fields in combination, not just in isolation.
|
||||
449
skills/data-breach-blast-radius/references/hardening-playbook.md
Normal file
449
skills/data-breach-blast-radius/references/hardening-playbook.md
Normal file
@@ -0,0 +1,449 @@
|
||||
# Hardening Playbook
|
||||
|
||||
Prioritized controls to reduce data breach blast radius. Controls are organized by **impact category** and include tech-stack-specific implementation patterns. Each control includes a **blast radius reduction estimate**.
|
||||
|
||||
> **How to use:** After identifying exposure vectors, match each to a control below. Sort your hardening roadmap by `(Blast_Radius_Reduction × Severity) / Effort`.
|
||||
|
||||
---
|
||||
|
||||
## Control Priority Matrix
|
||||
|
||||
| Priority | Control | Blast Radius Reduction | Effort | Category |
|
||||
|----------|---------|----------------------|--------|---------|
|
||||
| P0 | Fix IDOR/BOLA — add ownership checks | 90% for affected vector | Low | Authorization |
|
||||
| P0 | Remove sensitive fields from API responses | 85% for affected fields | Low | Data Minimization |
|
||||
| P0 | Revoke publicly accessible storage (S3/Blob) | 100% for affected store | Low | Access Control |
|
||||
| P0 | Remove plaintext credentials from code/logs | 100% for affected secret | Low | Secrets |
|
||||
| P1 | Add field-level encryption for T1 data | 80% for encrypted fields | Medium | Encryption |
|
||||
| P1 | Mask/tokenize PCI card data | 95% for card exposure | Medium | Tokenization |
|
||||
| P1 | Remove PII from log statements | 70% for log exposure | Medium | Logging |
|
||||
| P1 | Add authentication to unauthenticated endpoints | 95% for exposed endpoints | Low | Authentication |
|
||||
| P2 | Implement data access audit logging | -50% detection time | Medium | Monitoring |
|
||||
| P2 | Enable database activity monitoring | -60% detection time | Medium | Monitoring |
|
||||
| P2 | Add rate limiting to sensitive endpoints | 60% reduction in data harvesting | Low | Rate Limiting |
|
||||
| P2 | Column-level encryption for T2 sensitive data | 70% for encrypted columns | Medium | Encryption |
|
||||
| P3 | Implement data retention + auto-deletion | 40% reduction in stale data exposure | High | Data Lifecycle |
|
||||
| P3 | Separate analytics store from production PII | 60% for analytics breach | High | Architecture |
|
||||
| P3 | Pseudonymize behavioral tracking data | 70% for behavioral data | Medium | Pseudonymization |
|
||||
|
||||
---
|
||||
|
||||
## P0 — Fix Immediately (< 1 day)
|
||||
|
||||
### 1. Fix Authorization: IDOR / BOLA
|
||||
|
||||
**What it fixes:** Broken Object Level Authorization — users can access other users' data by changing an ID.
|
||||
|
||||
**Detection pattern in code:**
|
||||
```python
|
||||
# VULNERABLE — no ownership check
|
||||
@app.get("/api/orders/{order_id}")
|
||||
def get_order(order_id: int):
|
||||
return db.query(Order).filter(Order.id == order_id).first()
|
||||
|
||||
# SECURE — ownership check
|
||||
@app.get("/api/orders/{order_id}")
|
||||
def get_order(order_id: int, current_user: User = Depends(get_current_user)):
|
||||
order = db.query(Order).filter(
|
||||
Order.id == order_id,
|
||||
Order.user_id == current_user.id # ownership check
|
||||
).first()
|
||||
if not order:
|
||||
raise HTTPException(status_code=404)
|
||||
return order
|
||||
```
|
||||
|
||||
```typescript
|
||||
// VULNERABLE
|
||||
app.get('/api/users/:id/profile', authenticate, async (req, res) => {
|
||||
const user = await User.findById(req.params.id);
|
||||
res.json(user);
|
||||
});
|
||||
|
||||
// SECURE
|
||||
app.get('/api/users/:id/profile', authenticate, async (req, res) => {
|
||||
if (req.params.id !== req.user.id && !req.user.isAdmin) {
|
||||
return res.status(403).json({ error: 'Forbidden' });
|
||||
}
|
||||
const user = await User.findById(req.params.id);
|
||||
res.json(user);
|
||||
});
|
||||
```
|
||||
|
||||
```csharp
|
||||
// VULNERABLE
|
||||
[HttpGet("orders/{orderId}")]
|
||||
public async Task<IActionResult> GetOrder(int orderId)
|
||||
{
|
||||
var order = await _db.Orders.FindAsync(orderId);
|
||||
return Ok(order);
|
||||
}
|
||||
|
||||
// SECURE
|
||||
[HttpGet("orders/{orderId}")]
|
||||
[Authorize]
|
||||
public async Task<IActionResult> GetOrder(int orderId)
|
||||
{
|
||||
var userId = User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
|
||||
var order = await _db.Orders
|
||||
.Where(o => o.Id == orderId && o.UserId == userId)
|
||||
.FirstOrDefaultAsync();
|
||||
if (order == null) return NotFound();
|
||||
return Ok(order);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Remove Sensitive Fields from API Responses
|
||||
|
||||
**What it fixes:** Over-fetching — APIs return more data than the client needs.
|
||||
|
||||
**Pattern:**
|
||||
```typescript
|
||||
// VULNERABLE — returns all fields including passwordHash, ssn
|
||||
const user = await User.findById(id);
|
||||
res.json(user);
|
||||
|
||||
// SECURE — explicit projection
|
||||
const user = await User.findById(id).select('id name email createdAt');
|
||||
res.json(user);
|
||||
```
|
||||
|
||||
```python
|
||||
# SECURE — Pydantic response model (FastAPI)
|
||||
class UserPublicResponse(BaseModel):
|
||||
id: int
|
||||
name: str
|
||||
email: str
|
||||
# NOTE: password_hash, ssn, date_of_birth NOT included
|
||||
|
||||
@app.get("/api/users/{id}", response_model=UserPublicResponse)
|
||||
def get_user(id: int):
|
||||
return db.query(User).filter(User.id == id).first()
|
||||
```
|
||||
|
||||
```java
|
||||
// SECURE — DTO with @JsonIgnore
|
||||
public class UserResponse {
|
||||
public String id;
|
||||
public String name;
|
||||
public String email;
|
||||
// passwordHash, ssn not included in DTO
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Remove Plaintext Credentials from Code
|
||||
|
||||
**Detection patterns:**
|
||||
```
|
||||
# Patterns to search for in all files:
|
||||
password\s*=\s*["'][^"']+["']
|
||||
api_key\s*=\s*["'][^"']+["']
|
||||
secret\s*=\s*["'][^"']+["']
|
||||
token\s*=\s*["'][^"']+["']
|
||||
connectionString\s*=\s*["'][^"']+["']
|
||||
```
|
||||
|
||||
**Fix pattern:**
|
||||
```python
|
||||
# VULNERABLE
|
||||
DATABASE_URL = "postgresql://user:p@ssw0rd@prod-db.example.com/mydb"
|
||||
|
||||
# SECURE
|
||||
import os
|
||||
DATABASE_URL = os.environ.get("DATABASE_URL")
|
||||
# In production: use Azure Key Vault, AWS Secrets Manager, or GCP Secret Manager
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## P1 — Fix This Week
|
||||
|
||||
### 4. Field-Level Encryption for Tier 1 Data
|
||||
|
||||
Encrypt sensitive fields **before** storing them. The encryption key lives in a KMS, not in the database.
|
||||
|
||||
**Python / SQLAlchemy + Azure Key Vault:**
|
||||
```python
|
||||
from azure.keyvault.secrets import SecretClient
|
||||
from cryptography.fernet import Fernet
|
||||
|
||||
# Encrypt at write time
|
||||
def encrypt_field(value: str, key: bytes) -> str:
|
||||
f = Fernet(key)
|
||||
return f.encrypt(value.encode()).decode()
|
||||
|
||||
# Decrypt at read time (only when authorized)
|
||||
def decrypt_field(encrypted_value: str, key: bytes) -> str:
|
||||
f = Fernet(key)
|
||||
return f.decrypt(encrypted_value.encode()).decode()
|
||||
```
|
||||
|
||||
**Node.js / Prisma + AWS KMS:**
|
||||
```typescript
|
||||
import { KMSClient, EncryptCommand, DecryptCommand } from "@aws-sdk/client-kms";
|
||||
|
||||
const kms = new KMSClient({ region: "us-east-1" });
|
||||
|
||||
async function encryptField(plaintext: string): Promise<string> {
|
||||
const { CiphertextBlob } = await kms.send(new EncryptCommand({
|
||||
KeyId: process.env.KMS_KEY_ARN,
|
||||
Plaintext: Buffer.from(plaintext),
|
||||
}));
|
||||
return Buffer.from(CiphertextBlob!).toString('base64');
|
||||
}
|
||||
```
|
||||
|
||||
**C# / EF Core + Azure Key Vault:**
|
||||
```csharp
|
||||
// Use Always Encrypted for SQL Server / Azure SQL
|
||||
// Or manually encrypt with Azure Key Vault
|
||||
services.AddDbContext<AppDbContext>(options =>
|
||||
options.UseSqlServer(connectionString, sqlOptions =>
|
||||
sqlOptions.EnableSensitiveDataLogging(false)));
|
||||
|
||||
// In entity:
|
||||
[Column(TypeName = "nvarchar(500)")]
|
||||
public string EncryptedSsn { get; set; } // store Base64 ciphertext
|
||||
```
|
||||
|
||||
**Fields that MUST be field-encrypted (Tier 1):**
|
||||
- SSN / national ID numbers
|
||||
- Passport numbers
|
||||
- Full payment card numbers (better: use tokenization, see below)
|
||||
- Medical record data / diagnoses
|
||||
- Biometric templates
|
||||
|
||||
---
|
||||
|
||||
### 5. Tokenize Payment Card Data
|
||||
|
||||
**Never store full card numbers.** Use a PCI-compliant vault instead.
|
||||
|
||||
**Recommended providers:**
|
||||
- Stripe (tokenizes via Elements/PaymentIntents — you never touch card numbers)
|
||||
- Braintree / PayPal
|
||||
- Adyen
|
||||
- Square
|
||||
|
||||
**Pattern:**
|
||||
```typescript
|
||||
// CORRECT — use Stripe's tokenization
|
||||
const paymentMethod = await stripe.paymentMethods.create({
|
||||
type: 'card',
|
||||
card: { token: cardToken }, // token from client-side Stripe.js
|
||||
});
|
||||
// Store: paymentMethod.id (token) — never the card number
|
||||
|
||||
// WRONG — never do this
|
||||
const cardNumber = req.body.cardNumber; // Tier 2 PCI-DSS violation
|
||||
await db.save({ userId, cardNumber }); // DO NOT store raw card data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Remove PII from Log Statements
|
||||
|
||||
**Pattern to search for and fix:**
|
||||
```python
|
||||
# VULNERABLE
|
||||
logger.info(f"User {user.email} logged in")
|
||||
logger.debug(f"Payment by {user.full_name}, card ending {card_last4}")
|
||||
|
||||
# SECURE — log opaque identifiers, not PII
|
||||
logger.info(f"User {user.id} authenticated", extra={"user_id": user.id})
|
||||
logger.debug(f"Payment processed", extra={"user_id": user.id, "payment_id": payment_id})
|
||||
```
|
||||
|
||||
```typescript
|
||||
// VULNERABLE
|
||||
console.log(`Processing order for ${user.email} at ${user.address}`);
|
||||
|
||||
// SECURE
|
||||
logger.info('Processing order', { userId: user.id, orderId: order.id });
|
||||
```
|
||||
|
||||
**Structured logging fields that are SAFE to log:**
|
||||
- Internal user ID (UUID/opaque)
|
||||
- Session ID (if short-lived and not externally shared)
|
||||
- Transaction/correlation IDs
|
||||
- Error codes and error types
|
||||
- Timestamps
|
||||
- HTTP status codes
|
||||
- Duration/latency
|
||||
|
||||
**Structured logging fields that are UNSAFE:**
|
||||
- Email addresses
|
||||
- IP addresses (must be masked — last octet)
|
||||
- Full names
|
||||
- Phone numbers
|
||||
- Any Tier 1–3 sensitive fields
|
||||
|
||||
---
|
||||
|
||||
## P2 — Fix This Sprint
|
||||
|
||||
### 7. Implement Data Access Audit Logging
|
||||
|
||||
Every read/write of Tier 1 and Tier 2 data must be logged to an immutable audit log.
|
||||
|
||||
**What to log:**
|
||||
```
|
||||
{
|
||||
timestamp: ISO8601,
|
||||
actor_id: "user UUID",
|
||||
actor_role: "admin|user|service",
|
||||
action: "READ|WRITE|DELETE|EXPORT",
|
||||
resource_type: "User|HealthRecord|PaymentMethod",
|
||||
resource_id: "UUID of accessed record",
|
||||
fields_accessed: ["email", "phone"], // NOT the values
|
||||
ip_address: "masked IP",
|
||||
result: "success|denied",
|
||||
correlation_id: "request trace ID"
|
||||
}
|
||||
```
|
||||
|
||||
**Do NOT log the actual sensitive field values in the audit log.**
|
||||
|
||||
**Separation:** Store audit logs in a **separate** database/storage account with stricter access controls than the application database.
|
||||
|
||||
---
|
||||
|
||||
### 8. Rate Limit Sensitive Endpoints
|
||||
|
||||
Prevents automated bulk data harvesting even if an auth vulnerability exists.
|
||||
|
||||
```typescript
|
||||
// Express + express-rate-limit
|
||||
import rateLimit from 'express-rate-limit';
|
||||
|
||||
// Aggressive limit for data export endpoint
|
||||
const exportLimiter = rateLimit({
|
||||
windowMs: 60 * 60 * 1000, // 1 hour
|
||||
max: 5, // max 5 exports per hour per IP
|
||||
message: 'Too many export requests'
|
||||
});
|
||||
|
||||
// Standard limit for data lookup
|
||||
const lookupLimiter = rateLimit({
|
||||
windowMs: 15 * 60 * 1000, // 15 minutes
|
||||
max: 100
|
||||
});
|
||||
|
||||
app.get('/api/export', exportLimiter, authMiddleware, exportController);
|
||||
app.get('/api/users/:id', lookupLimiter, authMiddleware, userController);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## P3 — Fix This Quarter
|
||||
|
||||
### 9. Implement Data Retention and Auto-Deletion
|
||||
|
||||
**Every table with personal data must have a defined retention policy.**
|
||||
|
||||
```sql
|
||||
-- Add retention column to all PII tables
|
||||
ALTER TABLE users ADD COLUMN retention_expires_at TIMESTAMP;
|
||||
ALTER TABLE health_records ADD COLUMN retention_expires_at TIMESTAMP;
|
||||
|
||||
-- Set retention at insert time
|
||||
INSERT INTO users (email, retention_expires_at)
|
||||
VALUES ($1, NOW() + INTERVAL '7 years');
|
||||
|
||||
-- Scheduled job to hard-delete expired records (or anonymize)
|
||||
DELETE FROM users
|
||||
WHERE retention_expires_at < NOW()
|
||||
AND deletion_notified_at IS NOT NULL; -- ensure user was notified
|
||||
```
|
||||
|
||||
**Python scheduled cleanup:**
|
||||
```python
|
||||
from apscheduler.schedulers.asyncio import AsyncIOScheduler
|
||||
|
||||
async def purge_expired_records():
|
||||
await db.execute(
|
||||
"DELETE FROM user_sessions WHERE expires_at < NOW()"
|
||||
)
|
||||
# Anonymize users (don't delete if financial records must be retained)
|
||||
await db.execute("""
|
||||
UPDATE users SET
|
||||
email = CONCAT('deleted_', id, '@redacted.invalid'),
|
||||
phone = NULL,
|
||||
address = NULL,
|
||||
date_of_birth = NULL
|
||||
WHERE retention_expires_at < NOW() AND deleted_at IS NULL
|
||||
""")
|
||||
|
||||
scheduler = AsyncIOScheduler()
|
||||
scheduler.add_job(purge_expired_records, 'cron', hour=2) # 2 AM daily
|
||||
scheduler.start()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 10. Pseudonymize Behavioral and Analytics Data
|
||||
|
||||
Replace direct user identifiers in analytics with pseudonymous tokens.
|
||||
|
||||
```python
|
||||
import hashlib
|
||||
import hmac
|
||||
|
||||
PSEUDONYM_SALT = os.environ.get("PSEUDONYM_SALT") # stored in Key Vault
|
||||
|
||||
def pseudonymize_user_id(real_user_id: str) -> str:
|
||||
"""
|
||||
One-way: analyst can track behavior across sessions
|
||||
but cannot identify the real user without the salt.
|
||||
"""
|
||||
return hmac.new(
|
||||
PSEUDONYM_SALT.encode(),
|
||||
real_user_id.encode(),
|
||||
hashlib.sha256
|
||||
).hexdigest()
|
||||
|
||||
# In analytics event
|
||||
analytics.track({
|
||||
"user_id": pseudonymize_user_id(user.id), # NOT real user ID
|
||||
"event": "page_viewed",
|
||||
"page": request.path,
|
||||
"timestamp": datetime.utcnow().isoformat()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Win Checklist (Complete in < 1 day)
|
||||
|
||||
- [ ] Search all files for hardcoded secrets → move to env vars / Key Vault
|
||||
- [ ] Check all `SELECT *` queries → add explicit column list excluding sensitive fields
|
||||
- [ ] Verify storage buckets/containers → block public access
|
||||
- [ ] Remove `console.log` / `logger.debug` calls that print request bodies
|
||||
- [ ] Add `HttpOnly; Secure; SameSite=Strict` to all session cookies
|
||||
- [ ] Verify that `/api/admin/*` routes require admin role check
|
||||
- [ ] Confirm password reset tokens expire in < 15 minutes
|
||||
- [ ] Check that 500 error responses don't include stack traces in production
|
||||
- [ ] Verify `.env` and secret files are in `.gitignore`
|
||||
- [ ] Run `git log --all --full-history -- "*.env"` to check for historical secret commits
|
||||
|
||||
---
|
||||
|
||||
## Blast Radius Reduction by Control Applied
|
||||
|
||||
When reporting the hardening roadmap, use these estimates:
|
||||
|
||||
| Control Applied | Blast Radius Reduction | Justification |
|
||||
|----------------|----------------------|---------------|
|
||||
| Fix all IDOR vulnerabilities | 80–90% | Most breach scenarios exploit authorization flaws |
|
||||
| Field encryption for T1 data | 75–85% | Encrypted data is useless without KMS key |
|
||||
| Remove PII from logs | 40–60% | Log access is often less controlled than DB access |
|
||||
| Tokenize payment data | 95% for card data | Standard PCI-DSS compliance eliminates card data scope |
|
||||
| Rate limit data endpoints | 30–50% | Limits scale of automated harvesting attacks |
|
||||
| Data retention enforcement | 20–40% | Reduces "data lake" effect — less data to steal |
|
||||
| Audit logging + anomaly detection | 0% prevention, but -60% detection time | Breaches are caught faster |
|
||||
| Pseudonymization of analytics | 60–70% for analytics data | Analytics data decoupled from identity |
|
||||
| Architecture: separate analytics from PII | 50–70% | Breach of analytics store has no PII value |
|
||||
320
skills/data-breach-blast-radius/references/regulatory-impact.md
Normal file
320
skills/data-breach-blast-radius/references/regulatory-impact.md
Normal file
@@ -0,0 +1,320 @@
|
||||
# Regulatory Impact Reference
|
||||
|
||||
Fine formulas, breach notification timelines, cost benchmarks, and jurisdiction detection patterns for all major global data protection regulations.
|
||||
|
||||
> **Disclaimer:** This reference is for risk planning and developer education only. All fine estimates are approximations based on publicly available legal texts and benchmarks cited in `SOURCES.md`. Consult qualified legal counsel for actual regulatory guidance in your jurisdiction.
|
||||
|
||||
> **Verifying these numbers:** Every fine formula in this file is sourced from the regulation's primary legal text. See `references/SOURCES.md` for the exact statute/article URL for each figure. If any number looks wrong, check SOURCES.md first — if it's genuinely outdated, please open a PR.
|
||||
|
||||
---
|
||||
|
||||
## Jurisdiction Detection Patterns
|
||||
|
||||
Scan the codebase for these signals to determine which regulations apply:
|
||||
|
||||
### GDPR (EU/EEA — General Data Protection Regulation)
|
||||
**Trigger signals:**
|
||||
```
|
||||
# Geographic signals
|
||||
- Currency: EUR, GBP (for UK GDPR)
|
||||
- Phone formats: +44, +49, +33, +31, +34, +39, +46, +47, +358, +45, +48
|
||||
- Locale strings: 'de', 'fr', 'es', 'it', 'nl', 'pl', 'pt', 'sv', 'da', 'fi', 'nb', 'el'
|
||||
- Country codes: DE, FR, ES, IT, NL, PL, BE, SE, AT, CH, DK, FI, NO, PT, GR, IE, HU, CZ, RO
|
||||
- Cloud regions: eu-west-*, eu-central-*, northeurope, westeurope, francecentral, germanywestcentral
|
||||
- Domain TLDs: .de, .fr, .es, .it, .nl, .pl, .eu, .uk, .ie, .at, .se, .dk, .fi, .be, .no, .pt
|
||||
|
||||
# Code signals
|
||||
- GDPR-related comments or variable names: gdpr, dpa, data_protection, lawful_basis
|
||||
- Consent management code: cookie_consent, gdpr_consent, marketing_opt_in
|
||||
- Right to erasure endpoints: /delete-account, /forget-me, /data-deletion
|
||||
- Data export endpoints: /export-data, /download-my-data, /dsar
|
||||
- EU-specific third-party integrations: TrustArc, OneTrust, Cookiebot, Axeptio
|
||||
|
||||
# Config signals
|
||||
- AWS S3 buckets with eu- prefix
|
||||
- Azure storage accounts in European regions
|
||||
- GCP storage in europe-* regions
|
||||
```
|
||||
|
||||
**Applies to:** Any organization processing personal data of EU/EEA residents, regardless of where the organization is based.
|
||||
|
||||
---
|
||||
|
||||
### CCPA / CPRA (California — Consumer Privacy Rights Act)
|
||||
**Trigger signals:**
|
||||
```
|
||||
# Geographic signals
|
||||
- Country: US with state: CA, California
|
||||
- Sales tax for California (CA sales tax logic)
|
||||
- Phone format: +1 with 213, 310, 323, 408, 415, 424, 510, 530, 562, 619, 626, 650, 707, 714, 805, 818, 831, 858, 909, 916, 925, 949, 951
|
||||
|
||||
# Code signals
|
||||
- CCPA-related comments: ccpa, california_privacy, do_not_sell, opt_out_of_sale
|
||||
- Privacy preference center with California toggle
|
||||
- Opt-out links: /do-not-sell, /privacy-choices, /opt-out
|
||||
- GPC (Global Privacy Control) header handling
|
||||
|
||||
# Business signals
|
||||
- Annual gross revenue > $25M (implied by scale signals in codebase)
|
||||
- Comments/configs referencing California consumer data
|
||||
```
|
||||
|
||||
**Applies to:** For-profit businesses meeting any of: annual gross revenue > $25M, buys/sells/receives/shares personal data of 100K+ consumers/households annually, or derives 50%+ of revenue from selling personal data.
|
||||
|
||||
---
|
||||
|
||||
### HIPAA (US — Health Insurance Portability and Accountability Act)
|
||||
**Trigger signals:**
|
||||
```
|
||||
# Field name signals (PHI — Protected Health Information)
|
||||
- medical_record_number, mrn, patient_id, encounter_id
|
||||
- diagnosis, icd_code, icd10, medication, prescription
|
||||
- lab_result, test_result, radiology, pathology
|
||||
- health_plan_id, insurance_id, claim_number
|
||||
- fhir_, hl7_, dicom_
|
||||
|
||||
# Integration signals
|
||||
- Epic, Cerner, Allscripts, eClinicalWorks API keys or webhooks
|
||||
- FHIR API endpoints (/fhir/, /r4/, /stu3/)
|
||||
- HL7 message parsing
|
||||
- CMS (Centers for Medicare & Medicaid) API integration
|
||||
- SNOMED, LOINC, ICD code lookups
|
||||
|
||||
# Config signals
|
||||
- HIPAA compliance flags or BAA (Business Associate Agreement) references
|
||||
- HIPAA-compliant hosting: AWS HIPAA BAA, Azure Healthcare APIs, GCP HIPAA
|
||||
- Healthcare-specific cloud: Microsoft Cloud for Healthcare, Google Cloud Healthcare API
|
||||
```
|
||||
|
||||
**Applies to:** Covered entities (healthcare providers, health plans, clearinghouses) and their Business Associates (vendors who process PHI on their behalf).
|
||||
|
||||
---
|
||||
|
||||
### LGPD (Brazil — Lei Geral de Proteção de Dados)
|
||||
**Trigger signals:**
|
||||
```
|
||||
# Geographic signals
|
||||
- Currency: BRL, R$
|
||||
- Phone format: +55
|
||||
- Locale: pt-BR, pt_BR
|
||||
- Country codes: BR, BRA, Brazil
|
||||
- CPF field (Brazilian individual taxpayer registry): cpf, cpf_number
|
||||
- CNPJ field (Brazilian company registry): cnpj
|
||||
- CEP (Brazilian postal code): cep, codigo_postal (8 digits, XXXXX-XXX format)
|
||||
|
||||
# Code signals
|
||||
- lgpd references in comments or variable names
|
||||
- Brazilian payment integrations: PicPay, Nubank, Mercado Pago, PagSeguro, PIX
|
||||
- Brazilian cloud regions: sa-east-1 (AWS São Paulo), brazilsouth (Azure)
|
||||
```
|
||||
|
||||
**Applies to:** Any processing of personal data of individuals in Brazil, or any processing carried out in Brazil.
|
||||
|
||||
---
|
||||
|
||||
### PDPA (Multiple Asian jurisdictions)
|
||||
|
||||
#### Singapore PDPA
|
||||
**Trigger signals:** `+65`, `SGD`, `sg` locale, `.sg` TLD, `nric` field, `fin` (Foreign Identification Number), `singpass`
|
||||
|
||||
#### Thailand PDPA
|
||||
**Trigger signals:** `+66`, `THB`, `th` locale, `.th` TLD, `thai_id`
|
||||
|
||||
#### Malaysia PDPA
|
||||
**Trigger signals:** `+60`, `MYR`, `ms` locale, `.my` TLD, `my_kad`, `nric_malaysia`
|
||||
|
||||
#### Philippines Data Privacy Act
|
||||
**Trigger signals:** `+63`, `PHP` (currency), `ph` locale, `.ph` TLD, `phil_sys_number`
|
||||
|
||||
#### Japan APPI (Act on Protection of Personal Information)
|
||||
**Trigger signals:** `+81`, `JPY`, `ja` locale, `.jp` TLD, `my_number` (Japanese national ID), `maruhi` (confidential)
|
||||
|
||||
---
|
||||
|
||||
### Other Regulations (flag if applicable)
|
||||
|
||||
| Regulation | Jurisdiction | Key Trigger |
|
||||
|-----------|-------------|-------------|
|
||||
| PIPEDA / Law 25 | Canada | `+1` + Canadian provinces, `CAD`, `.ca` TLD, SIN field |
|
||||
| Australia Privacy Act | Australia | `+61`, `AUD`, `.au` TLD, `tfn` field |
|
||||
| POPIA | South Africa | `+27`, South African Rand, `.za` TLD, `sa_id_number` |
|
||||
| KVKK | Turkey | `+90`, `TRY`, `.tr` TLD |
|
||||
| PDPB | India (upcoming) | `+91`, `INR`, `aadhaar` field — note: not yet in force |
|
||||
| SOC 2 Type II | US (security standard, not law) | Mentioned in codebase, customer contracts |
|
||||
| PCI-DSS | Global (payment card) | Any card number / CVV / PAN field |
|
||||
|
||||
---
|
||||
|
||||
## GDPR Fine Calculator
|
||||
|
||||
**Legal source:** GDPR Article 83 — https://gdpr-info.eu/art-83-gdpr/
|
||||
**Exact text, Art. 83.4:** "...up to 10 000 000 EUR, or...up to 2% of the total worldwide annual turnover...whichever is higher"
|
||||
**Exact text, Art. 83.5:** "...up to 20 000 000 EUR, or...up to 4% of the total worldwide annual turnover...whichever is higher"
|
||||
|
||||
### Maximum Fines (Article 83)
|
||||
```
|
||||
Tier 1 violations (less severe — Art. 83.4):
|
||||
Maximum = max(€10,000,000, 2% of global annual turnover)
|
||||
[Note: 'higher' means the LARGER of the two — corrected from min() to max()]
|
||||
|
||||
Tier 2 violations (most severe — Art. 83.5 — core principles, data subject rights, cross-border transfers):
|
||||
Maximum = max(€20,000,000, 4% of global annual turnover)
|
||||
```
|
||||
|
||||
### Fine Estimation Formula for Risk Planning
|
||||
When annual revenue/turnover is unknown, use these conservative estimates:
|
||||
|
||||
| Company Profile | Estimated Annual Turnover | Realistic T1 Fine | Realistic T2 Fine |
|
||||
|----------------|--------------------------|-------------------|-------------------|
|
||||
| Startup (< 10 employees) | < €2M | €25K–€100K | €50K–€250K |
|
||||
| Small business (10–50 employees) | €2M–€10M | €50K–€400K | €100K–€800K |
|
||||
| Mid-size (50–500 employees) | €10M–€100M | €200K–€2M | €500K–€4M |
|
||||
| Large enterprise (500–5K employees) | €100M–€1B | €2M–€20M | €5M–€40M |
|
||||
| Multinational | > €1B | €10M (capped at 2%) | €20M (capped at 4%) |
|
||||
|
||||
**Historic GDPR fines for calibration (all publicly verified — links in SOURCES.md):**
|
||||
- Meta: €1.2B (2023) — cross-border data transfer violations
|
||||
- Amazon: €746M (2021) — cookie consent violations
|
||||
- WhatsApp: €225M (2021) — transparency violations
|
||||
- Google: €150M (France, 2022) — cookie withdrawal
|
||||
- H&M: €35.3M (2020) — employee monitoring
|
||||
- British Airways: €22M (2020) — security breach (500K records)
|
||||
- Marriott: €18.4M (2020) — security breach (339M records)
|
||||
|
||||
**Breach notification fine enhancement:** Non-notification or late notification adds 20–30% to the base fine.
|
||||
|
||||
---
|
||||
|
||||
## CCPA / CPRA Fine Calculator
|
||||
|
||||
**Legal source:** California Civil Code § 1798.155(a) (as amended June 30, 2025, Stats. 2025, Ch. 20) — https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV§ionNum=1798.155
|
||||
**Private right of action source:** California Civil Code § 1798.150 — https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV§ionNum=1798.150
|
||||
|
||||
```
|
||||
Non-intentional violations: $2,500 per violation [§ 1798.155(a)]
|
||||
Intentional violations: $7,500 per violation [§ 1798.155(a)]
|
||||
Children's data violations: $7,500 per violation [§ 1798.155(a) — intent not required for minors]
|
||||
Private right of action: $100–$750 per consumer [§ 1798.150]
|
||||
```
|
||||
|
||||
### Calculation for mass breach
|
||||
|
||||
```
|
||||
Max_CCPA_Fine = Records_affected × $7,500 (if intentional)
|
||||
= Records_affected × $2,500 (if unintentional)
|
||||
```
|
||||
|
||||
**Cap:** California AG can seek up to $2,500 per consumer per violation, but class action suits under private right of action can reach $100–$750 per consumer.
|
||||
|
||||
**Private right of action (unique to CCPA/CPRA):**
|
||||
```
|
||||
Civil_damages = max($100, min($750, actual_damages)) × affected_California_consumers
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
- 100K Californian users × $750 = $75M maximum private right of action
|
||||
- 100K users × $2,500 = $250M maximum CCPA fine (regulatory)
|
||||
|
||||
---
|
||||
|
||||
## HIPAA Fine Calculator
|
||||
|
||||
**Legal source:** 45 CFR § 160.404 — https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-160/subpart-D/section-160.404
|
||||
**HHS enforcement page:** https://www.hhs.gov/hipaa/for-professionals/compliance-enforcement/examples/all-cases/index.html
|
||||
**Note:** Amounts are 2024 inflation-adjusted figures per HHS. Updated annually — verify at HHS link above.
|
||||
|
||||
HIPAA fines are tiered by knowledge/culpability (45 CFR § 160.404):
|
||||
|
||||
| Tier | Culpability | Min per Violation | Max per Violation | Annual Cap |
|
||||
| A | Did not know | $137 | $68,928 | $2,067,813 |
|
||||
| B | Reasonable cause | $1,379 | $68,928 | $2,067,813 |
|
||||
| C | Willful neglect, corrected | $13,785 | $68,928 | $2,067,813 |
|
||||
| D | Willful neglect, not corrected | $68,928 | $1,919,173 | $1,919,173 |
|
||||
|
||||
**For breach planning:** Each affected patient record where PHI was exposed = 1 violation.
|
||||
|
||||
**Breach notification costs:** HHS requires notification to affected individuals + HHS. Breaches of 500+ individuals in a state require media notification. Breaches of 500+ total require HHS annual report.
|
||||
|
||||
**Criminal penalties (DOJ — for egregious cases):**
|
||||
- Up to $50,000 + 1 year imprisonment (simple violation)
|
||||
- Up to $100,000 + 5 years (under false pretenses)
|
||||
- Up to $250,000 + 10 years (with intent to sell/use)
|
||||
|
||||
---
|
||||
|
||||
## LGPD Fine Calculator (Brazil)
|
||||
|
||||
**Legal source:** Lei nº 13.709/2018 (LGPD) — Article 52, I — https://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/l13709.htm
|
||||
**ANPD (Brazilian DPA):** https://www.gov.br/anpd/pt-br
|
||||
|
||||
```
|
||||
Maximum fine per violation = 2% of revenue in Brazil in the prior fiscal year [Art. 52, I]
|
||||
Hard cap = R$50,000,000 (≈ $10M USD) per violation [Art. 52, I]
|
||||
```
|
||||
|
||||
Daily fine possible during non-compliance period.
|
||||
**Brazilian DPA (ANPD) enforcement began 2021.** Enforcement ramp-up is ongoing.
|
||||
|
||||
---
|
||||
|
||||
## Breach Notification Timeline Reference
|
||||
|
||||
**All timelines are sourced from primary legal texts.** See `SOURCES.md` for exact article/section URLs for each regulation.
|
||||
|
||||
How fast you must notify regulators and affected individuals after discovering a breach:
|
||||
|
||||
| Regulation | Regulator Notification | Individual Notification | Legal Source | Notes |
|
||||
|-----------|----------------------|------------------------|-------------|-------|
|
||||
| GDPR | **72 hours** from discovery | "Without undue delay" if high risk | Art. 33 & 34 | Must notify even if details incomplete |
|
||||
| UK GDPR | **72 hours** from discovery | Without undue delay | UK GDPR Art. 33 | Retained EU law post-Brexit |
|
||||
| CCPA / CPRA | "Most expedient time" (no hard number) | Same | Cal. Civ. Code § 1798.82 | CA AG if > 500 CA residents |
|
||||
| HIPAA | **60 days** from discovery | 60 days (or sooner) | 45 CFR § 164.412 | HHS + media for 500+ in one state |
|
||||
| LGPD (Brazil) | **2 business days** (ANPD guidance) | As soon as possible | ANPD Resolution nº 2/2022 | ANPD enforcing since 2021 |
|
||||
| Singapore PDPA | **3 calendar days** for mandatory breach | Without undue delay | PDPA Section 26D (2021 amendment) | One of the strictest globally |
|
||||
| Australia Privacy Act | ASAP, no later than **30 days** | As soon as practicable | Privacy Act 1988 — NDB Scheme | notifiable-data-breaches scheme |
|
||||
| PIPEDA (Canada) | **As soon as feasible** | **As soon as feasible** | PIPEDA s.10.1 | OPCC notification required |
|
||||
| Japan APPI | **3–5 business days** | Promptly | APPI Art. 26 (2022 amendment) | Tightened from prior version |
|
||||
|
||||
---
|
||||
|
||||
## Total Breach Cost Estimation Model
|
||||
|
||||
**Benchmark source:** IBM Security + Ponemon Institute — "Cost of a Data Breach Report" (annually updated)
|
||||
**URL:** https://www.ibm.com/reports/data-breach
|
||||
Figures below are from the **2024 edition** (last verified). IBM 2025 shows a 9% decrease — download the current PDF for updated values. **[IBM 2024, p.14]** page references refer to the 2024 edition.
|
||||
|
||||
Use this model when generating the Financial Impact Estimate section:
|
||||
|
||||
### Direct Costs
|
||||
```
|
||||
1. Detection & containment: $1.1M average [IBM 2024, p.14]
|
||||
2. Post-breach response: $1.2M average [IBM 2024, p.14]
|
||||
3. Lost business: $1.5M average [IBM 2024, p.14]
|
||||
4. Notification costs: records × $2–$8 per individual [industry estimate]
|
||||
5. Credit monitoring: records × $5–$20/year if PII [industry estimate]
|
||||
6. Legal costs: $200K–$3M depending on complexity [industry estimate]
|
||||
7. Forensic investigation: $50K–$500K [industry estimate]
|
||||
8. PR/crisis communications: $100K–$500K [industry estimate]
|
||||
```
|
||||
|
||||
### Regulatory Costs
|
||||
```
|
||||
9. Regulatory fines: [see per-regulation formulas above — all sourced from law text]
|
||||
10. Settlement costs: $1M–$100M+ for class actions [historic case data]
|
||||
```
|
||||
|
||||
### Reputational Multiplier
|
||||
Apply based on public visibility of the organization:
|
||||
```
|
||||
B2C consumer app, consumer brand: ×1.5 (high reputational damage)
|
||||
B2B enterprise, low public profile: ×1.1 (moderate reputational damage)
|
||||
Healthcare or financial institution: ×2.0 (trust erosion is severe)
|
||||
Government or public sector: ×1.8 (public accountability)
|
||||
```
|
||||
|
||||
### Final Estimate Format
|
||||
```
|
||||
Minimum likely cost: [conservative scenario, good response, small record count]
|
||||
Probable cost: [most likely scenario, average response]
|
||||
Maximum exposure: [worst case: maximum fines + class action + reputational]
|
||||
```
|
||||
305
skills/data-breach-blast-radius/references/report-format.md
Normal file
305
skills/data-breach-blast-radius/references/report-format.md
Normal file
@@ -0,0 +1,305 @@
|
||||
# Blast Radius Report Format
|
||||
|
||||
Use this template to generate the complete Data Breach Blast Radius report. Fill every section — do not skip any.
|
||||
|
||||
---
|
||||
|
||||
## Full Report Template
|
||||
|
||||
````markdown
|
||||
# 💥 Data Breach Blast Radius Report
|
||||
|
||||
**Repository:** [repo name or path analyzed]
|
||||
**Analysis date:** [ISO 8601 date]
|
||||
**Scope:** [full repo / specific path]
|
||||
**Languages / frameworks detected:** [list]
|
||||
**Analyzed by:** GitHub Copilot — data-breach-blast-radius skill
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
[2–3 paragraphs in plain English. No technical jargon. Assume the reader is a CEO, CISO, or board member who will ask: "How bad would it be?"
|
||||
|
||||
Paragraph 1: What data does this system hold and roughly how many people are affected?
|
||||
Paragraph 2: What is the single most dangerous exposure vector found? What would happen if it were exploited today?
|
||||
Paragraph 3: What is the estimated financial and regulatory impact? What is the most important thing to fix first?]
|
||||
|
||||
---
|
||||
|
||||
## Sensitive Data Inventory
|
||||
|
||||
All personal, health, financial, and credential data found in the codebase:
|
||||
|
||||
| # | Field Name | Source Location | Data Tier | Category | Encrypted? | Logged? | External Exposure? |
|
||||
|---|-----------|----------------|-----------|----------|-----------|---------|-------------------|
|
||||
| 1 | `email` | `models/user.py:14` | T3 — High | Contact | ❌ No | ⚠️ Yes | ✅ API response |
|
||||
| 2 | `ssn` | `models/employee.py:28` | T1 — Catastrophic | Gov. ID | ❌ No | ❌ No | ❌ No |
|
||||
| 3 | `card_number` | `models/payment.py:9` | T2 — Critical | PCI-DSS | ⚠️ Partial | ❌ No | ❌ No |
|
||||
| ... | ... | ... | ... | ... | ... | ... | ... |
|
||||
|
||||
**Summary:**
|
||||
- Tier 1 (Catastrophic) fields: [N]
|
||||
- Tier 2 (Critical) fields: [N]
|
||||
- Tier 3 (High) fields: [N]
|
||||
- Tier 4 (Elevated) fields: [N]
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Map
|
||||
|
||||
How sensitive data moves through the system. Read left to right: ingestion → processing → storage → transmission.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Ingestion["📥 Ingestion"]
|
||||
A1[User Registration\nPOST /api/users\nT3: email, phone\nT2: date_of_birth]
|
||||
A2[Payment\nPOST /api/payments\nT2: card_number, cvv]
|
||||
A3[Health Record\nPOST /api/health\nT1: diagnosis, mrn]
|
||||
end
|
||||
|
||||
subgraph Processing["⚙️ Processing"]
|
||||
B1[Auth Service\nJWT issued\nT3: email in token]
|
||||
B2[Payment Processor\nStripe tokenization\nT2: card → token]
|
||||
B3[Analytics\nMixpanel events\nT3: email logged ⚠️]
|
||||
end
|
||||
|
||||
subgraph Storage["🗄️ Storage"]
|
||||
C1[(PostgreSQL\nusers table\nT1+T2+T3 data\nNo field encryption)]
|
||||
C2[(Redis Cache\nSession data\nT3: email in cache)]
|
||||
C3[(S3 Bucket\nUser exports\n⚠️ Public read ACL)]
|
||||
end
|
||||
|
||||
subgraph Transmission["📤 Transmission"]
|
||||
D1[REST API\n/api/users/:id\n⚠️ No ownership check\nT1+T2+T3 in response]
|
||||
D2[Email notifications\nT3: email body contains\nfull name + order details]
|
||||
D3[Webhooks\nT3: email in payload]
|
||||
end
|
||||
|
||||
A1 --> B1 --> C1 --> D1
|
||||
A2 --> B2 --> C1
|
||||
A3 --> C1
|
||||
B1 --> C2
|
||||
C1 --> D2
|
||||
C1 --> D3
|
||||
C1 --> C3
|
||||
|
||||
style C3 fill:#ff6b6b,color:#fff
|
||||
style D1 fill:#ff6b6b,color:#fff
|
||||
style B3 fill:#ffa500,color:#fff
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Top Exposure Vectors
|
||||
|
||||
Ranked by Blast Radius Score (highest first):
|
||||
|
||||
### 🔴 Vector 1: [Title] — BRS: [score]/100
|
||||
|
||||
**Location:** `[file path]:[line number]`
|
||||
**Type:** [IDOR / Unauthenticated endpoint / Public storage / Log leakage / Over-fetching API / etc.]
|
||||
**Data exposed:** [T1/T2/T3 fields that would be exposed]
|
||||
**Exploitation:** [1–2 sentences — how an attacker would use this]
|
||||
**Records at risk:** [number or estimate]
|
||||
**Jurisdictions triggered:** [GDPR / CCPA / HIPAA / etc.]
|
||||
|
||||
```[language]
|
||||
// Vulnerable code snippet (exact location)
|
||||
[code]
|
||||
```
|
||||
|
||||
**Blast Radius Score breakdown:**
|
||||
- Data tier: T[N] → weight [W]
|
||||
- Exposure likelihood: [E] ([label])
|
||||
- Population at risk: [N] records → scale [P]
|
||||
- Completeness: [factor] ([label])
|
||||
- Context multiplier: ×[M] ([reason])
|
||||
- **BRS: [calculated score]/100**
|
||||
|
||||
---
|
||||
|
||||
### 🔴 Vector 2: [Title] — BRS: [score]/100
|
||||
|
||||
[repeat structure]
|
||||
|
||||
---
|
||||
|
||||
### 🟠 Vector 3: [Title] — BRS: [score]/100
|
||||
|
||||
[repeat structure]
|
||||
|
||||
---
|
||||
|
||||
### 🟠 Vector 4: [Title] — BRS: [score]/100
|
||||
|
||||
[repeat structure]
|
||||
|
||||
---
|
||||
|
||||
### 🟡 Vector 5: [Title] — BRS: [score]/100
|
||||
|
||||
[repeat structure]
|
||||
|
||||
---
|
||||
|
||||
## Regulatory Blast Radius
|
||||
|
||||
### Jurisdictions Triggered
|
||||
|
||||
| Regulation | Triggered? | Trigger Evidence | Notification Deadline |
|
||||
|-----------|-----------|-----------------|----------------------|
|
||||
| GDPR | [Yes/No/Unknown] | [e.g., EUR currency, EU cloud region] | 72 hours |
|
||||
| CCPA | [Yes/No/Unknown] | [e.g., California users, US domain] | Expedient |
|
||||
| HIPAA | [Yes/No/Unknown] | [e.g., PHI fields found, FHIR endpoints] | 60 days |
|
||||
| LGPD | [Yes/No/Unknown] | [e.g., BRL currency, CPF field] | 2 business days |
|
||||
| Singapore PDPA | [Yes/No/Unknown] | [e.g., SGD, +65 phone patterns] | 3 calendar days |
|
||||
| PCI-DSS | [Yes/No/Unknown] | [e.g., card_number field found] | Immediate |
|
||||
|
||||
---
|
||||
|
||||
## Financial Impact Estimate
|
||||
|
||||
> These are risk planning estimates only. Consult legal counsel for actual regulatory exposure.
|
||||
|
||||
### Maximum Simultaneous Exposure
|
||||
- **Total records at risk (worst case):** [number]
|
||||
- **Tier 1 records (catastrophic data):** [number]
|
||||
- **Estimated affected individuals:** [number]
|
||||
- **Active regulatory jurisdictions:** [list]
|
||||
|
||||
### Financial Impact Range
|
||||
|
||||
| Scenario | Estimated Cost | Key Assumptions |
|
||||
|---------|---------------|----------------|
|
||||
| **Minimum** (fast response, few records, cooperative regulatory outcome) | $[X] | [assumptions] |
|
||||
| **Probable** (industry average response time, moderate regulatory action) | $[X] | [assumptions] |
|
||||
| **Maximum** (slow detection, maximum fines, class action) | $[X] | [assumptions] |
|
||||
|
||||
### Breakdown (Probable Scenario)
|
||||
|
||||
| Cost Category | Estimate |
|
||||
|--------------|---------|
|
||||
| Detection & containment | $[X] |
|
||||
| Post-breach response | $[X] |
|
||||
| Legal & forensics | $[X] |
|
||||
| Breach notification & monitoring | $[X] |
|
||||
| Regulatory fines ([jurisdictions]) | $[X] |
|
||||
| Reputational/business impact | $[X] |
|
||||
| **Total estimated cost** | **$[X]** |
|
||||
|
||||
**Cost benchmarks used:** IBM Cost of a Data Breach Report 2024 ($4.88M global average, $165/record average) — verify current figures at ibm.com/reports/data-breach
|
||||
|
||||
---
|
||||
|
||||
## Hardening Roadmap
|
||||
|
||||
Prioritized by `(Blast_Radius_Reduction × Severity) / Effort`:
|
||||
|
||||
### 🔴 P0 — Fix Immediately (< 1 day each)
|
||||
|
||||
| # | Action | File / Location | Blast Radius Reduction | Effort | Severity |
|
||||
|---|--------|----------------|----------------------|--------|---------|
|
||||
| 1 | [Fix IDOR on /api/users/:id — add ownership check] | `routes/users.ts:45` | 85% for this vector | ⚡ Low | CRITICAL |
|
||||
| 2 | [Remove SSN from API response DTO] | `dtos/employee.dto.ts:22` | 90% for SSN exposure | ⚡ Low | CRITICAL |
|
||||
| 3 | [Block public read ACL on S3 bucket] | `infra/storage.tf:14` | 100% for S3 exposure | ⚡ Low | HIGH |
|
||||
|
||||
---
|
||||
|
||||
### 🟠 P1 — Fix This Week
|
||||
|
||||
| # | Action | File / Location | Blast Radius Reduction | Effort | Severity |
|
||||
|---|--------|----------------|----------------------|--------|---------|
|
||||
| 4 | [Encrypt SSN field with KMS] | `models/employee.py:28` | 80% for SSN field | 🔧 Medium | HIGH |
|
||||
| 5 | [Remove email from log statements (7 locations)] | `services/auth.py:66,89,121...` | 60% for log vector | 🔧 Medium | HIGH |
|
||||
| 6 | [Tokenize card data — migrate to Stripe Elements] | `services/payment.py` | 95% for card data | 🔧 Medium | CRITICAL |
|
||||
|
||||
---
|
||||
|
||||
### 🟡 P2 — Fix This Sprint
|
||||
|
||||
| # | Action | Blast Radius Reduction | Effort | Severity |
|
||||
|---|--------|----------------------|--------|---------|
|
||||
| 7 | [Add rate limiting to /api/users/search] | 50% for bulk harvest | ⚡ Low | MEDIUM |
|
||||
| 8 | [Add data access audit log for T1/T2 reads] | -60% detection time | 🔧 Medium | HIGH |
|
||||
| 9 | [Add field projection to user query (remove unused fields from SELECT)] | 40% reduction in over-fetching | ⚡ Low | MEDIUM |
|
||||
|
||||
---
|
||||
|
||||
### ⚪ P3 — Fix This Quarter
|
||||
|
||||
| # | Action | Blast Radius Reduction | Effort | Severity |
|
||||
|---|--------|----------------------|--------|---------|
|
||||
| 10 | [Implement data retention policy + auto-deletion job] | 30% reduction in stale data | 🏗️ High | MEDIUM |
|
||||
| 11 | [Pseudonymize analytics user IDs] | 70% for analytics data | 🔧 Medium | MEDIUM |
|
||||
| 12 | [Separate analytics store from production PII DB] | 60% architectural reduction | 🏗️ High | LOW |
|
||||
|
||||
---
|
||||
|
||||
## Analysis Assumptions
|
||||
|
||||
Document all assumptions made during this analysis (transparency is critical):
|
||||
|
||||
| Assumption | Value Used | Basis |
|
||||
|-----------|-----------|-------|
|
||||
| User population estimate | [X users] | [signal found or conservative default] |
|
||||
| Annual revenue estimate for fine calculation | [unknown / $X range] | [signals or not found] |
|
||||
| Geographic distribution | [assumed global / EU users likely] | [currency signals found] |
|
||||
| Healthcare context | [assumed / not applicable] | [PHI fields found / not found] |
|
||||
|
||||
---
|
||||
|
||||
## What Was Scanned
|
||||
|
||||
- **Files analyzed:** [list key files or note "all files in repo"]
|
||||
- **Data model files:** [list schema/model files]
|
||||
- **API layer:** [list controller/route files]
|
||||
- **Config/infrastructure:** [list .env, terraform, CI/CD files]
|
||||
- **Log/monitoring:** [list logging config files]
|
||||
- **Test data:** [note if test fixtures contain real PII]
|
||||
|
||||
---
|
||||
|
||||
*This report was generated by the [data-breach-blast-radius](https://github.com/github/awesome-copilot/tree/main/skills/data-breach-blast-radius) skill for GitHub Copilot.*
|
||||
*For risk planning purposes only. Consult qualified legal counsel and security professionals for actual regulatory guidance.*
|
||||
````
|
||||
|
||||
---
|
||||
|
||||
## Mermaid Diagram Conventions
|
||||
|
||||
Use these conventions in the Data Flow Map:
|
||||
|
||||
```
|
||||
# Node colors (using style declarations):
|
||||
🔴 fill:#ff6b6b,color:#fff → Public/unauthenticated exposure (CRITICAL)
|
||||
🟠 fill:#ffa500,color:#fff → Auth required but weak controls (HIGH)
|
||||
🟡 fill:#ffd700,color:#000 → Internal but over-broad access (MEDIUM)
|
||||
🟢 fill:#51cf66,color:#fff → Properly secured (GOOD)
|
||||
|
||||
# Node labels should include:
|
||||
- Action name
|
||||
- HTTP method + path (for API nodes)
|
||||
- Data tiers present (T1, T2, T3)
|
||||
- ⚠️ Warning emoji if an issue exists
|
||||
|
||||
# Subgraphs:
|
||||
- Ingestion (📥)
|
||||
- Processing (⚙️)
|
||||
- Storage (🗄️)
|
||||
- Transmission (📤)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Severity Icons
|
||||
|
||||
| Symbol | Severity | BRS Range |
|
||||
|--------|---------|-----------|
|
||||
| 🔴 | CRITICAL | 76–100 |
|
||||
| 🟠 | HIGH | 51–75 |
|
||||
| 🟡 | MEDIUM | 26–50 |
|
||||
| 🔵 | LOW | 0–25 |
|
||||
| ✅ | SECURE | Control in place |
|
||||
| ⚠️ | WARNING | Partial control |
|
||||
| ❌ | VULNERABLE | No control |
|
||||
Reference in New Issue
Block a user