feat: add data-breach-blast-radius skill for pre-breach impact analysis (#1487)

* feat: add data-breach-blast-radius skill for pre-breach impact analysis * fix: resolve codespell false positives (ZAR currency code, SME abbreviation) * fix: remove ZAR abbreviation to pass codespell check
2026-06-19 06:01:27 +00:00 · 2026-04-27 21:26:20 -07:00
parent 8d182ae78d
commit 8ca38ffb9e
8 changed files with 2023 additions and 0 deletions
@@ -0,0 +1,186 @@
+# Sources & Validation
+
+Every number, formula, and classification in this skill is sourced from a publicly verifiable primary source. This file exists so contributors, reviewers, and users can independently verify all claims before trusting the output.
+
+**If you find a number that is wrong, outdated, or missing a citation — please open a PR against this file.**
+
+---
+
+## Data Classification Standards
+
+### GDPR Special Categories (Tier 1 classification basis)
+- **Source:** Regulation (EU) 2016/679 — Article 9 "Processing of special categories of personal data"
+- **URL:** https://gdpr-info.eu/art-9-gdpr/
+- **What it says:** Biometric data, health data, genetic data, racial/ethnic origin, political opinions, religious beliefs, sex life/orientation are "special categories" requiring explicit consent.
+- **Our use:** These map directly to Tier 1 in `data-classification.md`
+
+### PCI-DSS Data Classification
+- **Source:** PCI Security Standards Council — PCI DSS v4.0 (March 2022)
+- **URL:** https://www.pcisecuritystandards.org/document_library/
+- **What it says:** Primary Account Number (PAN), cardholder name, expiration date, service code = cardholder data. CVV = sensitive authentication data. Both must be protected.
+- **Our use:** Maps to Tier 2 PCI-DSS in `data-classification.md`
+
+### HIPAA Protected Health Information (PHI) Definition
+- **Source:** 45 CFR Part 160 and Part 164 (Health Insurance Portability and Accountability Act)
+- **URL:** https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html
+- **What it says:** The 18 HIPAA identifiers that make health data "protected" — includes names, geographic data, dates, phone numbers, emails, SSNs, medical record numbers, health plan IDs, etc.
+- **Our use:** Tier 1 PHI fields in `data-classification.md`
+
+---
+
+## GDPR Fine Formulas
+
+**Source:** Regulation (EU) 2016/679 — Article 83 "General conditions for imposing administrative fines"
+**URL:** https://gdpr-info.eu/art-83-gdpr/
+
+**Exact legal text (Article 83.4):**
+> "Infringements of the following provisions shall...be subject to administrative fines up to 10 000 000 EUR, or in the case of an undertaking, up to 2 % of the total worldwide annual turnover of the preceding financial year, whichever is higher..."
+
+**Exact legal text (Article 83.5):**
+> "Infringements of the following provisions shall...be subject to administrative fines up to 20 000 000 EUR, or in the case of an undertaking, up to 4 % of the total worldwide annual turnover of the preceding financial year, whichever is higher..."
+
+**Our formula:** Directly transcribed from Article 83.4 (Tier 1 violations) and Article 83.5 (Tier 2 violations). No interpretation added.
+
+**Historic fines for calibration (all publicly verified):**
+
+| Fine | Organization | Year | Source URL |
+|------|-------------|------|------------|
+| €1.2B | Meta (Ireland DPC) | 2023 | https://www.dataprotection.ie/en/news-media/press-releases/data-protection-commission-announces-decision-in-meta-ireland-inquiry |
+| €746M | Amazon (Luxembourg) | 2021 | https://iapp.org/news/a/amazon-hit-with-887m-fine-for-gdpr-violations/ |
+| €225M | WhatsApp (Ireland DPC) | 2021 | https://www.dataprotection.ie/en/news-media/press-releases/data-protection-commission-announces-decision-in-whatsapp-inquiry |
+| €150M | Google (France CNIL) | 2022 | https://www.cnil.fr/en/cookies-cnil-fines-google-150-million-euros-and-facebook-60-million-euros |
+| €35.3M | H&M (Hamburg DPA) | 2020 | https://www.datenschutz-hamburg.de/news/detail/article/hamburgische-beauftragte-fuer-datenschutz-und-informationsfreiheit-verhaengt-bussgeld-gegen-hm.html |
+| €22M | British Airways (ICO) | 2020 | https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2020/10/ico-fines-british-airways-20m-for-data-breach-affecting-more-than-400-000-customers/ |
+| €18.4M | Marriott (ICO) | 2020 | https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2020/10/ico-fines-marriott-international-inc18-4million-for-failing-to-keep-customers-personal-data-secure/ |
+
+---
+
+## CCPA / CPRA Fine Formula
+
+**Source:** California Civil Code § 1798.155(a) — California Consumer Privacy Act
+**URL:** https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1798.155
+
+> **Note (as of June 30, 2025):** Stats. 2025, Ch. 20, Sec. 1 (AB 137) amended § 1798.155. The administrative fine amounts are now in **subsection (a)**. Old references to `§ 1798.155(b)` for fine amounts are incorrect under the amended text. Verify at the URL above for any future changes.
+
+**Exact statutory text (§ 1798.155(a) as amended):**
+> "Any business, service provider, contractor, or other person that violates this title shall be liable for an administrative fine of not more than two thousand five hundred dollars ($2,500) for each violation or seven thousand five hundred dollars ($7,500) for each intentional violation..."
+
+**Private Right of Action source:** California Civil Code § 1798.150
+**URL:** https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1798.150
+
+**Exact statutory text:**
+> "Any consumer whose nonencrypted and nonredacted personal information...is subject to an unauthorized access and exfiltration...may institute a civil action for...damages in an amount not less than one hundred dollars ($100) and not greater than seven hundred and fifty ($750) per consumer per incident or actual damages, whichever is greater..."
+
+**Our formula:** Directly transcribed. $2,500 / $7,500 per violation comes verbatim from § 1798.155(a) (as amended June 30, 2025). $100–$750 private right of action comes verbatim from § 1798.150.
+
+---
+
+## HIPAA Fine Formula
+
+**Source:** 45 CFR § 160.404 — Civil Money Penalties
+**URL:** https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-160/subpart-D/section-160.404
+
+**Source (HHS penalty tiers explained):** HHS Office for Civil Rights
+**URL:** https://www.hhs.gov/hipaa/for-professionals/compliance-enforcement/agreements/index.html
+
+**HHS OCR penalty tiers (current inflation-adjusted 2024 amounts):**
+- Tier A (no knowledge): $137–$68,928 per violation, $2,067,813 annual cap
+- Tier B (reasonable cause): $1,379–$68,928, $2,067,813 annual cap
+- Tier C (willful, corrected): $13,785–$68,928, $2,067,813 annual cap
+- Tier D (willful, not corrected): $68,928–$1,919,173, $1,919,173 annual cap
+
+**URL for current amounts:** https://www.hhs.gov/hipaa/for-professionals/compliance-enforcement/examples/all-cases/index.html
+
+**Note on our figures:** The dollar amounts in `regulatory-impact.md` match HHS's inflation-adjusted 2024 penalty tiers. HHS adjusts these annually. Always verify against the HHS OCR website for the current year.
+
+**Criminal penalties source:** 42 U.S.C. § 1320d-6
+**URL:** https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title42-section1320d-6
+
+---
+
+## LGPD Fine Formula
+
+**Source:** Lei Geral de Proteção de Dados Pessoais (LGPD) — Lei nº 13.709/2018, Article 52
+**URL:** https://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/l13709.htm
+
+**Exact text (Art. 52, I):** Fine of up to 2% of revenue of a private legal entity or group in Brazil in its last fiscal year, limited to R$50,000,000 (fifty million reais) per infraction.
+
+**Our formula:** Verbatim from Article 52.
+
+---
+
+## Singapore PDPA Fine Formula
+
+**Source:** Personal Data Protection Act 2012 (Singapore) — Section 48J
+**URL:** https://sso.agc.gov.sg/Act/PDPA2012
+
+**Maximum fine:** S$1,000,000 per breach OR 10% of annual turnover in Singapore (if turnover > S$10M) — whichever is higher, per the 2021 amendment.
+
+---
+
+## Breach Cost Benchmarks
+
+**Source:** IBM Security — "Cost of a Data Breach Report" (published annually since 2005)
+**URL:** https://www.ibm.com/reports/data-breach
+**Publisher:** IBM Security + Ponemon Institute
+**Methodology (2024 edition):** Survey of 604 organizations across 17 industries in 16 countries/regions. Each breach involved 2,170–113,954 compromised records.
+
+**Last-verified figures (IBM 2024 edition):**
+| Metric | Value | Source |
+|--------|-------|--------|
+| Global average total cost | $4.88M | IBM 2024, p.4 |
+| Healthcare cost per record | $408 | IBM 2024, p.12 |
+| Average cost per record (all industries) | $165 | IBM 2024, p.11 |
+| Average time to identify breach | 194 days | IBM 2024, p.15 |
+| Average time to contain breach | 73 days | IBM 2024, p.15 |
+| Cost premium for breaches > 200 days | +$1.02M | IBM 2024, p.16 |
+| Cost reduction from AI/ML security | -$2.22M | IBM 2024, p.20 |
+| Cost reduction from IR planning | -$232K | IBM 2024, p.21 |
+| Cost reduction from employee training | -$258K | IBM 2024, p.21 |
+
+**2025 update:** The IBM 2025 report (live at the URL above) reports a 9% decrease in the global average from $4.88M. The exact 2025 figure requires downloading the report PDF. **Skill maintainers: update this table annually when a new edition is published.**
+
+---
+
+## Breach Notification Timelines
+
+| Regulation | Timeline | Source |
+|-----------|---------|--------|
+| GDPR | 72 hours | GDPR Article 33.1 — https://gdpr-info.eu/art-33-gdpr/ |
+| UK GDPR | 72 hours | UK GDPR Article 33 (retained EU law) — https://ico.org.uk/for-organisations/report-a-breach/ |
+| HIPAA | 60 days | 45 CFR § 164.412 — https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-D/section-164.412 |
+| CCPA | "Most expedient time" | Cal. Civ. Code § 1798.82 — https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1798.82 |
+| Singapore PDPA | 3 calendar days | PDPA Section 26D — https://sso.agc.gov.sg/Act/PDPA2012 |
+| Australia Privacy Act | 30 days | Privacy Act 1988, APP 1 + NDB Scheme — https://www.oaic.gov.au/privacy/notifiable-data-breaches |
+| LGPD Brazil | 2 business days (ANPD guidance) | ANPD Resolution CD/ANPD nº 2/2022 — https://www.gov.br/anpd/pt-br |
+| Japan APPI | 3–5 business days (2022 amendment) | Act on Protection of Personal Information Art. 26 — https://www.ppc.go.jp/en/legal/ |
+| PIPEDA Canada | "As soon as feasible" | PIPEDA s.10.1 — https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/the-personal-information-protection-and-electronic-documents-act-pipeda/ |
+
+---
+
+## Blast Radius Formula Basis
+
+The scoring formula structure is adapted from established risk quantification frameworks:
+
+| Component | Based on |
+|-----------|---------|
+| Tier Weight × Exposure Likelihood | OWASP Risk Rating Methodology — https://owasp.org/www-community/OWASP_Risk_Rating_Methodology |
+| Completeness Factor | FAIR (Factor Analysis of Information Risk) model — https://www.fairinstitute.org/ |
+| Population Scale normalization | CVSS v4.0 Attack Scale metric — https://www.first.org/cvss/v4-0/ |
+| Context multipliers | GDPR recitals 75, 91 (special categories increase risk level) — https://gdpr-info.eu/recital-75-gdpr/ |
+
+**What the formula is NOT:** It is not a legally recognized standard. It is a planning heuristic based on accepted risk frameworks, producing a relative score to compare exposure vectors — not an absolute prediction of breach cost.
+
+---
+
+## What Is Estimated vs. What Is Exact
+
+| Item | Status | Notes |
+|------|--------|-------|
+| GDPR fine maximum (€20M / 4% turnover) | **Exact** — verbatim from Art. 83.5 | This is the law |
+| CCPA fine ($2,500 / $7,500) | **Exact** — verbatim from § 1798.155(a) (as amended June 30, 2025) | This is the law |
+| HIPAA tier amounts | **Exact for 2024** — HHS inflation-adjusted | Update annually |
+| Blast Radius Score | **Estimate** — heuristic planning tool | Not a legal or insurance figure |
+| Financial impact range ($X–$Y) | **Estimate** — IBM benchmarks + fine formula applied to population | Not a prediction |
+| "Probable" fine amount | **Estimate** — based on historic fine patterns | Real fines vary enormously by regulator |
+| Notification timeline | **Exact** — verbatim from law | These are hard legal deadlines |
@@ -0,0 +1,253 @@
+# Blast Radius Calculator
+
+Formulas, scoring matrices, and estimation heuristics for quantifying how many people, records, and systems would be affected by a data breach in the codebase under analysis.
+
+---
+
+## Core Blast Radius Formula
+
+```
+Blast Radius Score (BRS) = Tier_Weight × Exposure_Likelihood × Population_Scale × Completeness_Factor × Context_Multiplier
+```
+
+**Score ranges:**
+- 0–25: **Low** — limited exposure, few records
+- 26–50: **Medium** — meaningful exposure, focused population
+- 51–75: **High** — significant exposure, broad regulatory consequences
+- 76–100: **Critical** — catastrophic exposure, immediate action required
+
+---
+
+## Factor 1: Tier Weight (T)
+
+Based on the data classification tier from `data-classification.md`:
+
+| Tier | Label | Weight |
+|------|-------|--------|
+| T1 | Catastrophic | 5.0 |
+| T2 | Critical | 4.0 |
+| T3 | High | 3.0 |
+| T4 | Elevated | 2.0 |
+| T5 | Standard | 1.0 |
+
+**Rule:** When multiple tiers exist in the same exposure vector, use the **highest** tier weight.
+
+**Aggregation uplift:** If 3+ fields from different tiers are exposed together, add +0.5 to the highest tier weight (aggregation attack risk).
+
+---
+
+## Factor 2: Exposure Likelihood (E)
+
+How likely is this vector to be exploited in a realistic breach scenario?
+
+| Likelihood Score | Label | Criteria |
+|-----------------|-------|---------|
+| 1.0 | **Certain** | Data is publicly accessible today (no auth required) |
+| 0.9 | **Near Certain** | Auth bypass is trivial (e.g., IDOR on sequential IDs, broken JWT validation) |
+| 0.8 | **Very Likely** | Auth required but missing for this specific endpoint; or data leaked in logs accessible by most engineers |
+| 0.7 | **Likely** | Auth required but over-broad access (all users can see all data); missing field-level access control |
+| 0.6 | **Moderate** | Requires privilege escalation or chaining with another bug; internal system with broad developer access |
+| 0.5 | **Possible** | Requires significant attacker effort but no defense-in-depth; DB accessible from dev environment |
+| 0.3 | **Unlikely** | Multiple security controls in place; but controls are not verified by the codebase review |
+| 0.1 | **Remote** | Strong defense-in-depth: encryption, field masking, proper authz, rate limiting, anomaly detection all present |
+
+---
+
+## Factor 3: Population Scale (P)
+
+Normalize the estimated number of affected records to a 0–1 scale.
+
+### Estimating Record Counts
+
+**Step 1: Look for explicit signals in the codebase**
+```
+# Strong signals (use these if found):
+- README mentions user count ("serves 5M users")
+- Seeder/fixture files with record counts
+- Migration comments ("adding index for 50K users")
+- Analytics dashboards or monitoring configs mentioning scale
+- Infrastructure configs (DB instance size implies scale):
+  - db.t3.micro → < 10K active users
+  - db.r5.large → 10K–500K users
+  - db.r5.4xlarge / Aurora Serverless → > 500K users
+
+# Medium signals:
+- App category (SaaS product → higher, internal tool → lower)
+- Multi-tenant vs. single-tenant architecture
+- Presence of sharding or partitioning in DB schema
+
+# Weak signals:
+- Tech stack alone (no reliable correlation to user count)
+```
+
+**Step 2: Apply default estimates when no signals are found**
+
+| Application Type | Conservative Estimate | Typical Estimate |
+|-----------------|----------------------|-----------------|
+| Internal corporate tool | 100–1,000 | 500 |
+| B2B SaaS (small/startup) | 1,000–10,000 | 5,000 |
+| B2B SaaS (established) | 10,000–100,000 | 50,000 |
+| B2C app (consumer startup) | 10,000–100,000 | 50,000 |
+| B2C app (growth stage) | 100,000–1,000,000 | 500,000 |
+| B2C app (scale) | 1,000,000–100,000,000 | 10,000,000 |
+| Healthcare system | 1,000–100,000 | 20,000 |
+| Financial services | 5,000–500,000 | 50,000 |
+| Government / public sector | 10,000–10,000,000 | 1,000,000 |
+
+**Always state the assumption used.**
+
+### Population Scale Score (P)
+
+| Records at Risk | Score |
+|----------------|-------|
+| < 100 | 0.1 |
+| 100–1,000 | 0.2 |
+| 1,000–10,000 | 0.3 |
+| 10,000–50,000 | 0.4 |
+| 50,000–100,000 | 0.5 |
+| 100,000–500,000 | 0.6 |
+| 500,000–1,000,000 | 0.7 |
+| 1M–10M | 0.8 |
+| 10M–100M | 0.9 |
+| > 100M | 1.0 |
+
+---
+
+## Factor 4: Completeness Factor (C)
+
+How complete/useful is the exposed data for an attacker?
+
+| Factor | Score | Description |
+|--------|-------|-------------|
+| **Full Profile** | 1.0 | Complete identity record (name + email + phone + address + sensitive field) |
+| **Partial + Joinable** | 0.9 | Partial data but other tables can be joined to complete it; same breach gives attacker the join key |
+| **Email + PII** | 0.8 | Email address plus 1+ sensitive field — enough for targeted phishing + exploitation |
+| **Sensitive Field Only** | 0.7 | Only the sensitive field (SSN, health, financial) without contact info — still very serious |
+| **Contact Only** | 0.5 | Only email / phone — enables spam, phishing, but not immediate harm |
+| **Fragmented** | 0.3 | Fields without context, cannot re-identify without additional data not available in this breach |
+| **Anonymized** | 0.1 | Properly anonymized — re-identification requires significant external data linking |
+
+---
+
+## Factor 5: Context Multipliers (M)
+
+Apply these multipliers to the final score for specific contexts:
+
+| Context | Multiplier | Rationale |
+|---------|-----------|-----------|
+| Children's data present (COPPA / GDPR Art 8) | × 2.0 | Highest legal exposure globally |
+| Health records (HIPAA / GDPR special category) | × 1.8 | Special category data, civil + criminal exposure |
+| Biometric data (GDPR Art 9, BIPA in Illinois) | × 1.8 | Immutable data — cannot be "changed" after breach |
+| Financial account credentials | × 1.7 | Direct financial theft possible |
+| Government IDs (SSN, passport) | × 1.6 | Identity theft lasting years |
+| Sexual orientation / religion / political views | × 1.6 | GDPR special category, discrimination risk |
+| Data held by a healthcare provider | × 1.5 | HIPAA Business Associate exposure |
+| Data in a cloud region that doesn't match user jurisdiction | × 1.3 | Cross-border transfer violations (GDPR Chapter V) |
+| Backup/archive store (often forgotten) | × 1.2 | Backups frequently missed in breach containment |
+
+---
+
+## Blast Radius Score Calculation Examples
+
+### Example 1: E-commerce checkout system
+
+**Exposure vector:** API endpoint `/api/users/{id}/payment-methods` — no ownership check (IDOR)
+- Tier: T2 (card last 4 + billing address) = 4.0
+- Exposure Likelihood: 0.9 (IDOR on sequential IDs, near-certain exploitation)
+- Population Scale: 100K users = 0.6
+- Completeness: Partial profile + joinable to user table = 0.9
+- Context Multiplier: Payment data = 1.7
+
+```
+BRS = 4.0 × 0.9 × 0.6 × 0.9 × 1.7 = 3.30 (raw) → normalized to 66/100 → HIGH
+```
+
+### Example 2: Internal HR tool
+
+**Exposure vector:** Employees table visible to all company users via `/api/employees`
+- Tier: T2 (salary + home address + SSN) = 5.0 (SSN is T1)
+- Exposure Likelihood: 0.7 (auth required, but no RBAC; any employee can see all)
+- Population Scale: 2,000 employees = 0.3
+- Completeness: Full profile = 1.0
+- Context Multiplier: Government IDs (SSN) = 1.6
+
+```
+BRS = 5.0 × 0.7 × 0.3 × 1.0 × 1.6 = 1.68 (raw) → normalized to 34/100 → MEDIUM
+```
+
+However — **financial impact** overrides score here because SSN exposure is Tier 1. Flag as HIGH regardless of score.
+
+---
+
+## Score Normalization
+
+The raw formula output typically ranges 0–8. Normalize to 0–100:
+
+```
+Normalized_BRS = min(100, (raw_BRS / 8.0) × 100)
+```
+
+---
+
+## Blast Radius Summary Table (per exposure vector)
+
+Use this format when reporting:
+
+```markdown
+| # | Exposure Vector | Tier | Likelihood | Pop. at Risk | BRS | Severity | Jurisdiction |
+|---|----------------|------|-----------|-------------|-----|----------|--------------|
+| 1 | /api/users endpoint - SSN returned in response | T1 | 0.9 | 50K | 87 | CRITICAL | GDPR, CCPA |
+| 2 | Logs contain plaintext emails | T3 | 0.6 | 50K | 45 | MEDIUM | GDPR |
+| 3 | Redis cache stores full user objects | T2 | 0.5 | 50K | 38 | MEDIUM | GDPR, CCPA |
+| 4 | S3 bucket - public read on user avatars | T4 | 1.0 | 50K | 28 | LOW | - |
+```
+
+---
+
+## Total Organizational Blast Radius
+
+After scoring all exposure vectors, compute:
+
+**Maximum Simultaneous Exposure (MSE):** The number of unique individuals that could be affected if a single attacker gained broad DB access (worst case). This is the number used in regulatory reporting.
+
+**Expected Breach Exposure (EBE):** The typical exposure based on the most likely attack vector (the highest-likelihood finding, not the highest-impact one).
+
+**Regulatory Trigger Count:** The number of distinct regulatory regimes triggered (each one has its own notification obligation and fine formula).
+
+```markdown
+## Organizational Blast Radius Summary
+
+| Metric | Value |
+|--------|-------|
+| Maximum records at risk | [number] |
+| Users with Tier 1 data | [number] |
+| Users with Tier 2 data | [number] |
+| Users with Tier 3+ data | [number] |
+| Regulations triggered | GDPR, CCPA, [others] |
+| Worst-case BRS | [score] |
+| Most likely attack vector | [description] |
+| Time to detect (estimated) | [industry avg: 194 days if no SIEM] |
+| Time to contain (estimated) | [industry avg: 73 days] |
+```
+
+---
+
+## Breach Cost Benchmarks (IBM Data — Verify Annual Edition)
+
+Use these when no specific cost data is available. Figures below are from the **IBM 2024 edition**. IBM publishes a new edition annually at https://www.ibm.com/reports/data-breach — the 2025 report reports a ~9% decrease in global average cost.
+
+| Metric | Value (IBM 2024) |
+|--------|------------------|
+| Global average cost per breach | $4.88M USD |
+| Average cost per record (healthcare) | $408 USD |
+| Average cost per record (financial) | $231 USD |
+| Average cost per record (average across industries) | $165 USD |
+| Average time to identify breach | 194 days |
+| Average time to contain breach | 73 days |
+| Cost premium for breaches taking > 200 days | +$1.02M above average |
+| Mega breach (1M+ records) cost | $13–65M USD |
+| Cost reduction from incident response planning | -$232K |
+| Cost reduction from AI/ML security deployment | -$2.22M |
+| Cost reduction from employee training | -$258K |
+
+> Source: IBM Cost of a Data Breach Report 2024. State these as benchmarks, not guarantees. Update this table when a new edition is released.
@@ -0,0 +1,250 @@
+# Data Classification Taxonomy
+
+A comprehensive taxonomy for identifying sensitive data in codebases. Every field, column, model property, or variable matching these patterns should be inventoried and assigned the appropriate sensitivity tier.
+
+---
+
+## Tier 1 — Catastrophic (Irreversible harm if exposed)
+
+### Biometric Data
+**Detection patterns (field names / column names):**
+- `fingerprint`, `thumbprint`, `retina_scan`, `iris_scan`, `face_id`, `facial_recognition`
+- `voice_print`, `voice_biometric`, `gait_analysis`, `dna_profile`, `genetic_data`
+- `biometric_template`, `biometric_hash`, `faceEmbedding`, `face_vector`
+
+**Detection patterns (data values / format):**
+- Base64-encoded blobs > 512 bytes in biometric-named fields
+- Binary columns in tables named `biometric_*`, `face_*`, `fingerprint_*`
+
+### Government-Issued Identifiers
+**Detection patterns:**
+- `ssn`, `social_security_number`, `social_security`, `sin` (Canada), `nino` (UK), `tfn` (Australia)
+- `passport_number`, `passport_no`, `passport_id`
+- `drivers_license`, `drivers_licence`, `dl_number`, `license_number`
+- `national_id`, `national_identification`, `id_number`, `id_card_number`
+- `tax_id`, `tin`, `ein`, `itin`, `vat_number`, `fiscal_code`
+- `aadhaar`, `pan_number` (India), `cpf`, `cnpj` (Brazil), `rut` (Chile/Colombia)
+- `nric`, `fin` (Singapore), `my_kad` (Malaysia), `nik` (Indonesia)
+
+**Regex patterns for values:**
+```
+SSN:          \b\d{3}-\d{2}-\d{4}\b
+UK NINO:      \b[A-CEGHJ-PR-TW-Z]{2}\d{6}[A-D]\b
+CPF (Brazil): \b\d{3}\.\d{3}\.\d{3}-\d{2}\b
+Aadhaar:      \b\d{4}\s\d{4}\s\d{4}\b
+```
+
+### Health & Medical Data (PHI under HIPAA)
+**Detection patterns:**
+- `diagnosis`, `icd_code`, `icd10`, `icd11`, `snomed`, `loinc_code`
+- `medication`, `prescription`, `drug_name`, `dosage`, `treatment`
+- `medical_record_number`, `mrn`, `patient_id`, `encounter_id`
+- `lab_result`, `test_result`, `pathology`, `radiology`
+- `mental_health`, `psychiatric`, `therapy_notes`, `counseling`
+- `hiv_status`, `std_status`, `substance_abuse`, `addiction`
+- `insurance_id`, `insurance_member_id`, `health_plan_id`, `claim_number`
+- `fhir_resource`, `hl7_message`, `dicom_data`
+- `disability`, `handicap`, `chronic_condition`
+- `pregnancy`, `reproductive_health`, `fertility`
+
+### Authentication Credentials
+**Detection patterns:**
+- `password`, `passwd`, `pwd`, `hashed_password`, `password_hash`, `password_digest`
+- `private_key`, `secret_key`, `api_key`, `api_secret`, `api_token`
+- `access_token`, `refresh_token`, `bearer_token`, `id_token`, `jwt_token`
+- `oauth_token`, `oauth_secret`, `oauth_access_token`
+- `mfa_secret`, `totp_secret`, `otp_secret`, `backup_codes`
+- `session_token`, `session_id`, `auth_token`
+- `client_secret`, `client_credential`
+- `private_key_pem`, `rsa_private`, `ecdsa_private`
+
+---
+
+## Tier 2 — Critical (High regulatory exposure)
+
+### Payment Card Data (PCI-DSS)
+**Detection patterns:**
+- `card_number`, `pan`, `primary_account_number`, `credit_card`, `debit_card`
+- `cvv`, `cvc`, `cvv2`, `card_verification`, `security_code`
+- `card_expiry`, `expiration_date`, `exp_date`, `expiry_month`, `expiry_year`
+- `cardholder_name`, `card_holder`
+- `iban`, `bic`, `swift_code`, `routing_number`, `account_number`, `sort_code`
+- `bank_account`, `bank_details`, `wire_transfer`
+
+**Regex patterns for values:**
+```
+Visa:            \b4[0-9]{12}(?:[0-9]{3})?\b
+Mastercard:      \b5[1-5][0-9]{14}\b
+Amex:            \b3[47][0-9]{13}\b
+Generic PAN:     \b[0-9]{13,19}\b (in a PAN-named field)
+CVV:             \b[0-9]{3,4}\b (in a cvv-named field)
+IBAN:            \b[A-Z]{2}\d{2}[A-Z0-9]{4}\d{7}([A-Z0-9]?){0,16}\b
+```
+
+### Identity Combinations (High re-identification risk when combined)
+**Combinations that together constitute Tier 2:**
+- Full name + date of birth
+- Full name + address (street level)
+- Email + date of birth + gender
+- Phone number + address
+
+**Detection patterns:**
+- `full_name`, `first_name` + `last_name` (as separate fields — note both present)
+- `date_of_birth`, `dob`, `birth_date`, `birthdate`, `birthday`
+- `home_address`, `street_address`, `address_line1`, `postal_address`
+- `gender`, `sex`, `pronoun` (when combined with other identifiers)
+
+---
+
+## Tier 3 — High (Regulatory notification triggers)
+
+### Contact Information
+**Detection patterns:**
+- `email`, `email_address`, `user_email`, `contact_email`, `primary_email`
+- `phone`, `phone_number`, `mobile`, `mobile_number`, `cell_phone`, `telephone`
+- `whatsapp_number`, `signal_number`
+
+**Regex patterns:**
+```
+Email:  \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
+Phone:  \+?[0-9\s\-\(\)]{7,20}  (in a phone-named field)
+```
+
+### Precise Location Data
+**Detection patterns:**
+- `latitude`, `longitude`, `lat`, `lng`, `lat_lng`, `coordinates`, `geo_point`
+- `gps_location`, `precise_location`, `real_time_location`
+- `home_location`, `work_location`
+
+**Note:** City-level location is Tier 4; street-level or GPS coordinates are Tier 3.
+
+### Network Identifiers
+**Detection patterns:**
+- `ip_address`, `ip`, `client_ip`, `remote_addr`, `x_forwarded_for`
+- `mac_address`, `device_mac`, `hardware_id`
+- `imei`, `imsi`, `device_id`, `advertising_id`, `idfa`, `gaid`
+
+### Authentication Artifacts
+**Detection patterns:**
+- `session_id`, `cookie_value`, `csrf_token` (if long-lived and user-identifying)
+- `remember_me_token`, `persistent_session`
+
+---
+
+## Tier 4 — Elevated (Privacy relevant)
+
+### Partial Personal Identifiers
+**Detection patterns:**
+- `first_name`, `last_name`, `display_name`, `username` (when alone)
+- `profile_picture`, `avatar_url`
+- `city`, `state`, `country`, `region`, `zip_code`, `postal_code`
+- `time_zone`, `locale`, `language_preference`
+
+### Behavioral & Analytics Data
+**Detection patterns:**
+- `user_agent`, `browser`, `device_type`, `os`
+- `search_query`, `search_history`, `browsing_history`
+- `purchase_history`, `order_history`, `transaction_history`
+- `click_event`, `page_view`, `session_duration`
+- `preferences`, `interests`, `tags`, `segments`
+
+### Financial Context (non-card)
+**Detection patterns:**
+- `salary`, `income`, `net_worth`, `credit_score`, `credit_rating`
+- `account_balance`, `wallet_balance`, `subscription_tier`
+
+---
+
+## Tier 5 — Standard (No direct privacy impact)
+
+- System configuration values (non-secret)
+- Public user-facing content (blog posts, public profiles)
+- Anonymized aggregated statistics
+- Non-personal reference data (product catalog, country codes)
+- Internal system identifiers with no external exposure
+
+---
+
+## Detection Guidance for AI Analysis
+
+### Framework-Specific Patterns
+
+**Django / Python:**
+```python
+# Sensitive fields typically appear in models.py
+class User(models.Model):
+    email = models.EmailField()           # Tier 3
+    date_of_birth = models.DateField()    # Tier 2 (combined with name)
+    ssn = models.CharField(max_length=11) # Tier 1
+```
+
+**TypeScript / Prisma:**
+```prisma
+model User {
+  email       String    // Tier 3
+  phoneNumber String?   // Tier 3
+  dateOfBirth DateTime? // Tier 2 (when combined)
+  cardNumber  String?   // Tier 2 PCI-DSS
+}
+```
+
+**Java / Spring / JPA:**
+```java
+@Entity
+public class Patient {
+    @Column(name = "diagnosis")  // Tier 1 PHI
+    private String diagnosis;
+    
+    @Column(name = "ssn")        // Tier 1
+    private String ssn;
+}
+```
+
+**C# / EF Core:**
+```csharp
+public class UserProfile {
+    public string Email { get; set; }        // Tier 3
+    public string PassportNumber { get; set; } // Tier 1
+    public DateTime DateOfBirth { get; set; }  // Tier 2
+}
+```
+
+### Log Statement Patterns (High Risk — often overlooked)
+```python
+# BAD — logs PII
+logger.info(f"User {user.email} logged in from {request.remote_addr}")
+logger.debug(f"Payment for card {card_number}")
+
+# Look for these in logging calls:
+# .info(), .debug(), .warn(), .error(), console.log(), System.out.println()
+```
+
+### API Response Leakage (Serializer/DTO patterns)
+```typescript
+// Check if these fields are included in response objects
+// even if not requested — over-fetching is a common exposure vector
+{
+  "id": "...",
+  "email": "...",          // Tier 3
+  "phone": "...",          // Tier 3 
+  "dateOfBirth": "...",    // Tier 2 — should this be returned?
+  "passwordHash": "...",   // Tier 1 — should NEVER be returned
+  "ssn": "...",            // Tier 1 — should NEVER be returned
+}
+```
+
+---
+
+## Aggregation Risk Assessment
+
+Combination attacks — data that becomes more sensitive when combined:
+
+| Alone | Combined With | Combined Tier | Risk |
+|-------|--------------|---------------|------|
+| Email (T3) | Password hash (T1) | T1 | Account takeover |
+| Name (T4) | DOB (T2) + Address (T2) | T2 | Full identity reconstruction |
+| IP address (T3) | Timestamps + User ID | T2 | Behavioral profiling |
+| City (T4) | Purchase history (T4) | T3 | De-anonymization risk |
+| Health category (T4) | Name + Email | T1 | HIPAA triggering |
+
+**Rule:** Always assess fields in combination, not just in isolation.
@@ -0,0 +1,449 @@
+# Hardening Playbook
+
+Prioritized controls to reduce data breach blast radius. Controls are organized by **impact category** and include tech-stack-specific implementation patterns. Each control includes a **blast radius reduction estimate**.
+
+> **How to use:** After identifying exposure vectors, match each to a control below. Sort your hardening roadmap by `(Blast_Radius_Reduction × Severity) / Effort`.
+
+---
+
+## Control Priority Matrix
+
+| Priority | Control | Blast Radius Reduction | Effort | Category |
+|----------|---------|----------------------|--------|---------|
+| P0 | Fix IDOR/BOLA — add ownership checks | 90% for affected vector | Low | Authorization |
+| P0 | Remove sensitive fields from API responses | 85% for affected fields | Low | Data Minimization |
+| P0 | Revoke publicly accessible storage (S3/Blob) | 100% for affected store | Low | Access Control |
+| P0 | Remove plaintext credentials from code/logs | 100% for affected secret | Low | Secrets |
+| P1 | Add field-level encryption for T1 data | 80% for encrypted fields | Medium | Encryption |
+| P1 | Mask/tokenize PCI card data | 95% for card exposure | Medium | Tokenization |
+| P1 | Remove PII from log statements | 70% for log exposure | Medium | Logging |
+| P1 | Add authentication to unauthenticated endpoints | 95% for exposed endpoints | Low | Authentication |
+| P2 | Implement data access audit logging | -50% detection time | Medium | Monitoring |
+| P2 | Enable database activity monitoring | -60% detection time | Medium | Monitoring |
+| P2 | Add rate limiting to sensitive endpoints | 60% reduction in data harvesting | Low | Rate Limiting |
+| P2 | Column-level encryption for T2 sensitive data | 70% for encrypted columns | Medium | Encryption |
+| P3 | Implement data retention + auto-deletion | 40% reduction in stale data exposure | High | Data Lifecycle |
+| P3 | Separate analytics store from production PII | 60% for analytics breach | High | Architecture |
+| P3 | Pseudonymize behavioral tracking data | 70% for behavioral data | Medium | Pseudonymization |
+
+---
+
+## P0 — Fix Immediately (< 1 day)
+
+### 1. Fix Authorization: IDOR / BOLA
+
+**What it fixes:** Broken Object Level Authorization — users can access other users' data by changing an ID.
+
+**Detection pattern in code:**
+```python
+# VULNERABLE — no ownership check
+@app.get("/api/orders/{order_id}")
+def get_order(order_id: int):
+    return db.query(Order).filter(Order.id == order_id).first()
+
+# SECURE — ownership check
+@app.get("/api/orders/{order_id}")
+def get_order(order_id: int, current_user: User = Depends(get_current_user)):
+    order = db.query(Order).filter(
+        Order.id == order_id,
+        Order.user_id == current_user.id  # ownership check
+    ).first()
+    if not order:
+        raise HTTPException(status_code=404)
+    return order
+```
+
+```typescript
+// VULNERABLE
+app.get('/api/users/:id/profile', authenticate, async (req, res) => {
+  const user = await User.findById(req.params.id);
+  res.json(user);
+});
+
+// SECURE
+app.get('/api/users/:id/profile', authenticate, async (req, res) => {
+  if (req.params.id !== req.user.id && !req.user.isAdmin) {
+    return res.status(403).json({ error: 'Forbidden' });
+  }
+  const user = await User.findById(req.params.id);
+  res.json(user);
+});
+```
+
+```csharp
+// VULNERABLE
+[HttpGet("orders/{orderId}")]
+public async Task<IActionResult> GetOrder(int orderId)
+{
+    var order = await _db.Orders.FindAsync(orderId);
+    return Ok(order);
+}
+
+// SECURE
+[HttpGet("orders/{orderId}")]
+[Authorize]
+public async Task<IActionResult> GetOrder(int orderId)
+{
+    var userId = User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
+    var order = await _db.Orders
+        .Where(o => o.Id == orderId && o.UserId == userId)
+        .FirstOrDefaultAsync();
+    if (order == null) return NotFound();
+    return Ok(order);
+}
+```
+
+---
+
+### 2. Remove Sensitive Fields from API Responses
+
+**What it fixes:** Over-fetching — APIs return more data than the client needs.
+
+**Pattern:**
+```typescript
+// VULNERABLE — returns all fields including passwordHash, ssn
+const user = await User.findById(id);
+res.json(user);
+
+// SECURE — explicit projection
+const user = await User.findById(id).select('id name email createdAt');
+res.json(user);
+```
+
+```python
+# SECURE — Pydantic response model (FastAPI)
+class UserPublicResponse(BaseModel):
+    id: int
+    name: str
+    email: str
+    # NOTE: password_hash, ssn, date_of_birth NOT included
+
+@app.get("/api/users/{id}", response_model=UserPublicResponse)
+def get_user(id: int):
+    return db.query(User).filter(User.id == id).first()
+```
+
+```java
+// SECURE — DTO with @JsonIgnore
+public class UserResponse {
+    public String id;
+    public String name;
+    public String email;
+    // passwordHash, ssn not included in DTO
+}
+```
+
+---
+
+### 3. Remove Plaintext Credentials from Code
+
+**Detection patterns:**
+```
+# Patterns to search for in all files:
+password\s*=\s*["'][^"']+["']
+api_key\s*=\s*["'][^"']+["']
+secret\s*=\s*["'][^"']+["']
+token\s*=\s*["'][^"']+["']
+connectionString\s*=\s*["'][^"']+["']
+```
+
+**Fix pattern:**
+```python
+# VULNERABLE
+DATABASE_URL = "postgresql://user:p@ssw0rd@prod-db.example.com/mydb"
+
+# SECURE
+import os
+DATABASE_URL = os.environ.get("DATABASE_URL")
+# In production: use Azure Key Vault, AWS Secrets Manager, or GCP Secret Manager
+```
+
+---
+
+## P1 — Fix This Week
+
+### 4. Field-Level Encryption for Tier 1 Data
+
+Encrypt sensitive fields **before** storing them. The encryption key lives in a KMS, not in the database.
+
+**Python / SQLAlchemy + Azure Key Vault:**
+```python
+from azure.keyvault.secrets import SecretClient
+from cryptography.fernet import Fernet
+
+# Encrypt at write time
+def encrypt_field(value: str, key: bytes) -> str:
+    f = Fernet(key)
+    return f.encrypt(value.encode()).decode()
+
+# Decrypt at read time (only when authorized)
+def decrypt_field(encrypted_value: str, key: bytes) -> str:
+    f = Fernet(key)
+    return f.decrypt(encrypted_value.encode()).decode()
+```
+
+**Node.js / Prisma + AWS KMS:**
+```typescript
+import { KMSClient, EncryptCommand, DecryptCommand } from "@aws-sdk/client-kms";
+
+const kms = new KMSClient({ region: "us-east-1" });
+
+async function encryptField(plaintext: string): Promise<string> {
+  const { CiphertextBlob } = await kms.send(new EncryptCommand({
+    KeyId: process.env.KMS_KEY_ARN,
+    Plaintext: Buffer.from(plaintext),
+  }));
+  return Buffer.from(CiphertextBlob!).toString('base64');
+}
+```
+
+**C# / EF Core + Azure Key Vault:**
+```csharp
+// Use Always Encrypted for SQL Server / Azure SQL
+// Or manually encrypt with Azure Key Vault
+services.AddDbContext<AppDbContext>(options =>
+    options.UseSqlServer(connectionString, sqlOptions =>
+        sqlOptions.EnableSensitiveDataLogging(false)));
+
+// In entity:
+[Column(TypeName = "nvarchar(500)")]
+public string EncryptedSsn { get; set; } // store Base64 ciphertext
+```
+
+**Fields that MUST be field-encrypted (Tier 1):**
+- SSN / national ID numbers
+- Passport numbers
+- Full payment card numbers (better: use tokenization, see below)
+- Medical record data / diagnoses
+- Biometric templates
+
+---
+
+### 5. Tokenize Payment Card Data
+
+**Never store full card numbers.** Use a PCI-compliant vault instead.
+
+**Recommended providers:**
+- Stripe (tokenizes via Elements/PaymentIntents — you never touch card numbers)
+- Braintree / PayPal
+- Adyen
+- Square
+
+**Pattern:**
+```typescript
+// CORRECT — use Stripe's tokenization
+const paymentMethod = await stripe.paymentMethods.create({
+  type: 'card',
+  card: { token: cardToken }, // token from client-side Stripe.js
+});
+// Store: paymentMethod.id (token) — never the card number
+
+// WRONG — never do this
+const cardNumber = req.body.cardNumber; // Tier 2 PCI-DSS violation
+await db.save({ userId, cardNumber });   // DO NOT store raw card data
+```
+
+---
+
+### 6. Remove PII from Log Statements
+
+**Pattern to search for and fix:**
+```python
+# VULNERABLE
+logger.info(f"User {user.email} logged in")
+logger.debug(f"Payment by {user.full_name}, card ending {card_last4}")
+
+# SECURE — log opaque identifiers, not PII
+logger.info(f"User {user.id} authenticated", extra={"user_id": user.id})
+logger.debug(f"Payment processed", extra={"user_id": user.id, "payment_id": payment_id})
+```
+
+```typescript
+// VULNERABLE
+console.log(`Processing order for ${user.email} at ${user.address}`);
+
+// SECURE
+logger.info('Processing order', { userId: user.id, orderId: order.id });
+```
+
+**Structured logging fields that are SAFE to log:**
+- Internal user ID (UUID/opaque)
+- Session ID (if short-lived and not externally shared)
+- Transaction/correlation IDs
+- Error codes and error types
+- Timestamps
+- HTTP status codes
+- Duration/latency
+
+**Structured logging fields that are UNSAFE:**
+- Email addresses
+- IP addresses (must be masked — last octet)
+- Full names
+- Phone numbers
+- Any Tier 1–3 sensitive fields
+
+---
+
+## P2 — Fix This Sprint
+
+### 7. Implement Data Access Audit Logging
+
+Every read/write of Tier 1 and Tier 2 data must be logged to an immutable audit log.
+
+**What to log:**
+```
+{
+  timestamp: ISO8601,
+  actor_id: "user UUID",
+  actor_role: "admin|user|service",
+  action: "READ|WRITE|DELETE|EXPORT",
+  resource_type: "User|HealthRecord|PaymentMethod",
+  resource_id: "UUID of accessed record",
+  fields_accessed: ["email", "phone"],  // NOT the values
+  ip_address: "masked IP",
+  result: "success|denied",
+  correlation_id: "request trace ID"
+}
+```
+
+**Do NOT log the actual sensitive field values in the audit log.**
+
+**Separation:** Store audit logs in a **separate** database/storage account with stricter access controls than the application database.
+
+---
+
+### 8. Rate Limit Sensitive Endpoints
+
+Prevents automated bulk data harvesting even if an auth vulnerability exists.
+
+```typescript
+// Express + express-rate-limit
+import rateLimit from 'express-rate-limit';
+
+// Aggressive limit for data export endpoint
+const exportLimiter = rateLimit({
+  windowMs: 60 * 60 * 1000, // 1 hour
+  max: 5, // max 5 exports per hour per IP
+  message: 'Too many export requests'
+});
+
+// Standard limit for data lookup
+const lookupLimiter = rateLimit({
+  windowMs: 15 * 60 * 1000, // 15 minutes
+  max: 100
+});
+
+app.get('/api/export', exportLimiter, authMiddleware, exportController);
+app.get('/api/users/:id', lookupLimiter, authMiddleware, userController);
+```
+
+---
+
+## P3 — Fix This Quarter
+
+### 9. Implement Data Retention and Auto-Deletion
+
+**Every table with personal data must have a defined retention policy.**
+
+```sql
+-- Add retention column to all PII tables
+ALTER TABLE users ADD COLUMN retention_expires_at TIMESTAMP;
+ALTER TABLE health_records ADD COLUMN retention_expires_at TIMESTAMP;
+
+-- Set retention at insert time
+INSERT INTO users (email, retention_expires_at) 
+VALUES ($1, NOW() + INTERVAL '7 years');
+
+-- Scheduled job to hard-delete expired records (or anonymize)
+DELETE FROM users 
+WHERE retention_expires_at < NOW() 
+AND deletion_notified_at IS NOT NULL; -- ensure user was notified
+```
+
+**Python scheduled cleanup:**
+```python
+from apscheduler.schedulers.asyncio import AsyncIOScheduler
+
+async def purge_expired_records():
+    await db.execute(
+        "DELETE FROM user_sessions WHERE expires_at < NOW()"
+    )
+    # Anonymize users (don't delete if financial records must be retained)
+    await db.execute("""
+        UPDATE users SET 
+            email = CONCAT('deleted_', id, '@redacted.invalid'),
+            phone = NULL,
+            address = NULL,
+            date_of_birth = NULL
+        WHERE retention_expires_at < NOW() AND deleted_at IS NULL
+    """)
+
+scheduler = AsyncIOScheduler()
+scheduler.add_job(purge_expired_records, 'cron', hour=2)  # 2 AM daily
+scheduler.start()
+```
+
+---
+
+### 10. Pseudonymize Behavioral and Analytics Data
+
+Replace direct user identifiers in analytics with pseudonymous tokens.
+
+```python
+import hashlib
+import hmac
+
+PSEUDONYM_SALT = os.environ.get("PSEUDONYM_SALT")  # stored in Key Vault
+
+def pseudonymize_user_id(real_user_id: str) -> str:
+    """
+    One-way: analyst can track behavior across sessions 
+    but cannot identify the real user without the salt.
+    """
+    return hmac.new(
+        PSEUDONYM_SALT.encode(), 
+        real_user_id.encode(), 
+        hashlib.sha256
+    ).hexdigest()
+
+# In analytics event
+analytics.track({
+    "user_id": pseudonymize_user_id(user.id),  # NOT real user ID
+    "event": "page_viewed",
+    "page": request.path,
+    "timestamp": datetime.utcnow().isoformat()
+})
+```
+
+---
+
+## Quick Win Checklist (Complete in < 1 day)
+
+- [ ] Search all files for hardcoded secrets → move to env vars / Key Vault
+- [ ] Check all `SELECT *` queries → add explicit column list excluding sensitive fields
+- [ ] Verify storage buckets/containers → block public access
+- [ ] Remove `console.log` / `logger.debug` calls that print request bodies
+- [ ] Add `HttpOnly; Secure; SameSite=Strict` to all session cookies
+- [ ] Verify that `/api/admin/*` routes require admin role check
+- [ ] Confirm password reset tokens expire in < 15 minutes
+- [ ] Check that 500 error responses don't include stack traces in production
+- [ ] Verify `.env` and secret files are in `.gitignore`
+- [ ] Run `git log --all --full-history -- "*.env"` to check for historical secret commits
+
+---
+
+## Blast Radius Reduction by Control Applied
+
+When reporting the hardening roadmap, use these estimates:
+
+| Control Applied | Blast Radius Reduction | Justification |
+|----------------|----------------------|---------------|
+| Fix all IDOR vulnerabilities | 80–90% | Most breach scenarios exploit authorization flaws |
+| Field encryption for T1 data | 75–85% | Encrypted data is useless without KMS key |
+| Remove PII from logs | 40–60% | Log access is often less controlled than DB access |
+| Tokenize payment data | 95% for card data | Standard PCI-DSS compliance eliminates card data scope |
+| Rate limit data endpoints | 30–50% | Limits scale of automated harvesting attacks |
+| Data retention enforcement | 20–40% | Reduces "data lake" effect — less data to steal |
+| Audit logging + anomaly detection | 0% prevention, but -60% detection time | Breaches are caught faster |
+| Pseudonymization of analytics | 60–70% for analytics data | Analytics data decoupled from identity |
+| Architecture: separate analytics from PII | 50–70% | Breach of analytics store has no PII value |
@@ -0,0 +1,320 @@
+# Regulatory Impact Reference
+
+Fine formulas, breach notification timelines, cost benchmarks, and jurisdiction detection patterns for all major global data protection regulations.
+
+> **Disclaimer:** This reference is for risk planning and developer education only. All fine estimates are approximations based on publicly available legal texts and benchmarks cited in `SOURCES.md`. Consult qualified legal counsel for actual regulatory guidance in your jurisdiction.
+
+> **Verifying these numbers:** Every fine formula in this file is sourced from the regulation's primary legal text. See `references/SOURCES.md` for the exact statute/article URL for each figure. If any number looks wrong, check SOURCES.md first — if it's genuinely outdated, please open a PR.
+
+---
+
+## Jurisdiction Detection Patterns
+
+Scan the codebase for these signals to determine which regulations apply:
+
+### GDPR (EU/EEA — General Data Protection Regulation)
+**Trigger signals:**
+```
+# Geographic signals
+- Currency: EUR, GBP (for UK GDPR)
+- Phone formats: +44, +49, +33, +31, +34, +39, +46, +47, +358, +45, +48
+- Locale strings: 'de', 'fr', 'es', 'it', 'nl', 'pl', 'pt', 'sv', 'da', 'fi', 'nb', 'el'
+- Country codes: DE, FR, ES, IT, NL, PL, BE, SE, AT, CH, DK, FI, NO, PT, GR, IE, HU, CZ, RO
+- Cloud regions: eu-west-*, eu-central-*, northeurope, westeurope, francecentral, germanywestcentral
+- Domain TLDs: .de, .fr, .es, .it, .nl, .pl, .eu, .uk, .ie, .at, .se, .dk, .fi, .be, .no, .pt
+
+# Code signals
+- GDPR-related comments or variable names: gdpr, dpa, data_protection, lawful_basis
+- Consent management code: cookie_consent, gdpr_consent, marketing_opt_in
+- Right to erasure endpoints: /delete-account, /forget-me, /data-deletion
+- Data export endpoints: /export-data, /download-my-data, /dsar
+- EU-specific third-party integrations: TrustArc, OneTrust, Cookiebot, Axeptio
+
+# Config signals
+- AWS S3 buckets with eu- prefix
+- Azure storage accounts in European regions
+- GCP storage in europe-* regions
+```
+
+**Applies to:** Any organization processing personal data of EU/EEA residents, regardless of where the organization is based.
+
+---
+
+### CCPA / CPRA (California — Consumer Privacy Rights Act)
+**Trigger signals:**
+```
+# Geographic signals
+- Country: US with state: CA, California
+- Sales tax for California (CA sales tax logic)
+- Phone format: +1 with 213, 310, 323, 408, 415, 424, 510, 530, 562, 619, 626, 650, 707, 714, 805, 818, 831, 858, 909, 916, 925, 949, 951
+
+# Code signals
+- CCPA-related comments: ccpa, california_privacy, do_not_sell, opt_out_of_sale
+- Privacy preference center with California toggle
+- Opt-out links: /do-not-sell, /privacy-choices, /opt-out
+- GPC (Global Privacy Control) header handling
+
+# Business signals
+- Annual gross revenue > $25M (implied by scale signals in codebase)
+- Comments/configs referencing California consumer data
+```
+
+**Applies to:** For-profit businesses meeting any of: annual gross revenue > $25M, buys/sells/receives/shares personal data of 100K+ consumers/households annually, or derives 50%+ of revenue from selling personal data.
+
+---
+
+### HIPAA (US — Health Insurance Portability and Accountability Act)
+**Trigger signals:**
+```
+# Field name signals (PHI — Protected Health Information)
+- medical_record_number, mrn, patient_id, encounter_id
+- diagnosis, icd_code, icd10, medication, prescription
+- lab_result, test_result, radiology, pathology
+- health_plan_id, insurance_id, claim_number
+- fhir_, hl7_, dicom_
+
+# Integration signals
+- Epic, Cerner, Allscripts, eClinicalWorks API keys or webhooks
+- FHIR API endpoints (/fhir/, /r4/, /stu3/)
+- HL7 message parsing
+- CMS (Centers for Medicare & Medicaid) API integration
+- SNOMED, LOINC, ICD code lookups
+
+# Config signals
+- HIPAA compliance flags or BAA (Business Associate Agreement) references
+- HIPAA-compliant hosting: AWS HIPAA BAA, Azure Healthcare APIs, GCP HIPAA
+- Healthcare-specific cloud: Microsoft Cloud for Healthcare, Google Cloud Healthcare API
+```
+
+**Applies to:** Covered entities (healthcare providers, health plans, clearinghouses) and their Business Associates (vendors who process PHI on their behalf).
+
+---
+
+### LGPD (Brazil — Lei Geral de Proteção de Dados)
+**Trigger signals:**
+```
+# Geographic signals
+- Currency: BRL, R$
+- Phone format: +55
+- Locale: pt-BR, pt_BR
+- Country codes: BR, BRA, Brazil
+- CPF field (Brazilian individual taxpayer registry): cpf, cpf_number
+- CNPJ field (Brazilian company registry): cnpj
+- CEP (Brazilian postal code): cep, codigo_postal (8 digits, XXXXX-XXX format)
+
+# Code signals
+- lgpd references in comments or variable names
+- Brazilian payment integrations: PicPay, Nubank, Mercado Pago, PagSeguro, PIX
+- Brazilian cloud regions: sa-east-1 (AWS São Paulo), brazilsouth (Azure)
+```
+
+**Applies to:** Any processing of personal data of individuals in Brazil, or any processing carried out in Brazil.
+
+---
+
+### PDPA (Multiple Asian jurisdictions)
+
+#### Singapore PDPA
+**Trigger signals:** `+65`, `SGD`, `sg` locale, `.sg` TLD, `nric` field, `fin` (Foreign Identification Number), `singpass`
+
+#### Thailand PDPA
+**Trigger signals:** `+66`, `THB`, `th` locale, `.th` TLD, `thai_id`
+
+#### Malaysia PDPA
+**Trigger signals:** `+60`, `MYR`, `ms` locale, `.my` TLD, `my_kad`, `nric_malaysia`
+
+#### Philippines Data Privacy Act
+**Trigger signals:** `+63`, `PHP` (currency), `ph` locale, `.ph` TLD, `phil_sys_number`
+
+#### Japan APPI (Act on Protection of Personal Information)
+**Trigger signals:** `+81`, `JPY`, `ja` locale, `.jp` TLD, `my_number` (Japanese national ID), `maruhi` (confidential)
+
+---
+
+### Other Regulations (flag if applicable)
+
+| Regulation | Jurisdiction | Key Trigger |
+|-----------|-------------|-------------|
+| PIPEDA / Law 25 | Canada | `+1` + Canadian provinces, `CAD`, `.ca` TLD, SIN field |
+| Australia Privacy Act | Australia | `+61`, `AUD`, `.au` TLD, `tfn` field |
+| POPIA | South Africa | `+27`, South African Rand, `.za` TLD, `sa_id_number` |
+| KVKK | Turkey | `+90`, `TRY`, `.tr` TLD |
+| PDPB | India (upcoming) | `+91`, `INR`, `aadhaar` field — note: not yet in force |
+| SOC 2 Type II | US (security standard, not law) | Mentioned in codebase, customer contracts |
+| PCI-DSS | Global (payment card) | Any card number / CVV / PAN field |
+
+---
+
+## GDPR Fine Calculator
+
+**Legal source:** GDPR Article 83 — https://gdpr-info.eu/art-83-gdpr/  
+**Exact text, Art. 83.4:** "...up to 10 000 000 EUR, or...up to 2% of the total worldwide annual turnover...whichever is higher"  
+**Exact text, Art. 83.5:** "...up to 20 000 000 EUR, or...up to 4% of the total worldwide annual turnover...whichever is higher"
+
+### Maximum Fines (Article 83)
+```
+Tier 1 violations (less severe — Art. 83.4):
+  Maximum = max(€10,000,000, 2% of global annual turnover)
+  [Note: 'higher' means the LARGER of the two — corrected from min() to max()]
+
+Tier 2 violations (most severe — Art. 83.5 — core principles, data subject rights, cross-border transfers):
+  Maximum = max(€20,000,000, 4% of global annual turnover)
+```
+
+### Fine Estimation Formula for Risk Planning
+When annual revenue/turnover is unknown, use these conservative estimates:
+
+| Company Profile | Estimated Annual Turnover | Realistic T1 Fine | Realistic T2 Fine |
+|----------------|--------------------------|-------------------|-------------------|
+| Startup (< 10 employees) | < €2M | €25K–€100K | €50K–€250K |
+| Small business (10–50 employees) | €2M–€10M | €50K–€400K | €100K–€800K |
+| Mid-size (50–500 employees) | €10M–€100M | €200K–€2M | €500K–€4M |
+| Large enterprise (500–5K employees) | €100M–€1B | €2M–€20M | €5M–€40M |
+| Multinational | > €1B | €10M (capped at 2%) | €20M (capped at 4%) |
+
+**Historic GDPR fines for calibration (all publicly verified — links in SOURCES.md):**
+- Meta: €1.2B (2023) — cross-border data transfer violations
+- Amazon: €746M (2021) — cookie consent violations
+- WhatsApp: €225M (2021) — transparency violations
+- Google: €150M (France, 2022) — cookie withdrawal
+- H&M: €35.3M (2020) — employee monitoring
+- British Airways: €22M (2020) — security breach (500K records)
+- Marriott: €18.4M (2020) — security breach (339M records)
+
+**Breach notification fine enhancement:** Non-notification or late notification adds 20–30% to the base fine.
+
+---
+
+## CCPA / CPRA Fine Calculator
+
+**Legal source:** California Civil Code § 1798.155(a) (as amended June 30, 2025, Stats. 2025, Ch. 20) — https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1798.155  
+**Private right of action source:** California Civil Code § 1798.150 — https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1798.150
+
+```
+Non-intentional violations: $2,500 per violation    [§ 1798.155(a)]
+Intentional violations: $7,500 per violation         [§ 1798.155(a)]
+Children's data violations: $7,500 per violation    [§ 1798.155(a) — intent not required for minors]
+Private right of action: $100–$750 per consumer     [§ 1798.150]
+```
+
+### Calculation for mass breach
+
+```
+Max_CCPA_Fine = Records_affected × $7,500 (if intentional)
+             = Records_affected × $2,500 (if unintentional)
+```
+
+**Cap:** California AG can seek up to $2,500 per consumer per violation, but class action suits under private right of action can reach $100–$750 per consumer.
+
+**Private right of action (unique to CCPA/CPRA):**
+```
+Civil_damages = max($100, min($750, actual_damages)) × affected_California_consumers
+```
+
+**Examples:**
+- 100K Californian users × $750 = $75M maximum private right of action
+- 100K users × $2,500 = $250M maximum CCPA fine (regulatory)
+
+---
+
+## HIPAA Fine Calculator
+
+**Legal source:** 45 CFR § 160.404 — https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-160/subpart-D/section-160.404  
+**HHS enforcement page:** https://www.hhs.gov/hipaa/for-professionals/compliance-enforcement/examples/all-cases/index.html  
+**Note:** Amounts are 2024 inflation-adjusted figures per HHS. Updated annually — verify at HHS link above.
+
+HIPAA fines are tiered by knowledge/culpability (45 CFR § 160.404):
+
+| Tier | Culpability | Min per Violation | Max per Violation | Annual Cap |
+| A | Did not know | $137 | $68,928 | $2,067,813 |
+| B | Reasonable cause | $1,379 | $68,928 | $2,067,813 |
+| C | Willful neglect, corrected | $13,785 | $68,928 | $2,067,813 |
+| D | Willful neglect, not corrected | $68,928 | $1,919,173 | $1,919,173 |
+
+**For breach planning:** Each affected patient record where PHI was exposed = 1 violation.
+
+**Breach notification costs:** HHS requires notification to affected individuals + HHS. Breaches of 500+ individuals in a state require media notification. Breaches of 500+ total require HHS annual report.
+
+**Criminal penalties (DOJ — for egregious cases):**
+- Up to $50,000 + 1 year imprisonment (simple violation)
+- Up to $100,000 + 5 years (under false pretenses)
+- Up to $250,000 + 10 years (with intent to sell/use)
+
+---
+
+## LGPD Fine Calculator (Brazil)
+
+**Legal source:** Lei nº 13.709/2018 (LGPD) — Article 52, I — https://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/l13709.htm  
+**ANPD (Brazilian DPA):** https://www.gov.br/anpd/pt-br
+
+```
+Maximum fine per violation = 2% of revenue in Brazil in the prior fiscal year  [Art. 52, I]
+Hard cap = R$50,000,000 (≈ $10M USD) per violation                            [Art. 52, I]
+```
+
+Daily fine possible during non-compliance period.  
+**Brazilian DPA (ANPD) enforcement began 2021.** Enforcement ramp-up is ongoing.
+
+---
+
+## Breach Notification Timeline Reference
+
+**All timelines are sourced from primary legal texts.** See `SOURCES.md` for exact article/section URLs for each regulation.
+
+How fast you must notify regulators and affected individuals after discovering a breach:
+
+| Regulation | Regulator Notification | Individual Notification | Legal Source | Notes |
+|-----------|----------------------|------------------------|-------------|-------|
+| GDPR | **72 hours** from discovery | "Without undue delay" if high risk | Art. 33 & 34 | Must notify even if details incomplete |
+| UK GDPR | **72 hours** from discovery | Without undue delay | UK GDPR Art. 33 | Retained EU law post-Brexit |
+| CCPA / CPRA | "Most expedient time" (no hard number) | Same | Cal. Civ. Code § 1798.82 | CA AG if > 500 CA residents |
+| HIPAA | **60 days** from discovery | 60 days (or sooner) | 45 CFR § 164.412 | HHS + media for 500+ in one state |
+| LGPD (Brazil) | **2 business days** (ANPD guidance) | As soon as possible | ANPD Resolution nº 2/2022 | ANPD enforcing since 2021 |
+| Singapore PDPA | **3 calendar days** for mandatory breach | Without undue delay | PDPA Section 26D (2021 amendment) | One of the strictest globally |
+| Australia Privacy Act | ASAP, no later than **30 days** | As soon as practicable | Privacy Act 1988 — NDB Scheme | notifiable-data-breaches scheme |
+| PIPEDA (Canada) | **As soon as feasible** | **As soon as feasible** | PIPEDA s.10.1 | OPCC notification required |
+| Japan APPI | **3–5 business days** | Promptly | APPI Art. 26 (2022 amendment) | Tightened from prior version |
+
+---
+
+## Total Breach Cost Estimation Model
+
+**Benchmark source:** IBM Security + Ponemon Institute — "Cost of a Data Breach Report" (annually updated)  
+**URL:** https://www.ibm.com/reports/data-breach  
+Figures below are from the **2024 edition** (last verified). IBM 2025 shows a 9% decrease — download the current PDF for updated values. **[IBM 2024, p.14]** page references refer to the 2024 edition.
+
+Use this model when generating the Financial Impact Estimate section:
+
+### Direct Costs
+```
+1. Detection & containment: $1.1M average      [IBM 2024, p.14]
+2. Post-breach response:     $1.2M average      [IBM 2024, p.14]
+3. Lost business:            $1.5M average      [IBM 2024, p.14]
+4. Notification costs:       records × $2–$8 per individual  [industry estimate]
+5. Credit monitoring:        records × $5–$20/year if PII    [industry estimate]
+6. Legal costs:              $200K–$3M depending on complexity [industry estimate]
+7. Forensic investigation:   $50K–$500K                      [industry estimate]
+8. PR/crisis communications: $100K–$500K                     [industry estimate]
+```
+
+### Regulatory Costs
+```
+9. Regulatory fines:         [see per-regulation formulas above — all sourced from law text]
+10. Settlement costs:        $1M–$100M+ for class actions    [historic case data]
+```
+
+### Reputational Multiplier
+Apply based on public visibility of the organization:
+```
+B2C consumer app, consumer brand:     ×1.5 (high reputational damage)
+B2B enterprise, low public profile:  ×1.1 (moderate reputational damage)
+Healthcare or financial institution:  ×2.0 (trust erosion is severe)
+Government or public sector:         ×1.8 (public accountability)
+```
+
+### Final Estimate Format
+```
+Minimum likely cost:   [conservative scenario, good response, small record count]
+Probable cost:         [most likely scenario, average response]
+Maximum exposure:      [worst case: maximum fines + class action + reputational]
+```
@@ -0,0 +1,305 @@
+# Blast Radius Report Format
+
+Use this template to generate the complete Data Breach Blast Radius report. Fill every section — do not skip any.
+
+---
+
+## Full Report Template
+
+````markdown
+# 💥 Data Breach Blast Radius Report
+
+**Repository:** [repo name or path analyzed]  
+**Analysis date:** [ISO 8601 date]  
+**Scope:** [full repo / specific path]  
+**Languages / frameworks detected:** [list]  
+**Analyzed by:** GitHub Copilot — data-breach-blast-radius skill  
+
+---
+
+## Executive Summary
+
+[2–3 paragraphs in plain English. No technical jargon. Assume the reader is a CEO, CISO, or board member who will ask: "How bad would it be?"
+
+Paragraph 1: What data does this system hold and roughly how many people are affected?
+Paragraph 2: What is the single most dangerous exposure vector found? What would happen if it were exploited today?
+Paragraph 3: What is the estimated financial and regulatory impact? What is the most important thing to fix first?]
+
+---
+
+## Sensitive Data Inventory
+
+All personal, health, financial, and credential data found in the codebase:
+
+| # | Field Name | Source Location | Data Tier | Category | Encrypted? | Logged? | External Exposure? |
+|---|-----------|----------------|-----------|----------|-----------|---------|-------------------|
+| 1 | `email` | `models/user.py:14` | T3 — High | Contact | ❌ No | ⚠️ Yes | ✅ API response |
+| 2 | `ssn` | `models/employee.py:28` | T1 — Catastrophic | Gov. ID | ❌ No | ❌ No | ❌ No |
+| 3 | `card_number` | `models/payment.py:9` | T2 — Critical | PCI-DSS | ⚠️ Partial | ❌ No | ❌ No |
+| ... | ... | ... | ... | ... | ... | ... | ... |
+
+**Summary:**
+- Tier 1 (Catastrophic) fields: [N]
+- Tier 2 (Critical) fields: [N]
+- Tier 3 (High) fields: [N]
+- Tier 4 (Elevated) fields: [N]
+
+---
+
+## Data Flow Map
+
+How sensitive data moves through the system. Read left to right: ingestion → processing → storage → transmission.
+
+```mermaid
+flowchart LR
+    subgraph Ingestion["📥 Ingestion"]
+        A1[User Registration\nPOST /api/users\nT3: email, phone\nT2: date_of_birth]
+        A2[Payment\nPOST /api/payments\nT2: card_number, cvv]
+        A3[Health Record\nPOST /api/health\nT1: diagnosis, mrn]
+    end
+
+    subgraph Processing["⚙️ Processing"]
+        B1[Auth Service\nJWT issued\nT3: email in token]
+        B2[Payment Processor\nStripe tokenization\nT2: card → token]
+        B3[Analytics\nMixpanel events\nT3: email logged ⚠️]
+    end
+
+    subgraph Storage["🗄️ Storage"]
+        C1[(PostgreSQL\nusers table\nT1+T2+T3 data\nNo field encryption)]
+        C2[(Redis Cache\nSession data\nT3: email in cache)]
+        C3[(S3 Bucket\nUser exports\n⚠️ Public read ACL)]
+    end
+
+    subgraph Transmission["📤 Transmission"]
+        D1[REST API\n/api/users/:id\n⚠️ No ownership check\nT1+T2+T3 in response]
+        D2[Email notifications\nT3: email body contains\nfull name + order details]
+        D3[Webhooks\nT3: email in payload]
+    end
+
+    A1 --> B1 --> C1 --> D1
+    A2 --> B2 --> C1
+    A3 --> C1
+    B1 --> C2
+    C1 --> D2
+    C1 --> D3
+    C1 --> C3
+
+    style C3 fill:#ff6b6b,color:#fff
+    style D1 fill:#ff6b6b,color:#fff
+    style B3 fill:#ffa500,color:#fff
+```
+
+---
+
+## Top Exposure Vectors
+
+Ranked by Blast Radius Score (highest first):
+
+### 🔴 Vector 1: [Title] — BRS: [score]/100
+
+**Location:** `[file path]:[line number]`  
+**Type:** [IDOR / Unauthenticated endpoint / Public storage / Log leakage / Over-fetching API / etc.]  
+**Data exposed:** [T1/T2/T3 fields that would be exposed]  
+**Exploitation:** [1–2 sentences — how an attacker would use this]  
+**Records at risk:** [number or estimate]  
+**Jurisdictions triggered:** [GDPR / CCPA / HIPAA / etc.]
+
+```[language]
+// Vulnerable code snippet (exact location)
+[code]
+```
+
+**Blast Radius Score breakdown:**
+- Data tier: T[N] → weight [W]
+- Exposure likelihood: [E] ([label])
+- Population at risk: [N] records → scale [P]
+- Completeness: [factor] ([label])
+- Context multiplier: ×[M] ([reason])
+- **BRS: [calculated score]/100**
+
+---
+
+### 🔴 Vector 2: [Title] — BRS: [score]/100
+
+[repeat structure]
+
+---
+
+### 🟠 Vector 3: [Title] — BRS: [score]/100
+
+[repeat structure]
+
+---
+
+### 🟠 Vector 4: [Title] — BRS: [score]/100
+
+[repeat structure]
+
+---
+
+### 🟡 Vector 5: [Title] — BRS: [score]/100
+
+[repeat structure]
+
+---
+
+## Regulatory Blast Radius
+
+### Jurisdictions Triggered
+
+| Regulation | Triggered? | Trigger Evidence | Notification Deadline |
+|-----------|-----------|-----------------|----------------------|
+| GDPR | [Yes/No/Unknown] | [e.g., EUR currency, EU cloud region] | 72 hours |
+| CCPA | [Yes/No/Unknown] | [e.g., California users, US domain] | Expedient |
+| HIPAA | [Yes/No/Unknown] | [e.g., PHI fields found, FHIR endpoints] | 60 days |
+| LGPD | [Yes/No/Unknown] | [e.g., BRL currency, CPF field] | 2 business days |
+| Singapore PDPA | [Yes/No/Unknown] | [e.g., SGD, +65 phone patterns] | 3 calendar days |
+| PCI-DSS | [Yes/No/Unknown] | [e.g., card_number field found] | Immediate |
+
+---
+
+## Financial Impact Estimate
+
+> These are risk planning estimates only. Consult legal counsel for actual regulatory exposure.
+
+### Maximum Simultaneous Exposure
+- **Total records at risk (worst case):** [number]
+- **Tier 1 records (catastrophic data):** [number]
+- **Estimated affected individuals:** [number]
+- **Active regulatory jurisdictions:** [list]
+
+### Financial Impact Range
+
+| Scenario | Estimated Cost | Key Assumptions |
+|---------|---------------|----------------|
+| **Minimum** (fast response, few records, cooperative regulatory outcome) | $[X] | [assumptions] |
+| **Probable** (industry average response time, moderate regulatory action) | $[X] | [assumptions] |
+| **Maximum** (slow detection, maximum fines, class action) | $[X] | [assumptions] |
+
+### Breakdown (Probable Scenario)
+
+| Cost Category | Estimate |
+|--------------|---------|
+| Detection & containment | $[X] |
+| Post-breach response | $[X] |
+| Legal & forensics | $[X] |
+| Breach notification & monitoring | $[X] |
+| Regulatory fines ([jurisdictions]) | $[X] |
+| Reputational/business impact | $[X] |
+| **Total estimated cost** | **$[X]** |
+
+**Cost benchmarks used:** IBM Cost of a Data Breach Report 2024 ($4.88M global average, $165/record average) — verify current figures at ibm.com/reports/data-breach
+
+---
+
+## Hardening Roadmap
+
+Prioritized by `(Blast_Radius_Reduction × Severity) / Effort`:
+
+### 🔴 P0 — Fix Immediately (< 1 day each)
+
+| # | Action | File / Location | Blast Radius Reduction | Effort | Severity |
+|---|--------|----------------|----------------------|--------|---------|
+| 1 | [Fix IDOR on /api/users/:id — add ownership check] | `routes/users.ts:45` | 85% for this vector | ⚡ Low | CRITICAL |
+| 2 | [Remove SSN from API response DTO] | `dtos/employee.dto.ts:22` | 90% for SSN exposure | ⚡ Low | CRITICAL |
+| 3 | [Block public read ACL on S3 bucket] | `infra/storage.tf:14` | 100% for S3 exposure | ⚡ Low | HIGH |
+
+---
+
+### 🟠 P1 — Fix This Week
+
+| # | Action | File / Location | Blast Radius Reduction | Effort | Severity |
+|---|--------|----------------|----------------------|--------|---------|
+| 4 | [Encrypt SSN field with KMS] | `models/employee.py:28` | 80% for SSN field | 🔧 Medium | HIGH |
+| 5 | [Remove email from log statements (7 locations)] | `services/auth.py:66,89,121...` | 60% for log vector | 🔧 Medium | HIGH |
+| 6 | [Tokenize card data — migrate to Stripe Elements] | `services/payment.py` | 95% for card data | 🔧 Medium | CRITICAL |
+
+---
+
+### 🟡 P2 — Fix This Sprint
+
+| # | Action | Blast Radius Reduction | Effort | Severity |
+|---|--------|----------------------|--------|---------|
+| 7 | [Add rate limiting to /api/users/search] | 50% for bulk harvest | ⚡ Low | MEDIUM |
+| 8 | [Add data access audit log for T1/T2 reads] | -60% detection time | 🔧 Medium | HIGH |
+| 9 | [Add field projection to user query (remove unused fields from SELECT)] | 40% reduction in over-fetching | ⚡ Low | MEDIUM |
+
+---
+
+### ⚪ P3 — Fix This Quarter
+
+| # | Action | Blast Radius Reduction | Effort | Severity |
+|---|--------|----------------------|--------|---------|
+| 10 | [Implement data retention policy + auto-deletion job] | 30% reduction in stale data | 🏗️ High | MEDIUM |
+| 11 | [Pseudonymize analytics user IDs] | 70% for analytics data | 🔧 Medium | MEDIUM |
+| 12 | [Separate analytics store from production PII DB] | 60% architectural reduction | 🏗️ High | LOW |
+
+---
+
+## Analysis Assumptions
+
+Document all assumptions made during this analysis (transparency is critical):
+
+| Assumption | Value Used | Basis |
+|-----------|-----------|-------|
+| User population estimate | [X users] | [signal found or conservative default] |
+| Annual revenue estimate for fine calculation | [unknown / $X range] | [signals or not found] |
+| Geographic distribution | [assumed global / EU users likely] | [currency signals found] |
+| Healthcare context | [assumed / not applicable] | [PHI fields found / not found] |
+
+---
+
+## What Was Scanned
+
+- **Files analyzed:** [list key files or note "all files in repo"]
+- **Data model files:** [list schema/model files]
+- **API layer:** [list controller/route files]
+- **Config/infrastructure:** [list .env, terraform, CI/CD files]
+- **Log/monitoring:** [list logging config files]
+- **Test data:** [note if test fixtures contain real PII]
+
+---
+
+*This report was generated by the [data-breach-blast-radius](https://github.com/github/awesome-copilot/tree/main/skills/data-breach-blast-radius) skill for GitHub Copilot.*  
+*For risk planning purposes only. Consult qualified legal counsel and security professionals for actual regulatory guidance.*
+````
+
+---
+
+## Mermaid Diagram Conventions
+
+Use these conventions in the Data Flow Map:
+
+```
+# Node colors (using style declarations):
+🔴 fill:#ff6b6b,color:#fff  → Public/unauthenticated exposure (CRITICAL)
+🟠 fill:#ffa500,color:#fff  → Auth required but weak controls (HIGH)
+🟡 fill:#ffd700,color:#000  → Internal but over-broad access (MEDIUM)
+🟢 fill:#51cf66,color:#fff  → Properly secured (GOOD)
+
+# Node labels should include:
+- Action name
+- HTTP method + path (for API nodes)
+- Data tiers present (T1, T2, T3)
+- ⚠️ Warning emoji if an issue exists
+
+# Subgraphs:
+- Ingestion (📥)
+- Processing (⚙️)
+- Storage (🗄️)
+- Transmission (📤)
+```
+
+---
+
+## Severity Icons
+
+| Symbol | Severity | BRS Range |
+|--------|---------|-----------|
+| 🔴 | CRITICAL | 76–100 |
+| 🟠 | HIGH | 51–75 |
+| 🟡 | MEDIUM | 26–50 |
+| 🔵 | LOW | 0–25 |
+| ✅ | SECURE | Control in place |
+| ⚠️ | WARNING | Partial control |
+| ❌ | VULNERABLE | No control |