Files
awesome-copilot/skills/data-breach-blast-radius/references/hardening-playbook.md
Shubham Jiyani 8ca38ffb9e feat: add data-breach-blast-radius skill for pre-breach impact analysis (#1487)
* feat: add data-breach-blast-radius skill for pre-breach impact analysis

* fix: resolve codespell false positives (ZAR currency code, SME abbreviation)

* fix: remove ZAR abbreviation to pass codespell check
2026-04-28 14:26:20 +10:00

450 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Hardening Playbook
Prioritized controls to reduce data breach blast radius. Controls are organized by **impact category** and include tech-stack-specific implementation patterns. Each control includes a **blast radius reduction estimate**.
> **How to use:** After identifying exposure vectors, match each to a control below. Sort your hardening roadmap by `(Blast_Radius_Reduction × Severity) / Effort`.
---
## Control Priority Matrix
| Priority | Control | Blast Radius Reduction | Effort | Category |
|----------|---------|----------------------|--------|---------|
| P0 | Fix IDOR/BOLA — add ownership checks | 90% for affected vector | Low | Authorization |
| P0 | Remove sensitive fields from API responses | 85% for affected fields | Low | Data Minimization |
| P0 | Revoke publicly accessible storage (S3/Blob) | 100% for affected store | Low | Access Control |
| P0 | Remove plaintext credentials from code/logs | 100% for affected secret | Low | Secrets |
| P1 | Add field-level encryption for T1 data | 80% for encrypted fields | Medium | Encryption |
| P1 | Mask/tokenize PCI card data | 95% for card exposure | Medium | Tokenization |
| P1 | Remove PII from log statements | 70% for log exposure | Medium | Logging |
| P1 | Add authentication to unauthenticated endpoints | 95% for exposed endpoints | Low | Authentication |
| P2 | Implement data access audit logging | -50% detection time | Medium | Monitoring |
| P2 | Enable database activity monitoring | -60% detection time | Medium | Monitoring |
| P2 | Add rate limiting to sensitive endpoints | 60% reduction in data harvesting | Low | Rate Limiting |
| P2 | Column-level encryption for T2 sensitive data | 70% for encrypted columns | Medium | Encryption |
| P3 | Implement data retention + auto-deletion | 40% reduction in stale data exposure | High | Data Lifecycle |
| P3 | Separate analytics store from production PII | 60% for analytics breach | High | Architecture |
| P3 | Pseudonymize behavioral tracking data | 70% for behavioral data | Medium | Pseudonymization |
---
## P0 — Fix Immediately (< 1 day)
### 1. Fix Authorization: IDOR / BOLA
**What it fixes:** Broken Object Level Authorization — users can access other users' data by changing an ID.
**Detection pattern in code:**
```python
# VULNERABLE — no ownership check
@app.get("/api/orders/{order_id}")
def get_order(order_id: int):
return db.query(Order).filter(Order.id == order_id).first()
# SECURE — ownership check
@app.get("/api/orders/{order_id}")
def get_order(order_id: int, current_user: User = Depends(get_current_user)):
order = db.query(Order).filter(
Order.id == order_id,
Order.user_id == current_user.id # ownership check
).first()
if not order:
raise HTTPException(status_code=404)
return order
```
```typescript
// VULNERABLE
app.get('/api/users/:id/profile', authenticate, async (req, res) => {
const user = await User.findById(req.params.id);
res.json(user);
});
// SECURE
app.get('/api/users/:id/profile', authenticate, async (req, res) => {
if (req.params.id !== req.user.id && !req.user.isAdmin) {
return res.status(403).json({ error: 'Forbidden' });
}
const user = await User.findById(req.params.id);
res.json(user);
});
```
```csharp
// VULNERABLE
[HttpGet("orders/{orderId}")]
public async Task<IActionResult> GetOrder(int orderId)
{
var order = await _db.Orders.FindAsync(orderId);
return Ok(order);
}
// SECURE
[HttpGet("orders/{orderId}")]
[Authorize]
public async Task<IActionResult> GetOrder(int orderId)
{
var userId = User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
var order = await _db.Orders
.Where(o => o.Id == orderId && o.UserId == userId)
.FirstOrDefaultAsync();
if (order == null) return NotFound();
return Ok(order);
}
```
---
### 2. Remove Sensitive Fields from API Responses
**What it fixes:** Over-fetching — APIs return more data than the client needs.
**Pattern:**
```typescript
// VULNERABLE — returns all fields including passwordHash, ssn
const user = await User.findById(id);
res.json(user);
// SECURE — explicit projection
const user = await User.findById(id).select('id name email createdAt');
res.json(user);
```
```python
# SECURE — Pydantic response model (FastAPI)
class UserPublicResponse(BaseModel):
id: int
name: str
email: str
# NOTE: password_hash, ssn, date_of_birth NOT included
@app.get("/api/users/{id}", response_model=UserPublicResponse)
def get_user(id: int):
return db.query(User).filter(User.id == id).first()
```
```java
// SECURE — DTO with @JsonIgnore
public class UserResponse {
public String id;
public String name;
public String email;
// passwordHash, ssn not included in DTO
}
```
---
### 3. Remove Plaintext Credentials from Code
**Detection patterns:**
```
# Patterns to search for in all files:
password\s*=\s*["'][^"']+["']
api_key\s*=\s*["'][^"']+["']
secret\s*=\s*["'][^"']+["']
token\s*=\s*["'][^"']+["']
connectionString\s*=\s*["'][^"']+["']
```
**Fix pattern:**
```python
# VULNERABLE
DATABASE_URL = "postgresql://user:p@ssw0rd@prod-db.example.com/mydb"
# SECURE
import os
DATABASE_URL = os.environ.get("DATABASE_URL")
# In production: use Azure Key Vault, AWS Secrets Manager, or GCP Secret Manager
```
---
## P1 — Fix This Week
### 4. Field-Level Encryption for Tier 1 Data
Encrypt sensitive fields **before** storing them. The encryption key lives in a KMS, not in the database.
**Python / SQLAlchemy + Azure Key Vault:**
```python
from azure.keyvault.secrets import SecretClient
from cryptography.fernet import Fernet
# Encrypt at write time
def encrypt_field(value: str, key: bytes) -> str:
f = Fernet(key)
return f.encrypt(value.encode()).decode()
# Decrypt at read time (only when authorized)
def decrypt_field(encrypted_value: str, key: bytes) -> str:
f = Fernet(key)
return f.decrypt(encrypted_value.encode()).decode()
```
**Node.js / Prisma + AWS KMS:**
```typescript
import { KMSClient, EncryptCommand, DecryptCommand } from "@aws-sdk/client-kms";
const kms = new KMSClient({ region: "us-east-1" });
async function encryptField(plaintext: string): Promise<string> {
const { CiphertextBlob } = await kms.send(new EncryptCommand({
KeyId: process.env.KMS_KEY_ARN,
Plaintext: Buffer.from(plaintext),
}));
return Buffer.from(CiphertextBlob!).toString('base64');
}
```
**C# / EF Core + Azure Key Vault:**
```csharp
// Use Always Encrypted for SQL Server / Azure SQL
// Or manually encrypt with Azure Key Vault
services.AddDbContext<AppDbContext>(options =>
options.UseSqlServer(connectionString, sqlOptions =>
sqlOptions.EnableSensitiveDataLogging(false)));
// In entity:
[Column(TypeName = "nvarchar(500)")]
public string EncryptedSsn { get; set; } // store Base64 ciphertext
```
**Fields that MUST be field-encrypted (Tier 1):**
- SSN / national ID numbers
- Passport numbers
- Full payment card numbers (better: use tokenization, see below)
- Medical record data / diagnoses
- Biometric templates
---
### 5. Tokenize Payment Card Data
**Never store full card numbers.** Use a PCI-compliant vault instead.
**Recommended providers:**
- Stripe (tokenizes via Elements/PaymentIntents — you never touch card numbers)
- Braintree / PayPal
- Adyen
- Square
**Pattern:**
```typescript
// CORRECT — use Stripe's tokenization
const paymentMethod = await stripe.paymentMethods.create({
type: 'card',
card: { token: cardToken }, // token from client-side Stripe.js
});
// Store: paymentMethod.id (token) — never the card number
// WRONG — never do this
const cardNumber = req.body.cardNumber; // Tier 2 PCI-DSS violation
await db.save({ userId, cardNumber }); // DO NOT store raw card data
```
---
### 6. Remove PII from Log Statements
**Pattern to search for and fix:**
```python
# VULNERABLE
logger.info(f"User {user.email} logged in")
logger.debug(f"Payment by {user.full_name}, card ending {card_last4}")
# SECURE — log opaque identifiers, not PII
logger.info(f"User {user.id} authenticated", extra={"user_id": user.id})
logger.debug(f"Payment processed", extra={"user_id": user.id, "payment_id": payment_id})
```
```typescript
// VULNERABLE
console.log(`Processing order for ${user.email} at ${user.address}`);
// SECURE
logger.info('Processing order', { userId: user.id, orderId: order.id });
```
**Structured logging fields that are SAFE to log:**
- Internal user ID (UUID/opaque)
- Session ID (if short-lived and not externally shared)
- Transaction/correlation IDs
- Error codes and error types
- Timestamps
- HTTP status codes
- Duration/latency
**Structured logging fields that are UNSAFE:**
- Email addresses
- IP addresses (must be masked — last octet)
- Full names
- Phone numbers
- Any Tier 13 sensitive fields
---
## P2 — Fix This Sprint
### 7. Implement Data Access Audit Logging
Every read/write of Tier 1 and Tier 2 data must be logged to an immutable audit log.
**What to log:**
```
{
timestamp: ISO8601,
actor_id: "user UUID",
actor_role: "admin|user|service",
action: "READ|WRITE|DELETE|EXPORT",
resource_type: "User|HealthRecord|PaymentMethod",
resource_id: "UUID of accessed record",
fields_accessed: ["email", "phone"], // NOT the values
ip_address: "masked IP",
result: "success|denied",
correlation_id: "request trace ID"
}
```
**Do NOT log the actual sensitive field values in the audit log.**
**Separation:** Store audit logs in a **separate** database/storage account with stricter access controls than the application database.
---
### 8. Rate Limit Sensitive Endpoints
Prevents automated bulk data harvesting even if an auth vulnerability exists.
```typescript
// Express + express-rate-limit
import rateLimit from 'express-rate-limit';
// Aggressive limit for data export endpoint
const exportLimiter = rateLimit({
windowMs: 60 * 60 * 1000, // 1 hour
max: 5, // max 5 exports per hour per IP
message: 'Too many export requests'
});
// Standard limit for data lookup
const lookupLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100
});
app.get('/api/export', exportLimiter, authMiddleware, exportController);
app.get('/api/users/:id', lookupLimiter, authMiddleware, userController);
```
---
## P3 — Fix This Quarter
### 9. Implement Data Retention and Auto-Deletion
**Every table with personal data must have a defined retention policy.**
```sql
-- Add retention column to all PII tables
ALTER TABLE users ADD COLUMN retention_expires_at TIMESTAMP;
ALTER TABLE health_records ADD COLUMN retention_expires_at TIMESTAMP;
-- Set retention at insert time
INSERT INTO users (email, retention_expires_at)
VALUES ($1, NOW() + INTERVAL '7 years');
-- Scheduled job to hard-delete expired records (or anonymize)
DELETE FROM users
WHERE retention_expires_at < NOW()
AND deletion_notified_at IS NOT NULL; -- ensure user was notified
```
**Python scheduled cleanup:**
```python
from apscheduler.schedulers.asyncio import AsyncIOScheduler
async def purge_expired_records():
await db.execute(
"DELETE FROM user_sessions WHERE expires_at < NOW()"
)
# Anonymize users (don't delete if financial records must be retained)
await db.execute("""
UPDATE users SET
email = CONCAT('deleted_', id, '@redacted.invalid'),
phone = NULL,
address = NULL,
date_of_birth = NULL
WHERE retention_expires_at < NOW() AND deleted_at IS NULL
""")
scheduler = AsyncIOScheduler()
scheduler.add_job(purge_expired_records, 'cron', hour=2) # 2 AM daily
scheduler.start()
```
---
### 10. Pseudonymize Behavioral and Analytics Data
Replace direct user identifiers in analytics with pseudonymous tokens.
```python
import hashlib
import hmac
PSEUDONYM_SALT = os.environ.get("PSEUDONYM_SALT") # stored in Key Vault
def pseudonymize_user_id(real_user_id: str) -> str:
"""
One-way: analyst can track behavior across sessions
but cannot identify the real user without the salt.
"""
return hmac.new(
PSEUDONYM_SALT.encode(),
real_user_id.encode(),
hashlib.sha256
).hexdigest()
# In analytics event
analytics.track({
"user_id": pseudonymize_user_id(user.id), # NOT real user ID
"event": "page_viewed",
"page": request.path,
"timestamp": datetime.utcnow().isoformat()
})
```
---
## Quick Win Checklist (Complete in < 1 day)
- [ ] Search all files for hardcoded secrets → move to env vars / Key Vault
- [ ] Check all `SELECT *` queries → add explicit column list excluding sensitive fields
- [ ] Verify storage buckets/containers → block public access
- [ ] Remove `console.log` / `logger.debug` calls that print request bodies
- [ ] Add `HttpOnly; Secure; SameSite=Strict` to all session cookies
- [ ] Verify that `/api/admin/*` routes require admin role check
- [ ] Confirm password reset tokens expire in < 15 minutes
- [ ] Check that 500 error responses don't include stack traces in production
- [ ] Verify `.env` and secret files are in `.gitignore`
- [ ] Run `git log --all --full-history -- "*.env"` to check for historical secret commits
---
## Blast Radius Reduction by Control Applied
When reporting the hardening roadmap, use these estimates:
| Control Applied | Blast Radius Reduction | Justification |
|----------------|----------------------|---------------|
| Fix all IDOR vulnerabilities | 8090% | Most breach scenarios exploit authorization flaws |
| Field encryption for T1 data | 7585% | Encrypted data is useless without KMS key |
| Remove PII from logs | 4060% | Log access is often less controlled than DB access |
| Tokenize payment data | 95% for card data | Standard PCI-DSS compliance eliminates card data scope |
| Rate limit data endpoints | 3050% | Limits scale of automated harvesting attacks |
| Data retention enforcement | 2040% | Reduces "data lake" effect — less data to steal |
| Audit logging + anomaly detection | 0% prevention, but -60% detection time | Breaches are caught faster |
| Pseudonymization of analytics | 6070% for analytics data | Analytics data decoupled from identity |
| Architecture: separate analytics from PII | 5070% | Breach of analytics store has no PII value |