9.3 KiB
9.3 KiB
Data Protection Reference
Overview
Data protection encompasses safeguarding sensitive information throughout its lifecycle: collection, processing, storage, transmission, and disposal. Security failures at any stage can lead to data breaches.
Sensitive Data Categories
Personal Identifiable Information (PII)
- Full names, addresses, phone numbers
- Email addresses
- Social Security Numbers, national IDs
- Dates of birth
- Biometric data
Financial Information
- Credit card numbers (PAN)
- Bank account numbers
- Financial transactions
- Payment credentials
Authentication Credentials
- Passwords (plaintext or weakly hashed)
- API keys and tokens
- Session identifiers
- Private keys
Health Information (PHI/HIPAA)
- Medical records
- Health conditions
- Treatment information
- Insurance data
Sensitive Data Exposure Prevention
1. Data Classification
Classify all data by sensitivity level:
| Level | Examples | Handling |
|---|---|---|
| Public | Marketing content | No restrictions |
| Internal | Employee directory | Access controls |
| Confidential | Customer data | Encryption + access controls |
| Restricted | Passwords, keys, PCI data | Strong encryption + audit logs |
2. Minimize Data Collection
# VULNERABLE: Collecting unnecessary data
user_data = {
'name': form.name,
'email': form.email,
'ssn': form.ssn, # Why do you need this?
'mother_maiden_name': form.mother_maiden_name, # Security risk
'password': form.password, # Never store plaintext
}
# SAFE: Collect only what's needed
user_data = {
'name': form.name,
'email': form.email,
}
3. Encryption at Rest
# Database-level encryption
# Configure in database settings (TDE for SQL Server, etc.)
# Application-level encryption for specific fields
from cryptography.fernet import Fernet
def encrypt_ssn(ssn):
f = Fernet(get_encryption_key())
return f.encrypt(ssn.encode())
def decrypt_ssn(encrypted_ssn):
f = Fernet(get_encryption_key())
return f.decrypt(encrypted_ssn).decode()
4. Encryption in Transit
# VULNERABLE: HTTP endpoint
app.run(host='0.0.0.0', port=80)
# SAFE: HTTPS required
app.run(host='0.0.0.0', port=443, ssl_context='adhoc')
# BETTER: Proper TLS configuration
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ssl_context.load_cert_chain('cert.pem', 'key.pem')
ssl_context.minimum_version = ssl.TLSVersion.TLSv1_2
Information Disclosure Prevention
Error Messages
# VULNERABLE: Detailed error messages
@app.errorhandler(Exception)
def handle_error(e):
return {
'error': str(e),
'traceback': traceback.format_exc(),
'sql_query': last_query,
'server': socket.gethostname()
}, 500
# SAFE: Generic error messages
@app.errorhandler(Exception)
def handle_error(e):
# Log full details server-side
app.logger.error(f"Error: {e}", exc_info=True)
# Return generic message to client
return {'error': 'An unexpected error occurred'}, 500
Stack Traces
# VULNERABLE: Debug mode in production
app.run(debug=True)
# SAFE: Debug off, custom error pages
app.run(debug=False)
@app.errorhandler(404)
def not_found(e):
return render_template('404.html'), 404
@app.errorhandler(500)
def server_error(e):
return render_template('500.html'), 500
API Response Filtering
# VULNERABLE: Returning all fields
@app.route('/api/users/<id>')
def get_user(id):
user = User.query.get(id)
return jsonify(user.__dict__) # Includes password_hash, internal_id, etc.
# SAFE: Explicit field selection
@app.route('/api/users/<id>')
def get_user(id):
user = User.query.get(id)
return jsonify({
'id': user.public_id,
'name': user.name,
'email': user.email
})
Server Headers
# VULNERABLE: Technology disclosure
# Response headers reveal:
# Server: Apache/2.4.41 (Ubuntu)
# X-Powered-By: PHP/7.4.3
# X-AspNet-Version: 4.0.30319
# SAFE: Remove or genericize headers
# In nginx:
# server_tokens off;
# In Express.js:
app.disable('x-powered-by');
# In Flask:
@app.after_request
def remove_headers(response):
response.headers.pop('Server', None)
return response
Logging Security
What NOT to Log
# VULNERABLE: Logging sensitive data
logger.info(f"User login: {username}, password: {password}")
logger.info(f"API call with key: {api_key}")
logger.info(f"Credit card: {card_number}")
logger.debug(f"Session token: {session_id}")
# SAFE: Sanitized logging
logger.info(f"User login: {username}")
logger.info(f"API call with key: {api_key[:4]}****")
logger.info(f"Credit card: ****{card_number[-4:]}")
logger.debug(f"Session token: {hash_for_logging(session_id)}")
Sensitive Data Patterns to Avoid in Logs
| Data Type | Pattern |
|---|---|
| Passwords | password, passwd, pwd, secret |
| API Keys | api_key, apikey, token, bearer |
| Credit Cards | 16-digit numbers, card_number |
| SSN | \d{3}-\d{2}-\d{4}, ssn, social |
| Session IDs | session, sess_id, jsessionid |
Log Injection Prevention
# VULNERABLE: User input directly in logs
logger.info(f"Search query: {user_input}")
# Attack: user_input = "test\nINFO: Admin logged in"
# SAFE: Sanitize before logging
def sanitize_for_log(text):
return text.replace('\n', '\\n').replace('\r', '\\r')
logger.info(f"Search query: {sanitize_for_log(user_input)}")
Secure Data Disposal
Memory Handling
# Python strings are immutable - difficult to clear
# Use bytearray for sensitive data when possible
# BETTER: Clear sensitive data
import ctypes
def secure_zero(data):
"""Zero out sensitive data in memory."""
if isinstance(data, bytearray):
for i in range(len(data)):
data[i] = 0
elif isinstance(data, bytes):
# Can't modify bytes, but can overwrite the reference
pass
# In Java:
# char[] password = getPassword();
# try { ... }
# finally { Arrays.fill(password, '\0'); }
File Deletion
# VULNERABLE: Simple delete (data recoverable)
os.remove(sensitive_file)
# SAFER: Overwrite before delete
def secure_delete(filepath):
with open(filepath, 'ba+') as f:
length = f.tell()
f.seek(0)
f.write(os.urandom(length)) # Random overwrite
f.flush()
os.fsync(f.fileno())
os.remove(filepath)
Database Retention
# Implement data retention policies
def cleanup_old_data():
cutoff = datetime.now() - timedelta(days=RETENTION_DAYS)
# Delete old records
OldRecord.query.filter(OldRecord.created_at < cutoff).delete()
# Or anonymize instead of delete
User.query.filter(User.last_login < cutoff).update({
'email': func.concat('deleted_', User.id, '@example.com'),
'name': 'Deleted User',
'phone': None
})
Cache Security
# VULNERABLE: Caching sensitive data
@cache.cached(timeout=3600)
def get_user_with_ssn(user_id):
return User.query.get(user_id) # Includes SSN
# SAFE: Don't cache sensitive data
def get_user_with_ssn(user_id):
return User.query.get(user_id) # Not cached
# Or cache only non-sensitive parts
@cache.cached(timeout=3600)
def get_user_profile(user_id):
user = User.query.get(user_id)
return {
'id': user.id,
'name': user.name,
# SSN excluded
}
Cache Headers
# For sensitive pages
response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Pragma'] = 'no-cache'
response.headers['Expires'] = '0'
Grep Patterns for Detection
# Sensitive data in logs
grep -rn "logger.*password\|log.*password\|print.*password" --include="*.py" --include="*.js"
grep -rn "logger.*token\|log.*api_key\|print.*secret" --include="*.py" --include="*.js"
# Debug mode
grep -rn "debug.*[Tt]rue\|DEBUG.*=.*1" --include="*.py" --include="*.js" --include="*.env"
# Stack traces in responses
grep -rn "traceback\|stack_trace\|exc_info" --include="*.py" | grep -i "return\|response\|json"
# Verbose errors
grep -rn "str(e)\|str(exception)" --include="*.py" | grep -i "return\|response"
# Technology disclosure
grep -rn "X-Powered-By\|Server:" --include="*.py" --include="*.js" --include="*.conf"
# Missing cache headers
grep -rn "Set-Cookie\|session" --include="*.py" | grep -v "Cache-Control"
Testing Checklist
- Sensitive data encrypted at rest
- All transmissions over TLS 1.2+
- Error messages are generic (no stack traces, SQL errors, paths)
- Logging excludes sensitive data (passwords, tokens, PII)
- API responses filtered to necessary fields only
- Server headers don't reveal technology stack
- Sensitive pages have no-cache headers
- Data retention policies implemented
- Secure deletion procedures for sensitive files
- Debug mode disabled in production