Files

admin 2a3dedde11

2026-01-30 03:04:10 +00:00

9.3 KiB

Raw Permalink Blame History

Data Protection Reference

Overview

Data protection encompasses safeguarding sensitive information throughout its lifecycle: collection, processing, storage, transmission, and disposal. Security failures at any stage can lead to data breaches.

Sensitive Data Categories

Personal Identifiable Information (PII)

Full names, addresses, phone numbers
Email addresses
Social Security Numbers, national IDs
Dates of birth
Biometric data

Financial Information

Credit card numbers (PAN)
Bank account numbers
Financial transactions
Payment credentials

Authentication Credentials

Passwords (plaintext or weakly hashed)
API keys and tokens
Session identifiers
Private keys

Health Information (PHI/HIPAA)

Medical records
Health conditions
Treatment information
Insurance data

Sensitive Data Exposure Prevention

1. Data Classification

Classify all data by sensitivity level:

Level	Examples	Handling
Public	Marketing content	No restrictions
Internal	Employee directory	Access controls
Confidential	Customer data	Encryption + access controls
Restricted	Passwords, keys, PCI data	Strong encryption + audit logs

2. Minimize Data Collection

# VULNERABLE: Collecting unnecessary data
user_data = {
    'name': form.name,
    'email': form.email,
    'ssn': form.ssn,  # Why do you need this?
    'mother_maiden_name': form.mother_maiden_name,  # Security risk
    'password': form.password,  # Never store plaintext
}

# SAFE: Collect only what's needed
user_data = {
    'name': form.name,
    'email': form.email,
}

3. Encryption at Rest

# Database-level encryption
# Configure in database settings (TDE for SQL Server, etc.)

# Application-level encryption for specific fields
from cryptography.fernet import Fernet

def encrypt_ssn(ssn):
    f = Fernet(get_encryption_key())
    return f.encrypt(ssn.encode())

def decrypt_ssn(encrypted_ssn):
    f = Fernet(get_encryption_key())
    return f.decrypt(encrypted_ssn).decode()

4. Encryption in Transit

# VULNERABLE: HTTP endpoint
app.run(host='0.0.0.0', port=80)

# SAFE: HTTPS required
app.run(host='0.0.0.0', port=443, ssl_context='adhoc')

# BETTER: Proper TLS configuration
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ssl_context.load_cert_chain('cert.pem', 'key.pem')
ssl_context.minimum_version = ssl.TLSVersion.TLSv1_2

Information Disclosure Prevention

Error Messages

# VULNERABLE: Detailed error messages
@app.errorhandler(Exception)
def handle_error(e):
    return {
        'error': str(e),
        'traceback': traceback.format_exc(),
        'sql_query': last_query,
        'server': socket.gethostname()
    }, 500

# SAFE: Generic error messages
@app.errorhandler(Exception)
def handle_error(e):
    # Log full details server-side
    app.logger.error(f"Error: {e}", exc_info=True)

    # Return generic message to client
    return {'error': 'An unexpected error occurred'}, 500

Stack Traces

# VULNERABLE: Debug mode in production
app.run(debug=True)

# SAFE: Debug off, custom error pages
app.run(debug=False)

@app.errorhandler(404)
def not_found(e):
    return render_template('404.html'), 404

@app.errorhandler(500)
def server_error(e):
    return render_template('500.html'), 500

API Response Filtering

# VULNERABLE: Returning all fields
@app.route('/api/users/<id>')
def get_user(id):
    user = User.query.get(id)
    return jsonify(user.__dict__)  # Includes password_hash, internal_id, etc.

# SAFE: Explicit field selection
@app.route('/api/users/<id>')
def get_user(id):
    user = User.query.get(id)
    return jsonify({
        'id': user.public_id,
        'name': user.name,
        'email': user.email
    })

Server Headers

# VULNERABLE: Technology disclosure
# Response headers reveal:
# Server: Apache/2.4.41 (Ubuntu)
# X-Powered-By: PHP/7.4.3
# X-AspNet-Version: 4.0.30319

# SAFE: Remove or genericize headers
# In nginx:
# server_tokens off;

# In Express.js:
app.disable('x-powered-by');

# In Flask:
@app.after_request
def remove_headers(response):
    response.headers.pop('Server', None)
    return response

Logging Security

What NOT to Log

# VULNERABLE: Logging sensitive data
logger.info(f"User login: {username}, password: {password}")
logger.info(f"API call with key: {api_key}")
logger.info(f"Credit card: {card_number}")
logger.debug(f"Session token: {session_id}")

# SAFE: Sanitized logging
logger.info(f"User login: {username}")
logger.info(f"API call with key: {api_key[:4]}****")
logger.info(f"Credit card: ****{card_number[-4:]}")
logger.debug(f"Session token: {hash_for_logging(session_id)}")

Sensitive Data Patterns to Avoid in Logs

Data Type	Pattern
Passwords	`password`, `passwd`, `pwd`, `secret`
API Keys	`api_key`, `apikey`, `token`, `bearer`
Credit Cards	16-digit numbers, `card_number`
SSN	`\d{3}-\d{2}-\d{4}`, `ssn`, `social`
Session IDs	`session`, `sess_id`, `jsessionid`

Log Injection Prevention

# VULNERABLE: User input directly in logs
logger.info(f"Search query: {user_input}")
# Attack: user_input = "test\nINFO: Admin logged in"

# SAFE: Sanitize before logging
def sanitize_for_log(text):
    return text.replace('\n', '\\n').replace('\r', '\\r')

logger.info(f"Search query: {sanitize_for_log(user_input)}")

Secure Data Disposal

Memory Handling

# Python strings are immutable - difficult to clear
# Use bytearray for sensitive data when possible

# BETTER: Clear sensitive data
import ctypes

def secure_zero(data):
    """Zero out sensitive data in memory."""
    if isinstance(data, bytearray):
        for i in range(len(data)):
            data[i] = 0
    elif isinstance(data, bytes):
        # Can't modify bytes, but can overwrite the reference
        pass

# In Java:
# char[] password = getPassword();
# try { ... }
# finally { Arrays.fill(password, '\0'); }

File Deletion

# VULNERABLE: Simple delete (data recoverable)
os.remove(sensitive_file)

# SAFER: Overwrite before delete
def secure_delete(filepath):
    with open(filepath, 'ba+') as f:
        length = f.tell()
        f.seek(0)
        f.write(os.urandom(length))  # Random overwrite
        f.flush()
        os.fsync(f.fileno())
    os.remove(filepath)

Database Retention

# Implement data retention policies
def cleanup_old_data():
    cutoff = datetime.now() - timedelta(days=RETENTION_DAYS)

    # Delete old records
    OldRecord.query.filter(OldRecord.created_at < cutoff).delete()

    # Or anonymize instead of delete
    User.query.filter(User.last_login < cutoff).update({
        'email': func.concat('deleted_', User.id, '@example.com'),
        'name': 'Deleted User',
        'phone': None
    })

Cache Security

# VULNERABLE: Caching sensitive data
@cache.cached(timeout=3600)
def get_user_with_ssn(user_id):
    return User.query.get(user_id)  # Includes SSN

# SAFE: Don't cache sensitive data
def get_user_with_ssn(user_id):
    return User.query.get(user_id)  # Not cached

# Or cache only non-sensitive parts
@cache.cached(timeout=3600)
def get_user_profile(user_id):
    user = User.query.get(user_id)
    return {
        'id': user.id,
        'name': user.name,
        # SSN excluded
    }

Cache Headers

# For sensitive pages
response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Pragma'] = 'no-cache'
response.headers['Expires'] = '0'

Grep Patterns for Detection

# Sensitive data in logs
grep -rn "logger.*password\|log.*password\|print.*password" --include="*.py" --include="*.js"
grep -rn "logger.*token\|log.*api_key\|print.*secret" --include="*.py" --include="*.js"

# Debug mode
grep -rn "debug.*[Tt]rue\|DEBUG.*=.*1" --include="*.py" --include="*.js" --include="*.env"

# Stack traces in responses
grep -rn "traceback\|stack_trace\|exc_info" --include="*.py" | grep -i "return\|response\|json"

# Verbose errors
grep -rn "str(e)\|str(exception)" --include="*.py" | grep -i "return\|response"

# Technology disclosure
grep -rn "X-Powered-By\|Server:" --include="*.py" --include="*.js" --include="*.conf"

# Missing cache headers
grep -rn "Set-Cookie\|session" --include="*.py" | grep -v "Cache-Control"

Testing Checklist

Sensitive data encrypted at rest
All transmissions over TLS 1.2+
Error messages are generic (no stack traces, SQL errors, paths)
Logging excludes sensitive data (passwords, tokens, PII)
API responses filtered to necessary fields only
Server headers don't reveal technology stack
Sensitive pages have no-cache headers
Data retention policies implemented
Secure deletion procedures for sensitive files
Debug mode disabled in production

9.3 KiB Raw Permalink Blame History