sales-data-analysis/.cursor/rules/code_quality.md

# Code Quality & Best Practices

**Comprehensive guide for writing Cursor-optimized code in the sales analysis template.**

This document combines code quality standards and Cursor best practices to ensure AI assistants can effectively understand, modify, and extend the codebase.

## Type Hints

### When to Use Type Hints

Use type hints for:
- Function parameters
- Return values
- Class attributes
- Complex data structures

### Example Pattern

```python
from typing import Dict, List, Optional, Tuple
import pandas as pd

def calculate_annual_metrics(
    df: pd.DataFrame,
    metrics_func: callable,
    ltm_start: Optional[pd.Period] = None,
    ltm_end: Optional[pd.Period] = None
) -> pd.DataFrame:
    """
    Calculate annual metrics for all years

    Args:
        df: DataFrame with 'Year' and 'YearMonth' columns
        metrics_func: Function that takes a DataFrame and returns a dict of metrics
        ltm_start: LTM start period (defaults to config if None)
        ltm_end: LTM end period (defaults to config if None)

    Returns:
        DataFrame with 'Year' index and metric columns
    """
    # Implementation
```

## Docstrings

### Docstring Format

All functions should use Google-style docstrings:

```python
def function_name(param1: type, param2: type) -> return_type:
    """
    Brief description of what the function does.

    More detailed explanation if needed. Can span multiple lines.
    Explain any complex logic or important considerations.

    Args:
        param1: Description of param1
        param2: Description of param2

    Returns:
        Description of return value

    Raises:
        ValueError: When and why this exception is raised

    Example:
        >>> result = function_name(value1, value2)
        >>> print(result)
        expected_output
    """
```

### Required Elements

- Brief one-line summary
- Detailed description (if needed)
- Args section (all parameters)
- Returns section (return value)
- Raises section (if exceptions raised)
- Example section (for complex functions)

## Variable Naming

### Conventions

- **Descriptive names:** `customer_revenue` not `cr`
- **Consistent prefixes:** `df_` for DataFrames, `annual_` for annual metrics
- **Clear abbreviations:** `ltm` for Last Twelve Months (well-known)
- **Avoid single letters:** Except for loop variables (`i`, `j`, `k`)

### Good Examples

```python
# Good
customer_revenue_by_year = df.groupby(['Customer', 'Year'])[REVENUE_COLUMN].sum()
annual_metrics_df = calculate_annual_metrics(df, metrics_func)
ltm_start_period, ltm_end_period = get_ltm_period_config()

# Bad
cr = df.groupby(['C', 'Y'])['R'].sum()
am = calc(df, mf)
s, e = get_ltm()
```

## Error Messages

### Structure

Error messages should be:
1. **Specific:** What exactly went wrong
2. **Actionable:** How to fix it
3. **Contextual:** Where it occurred
4. **Helpful:** Reference to documentation

### Good Error Messages

```python
# Good
raise ValueError(
    f"Required column '{REVENUE_COLUMN}' not found in data.\n"
    f"Available columns: {list(df.columns)}\n"
    f"Please update config.py REVENUE_COLUMN to match your data.\n"
    f"See .cursor/rules/data_loading.md for more help."
)

# Bad
raise ValueError("Column not found")
```

## Code Comments

### When to Comment

- Complex logic that isn't immediately obvious
- Business rules or domain-specific knowledge
- Workarounds or non-obvious solutions
- Performance considerations
- TODO items with context

### Comment Style

```python
# Good: Explains WHY, not WHAT
# Use LTM for most recent year to enable apples-to-apples comparison
# with full calendar years (avoids partial year bias)
if year == LTM_END_YEAR and LTM_ENABLED:
    year_data = get_ltm_data(df, ltm_start, ltm_end)

# Bad: States the obvious
# Check if year equals LTM_END_YEAR
if year == LTM_END_YEAR:
```

## Function Design

### Single Responsibility

Each function should do one thing well:

```python
# Good: Single responsibility
def calculate_revenue(df: pd.DataFrame) -> float:
    """Calculate total revenue from DataFrame"""
    return df[REVENUE_COLUMN].sum()

def calculate_customer_count(df: pd.DataFrame) -> int:
    """Calculate unique customer count"""
    return df[CUSTOMER_COLUMN].nunique()

# Bad: Multiple responsibilities
def calculate_metrics(df):
    """Calculate revenue and customer count"""
    revenue = df[REVENUE_COLUMN].sum()
    customers = df[CUSTOMER_COLUMN].nunique()
    return revenue, customers
```

### Function Length

- Keep functions under 50 lines when possible
- Break complex functions into smaller helper functions
- Use descriptive function names that explain purpose

## Import Organization

### Standard Order

1. Standard library imports
2. Third-party imports (pandas, numpy, matplotlib)
3. Local/template imports (data_loader, analysis_utils, config)

### Example

```python
# Standard library
from pathlib import Path
from typing import Dict, Optional
from datetime import datetime

# Third-party
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Template imports
from data_loader import load_sales_data, validate_data_structure
from analysis_utils import calculate_annual_metrics, setup_revenue_chart
from config import REVENUE_COLUMN, CHART_SIZES, COMPANY_NAME
```

## Constants and Configuration

### Use Config Values

```python
# Good: From config
from config import REVENUE_COLUMN, DATE_COLUMN
revenue = df[REVENUE_COLUMN].sum()

# Bad: Hardcoded
revenue = df['USD'].sum()
```

### Magic Numbers

Avoid magic numbers - use named constants or config:

```python
# Good: Named constant
MILLIONS_DIVISOR = 1e6
revenue_millions = revenue / MILLIONS_DIVISOR

# Or from config
CHART_DPI = 300  # In config.py

# Bad: Magic number
revenue_millions = revenue / 1000000
```

## Testing Considerations

### Testable Code

Write code that's easy to test:
- Pure functions when possible (no side effects)
- Dependency injection for external dependencies
- Clear inputs and outputs

### Example

```python
# Good: Testable
def calculate_metrics(year_data: pd.DataFrame, revenue_col: str) -> Dict:
    """Calculate metrics - easy to test with sample data"""
    return {
        'Revenue': year_data[revenue_col].sum(),
        'Count': len(year_data)
    }

# Harder to test: Depends on global config
def calculate_metrics(year_data):
    """Uses global REVENUE_COLUMN - harder to test"""
    return {'Revenue': year_data[REVENUE_COLUMN].sum()}
```

## AI-Friendly Patterns

### Clear Intent

Code should clearly express intent:

```python
# Good: Intent is clear
customers_with_revenue = df[df[REVENUE_COLUMN] > 0][CUSTOMER_COLUMN].unique()

# Less clear: Requires understanding of pandas
customers_with_revenue = df.loc[df[REVENUE_COLUMN] > 0, CUSTOMER_COLUMN].unique()
```

### Explicit Over Implicit

```python
# Good: Explicit
if LTM_ENABLED and ltm_start is not None and ltm_end is not None:
    use_ltm = True
else:
    use_ltm = False

# Less clear: Implicit truthiness
use_ltm = LTM_ENABLED and ltm_start and ltm_end
```

## Documentation for AI

### Help AI Understand Context

Add comments that help AI understand business context:

```python
# LTM (Last Twelve Months) is used for the most recent partial year
# to enable fair comparison with full calendar years.
# Example: If latest data is through Sep 2025, use Oct 2024 - Sep 2025
if year == LTM_END_YEAR and LTM_ENABLED:
    # Use 12-month rolling period instead of partial calendar year
    year_data = get_ltm_data(df, ltm_start, ltm_end)
```

## Cursor-Specific Optimizations

### AI-Friendly Code Structure

Code should be structured so Cursor AI can:
1. **Understand intent** - Clear function names and comments
2. **Generate code** - Follow established patterns
3. **Fix errors** - Actionable error messages
4. **Extend functionality** - Modular, reusable functions

### Example: AI-Generated Code Pattern

When AI generates code, it should automatically:
```python
# AI recognizes this pattern and replicates it
def main():
    # 1. Load data (AI knows to use data_loader)
    df = load_sales_data(get_data_path())

    # 2. Validate (AI knows to check structure)
    is_valid, msg = validate_data_structure(df)
    if not is_valid:
        print(f"ERROR: {msg}")
        return

    # 3. Apply filters (AI knows exclusion filters)
    df = apply_exclusion_filters(df)

    # 4. Analysis logic (AI follows template patterns)
    # ...

    # 5. Create charts (AI knows formatting rules)
    # ...

    # 6. Validate revenue (AI knows to validate)
    validate_revenue(df, ANALYSIS_NAME)
```

### Help AI Generate Better Code

Add context comments that help AI:
```python
# LTM (Last Twelve Months) is used for the most recent partial year
# to enable fair comparison with full calendar years.
# Example: If latest data is through Sep 2025, use Oct 2024 - Sep 2025
# This avoids partial-year bias in year-over-year comparisons.
if year == LTM_END_YEAR and LTM_ENABLED:
    # Use 12-month rolling period instead of partial calendar year
    year_data = get_ltm_data(df, ltm_start, ltm_end)
    year_label = get_ltm_label()  # Returns "2025 (LTM 9/2025)"
```

## Summary Checklist

For Cursor-optimized code:
- ✅ Comprehensive docstrings with examples
- ✅ Type hints on functions
- ✅ Descriptive variable names
- ✅ Clear comments for business logic
- ✅ Structured error messages
- ✅ Consistent code patterns
- ✅ Use config values (never hardcode)
- ✅ Follow template utilities
- ✅ Include validation steps
- ✅ Reference documentation

## Summary

Follow these standards to ensure:
1. AI can understand code structure
2. AI can modify code safely
3. AI can generate new code following patterns
4. Code is maintainable and readable
5. Errors are clear and actionable
6. Cursor AI can assist effectively

---

**Last Updated:** January 2026
**For:** Cursor AI optimization and human developers