Files
sales-data-analysis/.cursor/rules/code_quality.md
Jonathan Pressnell cf0b596449 Initial commit: sales analysis template
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-06 09:16:34 -05:00

9.7 KiB

Code Quality & Best Practices

Comprehensive guide for writing Cursor-optimized code in the sales analysis template.

This document combines code quality standards and Cursor best practices to ensure AI assistants can effectively understand, modify, and extend the codebase.

Type Hints

When to Use Type Hints

Use type hints for:

  • Function parameters
  • Return values
  • Class attributes
  • Complex data structures

Example Pattern

from typing import Dict, List, Optional, Tuple
import pandas as pd

def calculate_annual_metrics(
    df: pd.DataFrame,
    metrics_func: callable,
    ltm_start: Optional[pd.Period] = None,
    ltm_end: Optional[pd.Period] = None
) -> pd.DataFrame:
    """
    Calculate annual metrics for all years
    
    Args:
        df: DataFrame with 'Year' and 'YearMonth' columns
        metrics_func: Function that takes a DataFrame and returns a dict of metrics
        ltm_start: LTM start period (defaults to config if None)
        ltm_end: LTM end period (defaults to config if None)
    
    Returns:
        DataFrame with 'Year' index and metric columns
    """
    # Implementation

Docstrings

Docstring Format

All functions should use Google-style docstrings:

def function_name(param1: type, param2: type) -> return_type:
    """
    Brief description of what the function does.
    
    More detailed explanation if needed. Can span multiple lines.
    Explain any complex logic or important considerations.
    
    Args:
        param1: Description of param1
        param2: Description of param2
    
    Returns:
        Description of return value
    
    Raises:
        ValueError: When and why this exception is raised
    
    Example:
        >>> result = function_name(value1, value2)
        >>> print(result)
        expected_output
    """

Required Elements

  • Brief one-line summary
  • Detailed description (if needed)
  • Args section (all parameters)
  • Returns section (return value)
  • Raises section (if exceptions raised)
  • Example section (for complex functions)

Variable Naming

Conventions

  • Descriptive names: customer_revenue not cr
  • Consistent prefixes: df_ for DataFrames, annual_ for annual metrics
  • Clear abbreviations: ltm for Last Twelve Months (well-known)
  • Avoid single letters: Except for loop variables (i, j, k)

Good Examples

# Good
customer_revenue_by_year = df.groupby(['Customer', 'Year'])[REVENUE_COLUMN].sum()
annual_metrics_df = calculate_annual_metrics(df, metrics_func)
ltm_start_period, ltm_end_period = get_ltm_period_config()

# Bad
cr = df.groupby(['C', 'Y'])['R'].sum()
am = calc(df, mf)
s, e = get_ltm()

Error Messages

Structure

Error messages should be:

  1. Specific: What exactly went wrong
  2. Actionable: How to fix it
  3. Contextual: Where it occurred
  4. Helpful: Reference to documentation

Good Error Messages

# Good
raise ValueError(
    f"Required column '{REVENUE_COLUMN}' not found in data.\n"
    f"Available columns: {list(df.columns)}\n"
    f"Please update config.py REVENUE_COLUMN to match your data.\n"
    f"See .cursor/rules/data_loading.md for more help."
)

# Bad
raise ValueError("Column not found")

Code Comments

When to Comment

  • Complex logic that isn't immediately obvious
  • Business rules or domain-specific knowledge
  • Workarounds or non-obvious solutions
  • Performance considerations
  • TODO items with context

Comment Style

# Good: Explains WHY, not WHAT
# Use LTM for most recent year to enable apples-to-apples comparison
# with full calendar years (avoids partial year bias)
if year == LTM_END_YEAR and LTM_ENABLED:
    year_data = get_ltm_data(df, ltm_start, ltm_end)

# Bad: States the obvious
# Check if year equals LTM_END_YEAR
if year == LTM_END_YEAR:

Function Design

Single Responsibility

Each function should do one thing well:

# Good: Single responsibility
def calculate_revenue(df: pd.DataFrame) -> float:
    """Calculate total revenue from DataFrame"""
    return df[REVENUE_COLUMN].sum()

def calculate_customer_count(df: pd.DataFrame) -> int:
    """Calculate unique customer count"""
    return df[CUSTOMER_COLUMN].nunique()

# Bad: Multiple responsibilities
def calculate_metrics(df):
    """Calculate revenue and customer count"""
    revenue = df[REVENUE_COLUMN].sum()
    customers = df[CUSTOMER_COLUMN].nunique()
    return revenue, customers

Function Length

  • Keep functions under 50 lines when possible
  • Break complex functions into smaller helper functions
  • Use descriptive function names that explain purpose

Import Organization

Standard Order

  1. Standard library imports
  2. Third-party imports (pandas, numpy, matplotlib)
  3. Local/template imports (data_loader, analysis_utils, config)

Example

# Standard library
from pathlib import Path
from typing import Dict, Optional
from datetime import datetime

# Third-party
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Template imports
from data_loader import load_sales_data, validate_data_structure
from analysis_utils import calculate_annual_metrics, setup_revenue_chart
from config import REVENUE_COLUMN, CHART_SIZES, COMPANY_NAME

Constants and Configuration

Use Config Values

# Good: From config
from config import REVENUE_COLUMN, DATE_COLUMN
revenue = df[REVENUE_COLUMN].sum()

# Bad: Hardcoded
revenue = df['USD'].sum()

Magic Numbers

Avoid magic numbers - use named constants or config:

# Good: Named constant
MILLIONS_DIVISOR = 1e6
revenue_millions = revenue / MILLIONS_DIVISOR

# Or from config
CHART_DPI = 300  # In config.py

# Bad: Magic number
revenue_millions = revenue / 1000000

Testing Considerations

Testable Code

Write code that's easy to test:

  • Pure functions when possible (no side effects)
  • Dependency injection for external dependencies
  • Clear inputs and outputs

Example

# Good: Testable
def calculate_metrics(year_data: pd.DataFrame, revenue_col: str) -> Dict:
    """Calculate metrics - easy to test with sample data"""
    return {
        'Revenue': year_data[revenue_col].sum(),
        'Count': len(year_data)
    }

# Harder to test: Depends on global config
def calculate_metrics(year_data):
    """Uses global REVENUE_COLUMN - harder to test"""
    return {'Revenue': year_data[REVENUE_COLUMN].sum()}

AI-Friendly Patterns

Clear Intent

Code should clearly express intent:

# Good: Intent is clear
customers_with_revenue = df[df[REVENUE_COLUMN] > 0][CUSTOMER_COLUMN].unique()

# Less clear: Requires understanding of pandas
customers_with_revenue = df.loc[df[REVENUE_COLUMN] > 0, CUSTOMER_COLUMN].unique()

Explicit Over Implicit

# Good: Explicit
if LTM_ENABLED and ltm_start is not None and ltm_end is not None:
    use_ltm = True
else:
    use_ltm = False

# Less clear: Implicit truthiness
use_ltm = LTM_ENABLED and ltm_start and ltm_end

Documentation for AI

Help AI Understand Context

Add comments that help AI understand business context:

# LTM (Last Twelve Months) is used for the most recent partial year
# to enable fair comparison with full calendar years.
# Example: If latest data is through Sep 2025, use Oct 2024 - Sep 2025
if year == LTM_END_YEAR and LTM_ENABLED:
    # Use 12-month rolling period instead of partial calendar year
    year_data = get_ltm_data(df, ltm_start, ltm_end)

Cursor-Specific Optimizations

AI-Friendly Code Structure

Code should be structured so Cursor AI can:

  1. Understand intent - Clear function names and comments
  2. Generate code - Follow established patterns
  3. Fix errors - Actionable error messages
  4. Extend functionality - Modular, reusable functions

Example: AI-Generated Code Pattern

When AI generates code, it should automatically:

# AI recognizes this pattern and replicates it
def main():
    # 1. Load data (AI knows to use data_loader)
    df = load_sales_data(get_data_path())
    
    # 2. Validate (AI knows to check structure)
    is_valid, msg = validate_data_structure(df)
    if not is_valid:
        print(f"ERROR: {msg}")
        return
    
    # 3. Apply filters (AI knows exclusion filters)
    df = apply_exclusion_filters(df)
    
    # 4. Analysis logic (AI follows template patterns)
    # ...
    
    # 5. Create charts (AI knows formatting rules)
    # ...
    
    # 6. Validate revenue (AI knows to validate)
    validate_revenue(df, ANALYSIS_NAME)

Help AI Generate Better Code

Add context comments that help AI:

# LTM (Last Twelve Months) is used for the most recent partial year
# to enable fair comparison with full calendar years.
# Example: If latest data is through Sep 2025, use Oct 2024 - Sep 2025
# This avoids partial-year bias in year-over-year comparisons.
if year == LTM_END_YEAR and LTM_ENABLED:
    # Use 12-month rolling period instead of partial calendar year
    year_data = get_ltm_data(df, ltm_start, ltm_end)
    year_label = get_ltm_label()  # Returns "2025 (LTM 9/2025)"

Summary Checklist

For Cursor-optimized code:

  • Comprehensive docstrings with examples
  • Type hints on functions
  • Descriptive variable names
  • Clear comments for business logic
  • Structured error messages
  • Consistent code patterns
  • Use config values (never hardcode)
  • Follow template utilities
  • Include validation steps
  • Reference documentation

Summary

Follow these standards to ensure:

  1. AI can understand code structure
  2. AI can modify code safely
  3. AI can generate new code following patterns
  4. Code is maintainable and readable
  5. Errors are clear and actionable
  6. Cursor AI can assist effectively

Last Updated: January 2026
For: Cursor AI optimization and human developers