Files
sales-data-analysis/.cursor/rules/analysis_patterns.md
Jonathan Pressnell cf0b596449 Initial commit: sales analysis template
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-06 09:16:34 -05:00

162 lines
4.1 KiB
Markdown

# Common Analysis Patterns
## ⭐ RECOMMENDED: Use Utilities
**Always prefer `analysis_utils.py` and `config.py` over manual implementations:**
- Consistent formatting
- Fewer errors
- Easier maintenance
- Standardized output
## Standard Script Structure (Using Utilities)
**RECOMMENDED:** Use `analysis_utils.py` and `config.py` for consistency:
```python
# 1. IMPORTS
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from data_loader import load_sales_data, validate_data_structure
from validate_revenue import validate_revenue
from analysis_utils import (
get_ltm_period_config, get_annual_data, calculate_annual_metrics,
get_millions_formatter, setup_revenue_chart, save_chart,
format_currency, print_annual_summary, sort_mixed_years,
apply_exclusion_filters
)
from config import (
DATA_FILE, OUTPUT_DIR, CHART_SIZES, ensure_directories,
get_data_path, REVENUE_COLUMN, COMPANY_NAME
)
# 2. LOAD DATA (ALWAYS use data_loader)
df = load_sales_data(get_data_path())
# 3. VALIDATE DATA STRUCTURE
is_valid, msg = validate_data_structure(df)
if not is_valid:
print(f"ERROR: {msg}")
return
# 4. APPLY EXCLUSION FILTERS (if configured)
df = apply_exclusion_filters(df)
# 5. SETUP LTM (if doing annual comparisons and LTM is enabled)
ltm_start, ltm_end = get_ltm_period_config()
# 6. DATA PREPARATION
# Convert columns, filter data, create derived columns
# 7. ANALYSIS LOGIC
# Use calculate_annual_metrics() for annual aggregations
# 8. VISUALIZATIONS
# Use setup_revenue_chart() and save_chart() from analysis_utils
# 9. VALIDATION
validate_revenue(df, "Analysis Name")
```
## Annual Aggregation Pattern
**RECOMMENDED:** Use `calculate_annual_metrics()` from `analysis_utils.py`:
```python
from analysis_utils import calculate_annual_metrics, get_ltm_period_config
from config import REVENUE_COLUMN
ltm_start, ltm_end = get_ltm_period_config()
def calculate_metrics(year_data):
"""Calculate metrics for a single year"""
return {
'Revenue': year_data[REVENUE_COLUMN].sum(),
# ... other metrics
}
annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end)
```
## Chart Formatting Pattern
**ALWAYS use this pattern for revenue charts:**
```python
from analysis_utils import setup_revenue_chart, save_chart
from config import CHART_SIZES
fig, ax = plt.subplots(figsize=CHART_SIZES['medium'])
# Divide data by 1e6 BEFORE plotting
ax.plot(data / 1e6, ...)
# OR
ax.bar(x, values / 1e6, ...)
# Apply formatter automatically
setup_revenue_chart(ax)
# Save chart
save_chart(fig, 'chart_name.png')
plt.close()
```
## Mixed Type Handling
When dealing with year columns that may contain mixed int/str types (e.g., "2025 (LTM 9/2025)"):
```python
from analysis_utils import sort_mixed_years
# Sort DataFrame by year
df_sorted = sort_mixed_years(df, year_col='Year')
# For chart labels
years = df_sorted['Year'].tolist()
x_pos = range(len(years))
ax.set_xticks(x_pos)
ax.set_xticklabels(years, rotation=45, ha='right')
```
## Price Calculation Pattern
```python
from analysis_utils import calculate_price_per_unit
from config import QUANTITY_COLUMN, REVENUE_COLUMN
# Calculate average price per unit (excludes outliers automatically)
price_per_unit = calculate_price_per_unit(df, QUANTITY_COLUMN, REVENUE_COLUMN)
```
## Exclusion Filters Pattern
If you need to exclude specific segments (e.g., test accounts, business units):
```python
from analysis_utils import apply_exclusion_filters
# Configure in config.py:
# EXCLUSION_FILTERS = {
# 'enabled': True,
# 'exclude_by_column': 'Country',
# 'exclude_values': ['KVT', 'Test']
# }
df = apply_exclusion_filters(df)
```
## Using Configuration Values
**ALWAYS use config values instead of hardcoding:**
```python
from config import (
REVENUE_COLUMN, # Use this instead of 'USD' or 'Amount'
CUSTOMER_COLUMN, # Use this instead of 'Customer'
DATE_COLUMN, # Use this instead of 'InvoiceDate'
COMPANY_NAME, # Use this for titles
ANALYSIS_YEARS, # Use this for year filtering
CHART_SIZES, # Use this for figure sizes
)
```