# Common Analysis Patterns ## ⭐ RECOMMENDED: Use Utilities **Always prefer `analysis_utils.py` and `config.py` over manual implementations:** - Consistent formatting - Fewer errors - Easier maintenance - Standardized output ## Standard Script Structure (Using Utilities) **RECOMMENDED:** Use `analysis_utils.py` and `config.py` for consistency: ```python # 1. IMPORTS import pandas as pd import numpy as np import matplotlib.pyplot as plt from data_loader import load_sales_data, validate_data_structure from validate_revenue import validate_revenue from analysis_utils import ( get_ltm_period_config, get_annual_data, calculate_annual_metrics, get_millions_formatter, setup_revenue_chart, save_chart, format_currency, print_annual_summary, sort_mixed_years, apply_exclusion_filters ) from config import ( DATA_FILE, OUTPUT_DIR, CHART_SIZES, ensure_directories, get_data_path, REVENUE_COLUMN, COMPANY_NAME ) # 2. LOAD DATA (ALWAYS use data_loader) df = load_sales_data(get_data_path()) # 3. VALIDATE DATA STRUCTURE is_valid, msg = validate_data_structure(df) if not is_valid: print(f"ERROR: {msg}") return # 4. APPLY EXCLUSION FILTERS (if configured) df = apply_exclusion_filters(df) # 5. SETUP LTM (if doing annual comparisons and LTM is enabled) ltm_start, ltm_end = get_ltm_period_config() # 6. DATA PREPARATION # Convert columns, filter data, create derived columns # 7. ANALYSIS LOGIC # Use calculate_annual_metrics() for annual aggregations # 8. VISUALIZATIONS # Use setup_revenue_chart() and save_chart() from analysis_utils # 9. VALIDATION validate_revenue(df, "Analysis Name") ``` ## Annual Aggregation Pattern **RECOMMENDED:** Use `calculate_annual_metrics()` from `analysis_utils.py`: ```python from analysis_utils import calculate_annual_metrics, get_ltm_period_config from config import REVENUE_COLUMN ltm_start, ltm_end = get_ltm_period_config() def calculate_metrics(year_data): """Calculate metrics for a single year""" return { 'Revenue': year_data[REVENUE_COLUMN].sum(), # ... other metrics } annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end) ``` ## Chart Formatting Pattern **ALWAYS use this pattern for revenue charts:** ```python from analysis_utils import setup_revenue_chart, save_chart from config import CHART_SIZES fig, ax = plt.subplots(figsize=CHART_SIZES['medium']) # Divide data by 1e6 BEFORE plotting ax.plot(data / 1e6, ...) # OR ax.bar(x, values / 1e6, ...) # Apply formatter automatically setup_revenue_chart(ax) # Save chart save_chart(fig, 'chart_name.png') plt.close() ``` ## Mixed Type Handling When dealing with year columns that may contain mixed int/str types (e.g., "2025 (LTM 9/2025)"): ```python from analysis_utils import sort_mixed_years # Sort DataFrame by year df_sorted = sort_mixed_years(df, year_col='Year') # For chart labels years = df_sorted['Year'].tolist() x_pos = range(len(years)) ax.set_xticks(x_pos) ax.set_xticklabels(years, rotation=45, ha='right') ``` ## Price Calculation Pattern ```python from analysis_utils import calculate_price_per_unit from config import QUANTITY_COLUMN, REVENUE_COLUMN # Calculate average price per unit (excludes outliers automatically) price_per_unit = calculate_price_per_unit(df, QUANTITY_COLUMN, REVENUE_COLUMN) ``` ## Exclusion Filters Pattern If you need to exclude specific segments (e.g., test accounts, business units): ```python from analysis_utils import apply_exclusion_filters # Configure in config.py: # EXCLUSION_FILTERS = { # 'enabled': True, # 'exclude_by_column': 'Country', # 'exclude_values': ['KVT', 'Test'] # } df = apply_exclusion_filters(df) ``` ## Using Configuration Values **ALWAYS use config values instead of hardcoding:** ```python from config import ( REVENUE_COLUMN, # Use this instead of 'USD' or 'Amount' CUSTOMER_COLUMN, # Use this instead of 'Customer' DATE_COLUMN, # Use this instead of 'InvoiceDate' COMPANY_NAME, # Use this for titles ANALYSIS_YEARS, # Use this for year filtering CHART_SIZES, # Use this for figure sizes ) ```