# Advanced Analysis Patterns This document provides patterns for sophisticated, production-grade analyses that leverage the full capabilities of the template framework. ## ⭐ Using Cursor AI Effectively When working in Cursor, you can ask the AI to: - "Create a cohort analysis script using the template patterns" - "Add statistical significance testing to this analysis" - "Generate a multi-dimensional analysis with product, customer, and geography" - "Create a forecasting analysis with confidence intervals" The AI will automatically use these patterns and utilities. ## Advanced Analysis Types ### 1. Multi-Dimensional Analysis **Pattern:** Analyze across multiple dimensions simultaneously (e.g., Product × Customer × Geography) ```python from data_loader import load_sales_data from analysis_utils import calculate_annual_metrics, get_ltm_period_config from config import REVENUE_COLUMN, ITEM_COLUMN, CUSTOMER_COLUMN, REGION_COLUMN df = load_sales_data(get_data_path()) # Multi-dimensional pivot pivot = df.pivot_table( index=[ITEM_COLUMN, CUSTOMER_COLUMN], columns=REGION_COLUMN, values=REVENUE_COLUMN, aggfunc='sum', fill_value=0 ) # Or use data_processing helper from data_processing import create_pivot_table pivot = create_pivot_table( df, index=[ITEM_COLUMN, CUSTOMER_COLUMN], columns=REGION_COLUMN, values=REVENUE_COLUMN ) ``` ### 2. Cohort Analysis with Retention Metrics **Pattern:** Track customer cohorts over time with retention and revenue metrics ```python from examples.cohort_analysis import create_cohorts, calculate_cohort_metrics df_cohort = create_cohorts(df) cohort_metrics = calculate_cohort_metrics(df_cohort) # Calculate Net Revenue Retention (NRR) nrr = cohort_metrics.groupby('Cohort').agg({ 'Revenue_Retention': lambda x: x.iloc[-1] if len(x) > 0 else 0 }) ``` ### 3. Statistical Significance Testing **Pattern:** Compare segments with statistical tests ```python from statistical_utils import test_statistical_significance # Compare two groups group1 = df[df['Segment'] == 'A'][REVENUE_COLUMN] group2 = df[df['Segment'] == 'B'][REVENUE_COLUMN] result = test_statistical_significance(group1, group2) if result['significant']: print(f"Significant difference (p={result['p_value']:.4f})") ``` ### 4. Price-Volume-Mix (PVM) Decomposition **Pattern:** Decompose revenue changes into price, volume, and mix effects ```python from config import QUANTITY_COLUMN, REVENUE_COLUMN def pvm_decomposition(df_base, df_current): """Decompose revenue change into price, volume, mix effects""" base_price = df_base[REVENUE_COLUMN].sum() / df_base[QUANTITY_COLUMN].sum() current_price = df_current[REVENUE_COLUMN].sum() / df_current[QUANTITY_COLUMN].sum() base_volume = df_base[QUANTITY_COLUMN].sum() current_volume = df_current[QUANTITY_COLUMN].sum() # Price effect price_effect = (current_price - base_price) * base_volume # Volume effect volume_effect = (current_volume - base_volume) * base_price # Mix effect (residual) total_change = df_current[REVENUE_COLUMN].sum() - df_base[REVENUE_COLUMN].sum() mix_effect = total_change - price_effect - volume_effect return { 'price_effect': price_effect, 'volume_effect': volume_effect, 'mix_effect': mix_effect, 'total_change': total_change } ``` ### 5. Time Series Forecasting **Pattern:** Forecast future revenue with confidence intervals ```python from data_processing import prepare_time_series from statistical_utils import calculate_confidence_interval # Prepare time series ts = prepare_time_series(df, freq='M') # Simple forecast (extend trend) from scipy import stats x = np.arange(len(ts)) slope, intercept, r_value, p_value, std_err = stats.linregress(x, ts.values) # Forecast next 12 months future_x = np.arange(len(ts), len(ts) + 12) forecast = slope * future_x + intercept # Calculate confidence intervals ci = calculate_confidence_interval(ts, confidence=0.95) ``` ### 6. Customer Lifetime Value (CLV) Analysis **Pattern:** Calculate CLV using historical data ```python from config import CUSTOMER_COLUMN, REVENUE_COLUMN, DATE_COLUMN def calculate_clv(df, years=3): """Calculate customer lifetime value""" customer_metrics = df.groupby(CUSTOMER_COLUMN).agg({ REVENUE_COLUMN: 'sum', DATE_COLUMN: ['min', 'max', 'count'] }).reset_index() customer_metrics.columns = [CUSTOMER_COLUMN, 'Total_Revenue', 'First_Purchase', 'Last_Purchase', 'Order_Count'] # Calculate customer age (years) customer_metrics['Customer_Age_Years'] = ( (customer_metrics['Last_Purchase'] - customer_metrics['First_Purchase']).dt.days / 365.25 ) # Annual revenue customer_metrics['Annual_Revenue'] = customer_metrics['Total_Revenue'] / customer_metrics['Customer_Age_Years'].replace(0, 1) # Projected CLV customer_metrics['CLV'] = customer_metrics['Annual_Revenue'] * years return customer_metrics ``` ### 7. Market Basket Analysis **Pattern:** Find product associations and cross-sell opportunities ```python from mlxtend.frequent_patterns import apriori, association_rules from mlxtend.preprocessing import TransactionEncoder # Prepare transaction data transactions = df.groupby(INVOICE_NUMBER_COLUMN)[ITEM_COLUMN].apply(list).tolist() # Encode transactions te = TransactionEncoder() te_ary = te.fit(transactions).transform(transactions) df_encoded = pd.DataFrame(te_ary, columns=te.columns_) # Find frequent itemsets frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True) # Generate association rules rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5) ``` ### 8. Segmentation with Machine Learning **Pattern:** Advanced customer segmentation using clustering ```python from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler # Prepare features features = df.groupby(CUSTOMER_COLUMN).agg({ REVENUE_COLUMN: ['sum', 'mean', 'count'], DATE_COLUMN: lambda x: (x.max() - x.min()).days }).reset_index() features.columns = [CUSTOMER_COLUMN, 'Total_Revenue', 'Avg_Order', 'Order_Count', 'Customer_Tenure'] # Scale features scaler = StandardScaler() features_scaled = scaler.fit_transform(features[['Total_Revenue', 'Avg_Order', 'Order_Count', 'Customer_Tenure']]) # Cluster kmeans = KMeans(n_clusters=5, random_state=42) features['Segment'] = kmeans.fit_predict(features_scaled) ``` ### 9. Anomaly Detection **Pattern:** Identify unusual patterns in data ```python from statistical_utils import calculate_z_score # Calculate z-scores for revenue mean_revenue = df[REVENUE_COLUMN].mean() std_revenue = df[REVENUE_COLUMN].std() df['Revenue_Z_Score'] = df[REVENUE_COLUMN].apply( lambda x: calculate_z_score(x, mean_revenue, std_revenue) ) # Flag anomalies (|z| > 3) df['Is_Anomaly'] = df['Revenue_Z_Score'].abs() > 3 ``` ### 10. Competitive Analysis Framework **Pattern:** Compare performance across dimensions ```python from statistical_utils import calculate_yoy_growth, calculate_cagr def competitive_analysis(df, dimension_col): """Compare performance across dimension (e.g., products, regions)""" analysis = df.groupby(dimension_col).agg({ REVENUE_COLUMN: ['sum', 'mean', 'count'] }).reset_index() analysis.columns = [dimension_col, 'Total_Revenue', 'Avg_Order', 'Order_Count'] # Calculate growth rates for year in sorted(df['Year'].unique())[1:]: prev_year = year - 1 current = df[df['Year'] == year].groupby(dimension_col)[REVENUE_COLUMN].sum() previous = df[df['Year'] == prev_year].groupby(dimension_col)[REVENUE_COLUMN].sum() growth = calculate_yoy_growth(current, previous) analysis[f'Growth_{year}'] = growth return analysis ``` ## Best Practices for Advanced Analyses 1. **Always validate data quality first:** ```python from data_quality import generate_data_quality_report report = generate_data_quality_report(df) ``` 2. **Use logging for complex analyses:** ```python from logger_config import get_logger logger = get_logger('advanced_analysis') logger.info("Starting complex analysis...") ``` 3. **Export intermediate results:** ```python from export_utils import export_to_excel export_to_excel(intermediate_df, 'intermediate_results.xlsx') ``` 4. **Generate comprehensive reports:** ```python from report_generator import generate_pdf_report generate_pdf_report(charts=['chart1.png', 'chart2.png'], summary_data=summary) ``` 5. **Test statistical significance:** ```python from statistical_utils import test_statistical_significance # Always test before making conclusions ``` ## Cursor AI Prompts for Advanced Analyses When using Cursor, try these prompts: - **"Create a cohort retention analysis with heatmaps"** - **"Build a price-volume-mix decomposition analysis"** - **"Generate a customer lifetime value analysis with segmentation"** - **"Create a forecasting model with confidence intervals"** - **"Build a multi-dimensional analysis across product, customer, and geography"** - **"Create an anomaly detection analysis for unusual transactions"** The AI will automatically use these patterns and the template utilities. --- **Last Updated:** January 2026 **For:** Advanced users and AI-assisted development