308 lines
9.2 KiB
Markdown
308 lines
9.2 KiB
Markdown
# Advanced Analysis Patterns
|
||
|
||
This document provides patterns for sophisticated, production-grade analyses that leverage the full capabilities of the template framework.
|
||
|
||
## ⭐ Using Cursor AI Effectively
|
||
|
||
When working in Cursor, you can ask the AI to:
|
||
- "Create a cohort analysis script using the template patterns"
|
||
- "Add statistical significance testing to this analysis"
|
||
- "Generate a multi-dimensional analysis with product, customer, and geography"
|
||
- "Create a forecasting analysis with confidence intervals"
|
||
|
||
The AI will automatically use these patterns and utilities.
|
||
|
||
## Advanced Analysis Types
|
||
|
||
### 1. Multi-Dimensional Analysis
|
||
|
||
**Pattern:** Analyze across multiple dimensions simultaneously (e.g., Product × Customer × Geography)
|
||
|
||
```python
|
||
from data_loader import load_sales_data
|
||
from analysis_utils import calculate_annual_metrics, get_ltm_period_config
|
||
from config import REVENUE_COLUMN, ITEM_COLUMN, CUSTOMER_COLUMN, REGION_COLUMN
|
||
|
||
df = load_sales_data(get_data_path())
|
||
|
||
# Multi-dimensional pivot
|
||
pivot = df.pivot_table(
|
||
index=[ITEM_COLUMN, CUSTOMER_COLUMN],
|
||
columns=REGION_COLUMN,
|
||
values=REVENUE_COLUMN,
|
||
aggfunc='sum',
|
||
fill_value=0
|
||
)
|
||
|
||
# Or use data_processing helper
|
||
from data_processing import create_pivot_table
|
||
pivot = create_pivot_table(
|
||
df,
|
||
index=[ITEM_COLUMN, CUSTOMER_COLUMN],
|
||
columns=REGION_COLUMN,
|
||
values=REVENUE_COLUMN
|
||
)
|
||
```
|
||
|
||
### 2. Cohort Analysis with Retention Metrics
|
||
|
||
**Pattern:** Track customer cohorts over time with retention and revenue metrics
|
||
|
||
```python
|
||
from examples.cohort_analysis import create_cohorts, calculate_cohort_metrics
|
||
|
||
df_cohort = create_cohorts(df)
|
||
cohort_metrics = calculate_cohort_metrics(df_cohort)
|
||
|
||
# Calculate Net Revenue Retention (NRR)
|
||
nrr = cohort_metrics.groupby('Cohort').agg({
|
||
'Revenue_Retention': lambda x: x.iloc[-1] if len(x) > 0 else 0
|
||
})
|
||
```
|
||
|
||
### 3. Statistical Significance Testing
|
||
|
||
**Pattern:** Compare segments with statistical tests
|
||
|
||
```python
|
||
from statistical_utils import test_statistical_significance
|
||
|
||
# Compare two groups
|
||
group1 = df[df['Segment'] == 'A'][REVENUE_COLUMN]
|
||
group2 = df[df['Segment'] == 'B'][REVENUE_COLUMN]
|
||
|
||
result = test_statistical_significance(group1, group2)
|
||
if result['significant']:
|
||
print(f"Significant difference (p={result['p_value']:.4f})")
|
||
```
|
||
|
||
### 4. Price-Volume-Mix (PVM) Decomposition
|
||
|
||
**Pattern:** Decompose revenue changes into price, volume, and mix effects
|
||
|
||
```python
|
||
from config import QUANTITY_COLUMN, REVENUE_COLUMN
|
||
|
||
def pvm_decomposition(df_base, df_current):
|
||
"""Decompose revenue change into price, volume, mix effects"""
|
||
base_price = df_base[REVENUE_COLUMN].sum() / df_base[QUANTITY_COLUMN].sum()
|
||
current_price = df_current[REVENUE_COLUMN].sum() / df_current[QUANTITY_COLUMN].sum()
|
||
|
||
base_volume = df_base[QUANTITY_COLUMN].sum()
|
||
current_volume = df_current[QUANTITY_COLUMN].sum()
|
||
|
||
# Price effect
|
||
price_effect = (current_price - base_price) * base_volume
|
||
|
||
# Volume effect
|
||
volume_effect = (current_volume - base_volume) * base_price
|
||
|
||
# Mix effect (residual)
|
||
total_change = df_current[REVENUE_COLUMN].sum() - df_base[REVENUE_COLUMN].sum()
|
||
mix_effect = total_change - price_effect - volume_effect
|
||
|
||
return {
|
||
'price_effect': price_effect,
|
||
'volume_effect': volume_effect,
|
||
'mix_effect': mix_effect,
|
||
'total_change': total_change
|
||
}
|
||
```
|
||
|
||
### 5. Time Series Forecasting
|
||
|
||
**Pattern:** Forecast future revenue with confidence intervals
|
||
|
||
```python
|
||
from data_processing import prepare_time_series
|
||
from statistical_utils import calculate_confidence_interval
|
||
|
||
# Prepare time series
|
||
ts = prepare_time_series(df, freq='M')
|
||
|
||
# Simple forecast (extend trend)
|
||
from scipy import stats
|
||
x = np.arange(len(ts))
|
||
slope, intercept, r_value, p_value, std_err = stats.linregress(x, ts.values)
|
||
|
||
# Forecast next 12 months
|
||
future_x = np.arange(len(ts), len(ts) + 12)
|
||
forecast = slope * future_x + intercept
|
||
|
||
# Calculate confidence intervals
|
||
ci = calculate_confidence_interval(ts, confidence=0.95)
|
||
```
|
||
|
||
### 6. Customer Lifetime Value (CLV) Analysis
|
||
|
||
**Pattern:** Calculate CLV using historical data
|
||
|
||
```python
|
||
from config import CUSTOMER_COLUMN, REVENUE_COLUMN, DATE_COLUMN
|
||
|
||
def calculate_clv(df, years=3):
|
||
"""Calculate customer lifetime value"""
|
||
customer_metrics = df.groupby(CUSTOMER_COLUMN).agg({
|
||
REVENUE_COLUMN: 'sum',
|
||
DATE_COLUMN: ['min', 'max', 'count']
|
||
}).reset_index()
|
||
|
||
customer_metrics.columns = [CUSTOMER_COLUMN, 'Total_Revenue', 'First_Purchase', 'Last_Purchase', 'Order_Count']
|
||
|
||
# Calculate customer age (years)
|
||
customer_metrics['Customer_Age_Years'] = (
|
||
(customer_metrics['Last_Purchase'] - customer_metrics['First_Purchase']).dt.days / 365.25
|
||
)
|
||
|
||
# Annual revenue
|
||
customer_metrics['Annual_Revenue'] = customer_metrics['Total_Revenue'] / customer_metrics['Customer_Age_Years'].replace(0, 1)
|
||
|
||
# Projected CLV
|
||
customer_metrics['CLV'] = customer_metrics['Annual_Revenue'] * years
|
||
|
||
return customer_metrics
|
||
```
|
||
|
||
### 7. Market Basket Analysis
|
||
|
||
**Pattern:** Find product associations and cross-sell opportunities
|
||
|
||
```python
|
||
from mlxtend.frequent_patterns import apriori, association_rules
|
||
from mlxtend.preprocessing import TransactionEncoder
|
||
|
||
# Prepare transaction data
|
||
transactions = df.groupby(INVOICE_NUMBER_COLUMN)[ITEM_COLUMN].apply(list).tolist()
|
||
|
||
# Encode transactions
|
||
te = TransactionEncoder()
|
||
te_ary = te.fit(transactions).transform(transactions)
|
||
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
|
||
|
||
# Find frequent itemsets
|
||
frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True)
|
||
|
||
# Generate association rules
|
||
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
|
||
```
|
||
|
||
### 8. Segmentation with Machine Learning
|
||
|
||
**Pattern:** Advanced customer segmentation using clustering
|
||
|
||
```python
|
||
from sklearn.cluster import KMeans
|
||
from sklearn.preprocessing import StandardScaler
|
||
|
||
# Prepare features
|
||
features = df.groupby(CUSTOMER_COLUMN).agg({
|
||
REVENUE_COLUMN: ['sum', 'mean', 'count'],
|
||
DATE_COLUMN: lambda x: (x.max() - x.min()).days
|
||
}).reset_index()
|
||
features.columns = [CUSTOMER_COLUMN, 'Total_Revenue', 'Avg_Order', 'Order_Count', 'Customer_Tenure']
|
||
|
||
# Scale features
|
||
scaler = StandardScaler()
|
||
features_scaled = scaler.fit_transform(features[['Total_Revenue', 'Avg_Order', 'Order_Count', 'Customer_Tenure']])
|
||
|
||
# Cluster
|
||
kmeans = KMeans(n_clusters=5, random_state=42)
|
||
features['Segment'] = kmeans.fit_predict(features_scaled)
|
||
```
|
||
|
||
### 9. Anomaly Detection
|
||
|
||
**Pattern:** Identify unusual patterns in data
|
||
|
||
```python
|
||
from statistical_utils import calculate_z_score
|
||
|
||
# Calculate z-scores for revenue
|
||
mean_revenue = df[REVENUE_COLUMN].mean()
|
||
std_revenue = df[REVENUE_COLUMN].std()
|
||
|
||
df['Revenue_Z_Score'] = df[REVENUE_COLUMN].apply(
|
||
lambda x: calculate_z_score(x, mean_revenue, std_revenue)
|
||
)
|
||
|
||
# Flag anomalies (|z| > 3)
|
||
df['Is_Anomaly'] = df['Revenue_Z_Score'].abs() > 3
|
||
```
|
||
|
||
### 10. Competitive Analysis Framework
|
||
|
||
**Pattern:** Compare performance across dimensions
|
||
|
||
```python
|
||
from statistical_utils import calculate_yoy_growth, calculate_cagr
|
||
|
||
def competitive_analysis(df, dimension_col):
|
||
"""Compare performance across dimension (e.g., products, regions)"""
|
||
analysis = df.groupby(dimension_col).agg({
|
||
REVENUE_COLUMN: ['sum', 'mean', 'count']
|
||
}).reset_index()
|
||
analysis.columns = [dimension_col, 'Total_Revenue', 'Avg_Order', 'Order_Count']
|
||
|
||
# Calculate growth rates
|
||
for year in sorted(df['Year'].unique())[1:]:
|
||
prev_year = year - 1
|
||
current = df[df['Year'] == year].groupby(dimension_col)[REVENUE_COLUMN].sum()
|
||
previous = df[df['Year'] == prev_year].groupby(dimension_col)[REVENUE_COLUMN].sum()
|
||
|
||
growth = calculate_yoy_growth(current, previous)
|
||
analysis[f'Growth_{year}'] = growth
|
||
|
||
return analysis
|
||
```
|
||
|
||
## Best Practices for Advanced Analyses
|
||
|
||
1. **Always validate data quality first:**
|
||
```python
|
||
from data_quality import generate_data_quality_report
|
||
report = generate_data_quality_report(df)
|
||
```
|
||
|
||
2. **Use logging for complex analyses:**
|
||
```python
|
||
from logger_config import get_logger
|
||
logger = get_logger('advanced_analysis')
|
||
logger.info("Starting complex analysis...")
|
||
```
|
||
|
||
3. **Export intermediate results:**
|
||
```python
|
||
from export_utils import export_to_excel
|
||
export_to_excel(intermediate_df, 'intermediate_results.xlsx')
|
||
```
|
||
|
||
4. **Generate comprehensive reports:**
|
||
```python
|
||
from report_generator import generate_pdf_report
|
||
generate_pdf_report(charts=['chart1.png', 'chart2.png'], summary_data=summary)
|
||
```
|
||
|
||
5. **Test statistical significance:**
|
||
```python
|
||
from statistical_utils import test_statistical_significance
|
||
# Always test before making conclusions
|
||
```
|
||
|
||
## Cursor AI Prompts for Advanced Analyses
|
||
|
||
When using Cursor, try these prompts:
|
||
|
||
- **"Create a cohort retention analysis with heatmaps"**
|
||
- **"Build a price-volume-mix decomposition analysis"**
|
||
- **"Generate a customer lifetime value analysis with segmentation"**
|
||
- **"Create a forecasting model with confidence intervals"**
|
||
- **"Build a multi-dimensional analysis across product, customer, and geography"**
|
||
- **"Create an anomaly detection analysis for unusual transactions"**
|
||
|
||
The AI will automatically use these patterns and the template utilities.
|
||
|
||
---
|
||
|
||
**Last Updated:** January 2026
|
||
**For:** Advanced users and AI-assisted development
|