162 lines
4.1 KiB
Markdown
162 lines
4.1 KiB
Markdown
# Common Analysis Patterns
|
|
|
|
## ⭐ RECOMMENDED: Use Utilities
|
|
|
|
**Always prefer `analysis_utils.py` and `config.py` over manual implementations:**
|
|
- Consistent formatting
|
|
- Fewer errors
|
|
- Easier maintenance
|
|
- Standardized output
|
|
|
|
## Standard Script Structure (Using Utilities)
|
|
|
|
**RECOMMENDED:** Use `analysis_utils.py` and `config.py` for consistency:
|
|
|
|
```python
|
|
# 1. IMPORTS
|
|
import pandas as pd
|
|
import numpy as np
|
|
import matplotlib.pyplot as plt
|
|
from data_loader import load_sales_data, validate_data_structure
|
|
from validate_revenue import validate_revenue
|
|
from analysis_utils import (
|
|
get_ltm_period_config, get_annual_data, calculate_annual_metrics,
|
|
get_millions_formatter, setup_revenue_chart, save_chart,
|
|
format_currency, print_annual_summary, sort_mixed_years,
|
|
apply_exclusion_filters
|
|
)
|
|
from config import (
|
|
DATA_FILE, OUTPUT_DIR, CHART_SIZES, ensure_directories,
|
|
get_data_path, REVENUE_COLUMN, COMPANY_NAME
|
|
)
|
|
|
|
# 2. LOAD DATA (ALWAYS use data_loader)
|
|
df = load_sales_data(get_data_path())
|
|
|
|
# 3. VALIDATE DATA STRUCTURE
|
|
is_valid, msg = validate_data_structure(df)
|
|
if not is_valid:
|
|
print(f"ERROR: {msg}")
|
|
return
|
|
|
|
# 4. APPLY EXCLUSION FILTERS (if configured)
|
|
df = apply_exclusion_filters(df)
|
|
|
|
# 5. SETUP LTM (if doing annual comparisons and LTM is enabled)
|
|
ltm_start, ltm_end = get_ltm_period_config()
|
|
|
|
# 6. DATA PREPARATION
|
|
# Convert columns, filter data, create derived columns
|
|
|
|
# 7. ANALYSIS LOGIC
|
|
# Use calculate_annual_metrics() for annual aggregations
|
|
|
|
# 8. VISUALIZATIONS
|
|
# Use setup_revenue_chart() and save_chart() from analysis_utils
|
|
|
|
# 9. VALIDATION
|
|
validate_revenue(df, "Analysis Name")
|
|
```
|
|
|
|
## Annual Aggregation Pattern
|
|
|
|
**RECOMMENDED:** Use `calculate_annual_metrics()` from `analysis_utils.py`:
|
|
|
|
```python
|
|
from analysis_utils import calculate_annual_metrics, get_ltm_period_config
|
|
from config import REVENUE_COLUMN
|
|
|
|
ltm_start, ltm_end = get_ltm_period_config()
|
|
|
|
def calculate_metrics(year_data):
|
|
"""Calculate metrics for a single year"""
|
|
return {
|
|
'Revenue': year_data[REVENUE_COLUMN].sum(),
|
|
# ... other metrics
|
|
}
|
|
|
|
annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end)
|
|
```
|
|
|
|
## Chart Formatting Pattern
|
|
|
|
**ALWAYS use this pattern for revenue charts:**
|
|
|
|
```python
|
|
from analysis_utils import setup_revenue_chart, save_chart
|
|
from config import CHART_SIZES
|
|
|
|
fig, ax = plt.subplots(figsize=CHART_SIZES['medium'])
|
|
|
|
# Divide data by 1e6 BEFORE plotting
|
|
ax.plot(data / 1e6, ...)
|
|
# OR
|
|
ax.bar(x, values / 1e6, ...)
|
|
|
|
# Apply formatter automatically
|
|
setup_revenue_chart(ax)
|
|
|
|
# Save chart
|
|
save_chart(fig, 'chart_name.png')
|
|
plt.close()
|
|
```
|
|
|
|
## Mixed Type Handling
|
|
|
|
When dealing with year columns that may contain mixed int/str types (e.g., "2025 (LTM 9/2025)"):
|
|
|
|
```python
|
|
from analysis_utils import sort_mixed_years
|
|
|
|
# Sort DataFrame by year
|
|
df_sorted = sort_mixed_years(df, year_col='Year')
|
|
|
|
# For chart labels
|
|
years = df_sorted['Year'].tolist()
|
|
x_pos = range(len(years))
|
|
ax.set_xticks(x_pos)
|
|
ax.set_xticklabels(years, rotation=45, ha='right')
|
|
```
|
|
|
|
## Price Calculation Pattern
|
|
|
|
```python
|
|
from analysis_utils import calculate_price_per_unit
|
|
from config import QUANTITY_COLUMN, REVENUE_COLUMN
|
|
|
|
# Calculate average price per unit (excludes outliers automatically)
|
|
price_per_unit = calculate_price_per_unit(df, QUANTITY_COLUMN, REVENUE_COLUMN)
|
|
```
|
|
|
|
## Exclusion Filters Pattern
|
|
|
|
If you need to exclude specific segments (e.g., test accounts, business units):
|
|
|
|
```python
|
|
from analysis_utils import apply_exclusion_filters
|
|
|
|
# Configure in config.py:
|
|
# EXCLUSION_FILTERS = {
|
|
# 'enabled': True,
|
|
# 'exclude_by_column': 'Country',
|
|
# 'exclude_values': ['KVT', 'Test']
|
|
# }
|
|
|
|
df = apply_exclusion_filters(df)
|
|
```
|
|
|
|
## Using Configuration Values
|
|
|
|
**ALWAYS use config values instead of hardcoding:**
|
|
|
|
```python
|
|
from config import (
|
|
REVENUE_COLUMN, # Use this instead of 'USD' or 'Amount'
|
|
CUSTOMER_COLUMN, # Use this instead of 'Customer'
|
|
DATE_COLUMN, # Use this instead of 'InvoiceDate'
|
|
COMPANY_NAME, # Use this for titles
|
|
ANALYSIS_YEARS, # Use this for year filtering
|
|
CHART_SIZES, # Use this for figure sizes
|
|
)
|
|
```
|