590 lines
16 KiB
Markdown
590 lines
16 KiB
Markdown
# Sales Analysis Template
|
|
|
|
**A best-in-class, reusable template for sales invoice detail analysis**
|
|
|
|
**Optimized for Cursor AI** - Just ask the AI to create analyses and it handles everything automatically.
|
|
|
|
This template provides a complete framework for analyzing sales data from any company. It's designed to be:
|
|
- **Flexible:** Works with different column names, date formats, and data structures
|
|
- **Automated:** Interactive setup wizard configures everything for your company
|
|
- **AI-Optimized:** Fully optimized for Cursor - AI knows all patterns and generates code automatically
|
|
- **Production-Ready:** Includes error handling, validation, and best practices
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### 1. Setup (Automated)
|
|
|
|
Run the interactive setup wizard:
|
|
|
|
```bash
|
|
python setup_wizard.py
|
|
```
|
|
|
|
The wizard will ask you about:
|
|
- Company name and analysis date
|
|
- Data file location
|
|
- Column names in your CSV
|
|
- Date range and LTM configuration
|
|
- Exclusion filters (if needed)
|
|
|
|
### 2. Manual Setup (Alternative)
|
|
|
|
If you prefer to configure manually:
|
|
|
|
1. **Update `config.py`** with your company-specific settings:
|
|
- `COMPANY_NAME`: Your company name
|
|
- `DATA_FILE`: Your CSV filename
|
|
- `REVENUE_COLUMN`: Your revenue/amount column name
|
|
- `DATE_COLUMN`: Your primary date column
|
|
- Column mappings for Customer, Item, etc.
|
|
- Date range and LTM settings
|
|
|
|
2. **Place your data file** in the template directory (or update `DATA_DIR` in config.py)
|
|
|
|
### 3. Test Data Loading
|
|
|
|
Verify your configuration works:
|
|
|
|
```bash
|
|
python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'Loaded {len(df):,} rows')"
|
|
```
|
|
|
|
### 4. Create Your First Analysis
|
|
|
|
Copy the template and customize:
|
|
|
|
```bash
|
|
cp analysis_template.py my_first_analysis.py
|
|
# Edit my_first_analysis.py with your analysis logic
|
|
python my_first_analysis.py
|
|
```
|
|
|
|
---
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
sales_analysis_template/
|
|
├── README.md # This file
|
|
├── QUICK_START.md # Quick start guide
|
|
├── TEMPLATE_OVERVIEW.md # High-level overview
|
|
├── TEMPLATE_SUMMARY.md # Comprehensive template summary
|
|
├── EXAMPLES.md # Example scripts guide
|
|
├── SETUP_CHECKLIST.md # Setup verification checklist
|
|
├── requirements.txt # Python dependencies
|
|
├── setup_wizard.py # Interactive setup wizard
|
|
│
|
|
├── config.py # ⭐ Configuration (customize for your company)
|
|
├── config_validator.py # Configuration validation utility
|
|
│
|
|
├── data_loader.py # ⭐ Data loading with fallback logic
|
|
├── data_quality.py # Data quality reporting
|
|
├── data_processing.py # Data transformation utilities
|
|
│
|
|
├── analysis_utils.py # ⭐ Common utilities (formatters, LTM, helpers)
|
|
├── statistical_utils.py # Statistical analysis utilities
|
|
├── validate_revenue.py # Revenue validation utility
|
|
│
|
|
├── export_utils.py # Export to CSV/Excel
|
|
├── report_generator.py # PDF report generation
|
|
├── logger_config.py # Logging configuration
|
|
│
|
|
├── analysis_template.py # Template for creating new analyses
|
|
├── run_all_analyses.py # Batch runner for all scripts
|
|
├── generate_sample_data.py # Generate sample data for testing
|
|
│
|
|
├── examples/ # Example analysis scripts
|
|
│ ├── annual_revenue_trend.py # Simple annual revenue analysis
|
|
│ ├── customer_segmentation.py # RFM customer segmentation
|
|
│ ├── cohort_analysis.py # Customer cohort analysis
|
|
│ └── product_performance.py # Product performance analysis
|
|
│
|
|
├── tests/ # Unit tests
|
|
│ ├── test_data_loader.py # Data loader tests
|
|
│ ├── test_analysis_utils.py # Analysis utils tests
|
|
│ └── test_config_validator.py # Config validator tests
|
|
│
|
|
└── .cursor/
|
|
└── rules/ # Cursor IDE rules (auto-loaded)
|
|
├── ai_assistant_guide.md # Complete AI assistant guide
|
|
├── advanced_analysis_patterns.md # Advanced techniques
|
|
├── analysis_patterns.md # Common analysis patterns
|
|
├── chart_formatting.md # Chart formatting rules
|
|
├── code_quality.md # Code quality standards
|
|
├── common_errors.md # Error troubleshooting
|
|
├── data_loading.md # Data loading patterns
|
|
├── error_handling.md # Error handling patterns
|
|
└── ltm_methodology.md # LTM methodology
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Configuration Guide
|
|
|
|
### Required Configuration
|
|
|
|
**In `config.py`, you MUST configure:**
|
|
|
|
1. **Company Information:**
|
|
```python
|
|
COMPANY_NAME = "Your Company Name"
|
|
```
|
|
|
|
2. **Data File:**
|
|
```python
|
|
DATA_FILE = 'your_sales_data.csv'
|
|
```
|
|
|
|
3. **Column Mappings:**
|
|
```python
|
|
REVENUE_COLUMN = 'USD' # Your revenue column name
|
|
DATE_COLUMN = 'InvoiceDate' # Your date column name
|
|
CUSTOMER_COLUMN = 'Customer' # Your customer column name
|
|
```
|
|
|
|
4. **Date Range:**
|
|
```python
|
|
MIN_YEAR = 2021
|
|
MAX_DATE = pd.Timestamp('2025-09-30')
|
|
ANALYSIS_YEARS = [2021, 2022, 2023, 2024, 2025]
|
|
```
|
|
|
|
### Optional Configuration
|
|
|
|
**LTM (Last Twelve Months):**
|
|
```python
|
|
LTM_ENABLED = True # Set to False if all years are complete
|
|
LTM_START_MONTH = 10
|
|
LTM_START_YEAR = 2024
|
|
LTM_END_MONTH = 9
|
|
LTM_END_YEAR = 2025
|
|
```
|
|
|
|
**Exclusion Filters:**
|
|
```python
|
|
EXCLUSION_FILTERS = {
|
|
'enabled': True,
|
|
'exclude_by_column': 'Country',
|
|
'exclude_values': ['Test', 'KVT']
|
|
}
|
|
```
|
|
|
|
**See `config.py` for all available options and detailed comments.**
|
|
|
|
---
|
|
|
|
## 📊 Data Requirements
|
|
|
|
### Required Columns
|
|
|
|
Your CSV file must have:
|
|
- **Revenue column:** A numeric column with sales amounts (configured as `REVENUE_COLUMN`)
|
|
- **Date column:** At least one date column (configured as `DATE_COLUMN`)
|
|
|
|
### Recommended Columns
|
|
|
|
For full analysis capabilities, include:
|
|
- **Customer/Account:** For customer segmentation and analysis
|
|
- **Item/Product:** For product analysis
|
|
- **Quantity:** For price calculations
|
|
- **Geographic:** Region, Country for geographic analysis
|
|
- **Segments:** Technology, EndMarket, ProductGroup for segmentation
|
|
|
|
### Date Column Fallback
|
|
|
|
The data loader supports fallback logic:
|
|
1. **Primary:** Uses `DATE_COLUMN` (e.g., InvoiceDate)
|
|
2. **Fallback 1:** Uses columns in `DATE_FALLBACK_COLUMNS` (e.g., Month, Year)
|
|
3. **Fallback 2:** Constructs from Year column if available
|
|
|
|
This ensures maximum date coverage even if some rows have missing dates.
|
|
|
|
---
|
|
|
|
## 💻 Creating Analysis Scripts
|
|
|
|
### Using the Template
|
|
|
|
1. **Copy the template:**
|
|
```bash
|
|
cp analysis_template.py my_analysis.py
|
|
```
|
|
|
|
2. **Update configuration:**
|
|
```python
|
|
ANALYSIS_NAME = "My Analysis"
|
|
DESCRIPTION = "Description of what this analysis does"
|
|
```
|
|
|
|
3. **Implement your logic:**
|
|
- Use `calculate_annual_metrics()` for annual aggregations
|
|
- Use `setup_revenue_chart()` and `save_chart()` for visualizations
|
|
- Follow patterns from `.cursor/rules/analysis_patterns.md`
|
|
|
|
4. **Run your analysis:**
|
|
```bash
|
|
python my_analysis.py
|
|
```
|
|
|
|
### Standard Pattern
|
|
|
|
```python
|
|
from data_loader import load_sales_data, validate_data_structure
|
|
from analysis_utils import (
|
|
get_ltm_period_config, calculate_annual_metrics,
|
|
setup_revenue_chart, save_chart, apply_exclusion_filters
|
|
)
|
|
from config import get_data_path, REVENUE_COLUMN, CHART_SIZES
|
|
|
|
# Load and validate
|
|
df = load_sales_data(get_data_path())
|
|
is_valid, msg = validate_data_structure(df)
|
|
if not is_valid:
|
|
print(f"ERROR: {msg}")
|
|
return
|
|
|
|
# Apply filters
|
|
df = apply_exclusion_filters(df)
|
|
|
|
# Calculate metrics
|
|
ltm_start, ltm_end = get_ltm_period_config()
|
|
annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end)
|
|
|
|
# Create charts
|
|
fig, ax = plt.subplots(figsize=CHART_SIZES['medium'])
|
|
ax.plot(data / 1e6, ...)
|
|
setup_revenue_chart(ax)
|
|
save_chart(fig, 'chart.png')
|
|
```
|
|
|
|
---
|
|
|
|
## 🎯 Key Features
|
|
|
|
### 1. Flexible Data Loading
|
|
|
|
- Handles different column names via configuration
|
|
- Fallback logic for date parsing (100% coverage)
|
|
- Automatic validation and error reporting
|
|
|
|
### 2. LTM (Last Twelve Months) Support
|
|
|
|
- Automatic LTM calculation for partial years
|
|
- Apples-to-apples comparison with full calendar years
|
|
- Configurable LTM periods
|
|
|
|
### 3. Standardized Chart Formatting
|
|
|
|
- Automatic millions formatter for revenue charts
|
|
- Consistent styling and sizing
|
|
- Professional output ready for reports
|
|
- Optional interactive charts with Plotly
|
|
|
|
### 4. Exclusion Filters
|
|
|
|
- Easy configuration for excluding segments
|
|
- Useful for excluding test accounts, business units, etc.
|
|
|
|
### 5. Revenue Validation
|
|
|
|
- Automatic validation after each analysis
|
|
- Ensures data loading is working correctly
|
|
- Optional validation against expected values
|
|
|
|
### 6. Example Scripts
|
|
|
|
- Working examples for common analyses
|
|
- Demonstrates best practices
|
|
- Easy to customize and extend
|
|
|
|
### 7. Data Export
|
|
|
|
- Export results to CSV and Excel
|
|
- Formatted summary tables
|
|
- Multiple sheet support
|
|
|
|
### 8. Data Quality Reporting
|
|
|
|
- Comprehensive data quality checks
|
|
- Missing value analysis
|
|
- Outlier detection
|
|
- Data profiling
|
|
|
|
### 9. Configuration Validation
|
|
|
|
- Early error detection
|
|
- Validates column mappings
|
|
- Checks date ranges and LTM configuration
|
|
|
|
### 10. Statistical Utilities
|
|
|
|
- Year-over-year growth calculations
|
|
- CAGR (Compound Annual Growth Rate)
|
|
- Correlation analysis
|
|
- Statistical significance testing
|
|
|
|
### 11. Report Generation
|
|
|
|
- Combine multiple charts into PDF reports
|
|
- Professional formatting
|
|
- Summary tables and metadata
|
|
|
|
### 12. Logging Infrastructure
|
|
|
|
- Structured logging with file and console output
|
|
- Analysis execution tracking
|
|
- Configurable log levels
|
|
|
|
---
|
|
|
|
## 📚 Documentation
|
|
|
|
### For AI Agents (Cursor IDE)
|
|
|
|
The `.cursor/rules/` directory contains comprehensive rules that are automatically loaded by Cursor:
|
|
|
|
- **`ai_assistant_guide.md`:** Complete guide with ready-to-use prompts
|
|
- **`advanced_analysis_patterns.md`:** Advanced techniques (cohort, PVM, forecasting, etc.)
|
|
- **`analysis_patterns.md`:** Standard patterns for creating analyses
|
|
- **`data_loading.md`:** Always use `data_loader.py`, never `pd.read_csv()` directly
|
|
- **`chart_formatting.md`:** How to format charts correctly
|
|
- **`ltm_methodology.md`:** LTM implementation and usage
|
|
- **`common_errors.md`:** Troubleshooting guide
|
|
- **`code_quality.md`:** Code quality standards and Cursor best practices
|
|
- **`error_handling.md`:** How to write AI-friendly error messages
|
|
|
|
### For Developers
|
|
|
|
- **`config.py`:** Heavily commented with all configuration options
|
|
- **`analysis_template.py`:** Template with examples and comments
|
|
- **`analysis_utils.py`:** Well-documented utility functions
|
|
|
|
---
|
|
|
|
## 🔍 Common Analysis Types
|
|
|
|
This template supports all standard sales analyses:
|
|
|
|
### Revenue Analyses
|
|
- Annual revenue trends
|
|
- Monthly revenue analysis
|
|
- Revenue by segment/product/geography
|
|
|
|
### Customer Analyses
|
|
- Customer segmentation (RFM)
|
|
- Customer concentration
|
|
- Churn analysis
|
|
- Cohort analysis
|
|
- Customer lifetime value (CLV)
|
|
|
|
### Product Analyses
|
|
- Product performance
|
|
- Product lifecycle
|
|
- BCG matrix
|
|
- Market basket analysis
|
|
|
|
### Financial Analyses
|
|
- Price elasticity
|
|
- Contribution margin
|
|
- Price vs volume analysis
|
|
|
|
### Advanced Analyses
|
|
- Seasonality analysis
|
|
- Time series forecasting
|
|
- Customer churn prediction
|
|
|
|
**See `examples/` directory for working example scripts, or the original Dukane project for 24+ production analysis scripts.**
|
|
|
|
---
|
|
|
|
## 🛠️ Dependencies
|
|
|
|
Install required packages:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
**Core dependencies:**
|
|
- `pandas` - Data manipulation
|
|
- `numpy` - Numerical operations
|
|
- `matplotlib` - Charting
|
|
- `seaborn` - Enhanced visualizations
|
|
|
|
**Optional dependencies** (uncomment in requirements.txt if needed):
|
|
- `openpyxl` - Excel export (export_utils.py)
|
|
- `plotly` - Interactive charts (analysis_utils.py)
|
|
- `reportlab` - PDF reports (report_generator.py)
|
|
- `scipy` - Statistical analysis (statistical_utils.py)
|
|
- `pytest` - Unit testing
|
|
- `pmdarima` - Time series forecasting
|
|
- `mlxtend` - Market basket analysis
|
|
- `scikit-learn` - Machine learning
|
|
|
|
---
|
|
|
|
## ⚠️ Important Notes
|
|
|
|
### Always Use Utilities
|
|
|
|
**✅ DO:**
|
|
```python
|
|
from data_loader import load_sales_data
|
|
from analysis_utils import setup_revenue_chart, save_chart
|
|
from config import REVENUE_COLUMN, CHART_SIZES
|
|
```
|
|
|
|
**❌ DON'T:**
|
|
```python
|
|
df = pd.read_csv('data.csv') # Use data_loader instead
|
|
ax.plot(revenue, ...) # Divide by 1e6 first, use setup_revenue_chart()
|
|
```
|
|
|
|
### Chart Formatting
|
|
|
|
**ALWAYS divide revenue by 1e6 before plotting:**
|
|
```python
|
|
ax.plot(revenue / 1e6, ...) # Convert to millions
|
|
setup_revenue_chart(ax) # Apply formatter
|
|
```
|
|
|
|
### LTM Labeling
|
|
|
|
**ALWAYS label LTM years correctly:**
|
|
```python
|
|
from config import get_ltm_label
|
|
ltm_label = get_ltm_label() # Returns "2025 (LTM 9/2025)" or None
|
|
if ltm_label:
|
|
title += f'\n({ltm_label})'
|
|
```
|
|
|
|
---
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Data Loading Issues
|
|
|
|
**Problem:** "Data file not found"
|
|
- **Solution:** Check `DATA_FILE` path in config.py
|
|
- **Solution:** Ensure file is in template directory or update `DATA_DIR`
|
|
|
|
**Problem:** "Required column 'USD' not found"
|
|
- **Solution:** Update `REVENUE_COLUMN` in config.py to match your CSV
|
|
- **Solution:** Check all column mappings in config.py
|
|
|
|
**Problem:** "All dates are NaN"
|
|
- **Solution:** Add fallback date columns to `DATE_FALLBACK_COLUMNS`
|
|
- **Solution:** Check date format in your CSV
|
|
|
|
### Analysis Issues
|
|
|
|
**Problem:** Charts show scientific notation (1e8)
|
|
- **Solution:** Divide by 1e6 before plotting: `ax.plot(data / 1e6, ...)`
|
|
- **Solution:** Use `setup_revenue_chart(ax)` to apply formatter
|
|
|
|
**Problem:** "DataFrame is empty" after filtering
|
|
- **Solution:** Check `MIN_YEAR` and `MAX_DATE` in config.py
|
|
- **Solution:** Verify `ANALYSIS_YEARS` includes years in your data
|
|
|
|
**See `.cursor/rules/common_errors.md` for more troubleshooting help.**
|
|
|
|
---
|
|
|
|
## 📝 Example Workflow
|
|
|
|
### Complete Analysis Workflow
|
|
|
|
1. **Setup:**
|
|
```bash
|
|
python setup_wizard.py
|
|
```
|
|
|
|
2. **Test data loading:**
|
|
```bash
|
|
python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'✓ Loaded {len(df):,} rows')"
|
|
```
|
|
|
|
3. **Create analysis:**
|
|
```bash
|
|
cp analysis_template.py revenue_analysis.py
|
|
# Edit revenue_analysis.py
|
|
```
|
|
|
|
4. **Run analysis:**
|
|
```bash
|
|
python revenue_analysis.py
|
|
```
|
|
|
|
5. **Add to batch runner:**
|
|
```python
|
|
# In run_all_analyses.py:
|
|
ANALYSIS_SCRIPTS = [
|
|
'revenue_analysis.py',
|
|
# ... other analyses
|
|
]
|
|
```
|
|
|
|
6. **Run all analyses:**
|
|
```bash
|
|
python run_all_analyses.py
|
|
```
|
|
|
|
---
|
|
|
|
## 🤝 Best Practices
|
|
|
|
1. **Always validate data** after loading:
|
|
```python
|
|
is_valid, msg = validate_data_structure(df)
|
|
```
|
|
|
|
2. **Use configuration values** instead of hardcoding:
|
|
```python
|
|
from config import REVENUE_COLUMN # ✅
|
|
revenue = df['USD'].sum() # ❌ Hardcoded
|
|
```
|
|
|
|
3. **Apply exclusion filters** if configured:
|
|
```python
|
|
df = apply_exclusion_filters(df)
|
|
```
|
|
|
|
4. **Validate revenue** at end of each analysis:
|
|
```python
|
|
validate_revenue(df, "Analysis Name")
|
|
```
|
|
|
|
5. **Use utility functions** for consistency:
|
|
```python
|
|
from analysis_utils import calculate_annual_metrics, setup_revenue_chart
|
|
```
|
|
|
|
---
|
|
|
|
## 📄 License
|
|
|
|
This template is provided as-is for use in sales analysis projects.
|
|
|
|
---
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
This template is based on best practices developed during the Dukane Corporation sales analysis project, which included 24+ production-ready analysis scripts and comprehensive documentation.
|
|
|
|
---
|
|
|
|
## 📞 Support
|
|
|
|
For questions or issues:
|
|
1. Check `.cursor/rules/` for detailed patterns and troubleshooting
|
|
2. Review `config.py` comments for configuration options
|
|
3. See example analyses in the original Dukane project
|
|
|
|
---
|
|
|
|
**Last Updated:** January 2026
|
|
**Template Version:** 1.0
|
|
**Status:** Production Ready
|