Files
sales-data-analysis/README.md
Jonathan Pressnell cf0b596449 Initial commit: sales analysis template
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-06 09:16:34 -05:00

590 lines
16 KiB
Markdown

# Sales Analysis Template
**A best-in-class, reusable template for sales invoice detail analysis**
**Optimized for Cursor AI** - Just ask the AI to create analyses and it handles everything automatically.
This template provides a complete framework for analyzing sales data from any company. It's designed to be:
- **Flexible:** Works with different column names, date formats, and data structures
- **Automated:** Interactive setup wizard configures everything for your company
- **AI-Optimized:** Fully optimized for Cursor - AI knows all patterns and generates code automatically
- **Production-Ready:** Includes error handling, validation, and best practices
---
## 🚀 Quick Start
### 1. Setup (Automated)
Run the interactive setup wizard:
```bash
python setup_wizard.py
```
The wizard will ask you about:
- Company name and analysis date
- Data file location
- Column names in your CSV
- Date range and LTM configuration
- Exclusion filters (if needed)
### 2. Manual Setup (Alternative)
If you prefer to configure manually:
1. **Update `config.py`** with your company-specific settings:
- `COMPANY_NAME`: Your company name
- `DATA_FILE`: Your CSV filename
- `REVENUE_COLUMN`: Your revenue/amount column name
- `DATE_COLUMN`: Your primary date column
- Column mappings for Customer, Item, etc.
- Date range and LTM settings
2. **Place your data file** in the template directory (or update `DATA_DIR` in config.py)
### 3. Test Data Loading
Verify your configuration works:
```bash
python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'Loaded {len(df):,} rows')"
```
### 4. Create Your First Analysis
Copy the template and customize:
```bash
cp analysis_template.py my_first_analysis.py
# Edit my_first_analysis.py with your analysis logic
python my_first_analysis.py
```
---
## 📁 Project Structure
```
sales_analysis_template/
├── README.md # This file
├── QUICK_START.md # Quick start guide
├── TEMPLATE_OVERVIEW.md # High-level overview
├── TEMPLATE_SUMMARY.md # Comprehensive template summary
├── EXAMPLES.md # Example scripts guide
├── SETUP_CHECKLIST.md # Setup verification checklist
├── requirements.txt # Python dependencies
├── setup_wizard.py # Interactive setup wizard
├── config.py # ⭐ Configuration (customize for your company)
├── config_validator.py # Configuration validation utility
├── data_loader.py # ⭐ Data loading with fallback logic
├── data_quality.py # Data quality reporting
├── data_processing.py # Data transformation utilities
├── analysis_utils.py # ⭐ Common utilities (formatters, LTM, helpers)
├── statistical_utils.py # Statistical analysis utilities
├── validate_revenue.py # Revenue validation utility
├── export_utils.py # Export to CSV/Excel
├── report_generator.py # PDF report generation
├── logger_config.py # Logging configuration
├── analysis_template.py # Template for creating new analyses
├── run_all_analyses.py # Batch runner for all scripts
├── generate_sample_data.py # Generate sample data for testing
├── examples/ # Example analysis scripts
│ ├── annual_revenue_trend.py # Simple annual revenue analysis
│ ├── customer_segmentation.py # RFM customer segmentation
│ ├── cohort_analysis.py # Customer cohort analysis
│ └── product_performance.py # Product performance analysis
├── tests/ # Unit tests
│ ├── test_data_loader.py # Data loader tests
│ ├── test_analysis_utils.py # Analysis utils tests
│ └── test_config_validator.py # Config validator tests
└── .cursor/
└── rules/ # Cursor IDE rules (auto-loaded)
├── ai_assistant_guide.md # Complete AI assistant guide
├── advanced_analysis_patterns.md # Advanced techniques
├── analysis_patterns.md # Common analysis patterns
├── chart_formatting.md # Chart formatting rules
├── code_quality.md # Code quality standards
├── common_errors.md # Error troubleshooting
├── data_loading.md # Data loading patterns
├── error_handling.md # Error handling patterns
└── ltm_methodology.md # LTM methodology
```
---
## 🔧 Configuration Guide
### Required Configuration
**In `config.py`, you MUST configure:**
1. **Company Information:**
```python
COMPANY_NAME = "Your Company Name"
```
2. **Data File:**
```python
DATA_FILE = 'your_sales_data.csv'
```
3. **Column Mappings:**
```python
REVENUE_COLUMN = 'USD' # Your revenue column name
DATE_COLUMN = 'InvoiceDate' # Your date column name
CUSTOMER_COLUMN = 'Customer' # Your customer column name
```
4. **Date Range:**
```python
MIN_YEAR = 2021
MAX_DATE = pd.Timestamp('2025-09-30')
ANALYSIS_YEARS = [2021, 2022, 2023, 2024, 2025]
```
### Optional Configuration
**LTM (Last Twelve Months):**
```python
LTM_ENABLED = True # Set to False if all years are complete
LTM_START_MONTH = 10
LTM_START_YEAR = 2024
LTM_END_MONTH = 9
LTM_END_YEAR = 2025
```
**Exclusion Filters:**
```python
EXCLUSION_FILTERS = {
'enabled': True,
'exclude_by_column': 'Country',
'exclude_values': ['Test', 'KVT']
}
```
**See `config.py` for all available options and detailed comments.**
---
## 📊 Data Requirements
### Required Columns
Your CSV file must have:
- **Revenue column:** A numeric column with sales amounts (configured as `REVENUE_COLUMN`)
- **Date column:** At least one date column (configured as `DATE_COLUMN`)
### Recommended Columns
For full analysis capabilities, include:
- **Customer/Account:** For customer segmentation and analysis
- **Item/Product:** For product analysis
- **Quantity:** For price calculations
- **Geographic:** Region, Country for geographic analysis
- **Segments:** Technology, EndMarket, ProductGroup for segmentation
### Date Column Fallback
The data loader supports fallback logic:
1. **Primary:** Uses `DATE_COLUMN` (e.g., InvoiceDate)
2. **Fallback 1:** Uses columns in `DATE_FALLBACK_COLUMNS` (e.g., Month, Year)
3. **Fallback 2:** Constructs from Year column if available
This ensures maximum date coverage even if some rows have missing dates.
---
## 💻 Creating Analysis Scripts
### Using the Template
1. **Copy the template:**
```bash
cp analysis_template.py my_analysis.py
```
2. **Update configuration:**
```python
ANALYSIS_NAME = "My Analysis"
DESCRIPTION = "Description of what this analysis does"
```
3. **Implement your logic:**
- Use `calculate_annual_metrics()` for annual aggregations
- Use `setup_revenue_chart()` and `save_chart()` for visualizations
- Follow patterns from `.cursor/rules/analysis_patterns.md`
4. **Run your analysis:**
```bash
python my_analysis.py
```
### Standard Pattern
```python
from data_loader import load_sales_data, validate_data_structure
from analysis_utils import (
get_ltm_period_config, calculate_annual_metrics,
setup_revenue_chart, save_chart, apply_exclusion_filters
)
from config import get_data_path, REVENUE_COLUMN, CHART_SIZES
# Load and validate
df = load_sales_data(get_data_path())
is_valid, msg = validate_data_structure(df)
if not is_valid:
print(f"ERROR: {msg}")
return
# Apply filters
df = apply_exclusion_filters(df)
# Calculate metrics
ltm_start, ltm_end = get_ltm_period_config()
annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end)
# Create charts
fig, ax = plt.subplots(figsize=CHART_SIZES['medium'])
ax.plot(data / 1e6, ...)
setup_revenue_chart(ax)
save_chart(fig, 'chart.png')
```
---
## 🎯 Key Features
### 1. Flexible Data Loading
- Handles different column names via configuration
- Fallback logic for date parsing (100% coverage)
- Automatic validation and error reporting
### 2. LTM (Last Twelve Months) Support
- Automatic LTM calculation for partial years
- Apples-to-apples comparison with full calendar years
- Configurable LTM periods
### 3. Standardized Chart Formatting
- Automatic millions formatter for revenue charts
- Consistent styling and sizing
- Professional output ready for reports
- Optional interactive charts with Plotly
### 4. Exclusion Filters
- Easy configuration for excluding segments
- Useful for excluding test accounts, business units, etc.
### 5. Revenue Validation
- Automatic validation after each analysis
- Ensures data loading is working correctly
- Optional validation against expected values
### 6. Example Scripts
- Working examples for common analyses
- Demonstrates best practices
- Easy to customize and extend
### 7. Data Export
- Export results to CSV and Excel
- Formatted summary tables
- Multiple sheet support
### 8. Data Quality Reporting
- Comprehensive data quality checks
- Missing value analysis
- Outlier detection
- Data profiling
### 9. Configuration Validation
- Early error detection
- Validates column mappings
- Checks date ranges and LTM configuration
### 10. Statistical Utilities
- Year-over-year growth calculations
- CAGR (Compound Annual Growth Rate)
- Correlation analysis
- Statistical significance testing
### 11. Report Generation
- Combine multiple charts into PDF reports
- Professional formatting
- Summary tables and metadata
### 12. Logging Infrastructure
- Structured logging with file and console output
- Analysis execution tracking
- Configurable log levels
---
## 📚 Documentation
### For AI Agents (Cursor IDE)
The `.cursor/rules/` directory contains comprehensive rules that are automatically loaded by Cursor:
- **`ai_assistant_guide.md`:** Complete guide with ready-to-use prompts
- **`advanced_analysis_patterns.md`:** Advanced techniques (cohort, PVM, forecasting, etc.)
- **`analysis_patterns.md`:** Standard patterns for creating analyses
- **`data_loading.md`:** Always use `data_loader.py`, never `pd.read_csv()` directly
- **`chart_formatting.md`:** How to format charts correctly
- **`ltm_methodology.md`:** LTM implementation and usage
- **`common_errors.md`:** Troubleshooting guide
- **`code_quality.md`:** Code quality standards and Cursor best practices
- **`error_handling.md`:** How to write AI-friendly error messages
### For Developers
- **`config.py`:** Heavily commented with all configuration options
- **`analysis_template.py`:** Template with examples and comments
- **`analysis_utils.py`:** Well-documented utility functions
---
## 🔍 Common Analysis Types
This template supports all standard sales analyses:
### Revenue Analyses
- Annual revenue trends
- Monthly revenue analysis
- Revenue by segment/product/geography
### Customer Analyses
- Customer segmentation (RFM)
- Customer concentration
- Churn analysis
- Cohort analysis
- Customer lifetime value (CLV)
### Product Analyses
- Product performance
- Product lifecycle
- BCG matrix
- Market basket analysis
### Financial Analyses
- Price elasticity
- Contribution margin
- Price vs volume analysis
### Advanced Analyses
- Seasonality analysis
- Time series forecasting
- Customer churn prediction
**See `examples/` directory for working example scripts, or the original Dukane project for 24+ production analysis scripts.**
---
## 🛠️ Dependencies
Install required packages:
```bash
pip install -r requirements.txt
```
**Core dependencies:**
- `pandas` - Data manipulation
- `numpy` - Numerical operations
- `matplotlib` - Charting
- `seaborn` - Enhanced visualizations
**Optional dependencies** (uncomment in requirements.txt if needed):
- `openpyxl` - Excel export (export_utils.py)
- `plotly` - Interactive charts (analysis_utils.py)
- `reportlab` - PDF reports (report_generator.py)
- `scipy` - Statistical analysis (statistical_utils.py)
- `pytest` - Unit testing
- `pmdarima` - Time series forecasting
- `mlxtend` - Market basket analysis
- `scikit-learn` - Machine learning
---
## ⚠️ Important Notes
### Always Use Utilities
**✅ DO:**
```python
from data_loader import load_sales_data
from analysis_utils import setup_revenue_chart, save_chart
from config import REVENUE_COLUMN, CHART_SIZES
```
**❌ DON'T:**
```python
df = pd.read_csv('data.csv') # Use data_loader instead
ax.plot(revenue, ...) # Divide by 1e6 first, use setup_revenue_chart()
```
### Chart Formatting
**ALWAYS divide revenue by 1e6 before plotting:**
```python
ax.plot(revenue / 1e6, ...) # Convert to millions
setup_revenue_chart(ax) # Apply formatter
```
### LTM Labeling
**ALWAYS label LTM years correctly:**
```python
from config import get_ltm_label
ltm_label = get_ltm_label() # Returns "2025 (LTM 9/2025)" or None
if ltm_label:
title += f'\n({ltm_label})'
```
---
## 🐛 Troubleshooting
### Data Loading Issues
**Problem:** "Data file not found"
- **Solution:** Check `DATA_FILE` path in config.py
- **Solution:** Ensure file is in template directory or update `DATA_DIR`
**Problem:** "Required column 'USD' not found"
- **Solution:** Update `REVENUE_COLUMN` in config.py to match your CSV
- **Solution:** Check all column mappings in config.py
**Problem:** "All dates are NaN"
- **Solution:** Add fallback date columns to `DATE_FALLBACK_COLUMNS`
- **Solution:** Check date format in your CSV
### Analysis Issues
**Problem:** Charts show scientific notation (1e8)
- **Solution:** Divide by 1e6 before plotting: `ax.plot(data / 1e6, ...)`
- **Solution:** Use `setup_revenue_chart(ax)` to apply formatter
**Problem:** "DataFrame is empty" after filtering
- **Solution:** Check `MIN_YEAR` and `MAX_DATE` in config.py
- **Solution:** Verify `ANALYSIS_YEARS` includes years in your data
**See `.cursor/rules/common_errors.md` for more troubleshooting help.**
---
## 📝 Example Workflow
### Complete Analysis Workflow
1. **Setup:**
```bash
python setup_wizard.py
```
2. **Test data loading:**
```bash
python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'✓ Loaded {len(df):,} rows')"
```
3. **Create analysis:**
```bash
cp analysis_template.py revenue_analysis.py
# Edit revenue_analysis.py
```
4. **Run analysis:**
```bash
python revenue_analysis.py
```
5. **Add to batch runner:**
```python
# In run_all_analyses.py:
ANALYSIS_SCRIPTS = [
'revenue_analysis.py',
# ... other analyses
]
```
6. **Run all analyses:**
```bash
python run_all_analyses.py
```
---
## 🤝 Best Practices
1. **Always validate data** after loading:
```python
is_valid, msg = validate_data_structure(df)
```
2. **Use configuration values** instead of hardcoding:
```python
from config import REVENUE_COLUMN # ✅
revenue = df['USD'].sum() # ❌ Hardcoded
```
3. **Apply exclusion filters** if configured:
```python
df = apply_exclusion_filters(df)
```
4. **Validate revenue** at end of each analysis:
```python
validate_revenue(df, "Analysis Name")
```
5. **Use utility functions** for consistency:
```python
from analysis_utils import calculate_annual_metrics, setup_revenue_chart
```
---
## 📄 License
This template is provided as-is for use in sales analysis projects.
---
## 🙏 Acknowledgments
This template is based on best practices developed during the Dukane Corporation sales analysis project, which included 24+ production-ready analysis scripts and comprehensive documentation.
---
## 📞 Support
For questions or issues:
1. Check `.cursor/rules/` for detailed patterns and troubleshooting
2. Review `config.py` comments for configuration options
3. See example analyses in the original Dukane project
---
**Last Updated:** January 2026
**Template Version:** 1.0
**Status:** Production Ready