Jonathan Pressnell cf0b596449 Initial commit: sales analysis template
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-06 09:16:34 -05:00

Sales Analysis Template

A best-in-class, reusable template for sales invoice detail analysis

Optimized for Cursor AI - Just ask the AI to create analyses and it handles everything automatically.

This template provides a complete framework for analyzing sales data from any company. It's designed to be:

  • Flexible: Works with different column names, date formats, and data structures
  • Automated: Interactive setup wizard configures everything for your company
  • AI-Optimized: Fully optimized for Cursor - AI knows all patterns and generates code automatically
  • Production-Ready: Includes error handling, validation, and best practices

🚀 Quick Start

1. Setup (Automated)

Run the interactive setup wizard:

python setup_wizard.py

The wizard will ask you about:

  • Company name and analysis date
  • Data file location
  • Column names in your CSV
  • Date range and LTM configuration
  • Exclusion filters (if needed)

2. Manual Setup (Alternative)

If you prefer to configure manually:

  1. Update config.py with your company-specific settings:

    • COMPANY_NAME: Your company name
    • DATA_FILE: Your CSV filename
    • REVENUE_COLUMN: Your revenue/amount column name
    • DATE_COLUMN: Your primary date column
    • Column mappings for Customer, Item, etc.
    • Date range and LTM settings
  2. Place your data file in the template directory (or update DATA_DIR in config.py)

3. Test Data Loading

Verify your configuration works:

python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'Loaded {len(df):,} rows')"

4. Create Your First Analysis

Copy the template and customize:

cp analysis_template.py my_first_analysis.py
# Edit my_first_analysis.py with your analysis logic
python my_first_analysis.py

📁 Project Structure

sales_analysis_template/
├── README.md                    # This file
├── QUICK_START.md               # Quick start guide
├── TEMPLATE_OVERVIEW.md         # High-level overview
├── TEMPLATE_SUMMARY.md          # Comprehensive template summary
├── EXAMPLES.md                  # Example scripts guide
├── SETUP_CHECKLIST.md           # Setup verification checklist
├── requirements.txt             # Python dependencies
├── setup_wizard.py              # Interactive setup wizard
│
├── config.py                    # ⭐ Configuration (customize for your company)
├── config_validator.py          # Configuration validation utility
│
├── data_loader.py               # ⭐ Data loading with fallback logic
├── data_quality.py              # Data quality reporting
├── data_processing.py           # Data transformation utilities
│
├── analysis_utils.py           # ⭐ Common utilities (formatters, LTM, helpers)
├── statistical_utils.py         # Statistical analysis utilities
├── validate_revenue.py          # Revenue validation utility
│
├── export_utils.py              # Export to CSV/Excel
├── report_generator.py          # PDF report generation
├── logger_config.py             # Logging configuration
│
├── analysis_template.py         # Template for creating new analyses
├── run_all_analyses.py          # Batch runner for all scripts
├── generate_sample_data.py      # Generate sample data for testing
│
├── examples/                    # Example analysis scripts
│   ├── annual_revenue_trend.py  # Simple annual revenue analysis
│   ├── customer_segmentation.py # RFM customer segmentation
│   ├── cohort_analysis.py       # Customer cohort analysis
│   └── product_performance.py   # Product performance analysis
│
├── tests/                       # Unit tests
│   ├── test_data_loader.py      # Data loader tests
│   ├── test_analysis_utils.py   # Analysis utils tests
│   └── test_config_validator.py # Config validator tests
│
└── .cursor/
    └── rules/                   # Cursor IDE rules (auto-loaded)
        ├── ai_assistant_guide.md # Complete AI assistant guide
        ├── advanced_analysis_patterns.md # Advanced techniques
        ├── analysis_patterns.md  # Common analysis patterns
        ├── chart_formatting.md   # Chart formatting rules
        ├── code_quality.md       # Code quality standards
        ├── common_errors.md      # Error troubleshooting
        ├── data_loading.md       # Data loading patterns
        ├── error_handling.md     # Error handling patterns
        └── ltm_methodology.md    # LTM methodology

🔧 Configuration Guide

Required Configuration

In config.py, you MUST configure:

  1. Company Information:

    COMPANY_NAME = "Your Company Name"
    
  2. Data File:

    DATA_FILE = 'your_sales_data.csv'
    
  3. Column Mappings:

    REVENUE_COLUMN = 'USD'  # Your revenue column name
    DATE_COLUMN = 'InvoiceDate'  # Your date column name
    CUSTOMER_COLUMN = 'Customer'  # Your customer column name
    
  4. Date Range:

    MIN_YEAR = 2021
    MAX_DATE = pd.Timestamp('2025-09-30')
    ANALYSIS_YEARS = [2021, 2022, 2023, 2024, 2025]
    

Optional Configuration

LTM (Last Twelve Months):

LTM_ENABLED = True  # Set to False if all years are complete
LTM_START_MONTH = 10
LTM_START_YEAR = 2024
LTM_END_MONTH = 9
LTM_END_YEAR = 2025

Exclusion Filters:

EXCLUSION_FILTERS = {
    'enabled': True,
    'exclude_by_column': 'Country',
    'exclude_values': ['Test', 'KVT']
}

See config.py for all available options and detailed comments.


📊 Data Requirements

Required Columns

Your CSV file must have:

  • Revenue column: A numeric column with sales amounts (configured as REVENUE_COLUMN)
  • Date column: At least one date column (configured as DATE_COLUMN)

For full analysis capabilities, include:

  • Customer/Account: For customer segmentation and analysis
  • Item/Product: For product analysis
  • Quantity: For price calculations
  • Geographic: Region, Country for geographic analysis
  • Segments: Technology, EndMarket, ProductGroup for segmentation

Date Column Fallback

The data loader supports fallback logic:

  1. Primary: Uses DATE_COLUMN (e.g., InvoiceDate)
  2. Fallback 1: Uses columns in DATE_FALLBACK_COLUMNS (e.g., Month, Year)
  3. Fallback 2: Constructs from Year column if available

This ensures maximum date coverage even if some rows have missing dates.


💻 Creating Analysis Scripts

Using the Template

  1. Copy the template:

    cp analysis_template.py my_analysis.py
    
  2. Update configuration:

    ANALYSIS_NAME = "My Analysis"
    DESCRIPTION = "Description of what this analysis does"
    
  3. Implement your logic:

    • Use calculate_annual_metrics() for annual aggregations
    • Use setup_revenue_chart() and save_chart() for visualizations
    • Follow patterns from .cursor/rules/analysis_patterns.md
  4. Run your analysis:

    python my_analysis.py
    

Standard Pattern

from data_loader import load_sales_data, validate_data_structure
from analysis_utils import (
    get_ltm_period_config, calculate_annual_metrics,
    setup_revenue_chart, save_chart, apply_exclusion_filters
)
from config import get_data_path, REVENUE_COLUMN, CHART_SIZES

# Load and validate
df = load_sales_data(get_data_path())
is_valid, msg = validate_data_structure(df)
if not is_valid:
    print(f"ERROR: {msg}")
    return

# Apply filters
df = apply_exclusion_filters(df)

# Calculate metrics
ltm_start, ltm_end = get_ltm_period_config()
annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end)

# Create charts
fig, ax = plt.subplots(figsize=CHART_SIZES['medium'])
ax.plot(data / 1e6, ...)
setup_revenue_chart(ax)
save_chart(fig, 'chart.png')

🎯 Key Features

1. Flexible Data Loading

  • Handles different column names via configuration
  • Fallback logic for date parsing (100% coverage)
  • Automatic validation and error reporting

2. LTM (Last Twelve Months) Support

  • Automatic LTM calculation for partial years
  • Apples-to-apples comparison with full calendar years
  • Configurable LTM periods

3. Standardized Chart Formatting

  • Automatic millions formatter for revenue charts
  • Consistent styling and sizing
  • Professional output ready for reports
  • Optional interactive charts with Plotly

4. Exclusion Filters

  • Easy configuration for excluding segments
  • Useful for excluding test accounts, business units, etc.

5. Revenue Validation

  • Automatic validation after each analysis
  • Ensures data loading is working correctly
  • Optional validation against expected values

6. Example Scripts

  • Working examples for common analyses
  • Demonstrates best practices
  • Easy to customize and extend

7. Data Export

  • Export results to CSV and Excel
  • Formatted summary tables
  • Multiple sheet support

8. Data Quality Reporting

  • Comprehensive data quality checks
  • Missing value analysis
  • Outlier detection
  • Data profiling

9. Configuration Validation

  • Early error detection
  • Validates column mappings
  • Checks date ranges and LTM configuration

10. Statistical Utilities

  • Year-over-year growth calculations
  • CAGR (Compound Annual Growth Rate)
  • Correlation analysis
  • Statistical significance testing

11. Report Generation

  • Combine multiple charts into PDF reports
  • Professional formatting
  • Summary tables and metadata

12. Logging Infrastructure

  • Structured logging with file and console output
  • Analysis execution tracking
  • Configurable log levels

📚 Documentation

For AI Agents (Cursor IDE)

The .cursor/rules/ directory contains comprehensive rules that are automatically loaded by Cursor:

  • ai_assistant_guide.md: Complete guide with ready-to-use prompts
  • advanced_analysis_patterns.md: Advanced techniques (cohort, PVM, forecasting, etc.)
  • analysis_patterns.md: Standard patterns for creating analyses
  • data_loading.md: Always use data_loader.py, never pd.read_csv() directly
  • chart_formatting.md: How to format charts correctly
  • ltm_methodology.md: LTM implementation and usage
  • common_errors.md: Troubleshooting guide
  • code_quality.md: Code quality standards and Cursor best practices
  • error_handling.md: How to write AI-friendly error messages

For Developers

  • config.py: Heavily commented with all configuration options
  • analysis_template.py: Template with examples and comments
  • analysis_utils.py: Well-documented utility functions

🔍 Common Analysis Types

This template supports all standard sales analyses:

Revenue Analyses

  • Annual revenue trends
  • Monthly revenue analysis
  • Revenue by segment/product/geography

Customer Analyses

  • Customer segmentation (RFM)
  • Customer concentration
  • Churn analysis
  • Cohort analysis
  • Customer lifetime value (CLV)

Product Analyses

  • Product performance
  • Product lifecycle
  • BCG matrix
  • Market basket analysis

Financial Analyses

  • Price elasticity
  • Contribution margin
  • Price vs volume analysis

Advanced Analyses

  • Seasonality analysis
  • Time series forecasting
  • Customer churn prediction

See examples/ directory for working example scripts, or the original Dukane project for 24+ production analysis scripts.


🛠️ Dependencies

Install required packages:

pip install -r requirements.txt

Core dependencies:

  • pandas - Data manipulation
  • numpy - Numerical operations
  • matplotlib - Charting
  • seaborn - Enhanced visualizations

Optional dependencies (uncomment in requirements.txt if needed):

  • openpyxl - Excel export (export_utils.py)
  • plotly - Interactive charts (analysis_utils.py)
  • reportlab - PDF reports (report_generator.py)
  • scipy - Statistical analysis (statistical_utils.py)
  • pytest - Unit testing
  • pmdarima - Time series forecasting
  • mlxtend - Market basket analysis
  • scikit-learn - Machine learning

⚠️ Important Notes

Always Use Utilities

DO:

from data_loader import load_sales_data
from analysis_utils import setup_revenue_chart, save_chart
from config import REVENUE_COLUMN, CHART_SIZES

DON'T:

df = pd.read_csv('data.csv')  # Use data_loader instead
ax.plot(revenue, ...)  # Divide by 1e6 first, use setup_revenue_chart()

Chart Formatting

ALWAYS divide revenue by 1e6 before plotting:

ax.plot(revenue / 1e6, ...)  # Convert to millions
setup_revenue_chart(ax)  # Apply formatter

LTM Labeling

ALWAYS label LTM years correctly:

from config import get_ltm_label
ltm_label = get_ltm_label()  # Returns "2025 (LTM 9/2025)" or None
if ltm_label:
    title += f'\n({ltm_label})'

🐛 Troubleshooting

Data Loading Issues

Problem: "Data file not found"

  • Solution: Check DATA_FILE path in config.py
  • Solution: Ensure file is in template directory or update DATA_DIR

Problem: "Required column 'USD' not found"

  • Solution: Update REVENUE_COLUMN in config.py to match your CSV
  • Solution: Check all column mappings in config.py

Problem: "All dates are NaN"

  • Solution: Add fallback date columns to DATE_FALLBACK_COLUMNS
  • Solution: Check date format in your CSV

Analysis Issues

Problem: Charts show scientific notation (1e8)

  • Solution: Divide by 1e6 before plotting: ax.plot(data / 1e6, ...)
  • Solution: Use setup_revenue_chart(ax) to apply formatter

Problem: "DataFrame is empty" after filtering

  • Solution: Check MIN_YEAR and MAX_DATE in config.py
  • Solution: Verify ANALYSIS_YEARS includes years in your data

See .cursor/rules/common_errors.md for more troubleshooting help.


📝 Example Workflow

Complete Analysis Workflow

  1. Setup:

    python setup_wizard.py
    
  2. Test data loading:

    python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'✓ Loaded {len(df):,} rows')"
    
  3. Create analysis:

    cp analysis_template.py revenue_analysis.py
    # Edit revenue_analysis.py
    
  4. Run analysis:

    python revenue_analysis.py
    
  5. Add to batch runner:

    # In run_all_analyses.py:
    ANALYSIS_SCRIPTS = [
        'revenue_analysis.py',
        # ... other analyses
    ]
    
  6. Run all analyses:

    python run_all_analyses.py
    

🤝 Best Practices

  1. Always validate data after loading:

    is_valid, msg = validate_data_structure(df)
    
  2. Use configuration values instead of hardcoding:

    from config import REVENUE_COLUMN  # ✅
    revenue = df['USD'].sum()  # ❌ Hardcoded
    
  3. Apply exclusion filters if configured:

    df = apply_exclusion_filters(df)
    
  4. Validate revenue at end of each analysis:

    validate_revenue(df, "Analysis Name")
    
  5. Use utility functions for consistency:

    from analysis_utils import calculate_annual_metrics, setup_revenue_chart
    

📄 License

This template is provided as-is for use in sales analysis projects.


🙏 Acknowledgments

This template is based on best practices developed during the Dukane Corporation sales analysis project, which included 24+ production-ready analysis scripts and comprehensive documentation.


📞 Support

For questions or issues:

  1. Check .cursor/rules/ for detailed patterns and troubleshooting
  2. Review config.py comments for configuration options
  3. See example analyses in the original Dukane project

Last Updated: January 2026
Template Version: 1.0
Status: Production Ready

Description
No description provided
Readme 114 KiB
Languages
Python 100%