# Sales Analysis Template **A best-in-class, reusable template for sales invoice detail analysis** **Optimized for Cursor AI** - Just ask the AI to create analyses and it handles everything automatically. This template provides a complete framework for analyzing sales data from any company. It's designed to be: - **Flexible:** Works with different column names, date formats, and data structures - **Automated:** Interactive setup wizard configures everything for your company - **AI-Optimized:** Fully optimized for Cursor - AI knows all patterns and generates code automatically - **Production-Ready:** Includes error handling, validation, and best practices --- ## 🚀 Quick Start ### 1. Setup (Automated) Run the interactive setup wizard: ```bash python setup_wizard.py ``` The wizard will ask you about: - Company name and analysis date - Data file location - Column names in your CSV - Date range and LTM configuration - Exclusion filters (if needed) ### 2. Manual Setup (Alternative) If you prefer to configure manually: 1. **Update `config.py`** with your company-specific settings: - `COMPANY_NAME`: Your company name - `DATA_FILE`: Your CSV filename - `REVENUE_COLUMN`: Your revenue/amount column name - `DATE_COLUMN`: Your primary date column - Column mappings for Customer, Item, etc. - Date range and LTM settings 2. **Place your data file** in the template directory (or update `DATA_DIR` in config.py) ### 3. Test Data Loading Verify your configuration works: ```bash python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'Loaded {len(df):,} rows')" ``` ### 4. Create Your First Analysis Copy the template and customize: ```bash cp analysis_template.py my_first_analysis.py # Edit my_first_analysis.py with your analysis logic python my_first_analysis.py ``` --- ## 📁 Project Structure ``` sales_analysis_template/ ├── README.md # This file ├── QUICK_START.md # Quick start guide ├── TEMPLATE_OVERVIEW.md # High-level overview ├── TEMPLATE_SUMMARY.md # Comprehensive template summary ├── EXAMPLES.md # Example scripts guide ├── SETUP_CHECKLIST.md # Setup verification checklist ├── requirements.txt # Python dependencies ├── setup_wizard.py # Interactive setup wizard │ ├── config.py # ⭐ Configuration (customize for your company) ├── config_validator.py # Configuration validation utility │ ├── data_loader.py # ⭐ Data loading with fallback logic ├── data_quality.py # Data quality reporting ├── data_processing.py # Data transformation utilities │ ├── analysis_utils.py # ⭐ Common utilities (formatters, LTM, helpers) ├── statistical_utils.py # Statistical analysis utilities ├── validate_revenue.py # Revenue validation utility │ ├── export_utils.py # Export to CSV/Excel ├── report_generator.py # PDF report generation ├── logger_config.py # Logging configuration │ ├── analysis_template.py # Template for creating new analyses ├── run_all_analyses.py # Batch runner for all scripts ├── generate_sample_data.py # Generate sample data for testing │ ├── examples/ # Example analysis scripts │ ├── annual_revenue_trend.py # Simple annual revenue analysis │ ├── customer_segmentation.py # RFM customer segmentation │ ├── cohort_analysis.py # Customer cohort analysis │ └── product_performance.py # Product performance analysis │ ├── tests/ # Unit tests │ ├── test_data_loader.py # Data loader tests │ ├── test_analysis_utils.py # Analysis utils tests │ └── test_config_validator.py # Config validator tests │ └── .cursor/ └── rules/ # Cursor IDE rules (auto-loaded) ├── ai_assistant_guide.md # Complete AI assistant guide ├── advanced_analysis_patterns.md # Advanced techniques ├── analysis_patterns.md # Common analysis patterns ├── chart_formatting.md # Chart formatting rules ├── code_quality.md # Code quality standards ├── common_errors.md # Error troubleshooting ├── data_loading.md # Data loading patterns ├── error_handling.md # Error handling patterns └── ltm_methodology.md # LTM methodology ``` --- ## 🔧 Configuration Guide ### Required Configuration **In `config.py`, you MUST configure:** 1. **Company Information:** ```python COMPANY_NAME = "Your Company Name" ``` 2. **Data File:** ```python DATA_FILE = 'your_sales_data.csv' ``` 3. **Column Mappings:** ```python REVENUE_COLUMN = 'USD' # Your revenue column name DATE_COLUMN = 'InvoiceDate' # Your date column name CUSTOMER_COLUMN = 'Customer' # Your customer column name ``` 4. **Date Range:** ```python MIN_YEAR = 2021 MAX_DATE = pd.Timestamp('2025-09-30') ANALYSIS_YEARS = [2021, 2022, 2023, 2024, 2025] ``` ### Optional Configuration **LTM (Last Twelve Months):** ```python LTM_ENABLED = True # Set to False if all years are complete LTM_START_MONTH = 10 LTM_START_YEAR = 2024 LTM_END_MONTH = 9 LTM_END_YEAR = 2025 ``` **Exclusion Filters:** ```python EXCLUSION_FILTERS = { 'enabled': True, 'exclude_by_column': 'Country', 'exclude_values': ['Test', 'KVT'] } ``` **See `config.py` for all available options and detailed comments.** --- ## 📊 Data Requirements ### Required Columns Your CSV file must have: - **Revenue column:** A numeric column with sales amounts (configured as `REVENUE_COLUMN`) - **Date column:** At least one date column (configured as `DATE_COLUMN`) ### Recommended Columns For full analysis capabilities, include: - **Customer/Account:** For customer segmentation and analysis - **Item/Product:** For product analysis - **Quantity:** For price calculations - **Geographic:** Region, Country for geographic analysis - **Segments:** Technology, EndMarket, ProductGroup for segmentation ### Date Column Fallback The data loader supports fallback logic: 1. **Primary:** Uses `DATE_COLUMN` (e.g., InvoiceDate) 2. **Fallback 1:** Uses columns in `DATE_FALLBACK_COLUMNS` (e.g., Month, Year) 3. **Fallback 2:** Constructs from Year column if available This ensures maximum date coverage even if some rows have missing dates. --- ## 💻 Creating Analysis Scripts ### Using the Template 1. **Copy the template:** ```bash cp analysis_template.py my_analysis.py ``` 2. **Update configuration:** ```python ANALYSIS_NAME = "My Analysis" DESCRIPTION = "Description of what this analysis does" ``` 3. **Implement your logic:** - Use `calculate_annual_metrics()` for annual aggregations - Use `setup_revenue_chart()` and `save_chart()` for visualizations - Follow patterns from `.cursor/rules/analysis_patterns.md` 4. **Run your analysis:** ```bash python my_analysis.py ``` ### Standard Pattern ```python from data_loader import load_sales_data, validate_data_structure from analysis_utils import ( get_ltm_period_config, calculate_annual_metrics, setup_revenue_chart, save_chart, apply_exclusion_filters ) from config import get_data_path, REVENUE_COLUMN, CHART_SIZES # Load and validate df = load_sales_data(get_data_path()) is_valid, msg = validate_data_structure(df) if not is_valid: print(f"ERROR: {msg}") return # Apply filters df = apply_exclusion_filters(df) # Calculate metrics ltm_start, ltm_end = get_ltm_period_config() annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end) # Create charts fig, ax = plt.subplots(figsize=CHART_SIZES['medium']) ax.plot(data / 1e6, ...) setup_revenue_chart(ax) save_chart(fig, 'chart.png') ``` --- ## 🎯 Key Features ### 1. Flexible Data Loading - Handles different column names via configuration - Fallback logic for date parsing (100% coverage) - Automatic validation and error reporting ### 2. LTM (Last Twelve Months) Support - Automatic LTM calculation for partial years - Apples-to-apples comparison with full calendar years - Configurable LTM periods ### 3. Standardized Chart Formatting - Automatic millions formatter for revenue charts - Consistent styling and sizing - Professional output ready for reports - Optional interactive charts with Plotly ### 4. Exclusion Filters - Easy configuration for excluding segments - Useful for excluding test accounts, business units, etc. ### 5. Revenue Validation - Automatic validation after each analysis - Ensures data loading is working correctly - Optional validation against expected values ### 6. Example Scripts - Working examples for common analyses - Demonstrates best practices - Easy to customize and extend ### 7. Data Export - Export results to CSV and Excel - Formatted summary tables - Multiple sheet support ### 8. Data Quality Reporting - Comprehensive data quality checks - Missing value analysis - Outlier detection - Data profiling ### 9. Configuration Validation - Early error detection - Validates column mappings - Checks date ranges and LTM configuration ### 10. Statistical Utilities - Year-over-year growth calculations - CAGR (Compound Annual Growth Rate) - Correlation analysis - Statistical significance testing ### 11. Report Generation - Combine multiple charts into PDF reports - Professional formatting - Summary tables and metadata ### 12. Logging Infrastructure - Structured logging with file and console output - Analysis execution tracking - Configurable log levels --- ## 📚 Documentation ### For AI Agents (Cursor IDE) The `.cursor/rules/` directory contains comprehensive rules that are automatically loaded by Cursor: - **`ai_assistant_guide.md`:** Complete guide with ready-to-use prompts - **`advanced_analysis_patterns.md`:** Advanced techniques (cohort, PVM, forecasting, etc.) - **`analysis_patterns.md`:** Standard patterns for creating analyses - **`data_loading.md`:** Always use `data_loader.py`, never `pd.read_csv()` directly - **`chart_formatting.md`:** How to format charts correctly - **`ltm_methodology.md`:** LTM implementation and usage - **`common_errors.md`:** Troubleshooting guide - **`code_quality.md`:** Code quality standards and Cursor best practices - **`error_handling.md`:** How to write AI-friendly error messages ### For Developers - **`config.py`:** Heavily commented with all configuration options - **`analysis_template.py`:** Template with examples and comments - **`analysis_utils.py`:** Well-documented utility functions --- ## 🔍 Common Analysis Types This template supports all standard sales analyses: ### Revenue Analyses - Annual revenue trends - Monthly revenue analysis - Revenue by segment/product/geography ### Customer Analyses - Customer segmentation (RFM) - Customer concentration - Churn analysis - Cohort analysis - Customer lifetime value (CLV) ### Product Analyses - Product performance - Product lifecycle - BCG matrix - Market basket analysis ### Financial Analyses - Price elasticity - Contribution margin - Price vs volume analysis ### Advanced Analyses - Seasonality analysis - Time series forecasting - Customer churn prediction **See `examples/` directory for working example scripts, or the original Dukane project for 24+ production analysis scripts.** --- ## 🛠️ Dependencies Install required packages: ```bash pip install -r requirements.txt ``` **Core dependencies:** - `pandas` - Data manipulation - `numpy` - Numerical operations - `matplotlib` - Charting - `seaborn` - Enhanced visualizations **Optional dependencies** (uncomment in requirements.txt if needed): - `openpyxl` - Excel export (export_utils.py) - `plotly` - Interactive charts (analysis_utils.py) - `reportlab` - PDF reports (report_generator.py) - `scipy` - Statistical analysis (statistical_utils.py) - `pytest` - Unit testing - `pmdarima` - Time series forecasting - `mlxtend` - Market basket analysis - `scikit-learn` - Machine learning --- ## ⚠️ Important Notes ### Always Use Utilities **✅ DO:** ```python from data_loader import load_sales_data from analysis_utils import setup_revenue_chart, save_chart from config import REVENUE_COLUMN, CHART_SIZES ``` **❌ DON'T:** ```python df = pd.read_csv('data.csv') # Use data_loader instead ax.plot(revenue, ...) # Divide by 1e6 first, use setup_revenue_chart() ``` ### Chart Formatting **ALWAYS divide revenue by 1e6 before plotting:** ```python ax.plot(revenue / 1e6, ...) # Convert to millions setup_revenue_chart(ax) # Apply formatter ``` ### LTM Labeling **ALWAYS label LTM years correctly:** ```python from config import get_ltm_label ltm_label = get_ltm_label() # Returns "2025 (LTM 9/2025)" or None if ltm_label: title += f'\n({ltm_label})' ``` --- ## 🐛 Troubleshooting ### Data Loading Issues **Problem:** "Data file not found" - **Solution:** Check `DATA_FILE` path in config.py - **Solution:** Ensure file is in template directory or update `DATA_DIR` **Problem:** "Required column 'USD' not found" - **Solution:** Update `REVENUE_COLUMN` in config.py to match your CSV - **Solution:** Check all column mappings in config.py **Problem:** "All dates are NaN" - **Solution:** Add fallback date columns to `DATE_FALLBACK_COLUMNS` - **Solution:** Check date format in your CSV ### Analysis Issues **Problem:** Charts show scientific notation (1e8) - **Solution:** Divide by 1e6 before plotting: `ax.plot(data / 1e6, ...)` - **Solution:** Use `setup_revenue_chart(ax)` to apply formatter **Problem:** "DataFrame is empty" after filtering - **Solution:** Check `MIN_YEAR` and `MAX_DATE` in config.py - **Solution:** Verify `ANALYSIS_YEARS` includes years in your data **See `.cursor/rules/common_errors.md` for more troubleshooting help.** --- ## 📝 Example Workflow ### Complete Analysis Workflow 1. **Setup:** ```bash python setup_wizard.py ``` 2. **Test data loading:** ```bash python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'✓ Loaded {len(df):,} rows')" ``` 3. **Create analysis:** ```bash cp analysis_template.py revenue_analysis.py # Edit revenue_analysis.py ``` 4. **Run analysis:** ```bash python revenue_analysis.py ``` 5. **Add to batch runner:** ```python # In run_all_analyses.py: ANALYSIS_SCRIPTS = [ 'revenue_analysis.py', # ... other analyses ] ``` 6. **Run all analyses:** ```bash python run_all_analyses.py ``` --- ## 🤝 Best Practices 1. **Always validate data** after loading: ```python is_valid, msg = validate_data_structure(df) ``` 2. **Use configuration values** instead of hardcoding: ```python from config import REVENUE_COLUMN # ✅ revenue = df['USD'].sum() # ❌ Hardcoded ``` 3. **Apply exclusion filters** if configured: ```python df = apply_exclusion_filters(df) ``` 4. **Validate revenue** at end of each analysis: ```python validate_revenue(df, "Analysis Name") ``` 5. **Use utility functions** for consistency: ```python from analysis_utils import calculate_annual_metrics, setup_revenue_chart ``` --- ## 📄 License This template is provided as-is for use in sales analysis projects. --- ## 🙏 Acknowledgments This template is based on best practices developed during the Dukane Corporation sales analysis project, which included 24+ production-ready analysis scripts and comprehensive documentation. --- ## 📞 Support For questions or issues: 1. Check `.cursor/rules/` for detailed patterns and troubleshooting 2. Review `config.py` comments for configuration options 3. See example analyses in the original Dukane project --- **Last Updated:** January 2026 **Template Version:** 1.0 **Status:** Production Ready