Co-authored-by: Cursor <cursoragent@cursor.com>
Sales Analysis Template
A best-in-class, reusable template for sales invoice detail analysis
Optimized for Cursor AI - Just ask the AI to create analyses and it handles everything automatically.
This template provides a complete framework for analyzing sales data from any company. It's designed to be:
- Flexible: Works with different column names, date formats, and data structures
- Automated: Interactive setup wizard configures everything for your company
- AI-Optimized: Fully optimized for Cursor - AI knows all patterns and generates code automatically
- Production-Ready: Includes error handling, validation, and best practices
🚀 Quick Start
1. Setup (Automated)
Run the interactive setup wizard:
python setup_wizard.py
The wizard will ask you about:
- Company name and analysis date
- Data file location
- Column names in your CSV
- Date range and LTM configuration
- Exclusion filters (if needed)
2. Manual Setup (Alternative)
If you prefer to configure manually:
-
Update
config.pywith your company-specific settings:COMPANY_NAME: Your company nameDATA_FILE: Your CSV filenameREVENUE_COLUMN: Your revenue/amount column nameDATE_COLUMN: Your primary date column- Column mappings for Customer, Item, etc.
- Date range and LTM settings
-
Place your data file in the template directory (or update
DATA_DIRin config.py)
3. Test Data Loading
Verify your configuration works:
python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'Loaded {len(df):,} rows')"
4. Create Your First Analysis
Copy the template and customize:
cp analysis_template.py my_first_analysis.py
# Edit my_first_analysis.py with your analysis logic
python my_first_analysis.py
📁 Project Structure
sales_analysis_template/
├── README.md # This file
├── QUICK_START.md # Quick start guide
├── TEMPLATE_OVERVIEW.md # High-level overview
├── TEMPLATE_SUMMARY.md # Comprehensive template summary
├── EXAMPLES.md # Example scripts guide
├── SETUP_CHECKLIST.md # Setup verification checklist
├── requirements.txt # Python dependencies
├── setup_wizard.py # Interactive setup wizard
│
├── config.py # ⭐ Configuration (customize for your company)
├── config_validator.py # Configuration validation utility
│
├── data_loader.py # ⭐ Data loading with fallback logic
├── data_quality.py # Data quality reporting
├── data_processing.py # Data transformation utilities
│
├── analysis_utils.py # ⭐ Common utilities (formatters, LTM, helpers)
├── statistical_utils.py # Statistical analysis utilities
├── validate_revenue.py # Revenue validation utility
│
├── export_utils.py # Export to CSV/Excel
├── report_generator.py # PDF report generation
├── logger_config.py # Logging configuration
│
├── analysis_template.py # Template for creating new analyses
├── run_all_analyses.py # Batch runner for all scripts
├── generate_sample_data.py # Generate sample data for testing
│
├── examples/ # Example analysis scripts
│ ├── annual_revenue_trend.py # Simple annual revenue analysis
│ ├── customer_segmentation.py # RFM customer segmentation
│ ├── cohort_analysis.py # Customer cohort analysis
│ └── product_performance.py # Product performance analysis
│
├── tests/ # Unit tests
│ ├── test_data_loader.py # Data loader tests
│ ├── test_analysis_utils.py # Analysis utils tests
│ └── test_config_validator.py # Config validator tests
│
└── .cursor/
└── rules/ # Cursor IDE rules (auto-loaded)
├── ai_assistant_guide.md # Complete AI assistant guide
├── advanced_analysis_patterns.md # Advanced techniques
├── analysis_patterns.md # Common analysis patterns
├── chart_formatting.md # Chart formatting rules
├── code_quality.md # Code quality standards
├── common_errors.md # Error troubleshooting
├── data_loading.md # Data loading patterns
├── error_handling.md # Error handling patterns
└── ltm_methodology.md # LTM methodology
🔧 Configuration Guide
Required Configuration
In config.py, you MUST configure:
-
Company Information:
COMPANY_NAME = "Your Company Name" -
Data File:
DATA_FILE = 'your_sales_data.csv' -
Column Mappings:
REVENUE_COLUMN = 'USD' # Your revenue column name DATE_COLUMN = 'InvoiceDate' # Your date column name CUSTOMER_COLUMN = 'Customer' # Your customer column name -
Date Range:
MIN_YEAR = 2021 MAX_DATE = pd.Timestamp('2025-09-30') ANALYSIS_YEARS = [2021, 2022, 2023, 2024, 2025]
Optional Configuration
LTM (Last Twelve Months):
LTM_ENABLED = True # Set to False if all years are complete
LTM_START_MONTH = 10
LTM_START_YEAR = 2024
LTM_END_MONTH = 9
LTM_END_YEAR = 2025
Exclusion Filters:
EXCLUSION_FILTERS = {
'enabled': True,
'exclude_by_column': 'Country',
'exclude_values': ['Test', 'KVT']
}
See config.py for all available options and detailed comments.
📊 Data Requirements
Required Columns
Your CSV file must have:
- Revenue column: A numeric column with sales amounts (configured as
REVENUE_COLUMN) - Date column: At least one date column (configured as
DATE_COLUMN)
Recommended Columns
For full analysis capabilities, include:
- Customer/Account: For customer segmentation and analysis
- Item/Product: For product analysis
- Quantity: For price calculations
- Geographic: Region, Country for geographic analysis
- Segments: Technology, EndMarket, ProductGroup for segmentation
Date Column Fallback
The data loader supports fallback logic:
- Primary: Uses
DATE_COLUMN(e.g., InvoiceDate) - Fallback 1: Uses columns in
DATE_FALLBACK_COLUMNS(e.g., Month, Year) - Fallback 2: Constructs from Year column if available
This ensures maximum date coverage even if some rows have missing dates.
💻 Creating Analysis Scripts
Using the Template
-
Copy the template:
cp analysis_template.py my_analysis.py -
Update configuration:
ANALYSIS_NAME = "My Analysis" DESCRIPTION = "Description of what this analysis does" -
Implement your logic:
- Use
calculate_annual_metrics()for annual aggregations - Use
setup_revenue_chart()andsave_chart()for visualizations - Follow patterns from
.cursor/rules/analysis_patterns.md
- Use
-
Run your analysis:
python my_analysis.py
Standard Pattern
from data_loader import load_sales_data, validate_data_structure
from analysis_utils import (
get_ltm_period_config, calculate_annual_metrics,
setup_revenue_chart, save_chart, apply_exclusion_filters
)
from config import get_data_path, REVENUE_COLUMN, CHART_SIZES
# Load and validate
df = load_sales_data(get_data_path())
is_valid, msg = validate_data_structure(df)
if not is_valid:
print(f"ERROR: {msg}")
return
# Apply filters
df = apply_exclusion_filters(df)
# Calculate metrics
ltm_start, ltm_end = get_ltm_period_config()
annual_df = calculate_annual_metrics(df, calculate_metrics, ltm_start, ltm_end)
# Create charts
fig, ax = plt.subplots(figsize=CHART_SIZES['medium'])
ax.plot(data / 1e6, ...)
setup_revenue_chart(ax)
save_chart(fig, 'chart.png')
🎯 Key Features
1. Flexible Data Loading
- Handles different column names via configuration
- Fallback logic for date parsing (100% coverage)
- Automatic validation and error reporting
2. LTM (Last Twelve Months) Support
- Automatic LTM calculation for partial years
- Apples-to-apples comparison with full calendar years
- Configurable LTM periods
3. Standardized Chart Formatting
- Automatic millions formatter for revenue charts
- Consistent styling and sizing
- Professional output ready for reports
- Optional interactive charts with Plotly
4. Exclusion Filters
- Easy configuration for excluding segments
- Useful for excluding test accounts, business units, etc.
5. Revenue Validation
- Automatic validation after each analysis
- Ensures data loading is working correctly
- Optional validation against expected values
6. Example Scripts
- Working examples for common analyses
- Demonstrates best practices
- Easy to customize and extend
7. Data Export
- Export results to CSV and Excel
- Formatted summary tables
- Multiple sheet support
8. Data Quality Reporting
- Comprehensive data quality checks
- Missing value analysis
- Outlier detection
- Data profiling
9. Configuration Validation
- Early error detection
- Validates column mappings
- Checks date ranges and LTM configuration
10. Statistical Utilities
- Year-over-year growth calculations
- CAGR (Compound Annual Growth Rate)
- Correlation analysis
- Statistical significance testing
11. Report Generation
- Combine multiple charts into PDF reports
- Professional formatting
- Summary tables and metadata
12. Logging Infrastructure
- Structured logging with file and console output
- Analysis execution tracking
- Configurable log levels
📚 Documentation
For AI Agents (Cursor IDE)
The .cursor/rules/ directory contains comprehensive rules that are automatically loaded by Cursor:
ai_assistant_guide.md: Complete guide with ready-to-use promptsadvanced_analysis_patterns.md: Advanced techniques (cohort, PVM, forecasting, etc.)analysis_patterns.md: Standard patterns for creating analysesdata_loading.md: Always usedata_loader.py, neverpd.read_csv()directlychart_formatting.md: How to format charts correctlyltm_methodology.md: LTM implementation and usagecommon_errors.md: Troubleshooting guidecode_quality.md: Code quality standards and Cursor best practiceserror_handling.md: How to write AI-friendly error messages
For Developers
config.py: Heavily commented with all configuration optionsanalysis_template.py: Template with examples and commentsanalysis_utils.py: Well-documented utility functions
🔍 Common Analysis Types
This template supports all standard sales analyses:
Revenue Analyses
- Annual revenue trends
- Monthly revenue analysis
- Revenue by segment/product/geography
Customer Analyses
- Customer segmentation (RFM)
- Customer concentration
- Churn analysis
- Cohort analysis
- Customer lifetime value (CLV)
Product Analyses
- Product performance
- Product lifecycle
- BCG matrix
- Market basket analysis
Financial Analyses
- Price elasticity
- Contribution margin
- Price vs volume analysis
Advanced Analyses
- Seasonality analysis
- Time series forecasting
- Customer churn prediction
See examples/ directory for working example scripts, or the original Dukane project for 24+ production analysis scripts.
🛠️ Dependencies
Install required packages:
pip install -r requirements.txt
Core dependencies:
pandas- Data manipulationnumpy- Numerical operationsmatplotlib- Chartingseaborn- Enhanced visualizations
Optional dependencies (uncomment in requirements.txt if needed):
openpyxl- Excel export (export_utils.py)plotly- Interactive charts (analysis_utils.py)reportlab- PDF reports (report_generator.py)scipy- Statistical analysis (statistical_utils.py)pytest- Unit testingpmdarima- Time series forecastingmlxtend- Market basket analysisscikit-learn- Machine learning
⚠️ Important Notes
Always Use Utilities
✅ DO:
from data_loader import load_sales_data
from analysis_utils import setup_revenue_chart, save_chart
from config import REVENUE_COLUMN, CHART_SIZES
❌ DON'T:
df = pd.read_csv('data.csv') # Use data_loader instead
ax.plot(revenue, ...) # Divide by 1e6 first, use setup_revenue_chart()
Chart Formatting
ALWAYS divide revenue by 1e6 before plotting:
ax.plot(revenue / 1e6, ...) # Convert to millions
setup_revenue_chart(ax) # Apply formatter
LTM Labeling
ALWAYS label LTM years correctly:
from config import get_ltm_label
ltm_label = get_ltm_label() # Returns "2025 (LTM 9/2025)" or None
if ltm_label:
title += f'\n({ltm_label})'
🐛 Troubleshooting
Data Loading Issues
Problem: "Data file not found"
- Solution: Check
DATA_FILEpath in config.py - Solution: Ensure file is in template directory or update
DATA_DIR
Problem: "Required column 'USD' not found"
- Solution: Update
REVENUE_COLUMNin config.py to match your CSV - Solution: Check all column mappings in config.py
Problem: "All dates are NaN"
- Solution: Add fallback date columns to
DATE_FALLBACK_COLUMNS - Solution: Check date format in your CSV
Analysis Issues
Problem: Charts show scientific notation (1e8)
- Solution: Divide by 1e6 before plotting:
ax.plot(data / 1e6, ...) - Solution: Use
setup_revenue_chart(ax)to apply formatter
Problem: "DataFrame is empty" after filtering
- Solution: Check
MIN_YEARandMAX_DATEin config.py - Solution: Verify
ANALYSIS_YEARSincludes years in your data
See .cursor/rules/common_errors.md for more troubleshooting help.
📝 Example Workflow
Complete Analysis Workflow
-
Setup:
python setup_wizard.py -
Test data loading:
python -c "from data_loader import load_sales_data; from config import get_data_path; df = load_sales_data(get_data_path()); print(f'✓ Loaded {len(df):,} rows')" -
Create analysis:
cp analysis_template.py revenue_analysis.py # Edit revenue_analysis.py -
Run analysis:
python revenue_analysis.py -
Add to batch runner:
# In run_all_analyses.py: ANALYSIS_SCRIPTS = [ 'revenue_analysis.py', # ... other analyses ] -
Run all analyses:
python run_all_analyses.py
🤝 Best Practices
-
Always validate data after loading:
is_valid, msg = validate_data_structure(df) -
Use configuration values instead of hardcoding:
from config import REVENUE_COLUMN # ✅ revenue = df['USD'].sum() # ❌ Hardcoded -
Apply exclusion filters if configured:
df = apply_exclusion_filters(df) -
Validate revenue at end of each analysis:
validate_revenue(df, "Analysis Name") -
Use utility functions for consistency:
from analysis_utils import calculate_annual_metrics, setup_revenue_chart
📄 License
This template is provided as-is for use in sales analysis projects.
🙏 Acknowledgments
This template is based on best practices developed during the Dukane Corporation sales analysis project, which included 24+ production-ready analysis scripts and comprehensive documentation.
📞 Support
For questions or issues:
- Check
.cursor/rules/for detailed patterns and troubleshooting - Review
config.pycomments for configuration options - See example analyses in the original Dukane project
Last Updated: January 2026
Template Version: 1.0
Status: Production Ready