Files

Jonathan Pressnell cf0b596449 Initial commit: sales analysis template

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-06 09:16:34 -05:00

2.0 KiB

Raw Permalink Blame History

Data Loading Rules

CRITICAL: Always Use data_loader.py

NEVER load data directly with pd.read_csv(). Always use:

from data_loader import load_sales_data
from config import get_data_path
df = load_sales_data(get_data_path())

Why This Matters

The data_loader.py implements intelligent fallback logic to ensure 100% date coverage:

Primary: Parse primary date column (from config.DATE_COLUMN)
Fallback 1: Use fallback date columns if primary is missing (from config.DATE_FALLBACK_COLUMNS)
Fallback 2: Use Year column if both missing
Result: Maximum date coverage possible

What data_loader.py Provides

Date Column: Properly parsed datetime with fallback logic
Year: Extracted year (100% coverage via fallback)
YearMonth: Period format for monthly aggregations
Revenue Column: Converted to numeric (from config.REVENUE_COLUMN)

Column Configuration

Before using, configure column names in config.py:

REVENUE_COLUMN: Your revenue/amount column name
DATE_COLUMN: Primary date column name
DATE_FALLBACK_COLUMNS: List of fallback date columns
CUSTOMER_COLUMN: Customer/account column name
Other columns as needed

Common Mistakes

❌ WRONG:

df = pd.read_csv('sales_data.csv')
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df = df.dropna(subset=['Date'])  # May drop significant data!

✅ CORRECT:

from data_loader import load_sales_data
from config import get_data_path
df = load_sales_data(get_data_path())  # Uses fallback logic

Data File Location

The data file path is configured in config.py:

DATA_FILE: Filename (e.g., 'sales_data.csv')
DATA_DIR: Optional subdirectory (defaults to current directory)
Use get_data_path() to get the full path

Validation

After loading, validate data structure:

from data_loader import validate_data_structure
is_valid, msg = validate_data_structure(df)
if not is_valid:
    print(f"ERROR: {msg}")

2.0 KiB Raw Permalink Blame History