cim_summary/backend/DATABASE.md

# Database Setup and Management

This document describes the database setup, migrations, and management for the CIM Document Processor backend.

## Database Schema

The application uses PostgreSQL with the following tables:

### Users Table
- `id` (UUID, Primary Key)
- `email` (VARCHAR, Unique)
- `name` (VARCHAR)
- `password_hash` (VARCHAR)
- `role` (VARCHAR, 'user' or 'admin')
- `created_at` (TIMESTAMP)
- `updated_at` (TIMESTAMP)
- `last_login` (TIMESTAMP, nullable)
- `is_active` (BOOLEAN)

### Documents Table
- `id` (UUID, Primary Key)
- `user_id` (UUID, Foreign Key to users.id)
- `original_file_name` (VARCHAR)
- `file_path` (VARCHAR)
- `file_size` (BIGINT)
- `uploaded_at` (TIMESTAMP)
- `status` (VARCHAR, processing status)
- `extracted_text` (TEXT, nullable)
- `generated_summary` (TEXT, nullable)
- `summary_markdown_path` (VARCHAR, nullable)
- `summary_pdf_path` (VARCHAR, nullable)
- `processing_started_at` (TIMESTAMP, nullable)
- `processing_completed_at` (TIMESTAMP, nullable)
- `error_message` (TEXT, nullable)
- `created_at` (TIMESTAMP)
- `updated_at` (TIMESTAMP)

### Document Feedback Table
- `id` (UUID, Primary Key)
- `document_id` (UUID, Foreign Key to documents.id)
- `user_id` (UUID, Foreign Key to users.id)
- `feedback` (TEXT)
- `regeneration_instructions` (TEXT, nullable)
- `created_at` (TIMESTAMP)

### Document Versions Table
- `id` (UUID, Primary Key)
- `document_id` (UUID, Foreign Key to documents.id)
- `version_number` (INTEGER)
- `summary_markdown` (TEXT)
- `summary_pdf_path` (VARCHAR)
- `feedback` (TEXT, nullable)
- `created_at` (TIMESTAMP)

### Processing Jobs Table
- `id` (UUID, Primary Key)
- `document_id` (UUID, Foreign Key to documents.id)
- `type` (VARCHAR, job type)
- `status` (VARCHAR, job status)
- `progress` (INTEGER, 0-100)
- `error_message` (TEXT, nullable)
- `created_at` (TIMESTAMP)
- `started_at` (TIMESTAMP, nullable)
- `completed_at` (TIMESTAMP, nullable)

## Setup Instructions

### 1. Install Dependencies
```bash
npm install
```

### 2. Configure Environment Variables
Copy the example environment file and configure your database settings:
```bash
cp .env.example .env
```

Update the following variables in `.env`:
- `DATABASE_URL` - PostgreSQL connection string
- `DB_HOST`, `DB_PORT`, `DB_NAME`, `DB_USER`, `DB_PASSWORD` - Database credentials

### 3. Create Database
Create a PostgreSQL database:
```sql
CREATE DATABASE cim_processor;
```

### 4. Run Migrations and Seed Data
```bash
npm run db:setup
```

This command will:
- Run all database migrations to create tables
- Seed the database with initial test data

## Available Scripts

### Database Management
- `npm run db:migrate` - Run database migrations
- `npm run db:seed` - Seed database with test data
- `npm run db:setup` - Run migrations and seed data

### Development
- `npm run dev` - Start development server
- `npm run build` - Build for production
- `npm run test` - Run tests
- `npm run lint` - Run linting

## Database Models

The application includes the following models:

### UserModel
- `create(userData)` - Create new user
- `findById(id)` - Find user by ID
- `findByEmail(email)` - Find user by email
- `findAll(limit, offset)` - Get all users (admin)
- `update(id, updates)` - Update user
- `delete(id)` - Soft delete user
- `emailExists(email)` - Check if email exists
- `count()` - Count total users

### DocumentModel
- `create(documentData)` - Create new document
- `findById(id)` - Find document by ID
- `findByUserId(userId, limit, offset)` - Get user's documents
- `findAll(limit, offset)` - Get all documents (admin)
- `updateStatus(id, status)` - Update document status
- `updateExtractedText(id, text)` - Update extracted text
- `updateGeneratedSummary(id, summary, markdownPath, pdfPath)` - Update summary
- `delete(id)` - Delete document
- `countByUser(userId)` - Count user's documents
- `findByStatus(status, limit, offset)` - Get documents by status

### DocumentFeedbackModel
- `create(feedbackData)` - Create new feedback
- `findByDocumentId(documentId)` - Get document feedback
- `findByUserId(userId, limit, offset)` - Get user's feedback
- `update(id, updates)` - Update feedback
- `delete(id)` - Delete feedback

### DocumentVersionModel
- `create(versionData)` - Create new version
- `findByDocumentId(documentId)` - Get document versions
- `findLatestByDocumentId(documentId)` - Get latest version
- `getNextVersionNumber(documentId)` - Get next version number
- `update(id, updates)` - Update version
- `delete(id)` - Delete version

### ProcessingJobModel
- `create(jobData)` - Create new job
- `findByDocumentId(documentId)` - Get document jobs
- `findByType(type, limit, offset)` - Get jobs by type
- `findByStatus(status, limit, offset)` - Get jobs by status
- `findPendingJobs(limit)` - Get pending jobs
- `updateStatus(id, status)` - Update job status
- `updateProgress(id, progress)` - Update job progress
- `delete(id)` - Delete job

## Seeded Data

The database is seeded with the following test data:

### Users
- `admin@example.com` / `admin123` (Admin role)
- `user1@example.com` / `user123` (User role)
- `user2@example.com` / `user123` (User role)

### Sample Documents
- Sample CIM documents with different processing statuses
- Associated processing jobs for testing

## Indexes

The following indexes are created for optimal performance:

### Users Table
- `idx_users_email` - Email lookups
- `idx_users_role` - Role-based queries
- `idx_users_is_active` - Active user filtering

### Documents Table
- `idx_documents_user_id` - User document queries
- `idx_documents_status` - Status-based queries
- `idx_documents_uploaded_at` - Date-based queries
- `idx_documents_user_status` - Composite index for user + status

### Other Tables
- Foreign key indexes on all relationship columns
- Composite indexes for common query patterns

## Triggers

- `update_users_updated_at` - Automatically updates `updated_at` timestamp on user updates
- `update_documents_updated_at` - Automatically updates `updated_at` timestamp on document updates

## Backup and Recovery

### Backup
```bash
pg_dump -h localhost -U username -d cim_processor > backup.sql
```

### Restore
```bash
psql -h localhost -U username -d cim_processor < backup.sql
```

## Troubleshooting

### Common Issues

1. **Connection refused**: Check database credentials and ensure PostgreSQL is running
2. **Permission denied**: Ensure database user has proper permissions
3. **Migration errors**: Check if migrations table exists and is accessible
4. **Seed data errors**: Ensure all required tables exist before seeding

### Logs
Check the application logs for detailed error information:
- Database connection errors
- Migration execution logs
- Seed data creation logs