# Database Setup and Management This document describes the database setup, migrations, and management for the CIM Document Processor backend. ## Database Schema The application uses PostgreSQL with the following tables: ### Users Table - `id` (UUID, Primary Key) - `email` (VARCHAR, Unique) - `name` (VARCHAR) - `password_hash` (VARCHAR) - `role` (VARCHAR, 'user' or 'admin') - `created_at` (TIMESTAMP) - `updated_at` (TIMESTAMP) - `last_login` (TIMESTAMP, nullable) - `is_active` (BOOLEAN) ### Documents Table - `id` (UUID, Primary Key) - `user_id` (UUID, Foreign Key to users.id) - `original_file_name` (VARCHAR) - `file_path` (VARCHAR) - `file_size` (BIGINT) - `uploaded_at` (TIMESTAMP) - `status` (VARCHAR, processing status) - `extracted_text` (TEXT, nullable) - `generated_summary` (TEXT, nullable) - `summary_markdown_path` (VARCHAR, nullable) - `summary_pdf_path` (VARCHAR, nullable) - `processing_started_at` (TIMESTAMP, nullable) - `processing_completed_at` (TIMESTAMP, nullable) - `error_message` (TEXT, nullable) - `created_at` (TIMESTAMP) - `updated_at` (TIMESTAMP) ### Document Feedback Table - `id` (UUID, Primary Key) - `document_id` (UUID, Foreign Key to documents.id) - `user_id` (UUID, Foreign Key to users.id) - `feedback` (TEXT) - `regeneration_instructions` (TEXT, nullable) - `created_at` (TIMESTAMP) ### Document Versions Table - `id` (UUID, Primary Key) - `document_id` (UUID, Foreign Key to documents.id) - `version_number` (INTEGER) - `summary_markdown` (TEXT) - `summary_pdf_path` (VARCHAR) - `feedback` (TEXT, nullable) - `created_at` (TIMESTAMP) ### Processing Jobs Table - `id` (UUID, Primary Key) - `document_id` (UUID, Foreign Key to documents.id) - `type` (VARCHAR, job type) - `status` (VARCHAR, job status) - `progress` (INTEGER, 0-100) - `error_message` (TEXT, nullable) - `created_at` (TIMESTAMP) - `started_at` (TIMESTAMP, nullable) - `completed_at` (TIMESTAMP, nullable) ## Setup Instructions ### 1. Install Dependencies ```bash npm install ``` ### 2. Configure Environment Variables Copy the example environment file and configure your database settings: ```bash cp .env.example .env ``` Update the following variables in `.env`: - `DATABASE_URL` - PostgreSQL connection string - `DB_HOST`, `DB_PORT`, `DB_NAME`, `DB_USER`, `DB_PASSWORD` - Database credentials ### 3. Create Database Create a PostgreSQL database: ```sql CREATE DATABASE cim_processor; ``` ### 4. Run Migrations and Seed Data ```bash npm run db:setup ``` This command will: - Run all database migrations to create tables - Seed the database with initial test data ## Available Scripts ### Database Management - `npm run db:migrate` - Run database migrations - `npm run db:seed` - Seed database with test data - `npm run db:setup` - Run migrations and seed data ### Development - `npm run dev` - Start development server - `npm run build` - Build for production - `npm run test` - Run tests - `npm run lint` - Run linting ## Database Models The application includes the following models: ### UserModel - `create(userData)` - Create new user - `findById(id)` - Find user by ID - `findByEmail(email)` - Find user by email - `findAll(limit, offset)` - Get all users (admin) - `update(id, updates)` - Update user - `delete(id)` - Soft delete user - `emailExists(email)` - Check if email exists - `count()` - Count total users ### DocumentModel - `create(documentData)` - Create new document - `findById(id)` - Find document by ID - `findByUserId(userId, limit, offset)` - Get user's documents - `findAll(limit, offset)` - Get all documents (admin) - `updateStatus(id, status)` - Update document status - `updateExtractedText(id, text)` - Update extracted text - `updateGeneratedSummary(id, summary, markdownPath, pdfPath)` - Update summary - `delete(id)` - Delete document - `countByUser(userId)` - Count user's documents - `findByStatus(status, limit, offset)` - Get documents by status ### DocumentFeedbackModel - `create(feedbackData)` - Create new feedback - `findByDocumentId(documentId)` - Get document feedback - `findByUserId(userId, limit, offset)` - Get user's feedback - `update(id, updates)` - Update feedback - `delete(id)` - Delete feedback ### DocumentVersionModel - `create(versionData)` - Create new version - `findByDocumentId(documentId)` - Get document versions - `findLatestByDocumentId(documentId)` - Get latest version - `getNextVersionNumber(documentId)` - Get next version number - `update(id, updates)` - Update version - `delete(id)` - Delete version ### ProcessingJobModel - `create(jobData)` - Create new job - `findByDocumentId(documentId)` - Get document jobs - `findByType(type, limit, offset)` - Get jobs by type - `findByStatus(status, limit, offset)` - Get jobs by status - `findPendingJobs(limit)` - Get pending jobs - `updateStatus(id, status)` - Update job status - `updateProgress(id, progress)` - Update job progress - `delete(id)` - Delete job ## Seeded Data The database is seeded with the following test data: ### Users - `admin@example.com` / `admin123` (Admin role) - `user1@example.com` / `user123` (User role) - `user2@example.com` / `user123` (User role) ### Sample Documents - Sample CIM documents with different processing statuses - Associated processing jobs for testing ## Indexes The following indexes are created for optimal performance: ### Users Table - `idx_users_email` - Email lookups - `idx_users_role` - Role-based queries - `idx_users_is_active` - Active user filtering ### Documents Table - `idx_documents_user_id` - User document queries - `idx_documents_status` - Status-based queries - `idx_documents_uploaded_at` - Date-based queries - `idx_documents_user_status` - Composite index for user + status ### Other Tables - Foreign key indexes on all relationship columns - Composite indexes for common query patterns ## Triggers - `update_users_updated_at` - Automatically updates `updated_at` timestamp on user updates - `update_documents_updated_at` - Automatically updates `updated_at` timestamp on document updates ## Backup and Recovery ### Backup ```bash pg_dump -h localhost -U username -d cim_processor > backup.sql ``` ### Restore ```bash psql -h localhost -U username -d cim_processor < backup.sql ``` ## Troubleshooting ### Common Issues 1. **Connection refused**: Check database credentials and ensure PostgreSQL is running 2. **Permission denied**: Ensure database user has proper permissions 3. **Migration errors**: Check if migrations table exists and is accessible 4. **Seed data errors**: Ensure all required tables exist before seeding ### Logs Check the application logs for detailed error information: - Database connection errors - Migration execution logs - Seed data creation logs