697 lines
19 KiB
Markdown
697 lines
19 KiB
Markdown
# Database Schema Documentation
|
|
## Complete Database Structure for CIM Document Processor
|
|
|
|
### 🎯 Overview
|
|
|
|
This document provides comprehensive documentation of the database schema for the CIM Document Processor, including all tables, relationships, indexes, and data structures.
|
|
|
|
---
|
|
|
|
## 🗄️ Database Architecture
|
|
|
|
### Technology Stack
|
|
- **Database**: PostgreSQL (via Supabase)
|
|
- **ORM**: Supabase Client (TypeScript)
|
|
- **Migrations**: SQL migration files
|
|
- **Backup**: Supabase automated backups
|
|
|
|
### Database Features
|
|
- **JSONB Support**: For flexible analysis data storage
|
|
- **UUID Primary Keys**: For secure document identification
|
|
- **Row Level Security**: For user data isolation
|
|
- **Full-Text Search**: For document content search
|
|
- **Vector Storage**: For AI embeddings and similarity search
|
|
|
|
---
|
|
|
|
## 📊 Core Tables
|
|
|
|
### Documents Table
|
|
**Purpose**: Primary table for storing document metadata and processing results
|
|
|
|
```sql
|
|
CREATE TABLE documents (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
user_id TEXT NOT NULL,
|
|
original_file_name TEXT NOT NULL,
|
|
file_path TEXT NOT NULL,
|
|
file_size INTEGER NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'uploaded',
|
|
extracted_text TEXT,
|
|
generated_summary TEXT,
|
|
summary_pdf_path TEXT,
|
|
analysis_data JSONB,
|
|
error_message TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique document identifier (UUID)
|
|
- `user_id` - User who owns the document
|
|
- `original_file_name` - Original uploaded file name
|
|
- `file_path` - Storage path for the document
|
|
- `file_size` - File size in bytes
|
|
- `status` - Processing status (uploaded, processing, completed, failed, cancelled)
|
|
- `extracted_text` - Text extracted from document
|
|
- `generated_summary` - AI-generated summary
|
|
- `summary_pdf_path` - Path to generated PDF report
|
|
- `analysis_data` - Structured analysis results (JSONB)
|
|
- `error_message` - Error message if processing failed
|
|
- `created_at` - Document creation timestamp
|
|
- `updated_at` - Last update timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_documents_user_id ON documents(user_id);
|
|
CREATE INDEX idx_documents_status ON documents(status);
|
|
CREATE INDEX idx_documents_created_at ON documents(created_at);
|
|
CREATE INDEX idx_documents_analysis_data ON documents USING GIN (analysis_data);
|
|
```
|
|
|
|
### Users Table
|
|
**Purpose**: User authentication and profile information
|
|
|
|
```sql
|
|
CREATE TABLE users (
|
|
id TEXT PRIMARY KEY,
|
|
name TEXT,
|
|
email TEXT UNIQUE NOT NULL,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Firebase user ID
|
|
- `name` - User display name
|
|
- `email` - User email address
|
|
- `created_at` - Account creation timestamp
|
|
- `updated_at` - Last update timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_users_email ON users(email);
|
|
```
|
|
|
|
### Processing Jobs Table
|
|
**Purpose**: Background job tracking and management
|
|
|
|
```sql
|
|
CREATE TABLE processing_jobs (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
|
|
user_id TEXT NOT NULL,
|
|
job_type TEXT NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'pending',
|
|
priority INTEGER DEFAULT 0,
|
|
attempts INTEGER DEFAULT 0,
|
|
max_attempts INTEGER DEFAULT 3,
|
|
started_at TIMESTAMP,
|
|
completed_at TIMESTAMP,
|
|
error_message TEXT,
|
|
result_data JSONB,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique job identifier
|
|
- `document_id` - Associated document
|
|
- `user_id` - User who initiated the job
|
|
- `job_type` - Type of processing job
|
|
- `status` - Job status (pending, running, completed, failed)
|
|
- `priority` - Job priority (higher = more important)
|
|
- `attempts` - Number of processing attempts
|
|
- `max_attempts` - Maximum allowed attempts
|
|
- `started_at` - Job start timestamp
|
|
- `completed_at` - Job completion timestamp
|
|
- `error_message` - Error message if failed
|
|
- `result_data` - Job result data (JSONB)
|
|
- `created_at` - Job creation timestamp
|
|
- `updated_at` - Last update timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_processing_jobs_document_id ON processing_jobs(document_id);
|
|
CREATE INDEX idx_processing_jobs_user_id ON processing_jobs(user_id);
|
|
CREATE INDEX idx_processing_jobs_status ON processing_jobs(status);
|
|
CREATE INDEX idx_processing_jobs_priority ON processing_jobs(priority);
|
|
```
|
|
|
|
---
|
|
|
|
## 🤖 AI Processing Tables
|
|
|
|
### Agentic RAG Sessions Table
|
|
**Purpose**: Track AI processing sessions and results
|
|
|
|
```sql
|
|
CREATE TABLE agentic_rag_sessions (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
|
|
user_id TEXT NOT NULL,
|
|
strategy TEXT NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'pending',
|
|
total_agents INTEGER DEFAULT 0,
|
|
completed_agents INTEGER DEFAULT 0,
|
|
failed_agents INTEGER DEFAULT 0,
|
|
overall_validation_score DECIMAL(3,2),
|
|
processing_time_ms INTEGER,
|
|
api_calls_count INTEGER DEFAULT 0,
|
|
total_cost DECIMAL(10,4),
|
|
reasoning_steps JSONB,
|
|
final_result JSONB,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
completed_at TIMESTAMP
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique session identifier
|
|
- `document_id` - Associated document
|
|
- `user_id` - User who initiated processing
|
|
- `strategy` - Processing strategy used
|
|
- `status` - Session status
|
|
- `total_agents` - Total number of AI agents
|
|
- `completed_agents` - Successfully completed agents
|
|
- `failed_agents` - Failed agents
|
|
- `overall_validation_score` - Quality validation score
|
|
- `processing_time_ms` - Total processing time
|
|
- `api_calls_count` - Number of API calls made
|
|
- `total_cost` - Total cost of processing
|
|
- `reasoning_steps` - AI reasoning process (JSONB)
|
|
- `final_result` - Final analysis result (JSONB)
|
|
- `created_at` - Session creation timestamp
|
|
- `completed_at` - Session completion timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_agentic_rag_sessions_document_id ON agentic_rag_sessions(document_id);
|
|
CREATE INDEX idx_agentic_rag_sessions_user_id ON agentic_rag_sessions(user_id);
|
|
CREATE INDEX idx_agentic_rag_sessions_status ON agentic_rag_sessions(status);
|
|
CREATE INDEX idx_agentic_rag_sessions_strategy ON agentic_rag_sessions(strategy);
|
|
```
|
|
|
|
### Agent Executions Table
|
|
**Purpose**: Track individual AI agent executions
|
|
|
|
```sql
|
|
CREATE TABLE agent_executions (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
session_id UUID REFERENCES agentic_rag_sessions(id) ON DELETE CASCADE,
|
|
agent_name TEXT NOT NULL,
|
|
agent_type TEXT NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'pending',
|
|
input_data JSONB,
|
|
output_data JSONB,
|
|
error_message TEXT,
|
|
execution_time_ms INTEGER,
|
|
api_calls INTEGER DEFAULT 0,
|
|
cost DECIMAL(10,4),
|
|
validation_score DECIMAL(3,2),
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
completed_at TIMESTAMP
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique execution identifier
|
|
- `session_id` - Associated processing session
|
|
- `agent_name` - Name of the AI agent
|
|
- `agent_type` - Type of agent
|
|
- `status` - Execution status
|
|
- `input_data` - Input data for agent (JSONB)
|
|
- `output_data` - Output data from agent (JSONB)
|
|
- `error_message` - Error message if failed
|
|
- `execution_time_ms` - Execution time in milliseconds
|
|
- `api_calls` - Number of API calls made
|
|
- `cost` - Cost of this execution
|
|
- `validation_score` - Quality validation score
|
|
- `created_at` - Execution creation timestamp
|
|
- `completed_at` - Execution completion timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_agent_executions_session_id ON agent_executions(session_id);
|
|
CREATE INDEX idx_agent_executions_agent_name ON agent_executions(agent_name);
|
|
CREATE INDEX idx_agent_executions_status ON agent_executions(status);
|
|
```
|
|
|
|
### Quality Metrics Table
|
|
**Purpose**: Track quality metrics for AI processing
|
|
|
|
```sql
|
|
CREATE TABLE quality_metrics (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
session_id UUID REFERENCES agentic_rag_sessions(id) ON DELETE CASCADE,
|
|
metric_name TEXT NOT NULL,
|
|
metric_value DECIMAL(10,4),
|
|
metric_type TEXT NOT NULL,
|
|
threshold_value DECIMAL(10,4),
|
|
passed BOOLEAN,
|
|
details JSONB,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique metric identifier
|
|
- `session_id` - Associated processing session
|
|
- `metric_name` - Name of the quality metric
|
|
- `metric_value` - Actual metric value
|
|
- `metric_type` - Type of metric (accuracy, completeness, etc.)
|
|
- `threshold_value` - Threshold for passing
|
|
- `passed` - Whether metric passed threshold
|
|
- `details` - Additional metric details (JSONB)
|
|
- `created_at` - Metric creation timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_quality_metrics_session_id ON quality_metrics(session_id);
|
|
CREATE INDEX idx_quality_metrics_metric_name ON quality_metrics(metric_name);
|
|
CREATE INDEX idx_quality_metrics_passed ON quality_metrics(passed);
|
|
```
|
|
|
|
---
|
|
|
|
## 🔍 Vector Database Tables
|
|
|
|
### Document Chunks Table
|
|
**Purpose**: Store document chunks with vector embeddings
|
|
|
|
```sql
|
|
CREATE TABLE document_chunks (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
|
|
chunk_index INTEGER NOT NULL,
|
|
content TEXT NOT NULL,
|
|
embedding VECTOR(1536),
|
|
metadata JSONB,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique chunk identifier
|
|
- `document_id` - Associated document
|
|
- `chunk_index` - Sequential chunk index
|
|
- `content` - Chunk text content
|
|
- `embedding` - Vector embedding (1536 dimensions)
|
|
- `metadata` - Chunk metadata (JSONB)
|
|
- `created_at` - Chunk creation timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_document_chunks_document_id ON document_chunks(document_id);
|
|
CREATE INDEX idx_document_chunks_chunk_index ON document_chunks(chunk_index);
|
|
CREATE INDEX idx_document_chunks_embedding ON document_chunks USING ivfflat (embedding vector_cosine_ops);
|
|
```
|
|
|
|
### Search Analytics Table
|
|
**Purpose**: Track vector search usage and performance
|
|
|
|
```sql
|
|
CREATE TABLE search_analytics (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
user_id TEXT NOT NULL,
|
|
query_text TEXT NOT NULL,
|
|
results_count INTEGER,
|
|
search_time_ms INTEGER,
|
|
success BOOLEAN,
|
|
error_message TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique search identifier
|
|
- `user_id` - User who performed search
|
|
- `query_text` - Search query text
|
|
- `results_count` - Number of results returned
|
|
- `search_time_ms` - Search execution time
|
|
- `success` - Whether search was successful
|
|
- `error_message` - Error message if failed
|
|
- `created_at` - Search timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_search_analytics_user_id ON search_analytics(user_id);
|
|
CREATE INDEX idx_search_analytics_created_at ON search_analytics(created_at);
|
|
CREATE INDEX idx_search_analytics_success ON search_analytics(success);
|
|
```
|
|
|
|
---
|
|
|
|
## 📈 Analytics Tables
|
|
|
|
### Performance Metrics Table
|
|
**Purpose**: Track system performance metrics
|
|
|
|
```sql
|
|
CREATE TABLE performance_metrics (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
metric_name TEXT NOT NULL,
|
|
metric_value DECIMAL(10,4),
|
|
metric_unit TEXT,
|
|
tags JSONB,
|
|
timestamp TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique metric identifier
|
|
- `metric_name` - Name of the performance metric
|
|
- `metric_value` - Metric value
|
|
- `metric_unit` - Unit of measurement
|
|
- `tags` - Additional tags (JSONB)
|
|
- `timestamp` - Metric timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_performance_metrics_name ON performance_metrics(metric_name);
|
|
CREATE INDEX idx_performance_metrics_timestamp ON performance_metrics(timestamp);
|
|
```
|
|
|
|
### Usage Analytics Table
|
|
**Purpose**: Track user usage patterns
|
|
|
|
```sql
|
|
CREATE TABLE usage_analytics (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
user_id TEXT NOT NULL,
|
|
action_type TEXT NOT NULL,
|
|
action_details JSONB,
|
|
ip_address INET,
|
|
user_agent TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Columns**:
|
|
- `id` - Unique analytics identifier
|
|
- `user_id` - User who performed action
|
|
- `action_type` - Type of action performed
|
|
- `action_details` - Action details (JSONB)
|
|
- `ip_address` - User IP address
|
|
- `user_agent` - User agent string
|
|
- `created_at` - Action timestamp
|
|
|
|
**Indexes**:
|
|
```sql
|
|
CREATE INDEX idx_usage_analytics_user_id ON usage_analytics(user_id);
|
|
CREATE INDEX idx_usage_analytics_action_type ON usage_analytics(action_type);
|
|
CREATE INDEX idx_usage_analytics_created_at ON usage_analytics(created_at);
|
|
```
|
|
|
|
---
|
|
|
|
## 🔗 Table Relationships
|
|
|
|
### Primary Relationships
|
|
```mermaid
|
|
erDiagram
|
|
users ||--o{ documents : "owns"
|
|
documents ||--o{ processing_jobs : "has"
|
|
documents ||--o{ agentic_rag_sessions : "has"
|
|
agentic_rag_sessions ||--o{ agent_executions : "contains"
|
|
agentic_rag_sessions ||--o{ quality_metrics : "has"
|
|
documents ||--o{ document_chunks : "contains"
|
|
users ||--o{ search_analytics : "performs"
|
|
users ||--o{ usage_analytics : "generates"
|
|
```
|
|
|
|
### Foreign Key Constraints
|
|
```sql
|
|
-- Documents table constraints
|
|
ALTER TABLE documents ADD CONSTRAINT fk_documents_user_id
|
|
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;
|
|
|
|
-- Processing jobs table constraints
|
|
ALTER TABLE processing_jobs ADD CONSTRAINT fk_processing_jobs_document_id
|
|
FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE;
|
|
|
|
-- Agentic RAG sessions table constraints
|
|
ALTER TABLE agentic_rag_sessions ADD CONSTRAINT fk_agentic_rag_sessions_document_id
|
|
FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE;
|
|
|
|
-- Agent executions table constraints
|
|
ALTER TABLE agent_executions ADD CONSTRAINT fk_agent_executions_session_id
|
|
FOREIGN KEY (session_id) REFERENCES agentic_rag_sessions(id) ON DELETE CASCADE;
|
|
|
|
-- Quality metrics table constraints
|
|
ALTER TABLE quality_metrics ADD CONSTRAINT fk_quality_metrics_session_id
|
|
FOREIGN KEY (session_id) REFERENCES agentic_rag_sessions(id) ON DELETE CASCADE;
|
|
|
|
-- Document chunks table constraints
|
|
ALTER TABLE document_chunks ADD CONSTRAINT fk_document_chunks_document_id
|
|
FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE;
|
|
```
|
|
|
|
---
|
|
|
|
## 🔐 Row Level Security (RLS)
|
|
|
|
### Documents Table RLS
|
|
```sql
|
|
-- Enable RLS
|
|
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
|
|
|
|
-- Policy: Users can only access their own documents
|
|
CREATE POLICY "Users can view own documents" ON documents
|
|
FOR SELECT USING (auth.uid()::text = user_id);
|
|
|
|
CREATE POLICY "Users can insert own documents" ON documents
|
|
FOR INSERT WITH CHECK (auth.uid()::text = user_id);
|
|
|
|
CREATE POLICY "Users can update own documents" ON documents
|
|
FOR UPDATE USING (auth.uid()::text = user_id);
|
|
|
|
CREATE POLICY "Users can delete own documents" ON documents
|
|
FOR DELETE USING (auth.uid()::text = user_id);
|
|
```
|
|
|
|
### Processing Jobs Table RLS
|
|
```sql
|
|
-- Enable RLS
|
|
ALTER TABLE processing_jobs ENABLE ROW LEVEL SECURITY;
|
|
|
|
-- Policy: Users can only access their own jobs
|
|
CREATE POLICY "Users can view own jobs" ON processing_jobs
|
|
FOR SELECT USING (auth.uid()::text = user_id);
|
|
|
|
CREATE POLICY "Users can insert own jobs" ON processing_jobs
|
|
FOR INSERT WITH CHECK (auth.uid()::text = user_id);
|
|
|
|
CREATE POLICY "Users can update own jobs" ON processing_jobs
|
|
FOR UPDATE USING (auth.uid()::text = user_id);
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Data Types and Constraints
|
|
|
|
### Status Enums
|
|
```sql
|
|
-- Document status enum
|
|
CREATE TYPE document_status AS ENUM (
|
|
'uploaded',
|
|
'processing',
|
|
'completed',
|
|
'failed',
|
|
'cancelled'
|
|
);
|
|
|
|
-- Job status enum
|
|
CREATE TYPE job_status AS ENUM (
|
|
'pending',
|
|
'running',
|
|
'completed',
|
|
'failed',
|
|
'cancelled'
|
|
);
|
|
|
|
-- Session status enum
|
|
CREATE TYPE session_status AS ENUM (
|
|
'pending',
|
|
'processing',
|
|
'completed',
|
|
'failed',
|
|
'cancelled'
|
|
);
|
|
```
|
|
|
|
### Check Constraints
|
|
```sql
|
|
-- File size constraint
|
|
ALTER TABLE documents ADD CONSTRAINT check_file_size
|
|
CHECK (file_size > 0 AND file_size <= 104857600);
|
|
|
|
-- Processing time constraint
|
|
ALTER TABLE agentic_rag_sessions ADD CONSTRAINT check_processing_time
|
|
CHECK (processing_time_ms >= 0);
|
|
|
|
-- Validation score constraint
|
|
ALTER TABLE quality_metrics ADD CONSTRAINT check_validation_score
|
|
CHECK (metric_value >= 0 AND metric_value <= 1);
|
|
```
|
|
|
|
---
|
|
|
|
## 🔄 Migration Scripts
|
|
|
|
### Initial Schema Migration
|
|
```sql
|
|
-- Migration: 001_create_initial_schema.sql
|
|
BEGIN;
|
|
|
|
-- Create users table
|
|
CREATE TABLE users (
|
|
id TEXT PRIMARY KEY,
|
|
name TEXT,
|
|
email TEXT UNIQUE NOT NULL,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Create documents table
|
|
CREATE TABLE documents (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
user_id TEXT NOT NULL,
|
|
original_file_name TEXT NOT NULL,
|
|
file_path TEXT NOT NULL,
|
|
file_size INTEGER NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'uploaded',
|
|
extracted_text TEXT,
|
|
generated_summary TEXT,
|
|
summary_pdf_path TEXT,
|
|
analysis_data JSONB,
|
|
error_message TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Create indexes
|
|
CREATE INDEX idx_documents_user_id ON documents(user_id);
|
|
CREATE INDEX idx_documents_status ON documents(status);
|
|
CREATE INDEX idx_documents_created_at ON documents(created_at);
|
|
|
|
-- Enable RLS
|
|
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
|
|
|
|
COMMIT;
|
|
```
|
|
|
|
### Add Vector Support Migration
|
|
```sql
|
|
-- Migration: 002_add_vector_support.sql
|
|
BEGIN;
|
|
|
|
-- Enable vector extension
|
|
CREATE EXTENSION IF NOT EXISTS vector;
|
|
|
|
-- Create document chunks table
|
|
CREATE TABLE document_chunks (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
|
|
chunk_index INTEGER NOT NULL,
|
|
content TEXT NOT NULL,
|
|
embedding VECTOR(1536),
|
|
metadata JSONB,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
-- Create vector indexes
|
|
CREATE INDEX idx_document_chunks_document_id ON document_chunks(document_id);
|
|
CREATE INDEX idx_document_chunks_embedding ON document_chunks USING ivfflat (embedding vector_cosine_ops);
|
|
|
|
COMMIT;
|
|
```
|
|
|
|
---
|
|
|
|
## 📈 Performance Optimization
|
|
|
|
### Query Optimization
|
|
```sql
|
|
-- Optimize document queries with composite indexes
|
|
CREATE INDEX idx_documents_user_status ON documents(user_id, status);
|
|
CREATE INDEX idx_documents_user_created ON documents(user_id, created_at DESC);
|
|
|
|
-- Optimize processing job queries
|
|
CREATE INDEX idx_processing_jobs_user_status ON processing_jobs(user_id, status);
|
|
CREATE INDEX idx_processing_jobs_priority_status ON processing_jobs(priority DESC, status);
|
|
|
|
-- Optimize analytics queries
|
|
CREATE INDEX idx_usage_analytics_user_action ON usage_analytics(user_id, action_type);
|
|
CREATE INDEX idx_performance_metrics_name_time ON performance_metrics(metric_name, timestamp DESC);
|
|
```
|
|
|
|
### Partitioning Strategy
|
|
```sql
|
|
-- Partition documents table by creation date
|
|
CREATE TABLE documents_2024 PARTITION OF documents
|
|
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
|
|
|
|
CREATE TABLE documents_2025 PARTITION OF documents
|
|
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
|
|
```
|
|
|
|
---
|
|
|
|
## 🔍 Monitoring and Maintenance
|
|
|
|
### Database Health Queries
|
|
```sql
|
|
-- Check table sizes
|
|
SELECT
|
|
schemaname,
|
|
tablename,
|
|
attname,
|
|
n_distinct,
|
|
correlation
|
|
FROM pg_stats
|
|
WHERE tablename = 'documents';
|
|
|
|
-- Check index usage
|
|
SELECT
|
|
schemaname,
|
|
tablename,
|
|
indexname,
|
|
idx_scan,
|
|
idx_tup_read,
|
|
idx_tup_fetch
|
|
FROM pg_stat_user_indexes
|
|
WHERE tablename = 'documents';
|
|
|
|
-- Check slow queries
|
|
SELECT
|
|
query,
|
|
calls,
|
|
total_time,
|
|
mean_time,
|
|
rows
|
|
FROM pg_stat_statements
|
|
WHERE query LIKE '%documents%'
|
|
ORDER BY mean_time DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
### Maintenance Procedures
|
|
```sql
|
|
-- Vacuum and analyze tables
|
|
VACUUM ANALYZE documents;
|
|
VACUUM ANALYZE processing_jobs;
|
|
VACUUM ANALYZE agentic_rag_sessions;
|
|
|
|
-- Update statistics
|
|
ANALYZE documents;
|
|
ANALYZE processing_jobs;
|
|
ANALYZE agentic_rag_sessions;
|
|
```
|
|
|
|
---
|
|
|
|
This comprehensive database schema documentation provides complete information about the database structure, relationships, and optimization strategies for the CIM Document Processor. |