## What was done: ✅ Fixed Firebase Admin initialization to use default credentials for Firebase Functions ✅ Updated frontend to use correct Firebase Functions URL (was using Cloud Run URL) ✅ Added comprehensive debugging to authentication middleware ✅ Added debugging to file upload middleware and CORS handling ✅ Added debug buttons to frontend for troubleshooting authentication ✅ Enhanced error handling and logging throughout the stack ## Current issues: ❌ Document upload still returns 400 Bad Request despite authentication working ❌ GET requests work fine (200 OK) but POST upload requests fail ❌ Frontend authentication is working correctly (valid JWT tokens) ❌ Backend authentication middleware is working (rejects invalid tokens) ❌ CORS is configured correctly and allowing requests ## Root cause analysis: - Authentication is NOT the issue (tokens are valid, GET requests work) - The problem appears to be in the file upload handling or multer configuration - Request reaches the server but fails during upload processing - Need to identify exactly where in the upload pipeline the failure occurs ## TODO next steps: 1. 🔍 Check Firebase Functions logs after next upload attempt to see debugging output 2. 🔍 Verify if request reaches upload middleware (look for '�� Upload middleware called' logs) 3. 🔍 Check if file validation is triggered (look for '🔍 File filter called' logs) 4. 🔍 Identify specific error in upload pipeline (multer, file processing, etc.) 5. 🔍 Test with smaller file or different file type to isolate issue 6. 🔍 Check if issue is with Firebase Functions file size limits or timeout 7. 🔍 Verify multer configuration and file handling in Firebase Functions environment ## Technical details: - Frontend: https://cim-summarizer.web.app - Backend: https://us-central1-cim-summarizer.cloudfunctions.net/api - Authentication: Firebase Auth with JWT tokens (working correctly) - File upload: Multer with memory storage for immediate GCS upload - Debug buttons available in production frontend for troubleshooting
9.3 KiB
9.3 KiB
Google Cloud Storage Integration
This document describes the Google Cloud Storage (GCS) integration implementation for the CIM Document Processor backend.
Overview
The GCS integration replaces the previous local file storage system with a cloud-only approach using Google Cloud Storage. This provides:
- Scalability: No local storage limitations
- Reliability: Google's infrastructure with 99.9%+ availability
- Security: IAM-based access control and encryption
- Cost-effectiveness: Pay only for what you use
- Global access: Files accessible from anywhere
Configuration
Environment Variables
The following environment variables are required for GCS integration:
# Google Cloud Configuration
GCLOUD_PROJECT_ID=your-project-id
GCS_BUCKET_NAME=your-bucket-name
GOOGLE_APPLICATION_CREDENTIALS=./serviceAccountKey.json
Service Account Setup
- Create a service account in Google Cloud Console
- Grant the following roles:
Storage Object Admin(for full bucket access)Storage Object Viewer(for read-only access if needed)
- Download the JSON key file as
serviceAccountKey.json - Place it in the
backend/directory
Bucket Configuration
- Create a GCS bucket in your Google Cloud project
- Configure bucket settings:
- Location: Choose a region close to your users
- Storage class: Standard (for frequently accessed files)
- Access control: Uniform bucket-level access (recommended)
- Public access: Prevent public access (files are private by default)
Implementation Details
File Storage Service
The FileStorageService class provides the following operations:
Core Operations
- Upload:
storeFile(file, userId)- Upload files to GCS with metadata - Download:
getFile(filePath)- Download files from GCS - Delete:
deleteFile(filePath)- Delete files from GCS - Exists:
fileExists(filePath)- Check if file exists - Info:
getFileInfo(filePath)- Get file metadata and info
Advanced Operations
- List:
listFiles(prefix, maxResults)- List files with prefix filtering - Copy:
copyFile(sourcePath, destinationPath)- Copy files within GCS - Move:
moveFile(sourcePath, destinationPath)- Move files within GCS - Signed URLs:
generateSignedUrl(filePath, expirationMinutes)- Generate temporary access URLs - Cleanup:
cleanupOldFiles(prefix, daysOld)- Remove old files - Stats:
getStorageStats(prefix)- Get storage statistics
Error Handling & Retry Logic
- Exponential backoff: Retries with increasing delays (1s, 2s, 4s)
- Configurable retries: Default 3 attempts per operation
- Comprehensive logging: All operations logged with context
- Graceful failures: Operations return null/false on failure instead of throwing
File Organization
Files are organized in GCS using the following structure:
bucket-name/
├── uploads/
│ ├── user-id-1/
│ │ ├── timestamp-filename1.pdf
│ │ └── timestamp-filename2.pdf
│ └── user-id-2/
│ └── timestamp-filename3.pdf
└── processed/
├── user-id-1/
│ └── processed-files/
└── user-id-2/
└── processed-files/
File Metadata
Each uploaded file includes metadata:
{
"originalName": "document.pdf",
"userId": "user-123",
"uploadedAt": "2024-01-15T10:30:00Z",
"size": "1048576"
}
Usage Examples
Basic File Operations
import { fileStorageService } from '../services/fileStorageService';
// Upload a file
const uploadResult = await fileStorageService.storeFile(file, userId);
if (uploadResult.success) {
console.log('File uploaded:', uploadResult.fileInfo);
}
// Download a file
const fileBuffer = await fileStorageService.getFile(gcsPath);
if (fileBuffer) {
// Process the file buffer
}
// Delete a file
const deleted = await fileStorageService.deleteFile(gcsPath);
if (deleted) {
console.log('File deleted successfully');
}
Advanced Operations
// List user's files
const userFiles = await fileStorageService.listFiles(`uploads/${userId}/`);
// Generate signed URL for temporary access
const signedUrl = await fileStorageService.generateSignedUrl(gcsPath, 60);
// Copy file to processed directory
await fileStorageService.copyFile(
`uploads/${userId}/original.pdf`,
`processed/${userId}/processed.pdf`
);
// Get storage statistics
const stats = await fileStorageService.getStorageStats(`uploads/${userId}/`);
console.log(`User has ${stats.totalFiles} files, ${stats.totalSize} bytes total`);
Testing
Running Integration Tests
# Test GCS integration
npm run test:gcs
The test script performs the following operations:
- Connection Test: Verifies GCS bucket access
- Upload Test: Uploads a test file
- Existence Check: Verifies file exists
- Metadata Retrieval: Gets file information
- Download Test: Downloads and verifies content
- Signed URL: Generates temporary access URL
- Copy/Move: Tests file operations
- Listing: Lists files in directory
- Statistics: Gets storage stats
- Cleanup: Removes test files
Manual Testing
// Test connection
const connected = await fileStorageService.testConnection();
console.log('GCS connected:', connected);
// Test with a real file
const mockFile = {
originalname: 'test.pdf',
filename: 'test.pdf',
path: '/path/to/local/file.pdf',
size: 1024,
mimetype: 'application/pdf'
};
const result = await fileStorageService.storeFile(mockFile, 'test-user');
Security Considerations
Access Control
- Service Account: Uses least-privilege service account
- Bucket Permissions: Files are private by default
- Signed URLs: Temporary access for specific files
- User Isolation: Files organized by user ID
Data Protection
- Encryption: GCS provides encryption at rest and in transit
- Metadata: Sensitive information stored in metadata
- Cleanup: Automatic cleanup of old files
- Audit Logging: All operations logged for audit
Performance Optimization
Upload Optimization
- Resumable Uploads: Large files can be resumed if interrupted
- Parallel Uploads: Multiple files can be uploaded simultaneously
- Chunked Uploads: Large files uploaded in chunks
Download Optimization
- Streaming: Files can be streamed instead of loaded entirely into memory
- Caching: Consider implementing client-side caching
- CDN: Use Cloud CDN for frequently accessed files
Monitoring and Logging
Log Levels
- INFO: Successful operations
- WARN: Retry attempts and non-critical issues
- ERROR: Failed operations and critical issues
Metrics to Monitor
- Upload Success Rate: Percentage of successful uploads
- Download Latency: Time to download files
- Storage Usage: Total storage and file count
- Error Rates: Failed operations by type
Troubleshooting
Common Issues
-
Authentication Errors
- Verify service account key file exists
- Check service account permissions
- Ensure project ID is correct
-
Bucket Access Errors
- Verify bucket exists
- Check bucket permissions
- Ensure bucket name is correct
-
Upload Failures
- Check file size limits
- Verify network connectivity
- Review error logs for specific issues
-
Download Failures
- Verify file exists in GCS
- Check file permissions
- Review network connectivity
Debug Commands
# Test GCS connection
npm run test:gcs
# Check environment variables
echo $GCLOUD_PROJECT_ID
echo $GCS_BUCKET_NAME
# Verify service account
gcloud auth activate-service-account --key-file=serviceAccountKey.json
Migration from Local Storage
Migration Steps
- Backup: Ensure all local files are backed up
- Upload: Upload existing files to GCS
- Update Paths: Update database records with GCS paths
- Test: Verify all operations work with GCS
- Cleanup: Remove local files after verification
Migration Script
// Example migration script
async function migrateToGCS() {
const localFiles = await getLocalFiles();
for (const file of localFiles) {
const uploadResult = await fileStorageService.storeFile(file, file.userId);
if (uploadResult.success) {
await updateDatabaseRecord(file.id, uploadResult.fileInfo);
}
}
}
Cost Optimization
Storage Classes
- Standard: For frequently accessed files
- Nearline: For files accessed less than once per month
- Coldline: For files accessed less than once per quarter
- Archive: For long-term storage
Lifecycle Management
- Automatic Cleanup: Remove old files automatically
- Storage Class Transitions: Move files to cheaper storage classes
- Compression: Compress files before upload
Future Enhancements
Planned Features
- Multi-region Support: Distribute files across regions
- Versioning: File version control
- Backup: Automated backup to secondary bucket
- Analytics: Detailed usage analytics
- Webhooks: Notifications for file events
Integration Opportunities
- Cloud Functions: Process files on upload
- Cloud Run: Serverless file processing
- BigQuery: Analytics on file metadata
- Cloud Logging: Centralized logging
- Cloud Monitoring: Performance monitoring