A Retrieval Augmented Generation (RAG) system hosted at https://mysecondbrain.info that leverages Weaviate as its vector database to efficiently retrieve answers from uploaded documents in various formats.
MySecondBrain.info is inspired by the concept of a "second brain" - the idea of offloading information storage to an external system while maintaining easy retrieval of that information when needed. This project aims to create a fun and effective way to store, process, and retrieve information from various document formats using modern AI techniques.
Note: This project is a work in progress, with additional features and improvements planned for the roadmap. Future updates will include enhanced document processing, more sophisticated querying capabilities, and improved UI/UX.
-
Document Ingestion & Embedding Generation
- Supported Formats: PDF, DOCX, JSON, TXT, HTML
- Dual processing pipeline (Python script or native JS parsers)
- Automated fallback mechanism for resilient document processing
- Storage of embeddings in Weaviate
-
Question-Answer API
- Query documents and get relevant answers
- Support for both semantic search and structured queries (for JSON documents)
-
JSON Data RAG Extension
- Support for structured queries on JSON data
- Aggregation operations (max, min, sum, avg)
- Backend: Node.js, Express.js
- Database: MySQL 8 (for relational data), Weaviate (for vector storage)
- Storage: AWS S3 (for document storage)
- AI: OpenAI API (for embeddings and completions)
- Document Processing:
- Python script for primary processing
- Native JS parsers as backup (PDF.js, Mammoth for DOCX)
- Factory pattern for extensible document parsing
mysecondbrain.info-BE/
├── config/ # Configuration files
├── cron/ # Cron jobs
├── databases/ # Database connections and schemas
│ ├── mysql8/ # MySQL database
│ ├── redis/ # Redis connection for caching and rate limiting
│ └── weaviate/ # Weaviate vector database
├── doc-store/ # Temporary document storage
├── docs/ # System documentation
├── middlewares/ # Express middlewares
├── routes/ # API routes
│ └── controllers/ # Route controllers
├── scripts/ # Python scripts for document processing
├── services/ # Business logic
│ ├── document.parser.factory.js # Factory for document parsers
│ ├── document.processor.service.js # Document processing orchestration
│ ├── json.parser.service.js # JSON specific parsing
│ ├── pdf.service.js # PDF specific parsing
│ ├── vectorization.service.js # Vector creation and storage
│ └── weaviate.service.js # Vector database interactions
└── utils/ # Utility functions
API specifications are available via Swagger UI at: http://localhost:3500/api-docs
The JSON collection for the API documentation is accessible at: http://localhost:3500/api-docs.json
POST /api/v1/auth/signup- Create a new user accountPOST /api/v1/auth/login- Login to an existing accountGET /api/v1/auth/logout- Logout from the current session
POST /api/v1/documents/upload- Upload a new documentPOST /api/v1/documents/update/:documentId- Update an existing documentGET /api/v1/documents/list- List all documents for the current userGET /api/v1/documents/download/:documentId- Get a download URL for a documentGET /api/v1/documents/status/:documentId- Check the processing status of a documentDELETE /api/v1/documents/delete/:documentId- Delete a document
POST /api/v1/chats- Create a new chatGET /api/v1/chats/:chatId- Get a specific chatGET /api/v1/chats- List all chats for the current userDELETE /api/v1/chats/:chatId- Delete a chatPOST /api/v1/chats/query- Query documents and get an answerPOST /api/v1/chats/structured-query- Perform structured query on JSON documents
MySecondBrain uses a dual-pipeline approach for document processing:
-
Primary Pipeline: Python-based processor for PDF, TXT, and JSON files
- Handles chunking, metadata extraction, and structure preservation
- Outputs standardized JSON format for vectorization
-
Secondary Pipeline: Native JavaScript parsers
- Factory pattern implementation with
DocumentParserFactory - Support for PDF, DOCX, TXT, JSON, and HTML formats
- Automatic fallback if Python processor fails
- Factory pattern implementation with
The system can be configured to prefer either pipeline through the documentProcessorService.setUseNativeParser() method.
To add support for a new document type:
- Implement a parser method in
document.parser.factory.js - Update the
isSupportedmethod to include the new file type - Add a case in the
parseDocumentswitch statement
The Weaviate service has been refactored to improve maintainability and testability. The original monolithic service has been split into several specialized modules with clear responsibilities:
weaviate-client.js- HTTP client for Weaviate APIweaviate-schema.js- Schema and tenant managementweaviate-query.js- GraphQL query builderweaviate.service.js- Business logic for vector operations
- Improved Separation of Concerns: Each module now has a clear, focused responsibility
- Better Testability: Modules can be tested independently
- More Robust Error Handling: Consistent error handling patterns across modules
- Easier Maintenance: Smaller, more focused files are easier to understand and modify
- No Library Dependency: Direct HTTP calls instead of relying on the problematic client library
This module provides a low-level HTTP client for communicating with the Weaviate API. It handles:
- HTTP requests to the Weaviate API
- Authentication and headers
- Basic error handling
- Response formatting
This module handles schema and tenant management:
- Class definitions for Document and JsonDocument
- Class initialization and updates
- Tenant creation and validation
This module builds GraphQL queries for different operations:
- Similarity search queries
- Structured queries for numeric operations
- Group by queries
- Delete mutations
This is the main business logic layer that uses the other modules:
- Vector storage and retrieval
- JSON document processing
- Similarity search
- Structured queries
- Data deletion
- Node.js (lts/hydrogen -> v18.20.7)
- MySQL 8
- Weaviate [docker container deployed locally]
- AWS S3 bucket
- OpenAI API key
- Python 3.8+ (for document processing script)
Create a .env.development file with the variables specified in the example:
cp .env.development.example .env.development
# Edit the file with your actual configurationFor a complete list of all environment variables and their configurations, see Environment Variables Documentation.
-
Clone the repository:
git clone https://github.com/aquib-J/mysecondbrain.info-BE.git cd mysecondbrain.info-BE -
Install dependencies:
npm install -
Set up the database:
# Run the SQL queries in db-schema.queries.sql -
Set up the Python script:
# Make the Python script executable chmod +x scripts/pdf_processor.py # Install Python dependencies pip install -r scripts/requirements.txt -
Start the development server:
npm run start:dev
For a faster and more consistent setup, use Docker:
- Ensure Docker and Docker Compose are installed on your system
- Set up environment variables:
cp .env.development.example .env.development # Edit the file with your actual configuration - Start the application stack:
docker compose up -d
- To check the application logs:
docker compose logs -f api
- To stop all services:
docker compose down
For detailed instructions on deployment, scaling, and maintenance using Docker, see our comprehensive Deployment Guide.
For production deployment with HTTPS support, we've integrated Certbot with Nginx to provide automatic SSL certificate generation and renewal.
- A registered domain name pointing to your server IP address
- Ports 80 and 443 accessible from the internet
- Docker and Docker Compose installed
You can customize the SSL setup using these environment variables:
DOMAIN: Your domain name (default: api.mysecondbrain.info)EMAIL: Email for Let's Encrypt registration (default: aquib.jansher@gmail.com)STAGING: Set to 1 for testing (avoid rate limits) or 0 for production
-
Initialize SSL certificates:
# Optional: Set custom domain and email export DOMAIN=your-domain.com export EMAIL=your-email@example.com # Run the initialization script ./nginx/init-letsencrypt.sh
This script will:
- Create temporary self-signed certificates
- Start Nginx with these certificates
- Use Certbot to request proper Let's Encrypt certificates
- Reload Nginx to use the new certificates
- Create a
.env.sslfile for future use
-
Start the application with SSL:
# Using the environment file created during initialization docker compose --env-file .env.ssl -f docker-compose.production.yml up -d -
Certificate renewal:
Certificates are automatically renewed by the Certbot container every 12 hours. For manual renewal:
./nginx/renew-certs.sh
The SSL implementation consists of:
-
Nginx Container:
- Terminates TLS connections
- Serves as a reverse proxy to the API
- Handles HTTP to HTTPS redirection
- Exposes paths needed for certificate validation
-
Certbot Container:
- Obtains and renews SSL certificates
- Uses the webroot plugin for domain validation
- Stores certificates in a Docker volume shared with Nginx
-
Configuration Files:
nginx/templates/nginx.conf.template: Template for Nginx configuration with environment variable substitutionnginx/init-letsencrypt.sh: Script for initial certificate setupnginx/renew-certs.sh: Script for manual certificate renewal
Our SSL implementation includes:
- TLS 1.2/1.3 only (older protocols disabled)
- Strong cipher suite configuration
- HTTP Strict Transport Security (HSTS)
- OCSP stapling for certificate validation
- Modern security headers (X-Frame-Options, Content-Security-Policy, etc.)
- Automatic redirection from HTTP to HTTPS
If you're setting up or managing the system, start with:
- Review the environment variables documentation to ensure proper configuration
- Set up log archiving for production deployments to ensure data retention
| Document | Description |
|---|---|
| Environment Variables | Complete list of all environment variables used by the system, their defaults, and which ones are required. |
| Log Archiving | Detailed guide on the log archiving system, including AWS S3 configuration, scheduling, and monitoring. |
This is the backend API for MySecondBrain.info, a comprehensive knowledge management and note-taking application.
- User authentication with JWT
- Redis-based email queue system
- Amazon RDS MySQL database integration
- Weaviate vector database for semantic search
- Automated SSL certificate management with Let's Encrypt
- GitHub Actions-based CI/CD pipeline
The application includes a robust, Redis-based email queue system designed to handle email sending in a non-blocking, fault-tolerant manner.
- Non-blocking operation: Email sending happens asynchronously via a queue
- Automatic retries: Failed emails are automatically retried with exponential backoff
- Dead letter queue: Persistently failed emails are stored for inspection and manual retry
- Admin API endpoints: Monitor and manage the email queue via admin endpoints
- Email service health monitoring: Status checks and diagnostics available through scripts
The email queue system consists of three primary queues:
- Main Queue (
email:queue): New email jobs are added here for processing - Processing Queue (
email:processing:*): Temporary storage for emails being processed - Dead Letter Queue (
email:deadletter): Storage for emails that failed after multiple retry attempts
The following endpoints are available for queue management (admin only):
GET /api/v1/admin/email/queue/stats- Get queue statisticsGET /api/v1/admin/email/queue/dead- List failed emailsPOST /api/v1/admin/email/queue/retry/:jobId- Retry a specific failed emailDELETE /api/v1/admin/email/queue/dead- Clear the dead letter queue
Testing scripts are available in the scripts directory:
scripts/test-email-service.js- Send a direct test email using the email servicescripts/test-email-queue.js- Test the email queue system with sample emailsscripts/email-queue-status.js- Comprehensive tool for queue monitoring and management
# Test direct email sending
node scripts/test-email-service.js production
# Test the email queue system with sample emails
node scripts/test-email-queue.js production
# Check email queue status
node scripts/email-queue-status.js production status
# Get detailed debugging info about the email queue
node scripts/email-queue-status.js production debug
# Retry all failed emails
node scripts/email-queue-status.js production retry-all
# Clear the dead letter queue
node scripts/email-queue-status.js production clearThe application supports multiple environments through .env.* files:
.env.development- Development environment settings.env.test- Test environment settings.env.production- Production environment settings
Key environment variables for proper operation include:
# Core settings
PORT=3000
NODE_ENV=production || development
SERVICE_NAME=mysecondbrain-api
LOG_LEVEL=info
ENABLE_CRON_JOBS=true
# Database settings (Amazon RDS)
DB_HOST=your-rds-instance.region.rds.amazonaws.com
DB_PORT=3306
DB_NAME=mysecondbrain
DB_USER=dbuser
DB_PASSWORD=dbpassword
# Redis settings
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_PASSWORD=
# Email settings
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your-email@gmail.com
SMTP_PASS=your-app-password
EMAIL_FROM=no-reply@mysecondbrain.info
SEND_EMAILS=true
# Authentication
JWT_SECRET=your-jwt-secret
JWT_EXPIRY=3600
ADMIN_PASS=admin-password-hash
# Domain configuration
DOMAIN=api.mysecondbrain.info
The application is deployed using Docker Compose and a GitHub Actions workflow. The deployment process includes:
- Amazon RDS connection testing
- Redis connection verification
- SSL certificate management with Let's Encrypt
- Docker container deployment and orchestration
- Node.js Express application
- Redis for caching and queue management
- Weaviate vector database
- Nginx for reverse proxy and SSL termination
- Node.js 18+ (lts/hydrogen)
- Docker and Docker Compose
- MySQL (local development) or Amazon RDS connection
- Clone the repository
- Create appropriate
.env.*files - Install dependencies:
npm install - Start development server:
npm run dev
- Run unit tests:
npm test - Test email service:
node scripts/test-email-service.js development - Test email queue:
node scripts/test-email-queue.js development
When adding new features to the system, please update the main README.md file and add detailed documentation here as needed.
Documentation should be written in Markdown format and follow these guidelines:
- Use clear, concise language
- Include code examples when relevant
- Provide troubleshooting tips where appropriate
- Link to related documentation when applicable
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.