Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,15 @@ CELERY_RESULT_BACKEND=redis://:valkey@valkey:6379/1
# API_KEY_CACHE_TTL=300

## =============================================================================
## KUZU GRAPH DATABASE CONFIGURATION
## GRAPH DATABASE CONFIGURATION (KUZU AND NEO4J)
## =============================================================================

## Graph Backend Selection
## Options: kuzu (default), neo4j
# BACKEND_TYPE=kuzu

## Kuzu API URL
KUZU_API_URL=http://kuzu:8001
KUZU_API_URL=http://kuzu-api:8001

## User Graph Limits
USER_GRAPHS_DEFAULT_LIMIT=100
Expand Down Expand Up @@ -170,6 +174,15 @@ KUZU_MAX_DATABASES_PER_NODE=50
# INSTANCE_ID=unknown
# CLUSTER_TIER=standard

## Neo4j Backend Configuration
NEO4J_URI=bolt://neo4j-db:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neo4jpassword # Retrieved from AWS Secrets Manager in prod/staging
NEO4J_ENTERPRISE=false # Enable multi-database support (requires Enterprise license)
# NEO4J_MAX_CONNECTION_POOL_SIZE=50
# NEO4J_CONNECTION_ACQUISITION_TIMEOUT=60
# NEO4J_MAX_CONNECTION_LIFETIME=3600

## =============================================================================
## PERFORMANCE AND SCALING
## =============================================================================
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/deploy-kuzu-infra.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ on:
required: false
type: string
default: ""
kuzu_api_rotation_code_key:
graph_api_rotation_code_key:
description: "S3 key for Kuzu API Rotation Lambda deployment package"
required: false
type: string
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/deploy-kuzu-volumes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ on:
default: ""

# Secrets Configuration
kuzu_api_secret_arn:
graph_api_secret_arn:
description: "ARN of the Kuzu API secret for Lambda authentication"
required: false
type: string
Expand Down Expand Up @@ -164,9 +164,9 @@ jobs:
fi

# Add Kuzu API secret if provided
if [ -n "${{ inputs.kuzu_api_secret_arn }}" ]; then
if [ -n "${{ inputs.graph_api_secret_arn }}" ]; then
STACK_PARAMS="$STACK_PARAMS \
ParameterKey=KuzuSecretArn,ParameterValue=${{ inputs.kuzu_api_secret_arn }}"
ParameterKey=KuzuSecretArn,ParameterValue=${{ inputs.graph_api_secret_arn }}"
fi

# Deploy or update the stack
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ jobs:
# Lambda Configuration
lambda_code_bucket: robosystems-${{ vars.ENVIRONMENT_PROD || 'prod' }}-deployment
# Secrets Configuration
kuzu_api_secret_arn: ${{ needs.deploy-kuzu-infra.outputs.secret_arn }}
graph_api_secret_arn: ${{ needs.deploy-kuzu-infra.outputs.secret_arn }}
secrets:
ACTIONS_TOKEN: ${{ secrets.ACTIONS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/staging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ jobs:
# Lambda Configuration
lambda_code_bucket: robosystems-${{ vars.ENVIRONMENT_STAGING || 'staging' }}-deployment
# Secrets Configuration
kuzu_api_secret_arn: ${{ needs.deploy-kuzu-infra.outputs.secret_arn }}
graph_api_secret_arn: ${{ needs.deploy-kuzu-infra.outputs.secret_arn }}
secrets:
ACTIONS_TOKEN: ${{ secrets.ACTIONS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
Expand Down
6 changes: 5 additions & 1 deletion .vscode/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,8 @@
"pg",
"valkey",
"kuzu",
"neo4j",
"premium",
"localstack",
"api",
"worker",
Expand All @@ -288,7 +290,9 @@
"api",
"worker",
"beat",
"kuzu",
"kuzu-api",
"neo4j-db",
"neo4j-api",
"pg-iam",
"valkey",
"grafana",
Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ robosystems/
├── adapters/ # External service integrations
├── config/ # Centralized configuration
├── security/ # Security implementations
├── kuzu_api/ # Graph database API service
├── graph_api/ # Graph database API service
└── scripts/ # Utility and admin scripts
```

Expand Down
70 changes: 41 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

RoboSystems is an enterprise-grade financial knowledge graph platform that transforms complex financial data into actionable intelligence through graph-based analytics and AI-powered insights.

- **Graph-Based Financial Intelligence**: Leverages Kuzu graph database technology to model complex financial relationships, enabling deep analysis of relationships between accounting, financial reporting, portfolio management, and public XBRL data
- **Graph-Based Financial Intelligence**: Leverages graph database technology (Kuzu or Neo4j) to model complex financial relationships, enabling deep analysis of relationships between accounting, financial reporting, portfolio management, and public XBRL data
- **GraphRAG Architecture**: Knowledge graph-based retrieval-augmented generation for LLM-powered financial analysis over enterprise financial and operating data
- **Model Context Protocol (MCP)**: Standardized server and [client](https://www.npmjs.com/package/@robosystems/mcp) for LLM integration with natural language querying
- **Multi-Source Data Integration**: Seamlessly integrates QuickBooks accounting data, SEC XBRL filings (10-K, 10-Q), and custom financial datasets into a unified knowledge graph
Expand All @@ -13,7 +13,7 @@ RoboSystems is an enterprise-grade financial knowledge graph platform that trans

RoboSystems bridges the gap between raw financial data and actionable business intelligence by creating interconnected knowledge graphs that reveal hidden relationships, patterns, and insights that traditional databases miss. It's the backbone for next-generation financial applications that need to understand not just numbers, but the relationships and context behind them.

- **Multi-Tenant Graph Databases**: Create isolated Kuzu database instances with cluster-based scaling
- **Multi-Tenant Graph Databases**: Create isolated graph database instances (Kuzu or Neo4j) with cluster-based scaling
- **AI Agent Interface**: Natural language financial analysis through Claude powered agents via Model Context Protocol (MCP)
- **Entity Graph Creation**: Curated enterprise financial data schemas for defined use cases with RoboLedger, RoboInvestor and more
- **Generic Graph Creation**: Custom schema definitions with custom node/relationship types
Expand Down Expand Up @@ -41,7 +41,7 @@ just start

This initializes the `.env` file and starts the complete RoboSystems stack with:

- Kuzu graph database
- Graph database (Kuzu by default, Neo4j optional)
- PostgreSQL with automatic migrations
- Valkey message broker
- All development services
Expand Down Expand Up @@ -111,32 +111,42 @@ just logs-follow worker # CloudWatch log search
- **MCP Integration**: Model Context Protocol for AI-powered financial analytics
- **Celery Workers** with priority queues for asynchronous processing

### Kuzu Graph Database System
### Graph Database System

**Kuzu** is a high-performance embedded graph database that powers RoboSystems' financial knowledge graph platform. This system provides multi-tenant graph databases with enterprise-grade scaling and reliability.
RoboSystems supports **pluggable graph database backends** to provide flexibility and choice for different deployment scenarios:

- **Cluster-Based Infrastructure**: Tiered instances (Standard/Enterprise/Premium) for different workload requirements
- **Multi-Tenant Isolation**: Each entity gets a dedicated database (`kg12345abc`) with complete data isolation
- **Shared Repositories**: Common databases for SEC filings, industry benchmarks, and economic indicators
- **API-First Design**: All database access through REST APIs with no direct database connections
- **Schema-Driven Operations**: All graph operations derive from curated schemas (RoboLedger, RoboInvestor, and more)
#### Supported Backends

- **Kuzu** (Default): High-performance embedded graph database, ideal for Standard tier deployments
- **Neo4j Community**: Client-server architecture for Professional/Enterprise tiers with advanced features
- **Neo4j Enterprise**: Full enterprise features including multi-database support for Premium tier

#### Kuzu API System (`/robosystems/kuzu_api/`)
#### Graph API System (`/robosystems/graph_api/`)

The **Kuzu API** is a FastAPI microservice that runs alongside Kuzu databases on instances, providing:
The **Graph API** is a FastAPI microservice that provides a unified interface regardless of backend:

- **HTTP REST Interface**: High-performance API for all graph operations (port 8001)
- **Multi-Database Management**: Handles up to 10 databases per instance (Standard tier)
- **Connection Pooling**: Efficient resource management with max 3 connections per database
- **Backend Abstraction**: Consistent API whether using Kuzu or Neo4j
- **HTTP REST Interface**: High-performance API for all graph operations (port 8001 for Kuzu, 8002 for Neo4j)
- **Multi-Database Management**: Handles multiple databases per instance (backend-dependent)
- **Connection Pooling**: Efficient resource management with backend-optimized pooling
- **Async Ingestion**: Queue-based data loading with S3 integration
- **Streaming Support**: NDJSON streaming for large query results
- **Admission Control**: CPU/memory-based backpressure to prevent overload

#### Client-Factory System (`/robosystems/kuzu_api/client/`)
#### Infrastructure Design

- **Cluster-Based Infrastructure**: Tiered instances (Standard/Enterprise/Premium) for different workload requirements
- **Multi-Tenant Isolation**: Each entity gets a dedicated database (`kg12345abc`) with complete data isolation
- **Shared Repositories**: Common databases for SEC filings, industry benchmarks, and economic indicators
- **API-First Design**: All database access through REST APIs with no direct database connections
- **Schema-Driven Operations**: All graph operations derive from curated schemas (RoboLedger, RoboInvestor, and more)

#### Client-Factory System (`/robosystems/graph_api/client/`)

The client-factory layer provides intelligent routing between application code and Kuzu infrastructure:
The client-factory layer provides intelligent routing between application code and graph database infrastructure:

- **Automatic Discovery**: Finds database instances via DynamoDB registry
- **Backend-Agnostic**: Works seamlessly with both Kuzu and Neo4j backends
- **Automatic Discovery**: Finds database instances via DynamoDB registry (Kuzu) or direct connection (Neo4j)
- **Redis Caching**: Caches instance locations to reduce lookups
- **Circuit Breakers**: Prevents cascading failures with automatic recovery
- **Connection Reuse**: HTTP/2 connection pooling for efficiency
Expand Down Expand Up @@ -192,8 +202,8 @@ The client-factory layer provides intelligent routing between application code a

### Data Layer

- **Kuzu Graph Database**: Financial knowledge graph with cluster-based scaling
- **DynamoDB**: Kuzu database allocation registry, instance and volume management
- **Graph Database**: Pluggable backend (Kuzu or Neo4j) for financial knowledge graphs with cluster-based scaling
- **DynamoDB**: Database allocation registry, instance and volume management
- **PostgreSQL**: Primary relational database for identity and access management
- **Valkey**: Message broker and caching (separate DBs for queues, cache, progress tracking)
- **AWS S3**: Document storage and database synchronization
Expand All @@ -203,8 +213,8 @@ The client-factory layer provides intelligent routing between application code a
- **VPC**: AWS VPC with NAT Gateway, CloudTrail, and VPC Flow Logs
- **API**: ECS Fargate ARM64/Graviton with auto-scaling and WAF
- **Workers**: ECS Fargate ARM64/Graviton with auto-scaling
- **Kuzu Writers**: EC2 Graviton instances with DynamoDB registry and management lambdas
- **Kuzu Readers**: EC2 Graviton instances with load balancing for shared repositories
- **Graph Database Writers**: EC2 Graviton instances (Kuzu) or ECS containers (Neo4j) with DynamoDB registry and management lambdas
- **Graph Database Readers**: EC2 Graviton instances or ECS containers with load balancing for shared repositories
- **Database & Cache**: RDS PostgreSQL + ElastiCache Valkey instances
- **Observability**: Amazon Managed Prometheus + Grafana with AWS SSO
- **Self-Hosted CI/CD**: GitHub Actions runner on dedicated infrastructure
Expand All @@ -229,7 +239,7 @@ The client-factory layer provides intelligent routing between application code a
- **Multi-Agent Architecture**: Intelligent routing to specialized agents based on query context
- **Dynamic Agent Selection**: Automatic selection of the most appropriate agent for each task
- **Parallel Query Processing**: Batch processing of multiple queries simultaneously
- **Context-Aware Responses**: GraphRAG-enabled agents with native kuzu graph database integration
- **Context-Aware Responses**: GraphRAG-enabled agents with native graph database integration
- **Extensible Framework**: Support for custom agents with specific domain expertise

### Credit System
Expand Down Expand Up @@ -298,12 +308,13 @@ All infrastructure is managed through CloudFormation templates in `/cloudformati
- **`beat.yaml`**: Celery beat scheduler for periodic tasks and cron jobs
- **`worker-monitor.yaml`**: Lambda function for monitoring worker health and queue depths

#### Kuzu Graph Database
#### Graph Database Infrastructure

- **`kuzu-infra.yaml`**: Base infrastructure for Kuzu clusters (security groups, roles, registries)
- **`kuzu-infra.yaml`**: Base infrastructure for graph database clusters (security groups, roles, registries)
- **`kuzu-volumes.yaml`**: EBS volume management and snapshot automation
- **`kuzu-writers.yaml`**: Auto-scaling EC2 writer clusters with tiered instance types
- **`kuzu-writers.yaml`**: Auto-scaling EC2 writer clusters with tiered instance types (Kuzu backend)
- **`kuzu-shared-replicas.yaml`**: ECS Fargate read replicas for shared repositories (SEC)
- **`neo4j-*.yaml`**: Neo4j-specific infrastructure templates (when using Neo4j backend)

#### Observability

Expand Down Expand Up @@ -385,10 +396,11 @@ Each major system component has detailed documentation:
- **`/robosystems/models/api/README.md`**: Centralized Pydantic models for API validation
- **`/robosystems/config/README.md`**: Configuration management and environment handling

### Kuzu Graph Database System
### Graph Database System

- **`/robosystems/kuzu_api/README.md`**: Complete Kuzu API documentation
- **`/robosystems/kuzu_api/client/README.md`**: Client-factory system for intelligent routing
- **`/robosystems/graph_api/README.md`**: Complete Graph API documentation (supports Kuzu and Neo4j backends)
- **`/robosystems/graph_api/backends/README.md`**: Backend abstraction layer and implementation details
- **`/robosystems/graph_api/client/README.md`**: Client-factory system for intelligent routing

### Middleware Components

Expand Down
15 changes: 13 additions & 2 deletions bin/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ case $DOCKER_PROFILE in
"kuzu-writer")
echo "Starting Kuzu Writer API..."
# max-databases will be loaded from tier configuration based on CLUSTER_TIER
exec uv run python -m robosystems.kuzu_api \
exec uv run python -m robosystems.graph_api \
--node-type writer \
--repository-type entity \
--port ${KUZU_PORT:-8001} \
Expand All @@ -138,13 +138,24 @@ case $DOCKER_PROFILE in
READONLY_FLAG=""
fi
# max-databases will be loaded from tier configuration based on CLUSTER_TIER
exec uv run python -m robosystems.kuzu_api \
exec uv run python -m robosystems.graph_api \
--node-type ${KUZU_NODE_TYPE} \
--repository-type shared \
--port ${KUZU_PORT:-8002} \
--base-path ${KUZU_DATABASE_PATH:-/app/data/kuzu-dbs} \
${READONLY_FLAG}
;;
"neo4j-writer")
echo "Starting Neo4j Graph API..."
# Graph API with Neo4j backend
# Backend type determined by BACKEND_TYPE env var (neo4j_community or neo4j_enterprise)
# Note: base-path is for metadata only (actual data stored in Neo4j database via Bolt)
exec uv run python -m robosystems.graph_api \
--node-type writer \
--repository-type entity \
--port ${GRAPH_API_PORT:-8002} \
--base-path /app/data/neo4j-metadata
;;
*)
echo "Unknown profile: $DOCKER_PROFILE"
exit 1
Expand Down
2 changes: 1 addition & 1 deletion bin/tools/package-scripts.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ psycopg2-binary==2.9.9"
package_lambda "valkey-rotation" "valkey_rotation.py" "boto3==1.34.14
redis==5.0.1"

package_lambda "kuzu-api-rotation" "kuzu_api_rotation.py" "boto3==1.34.14"
package_lambda "kuzu-api-rotation" "graph_api_rotation.py" "boto3==1.34.14"

# Package snapshot Lambda functions (temporary - will be migrated to volume manager)
package_lambda "kuzu-snapshot-creator" "kuzu_snapshot_creator.py" "boto3==1.34.14"
Expand Down
Loading
Loading