This guide explains how to configure and use the parallel processing capabilities of gql-ingest.
gql-ingest supports two levels of parallelization:
- Row-level parallelization - Process multiple CSV rows concurrently within a single entity
- Entity-level parallelization - Process multiple entities concurrently with dependency management
Add a config.yaml file to your configuration directory (same directory as mappings/, data/, graphql/):
# Global parallel processing settings
parallelProcessing:
concurrency: 10 # Max concurrent requests per entity (>1 enables parallel row processing)
entityConcurrency: 3 # Max concurrent entities processed simultaneously
preserveRowOrder: false # Allow rows to complete out of order
# Per-entity overrides
entityConfig:
users:
concurrency: 1 # Sequential processing
preserveRowOrder: true # Maintain CSV order
products:
concurrency: 20 # High concurrency for bulk data
# Entity dependencies (creates execution waves)
entityDependencies:
products: ["users"] # Products depend on users
orders: ["products", "users"] # Orders depend on both| Option | Type | Default | Description |
|---|---|---|---|
concurrency |
number | 1 | Maximum concurrent requests per entity (>1 enables parallel row processing) |
entityConcurrency |
number | 1 | Maximum concurrent entities processed simultaneously |
preserveRowOrder |
boolean | true | Maintain CSV row order (forces concurrency=1) |
Key Insight: entityConcurrency replaces the previous confusing enableEntityParallelization and preserveEntityOrder boolean settings. Higher values = more entities processed simultaneously.
Override global settings for specific entities:
entityConfig:
entityName:
concurrency: 5 # Override global concurrency
preserveRowOrder: true # Override global row order settingDefine which entities must complete before others can start:
entityDependencies:
products: ["users", "categories"] # Products waits for users AND categories
orders: ["products"] # Orders waits for productsEntities are organized into execution waves based on dependencies:
# Configuration
entityDependencies:
products: ["users", "categories"]
orders: ["products", "users"]
reviews: ["products", "users"]
# Execution waves:
# Wave 1: users, categories (no dependencies)
# Wave 2: products (depends on Wave 1)
# Wave 3: orders, reviews (depend on products from Wave 2)Within each wave, entityConcurrency controls how many entities can process simultaneously:
entityConcurrency: 1- Entities in wave processed sequentially (one at a time)entityConcurrency: 3- Up to 3 entities in wave processed concurrentlyentityConcurrency: 10- Up to 10 entities in wave processed concurrently
Important: Wave boundaries are always respected. Wave 2 never starts until Wave 1 is complete.
Controls processing within a single entity:
users:
preserveRowOrder: true # Process user rows in CSV order
# Automatically sets concurrency: 1
products:
preserveRowOrder: false # Rows can complete out of order
concurrency: 10 # Can process 10 rows concurrentlyControls how many entities can process simultaneously within dependency waves:
entityConcurrency: 1
# Wave 1: [users, categories] - processed one at a time
# Wave 2: [products] - single entity
# Wave 3: [orders, reviews] - processed one at a time
entityConcurrency: 3
# Wave 1: [users, categories] - both processed concurrently (2 entities)
# Wave 2: [products] - single entity
# Wave 3: [orders, reviews] - both processed concurrently (2 entities)| Data Type | Recommended Concurrency | Reasoning |
|---|---|---|
| User accounts | 1-5 | Sensitive data, avoid rate limits |
| Product catalog | 10-50 | Bulk data, higher throughput |
| Transactional data | 5-15 | Moderate concurrency |
Based on typical GraphQL response times (~100ms):
| Concurrency | Throughput | Use Case |
|---|---|---|
| 1 (sequential) | ~10 req/sec | Sensitive data, debugging |
| 10 | ~100 req/sec | Standard processing |
| 20 | ~200 req/sec | Bulk data import |
| 50+ | ~500+ req/sec | High-volume scenarios |
Higher concurrency uses more memory:
- Each concurrent request holds CSV row data
- Large CSV files with high concurrency may require more RAM
- Monitor memory usage and adjust concurrency accordingly
- Individual row failures don't stop other concurrent requests
- Failed rows are logged with context
- Success/failure metrics tracked per entity
- If an entity in Wave 1 fails, dependent entities in Wave 2+ are still attempted
- Use metrics to identify systematic failures
parallelProcessing:
concurrency: 20 # High concurrency enables parallel row processing
entityConcurrency: 5 # Process up to 5 entities simultaneously
preserveRowOrder: false
entityConfig:
users:
concurrency: 5 # Lower for user data
products:
concurrency: 50 # Higher for product catalogparallelProcessing:
concurrency: 2 # Low concurrency with parallel processing
entityConcurrency: 1 # Process entities one at a time
preserveRowOrder: true
entityDependencies:
products: ["users"]
orders: ["products"]parallelProcessing:
concurrency: 10 # Moderate concurrency enables parallel processing
entityConcurrency: 2 # Process up to 2 entities simultaneously
entityConfig:
# Sensitive data - preserve order
users:
concurrency: 1 # Sequential processing (concurrency=1)
preserveRowOrder: true
# Reference data - high throughput
categories:
concurrency: 20 # Parallel processing (concurrency>1)
preserveRowOrder: false
# Transactional data - moderate concurrency
orders:
concurrency: 5 # Parallel processing (concurrency>1)
preserveRowOrder: true
entityDependencies:
products: ["users", "categories"]
orders: ["products", "users"]- Server rate limiting - Reduce concurrency
- Memory usage too high - Lower concurrency or process smaller batches
- Unexpected order - Check
preserveRowOrderand dependency configuration - Slow performance - Increase concurrency (if server allows)
The tool provides detailed metrics:
- Total processed, successes, failures
- Success rate percentage
- Processing duration
- Per-entity breakdown
Use these metrics to optimize concurrency settings for your specific use case.