Mathematically Grounded, Engineering-Strong Database Seeding Engine
Drawline Core is a production-grade TypeScript library for intelligent, deterministic test data generation across multiple database systems. It provides a unified interface for schema inference, relationship resolution, and referentially intact data seeding with strong mathematical guarantees on data consistency.
- Overview
- Features Achieved
- Technical Architecture
- Mathematical Foundations
- Usage
- Roadmap
- Development
Drawline addresses one of the most challenging problems in software engineering: generating realistic, referentially intact test data at scale across heterogeneous database systems. Traditional approaches rely on simple random generation or expensive database lookups to maintain foreign key integrity. Drawline uses a mathematically derived deterministic generation protocol that guarantees referential integrity without any database queries during generation.
Given:
- A database schema
$S$ with collections$C = {c_1, c_2, ..., c_n}$ - Relationships
$R = {r_1, r_2, ..., r_m}$ defining foreign key dependencies - A generation seed
$\sigma \in \mathbb{N}$
Generate documents
- All foreign key references point to existing primary keys
- The generation is fully deterministic:
$G(\sigma, c, i) \rightarrow d_i$ - No database queries are required during generation
Drawline now includes a world-class Drawline Semantic Engine powered by 60+ curated industry datasets. No more "Lorem Ipsum" or generic faker data; your test databases will now contain high-fidelity, domain-specific information.
- 60+ Industry Domains: Finance, Healthcare, Aviation, Logistics, Law, Science, Tech, and more.
- Context-Aware Inference: The engine automatically detects field names like
pan_card(Indian context),flight_number(Aviation), ordiagnosis_code(Healthcare) and routes them to the correct semantic generator. - Zero-Dependency Core: High-performance generation without bloated external libraries.
- Deterministic Randomness: Uses Xoshiro128 PRNG for repeatable, seed-based data generation across all 60+ datasets.
Drawline provides ready-to-use schema templates for various sectors:
- Ecommerce: Multi-table setup with users, products, orders, and logistics tracking.
- OTT Streaming: Profiles, movie titles, genres, and watch history.
- Fintech: Transactions, bank details, and financial tax types.
- Logistics: Carriers, shipments, and global tracking states.
- Healthcare: Appointments, vitals, and medical specialties.
- ...and 6 more industry presets.
A single entry point for all project health checks:
- Dataset Integrity: Automatically validates all 60+ JSON semantic collections.
- Performance Benchmarking: Non-interactive suite that measures TPS (Transactions Per Second) and latency.
- Unit Testing: Full integration with Vitest for 100% core logic verification.
Drawline implements a unified adapter pattern supporting 11+ database systems:
| Adapter | Status | Key Features |
|---|---|---|
| PostgreSQL | ✅ Complete | Schema inference, FK constraints, serial types |
| MySQL | ✅ Complete | AUTO_INCREMENT, foreign keys |
| SQLite | ✅ Complete | Embedded testing, local file support |
| MongoDB | ✅ Complete | ObjectId generation, document embedding |
| CSV Export | ✅ Complete | Automated Export alongside test reports |
| ...and more | DynamoDB, Firestore, Redis, SQL Server |
Smart field generation with score-based routing:
// Automatic Industry Routing
this.addRule('flight_status', ['flight', 'status'], 10, (r) => SemanticProvider.getFlightStatus(r));
this.addRule('diagnosis', ['diagnosis'], 10, (r) => SemanticProvider.getHealthcareDiagnosis(r));- Automated Benchmarking: Measures TPS and Memory usage on every PR.
- Artifact Upload: Generates and uploads PDF/Markdown reports and sample CSV data for every CI run.
- Version Gating: Ensures
npm publishonly occurs if all 60+ datasets are valid and benchmarks are stable.
┌─────────────────────────────────────────────────────────────────────┐
│ TestDataGeneratorService │
├─────────────────────────────────────────────────────────────────────┤
│ 1. initialize(config, collections, relationships) │
│ ├── Preload metadata from target DB │
│ ├── Build relationship map │
│ └── Initialize seeded RNG │
│ │
│ 2. buildDependencyOrder() │
│ ├── Build DAG from relationships │
│ ├── Detect and break cycles │
│ └── Return topological sort │
│ │
│ 3. generateAndPopulate() │
│ ├── For each collection in order: │
│ │ ├── ensureCollection() │
│ │ ├── generateCollectionData() │
│ │ └── insertDocuments() │
│ └── Validate referential integrity │
└─────────────────────────────────────────────────────────────────────┘
abstract class BaseAdapter {
// Connection management
abstract connect(): Promise<void>;
abstract disconnect(): Promise<void>;
// Schema operations
abstract collectionExists(name: string): Promise<boolean>;
abstract ensureCollection(name: string, fields: SchemaField[]): Promise<void>;
abstract getCollectionDetails(name: string): Promise<CollectionDetails>;
abstract getCollectionSchema(name: string): Promise<SchemaField[]>;
// Data operations
abstract insertDocuments(
collectionName: string,
documents: GeneratedDocument[]
): Promise<(string | number)[]>;
abstract clearCollection(name: string): Promise<void>;
abstract getDocumentCount(name: string): Promise<number>;
// Validation
abstract validateReference(
collectionName: string,
fieldName: string,
value: unknown
): Promise<boolean>;
}BaseAdapter
├── PostgresAdapter
├── MySQLAdapter
├── SQLiteAdapter
├── MongoDBAdapter
├── DynamoDBAdapter
├── FirestoreAdapter
├── RedisAdapter
├── SQLServerAdapter
├── InMemoryAdapter (for testing)
├── EphemeralAdapter (for demos)
├── NullAdapter (no-op)
└── CSVExportAdapter (export)
Problem: Given a DAG
Algorithm: Kahn's algorithm with in-degree counting:
TOPOLOGICAL-SORT(G):
Compute in-degree(v) for all v ∈ V
Queue ← { v | in-degree(v) = 0 }
result ← []
while Queue not empty:
v ← Queue.pop()
result.append(v)
for each edge (v, w):
in-degree(w) ← in-degree(w) - 1
if in-degree(w) = 0:
Queue.push(w)
return result
Complexity:
Theorem: For any collections
Proof: Using the deterministic hash: $$id(c, i) = \text{hash}(\text{collection}c \oplus i \oplus \sigma){constrained}$$
The FK resolution computes:
By substitution:
Theorem: Any finite directed graph can be made acyclic by removing at least one edge.
Algorithm: Modified DFS with cycle breaking:
DETECT-CYCLE(G):
visited ← ∅
recursionStack ← ∅
DFS(v):
visited.add(v)
recursionStack.add(v)
for each neighbor u of v:
if u ∉ visited:
if DFS(u) return true
if u ∈ recursionStack:
return CYCLE-DETECTED(v, u)
recursionStack.delete(v)
return false
for each vertex v:
if v ∉ visited:
if DFS(v) return true
return false
Breaking Strategy: When cycles detected, prioritize removing weak dependencies (non-required FKs) to preserve data integrity.
Problem: Given a field name
Algorithm: Score-based matching:
Where:
-
$\text{match}(r, f) = 5$ if$|tokens(f)| = |tokens(r)|$ (perfect match) $\text{noise}(r, f) = 0.5 \times (|tokens(f)| - |tokens(r)|)$
Select
For composite FKs
- Select parent row index
$r = i \mod |parent|$ - Retrieve cached parent row
$P[r]$ - For each component
$f_j$ :$$value[f_j] = P[r][p_j]$$
This ensures all FK components reference the same parent row.
For constraints like
Where
npm install @solvaratech/drawline-coreimport { TestDataGeneratorService } from "@solvaratech/drawline-core/server";
import { PostgresAdapter } from "@solvaratech/drawline-core/generator/adapters/PostgresAdapter";
// 1. Configure adapter
const adapter = new PostgresAdapter({
connectionString: "postgres://user:pass@localhost:5432/mydb"
});
await adapter.connect();
// 2. Initialize service
const service = new TestDataGeneratorService(adapter);
// 3. Define schema
const collections = [
{
id: "users",
name: "users",
fields: [
{ id: "id", name: "id", type: "uuid", isPrimaryKey: true },
{ id: "email", name: "email", type: "string", required: true },
{ id: "name", name: "name", type: "string" }
]
},
{
id: "posts",
name: "posts",
fields: [
{ id: "id", name: "id", type: "uuid", isPrimaryKey: true },
{ id: "user_id", name: "user_id", type: "uuid", isForeignKey: true,
referencedCollectionId: "users" },
{ id: "title", name: "title", type: "string" }
]
}
];
const relationships = [
{
id: "posts->users",
fromCollectionId: "posts",
toCollectionId: "users",
type: "many-to-one",
fromField: "user_id",
toField: "id"
}
];
// 4. Generate configuration
const config = {
collections: [
{ collectionName: "users", count: 100 },
{ collectionName: "posts", count: 1000 }
],
seed: 12345
};
// 5. Execute generation
const result = await service.generateAndPopulate(
collections,
relationships,
config
);
console.log(`Generated ${result.totalDocumentsGenerated} documents`);import { computeSchemaDiff, generateDDL } from "@solvaratech/drawline-core/schema";
// Compare current schema with database
const diff = computeSchemaDiff(databaseSnapshot, newSchema, "additive");
// Generate migration SQL
const statements = generateDDL(diff);
for (const stmt of statements) {
console.log(stmt.sql);
}import { PrismaGenerator } from "@solvaratech/drawline-core/generators/orm";
const generator = new PrismaGenerator();
const output = generator.generate(collections, relationships);
console.log(output.content); // Prisma schema.prisma content- Enhanced Validation: Post-generation integrity validation
- Data masking: Sensitive data identification and redaction
- Incremental generation: Delta seeding for existing databases
- Distribution profiles: Normal, exponential, power-law distributions
- Relationship visualization: Draw relationship graphs
- Web UI Dashboard: Visual schema editor and generator interface
- Data Templates: Reusable generation templates
- Export formats: More export adapters (Excel, JSON Lines)
- Audit logging: Generation audit trail
- CI/CD integration: GitHub Actions, GitLab CI
- GraphQL API: REST/GraphQL API for remote generation
- Multi-tenant:隔离的多租户支持
- Enterprise features: SSO, RBAC, audit
- Cloud dashboard: SaaS management console
- Plug-in system: Third-party generator plugins
- Node.js 18+
- TypeScript 5.9+
- pnpm or npm
npm install
npm run build# Run all tests
npm test
# Watch mode
npm run test:watch
# UI
npm run test:ui
# CI (with coverage)
npm run test:cinpm run type-checknpm run cli:build
npm link # Link globally
drawline init
drawline gen --schema schema.json --config config.json// Main exports
export * from "./types/schemaDesign"; // Schema types
export * from "./types/schemaDiff"; // Diff types
export * from "./utils/schemaConverter"; // Converters
export * from "./utils/errorMessages"; // Errors
export * from "./schema"; // Schema engine
export * from "./generators/orm"; // ORM generators
// Server exports
export * from "./connections"; // Database connections
export * from "./generator"; // Generation engineinterface SchemaCollection {
id: string;
name: string;
fields: SchemaField[];
schema?: string;
dbName?: string;
position?: { x: number; y: number };
}
interface SchemaField {
id: string;
name: string;
type: FieldType;
required?: boolean;
isPrimaryKey?: boolean;
isForeignKey?: boolean;
isSerial?: boolean;
compositePrimaryKeyIndex?: number;
compositeKeyGroup?: string;
referencedCollectionId?: string;
foreignKeyTarget?: string;
rawType?: string;
arrayItemType?: string;
defaultValue?: any;
constraints?: FieldConstraints;
}
interface SchemaRelationship {
id: string;
fromCollectionId: string;
toCollectionId: string;
type: "one-to-one" | "one-to-many" | "many-to-many";
fromField?: string;
toField?: string;
fromFields?: string[];
toFields?: string[];
}
interface TestDataConfig {
collections: CollectionConfig[];
seed?: number | string;
batchSize?: number;
onProgress?: (progress: ProgressUpdate) => Promise<void>;
}MIT License. See LICENSE file for details.
See CONTRIBUTING.md for development guidelines.
- GitHub Issues: https://github.com/solvaratech/drawline-core/issues
- Documentation: https://drawline.app/docs