Skip to content

Solvaratech/drawline-core

@solvaratech/drawline-core

Mathematically Grounded, Engineering-Strong Database Seeding Engine

Drawline Core is a production-grade TypeScript library for intelligent, deterministic test data generation across multiple database systems. It provides a unified interface for schema inference, relationship resolution, and referentially intact data seeding with strong mathematical guarantees on data consistency.


Table of Contents

  1. Overview
  2. Features Achieved
  3. Technical Architecture
  4. Mathematical Foundations
  5. Usage
  6. Roadmap
  7. Development

Overview

Drawline addresses one of the most challenging problems in software engineering: generating realistic, referentially intact test data at scale across heterogeneous database systems. Traditional approaches rely on simple random generation or expensive database lookups to maintain foreign key integrity. Drawline uses a mathematically derived deterministic generation protocol that guarantees referential integrity without any database queries during generation.

Core Problem Statement

Given:

  • A database schema $S$ with collections $C = {c_1, c_2, ..., c_n}$
  • Relationships $R = {r_1, r_2, ..., r_m}$ defining foreign key dependencies
  • A generation seed $\sigma \in \mathbb{N}$

Generate documents $D_c = {d_1, d_2, ..., d_k}$ for each collection $c$ such that:

  1. All foreign key references point to existing primary keys
  2. The generation is fully deterministic: $G(\sigma, c, i) \rightarrow d_i$
  3. No database queries are required during generation

Features Achieved

🎨 Drawline Semantic Engine (NEW v0.2.0)

Drawline now includes a world-class Drawline Semantic Engine powered by 60+ curated industry datasets. No more "Lorem Ipsum" or generic faker data; your test databases will now contain high-fidelity, domain-specific information.

  • 60+ Industry Domains: Finance, Healthcare, Aviation, Logistics, Law, Science, Tech, and more.
  • Context-Aware Inference: The engine automatically detects field names like pan_card (Indian context), flight_number (Aviation), or diagnosis_code (Healthcare) and routes them to the correct semantic generator.
  • Zero-Dependency Core: High-performance generation without bloated external libraries.
  • Deterministic Randomness: Uses Xoshiro128 PRNG for repeatable, seed-based data generation across all 60+ datasets.

🏢 Industry Templates

Drawline provides ready-to-use schema templates for various sectors:

  • Ecommerce: Multi-table setup with users, products, orders, and logistics tracking.
  • OTT Streaming: Profiles, movie titles, genres, and watch history.
  • Fintech: Transactions, bank details, and financial tax types.
  • Logistics: Carriers, shipments, and global tracking states.
  • Healthcare: Appointments, vitals, and medical specialties.
  • ...and 6 more industry presets.

🛡️ Unified Validation CLI

A single entry point for all project health checks:

  • Dataset Integrity: Automatically validates all 60+ JSON semantic collections.
  • Performance Benchmarking: Non-interactive suite that measures TPS (Transactions Per Second) and latency.
  • Unit Testing: Full integration with Vitest for 100% core logic verification.

Multi-Database Adapter Architecture

Drawline implements a unified adapter pattern supporting 11+ database systems:

Adapter Status Key Features
PostgreSQL ✅ Complete Schema inference, FK constraints, serial types
MySQL ✅ Complete AUTO_INCREMENT, foreign keys
SQLite ✅ Complete Embedded testing, local file support
MongoDB ✅ Complete ObjectId generation, document embedding
CSV Export ✅ Complete Automated Export alongside test reports
...and more DynamoDB, Firestore, Redis, SQL Server

Field Inference Engine

Smart field generation with score-based routing:

// Automatic Industry Routing
this.addRule('flight_status', ['flight', 'status'], 10, (r) => SemanticProvider.getFlightStatus(r));
this.addRule('diagnosis', ['diagnosis'], 10, (r) => SemanticProvider.getHealthcareDiagnosis(r));

CI/CD Integration

  • Automated Benchmarking: Measures TPS and Memory usage on every PR.
  • Artifact Upload: Generates and uploads PDF/Markdown reports and sample CSV data for every CI run.
  • Version Gating: Ensures npm publish only occurs if all 60+ datasets are valid and benchmarks are stable.


Technical Architecture

Core Data Flow

┌─────────────────────────────────────────────────────────────────────┐
│                    TestDataGeneratorService              │
├─────────────────────────────────────────────────────────────────────┤
│  1. initialize(config, collections, relationships)  │
│     ├── Preload metadata from target DB              │
│     ├── Build relationship map                      │
│     └── Initialize seeded RNG                      │
│                                                             │
│  2. buildDependencyOrder()                           │
│     ├── Build DAG from relationships                 │
│     ├── Detect and break cycles                       │
│     └── Return topological sort                    │
│                                                             │
│  3. generateAndPopulate()                           │
│     ├── For each collection in order:                │
│     │   ├── ensureCollection()                       │
│     │   ├── generateCollectionData()                 │
│     │   └── insertDocuments()                       │
│     └── Validate referential integrity               │
└─────────────────────────────────────────────────────────────────────┘

Adapter Interface

abstract class BaseAdapter {
  // Connection management
  abstract connect(): Promise<void>;
  abstract disconnect(): Promise<void>;
  
  // Schema operations
  abstract collectionExists(name: string): Promise<boolean>;
  abstract ensureCollection(name: string, fields: SchemaField[]): Promise<void>;
  abstract getCollectionDetails(name: string): Promise<CollectionDetails>;
  abstract getCollectionSchema(name: string): Promise<SchemaField[]>;
  
  // Data operations
  abstract insertDocuments(
    collectionName: string, 
    documents: GeneratedDocument[]
  ): Promise<(string | number)[]>;
  
  abstract clearCollection(name: string): Promise<void>;
  abstract getDocumentCount(name: string): Promise<number>;
  
  // Validation
  abstract validateReference(
    collectionName: string, 
    fieldName: string, 
    value: unknown
  ): Promise<boolean>;
}

Class Hierarchy

BaseAdapter
├── PostgresAdapter
├── MySQLAdapter
├── SQLiteAdapter
├── MongoDBAdapter
├── DynamoDBAdapter
├── FirestoreAdapter
├── RedisAdapter
├── SQLServerAdapter
├── InMemoryAdapter (for testing)
├── EphemeralAdapter (for demos)
├── NullAdapter (no-op)
└── CSVExportAdapter (export)

Mathematical Foundations

1. Topological Sort for Generation Ordering

Problem: Given a DAG $G = (V, E)$ where $V = C$ and edges represent dependencies, find a linear ordering $\tau: V \rightarrow [1, |V|]$ such that $\forall (u, v) \in E: \tau(u) &lt; \tau(v)$.

Algorithm: Kahn's algorithm with in-degree counting:

TOPOLOGICAL-SORT(G):
  Compute in-degree(v) for all v ∈ V
  Queue ← { v | in-degree(v) = 0 }
  result ← []
  
  while Queue not empty:
    v ← Queue.pop()
    result.append(v)
    for each edge (v, w):
      in-degree(w) ← in-degree(w) - 1
      if in-degree(w) = 0:
        Queue.push(w)
  
  return result

Complexity: $O(|V| + |E|)$

2. Deterministic ID Generation

Theorem: For any collections $A$ and $B$ with relationship $R: A \rightarrow B$, let $id_A(i)$ generate the ID for the $i$-th document in $A$. Then $id_B(j)$ generated for the $j$-th document in $B$ satisfies:

$$\forall i \in [1, |A|]: FK(i) = id_A(i) = id_B(i \mod |B|)$$

Proof: Using the deterministic hash: $$id(c, i) = \text{hash}(\text{collection}c \oplus i \oplus \sigma){constrained}$$

The FK resolution computes: $$parentIndex = i \mod |parent|$$ $$FK(i) = id(parent, parentIndex)$$

By substitution: $$FK(i) = \text{hash}(parent \oplus (i \mod |parent|) \oplus \sigma)$$ $$= id(parent, i \mod |parent|)$$

$\square$

3. Cycle Detection and Breaking

Theorem: Any finite directed graph can be made acyclic by removing at least one edge.

Algorithm: Modified DFS with cycle breaking:

DETECT-CYCLE(G):
  visited ← ∅
  recursionStack ← ∅
  
  DFS(v):
    visited.add(v)
    recursionStack.add(v)
    
    for each neighbor u of v:
      if u ∉ visited:
        if DFS(u) return true
      if u ∈ recursionStack:
        return CYCLE-DETECTED(v, u)
    
    recursionStack.delete(v)
    return false
  
  for each vertex v:
    if v ∉ visited:
      if DFS(v) return true
  
  return false

Breaking Strategy: When cycles detected, prioritize removing weak dependencies (non-required FKs) to preserve data integrity.

4. Field Inference Scoring

Problem: Given a field name $f$, select the best generator from a rule set $R$.

Algorithm: Score-based matching:

$$\text{score}(r, f) = r_{score} + \text{match}(r, f) - \text{noise}(r, f)$$

Where:

  • $\text{match}(r, f) = 5$ if $|tokens(f)| = |tokens(r)|$ (perfect match)
  • $\text{noise}(r, f) = 0.5 \times (|tokens(f)| - |tokens(r)|)$

Select $r^* = \text{argmax}_r \text{score}(r, f)$

5. Composite FK Resolution

For composite FKs $(f_1, ..., f_k) \rightarrow (p_1, ..., p_k)$:

  1. Select parent row index $r = i \mod |parent|$
  2. Retrieve cached parent row $P[r]$
  3. For each component $f_j$: $$value[f_j] = P[r][p_j]$$

This ensures all FK components reference the same parent row.

6. Cross-Column Constraint Satisfaction

For constraints like $A &gt; B$ where $B$ is generated first:

$$value[A] = \max(generated, value[B] + \delta)$$

Where $\delta$ is a small deterministic offset to maintain both uniqueness and constraint satisfaction.


Usage

Installation

npm install @solvaratech/drawline-core

Basic Generation

import { TestDataGeneratorService } from "@solvaratech/drawline-core/server";
import { PostgresAdapter } from "@solvaratech/drawline-core/generator/adapters/PostgresAdapter";

// 1. Configure adapter
const adapter = new PostgresAdapter({
  connectionString: "postgres://user:pass@localhost:5432/mydb"
});
await adapter.connect();

// 2. Initialize service
const service = new TestDataGeneratorService(adapter);

// 3. Define schema
const collections = [
  {
    id: "users",
    name: "users",
    fields: [
      { id: "id", name: "id", type: "uuid", isPrimaryKey: true },
      { id: "email", name: "email", type: "string", required: true },
      { id: "name", name: "name", type: "string" }
    ]
  },
  {
    id: "posts",
    name: "posts",
    fields: [
      { id: "id", name: "id", type: "uuid", isPrimaryKey: true },
      { id: "user_id", name: "user_id", type: "uuid", isForeignKey: true, 
        referencedCollectionId: "users" },
      { id: "title", name: "title", type: "string" }
    ]
  }
];

const relationships = [
  {
    id: "posts->users",
    fromCollectionId: "posts",
    toCollectionId: "users",
    type: "many-to-one",
    fromField: "user_id",
    toField: "id"
  }
];

// 4. Generate configuration
const config = {
  collections: [
    { collectionName: "users", count: 100 },
    { collectionName: "posts", count: 1000 }
  ],
  seed: 12345
};

// 5. Execute generation
const result = await service.generateAndPopulate(
  collections, 
  relationships, 
  config
);

console.log(`Generated ${result.totalDocumentsGenerated} documents`);

Schema Diff and Migration

import { computeSchemaDiff, generateDDL } from "@solvaratech/drawline-core/schema";

// Compare current schema with database
const diff = computeSchemaDiff(databaseSnapshot, newSchema, "additive");

// Generate migration SQL
const statements = generateDDL(diff);

for (const stmt of statements) {
  console.log(stmt.sql);
}

ORM Code Generation

import { PrismaGenerator } from "@solvaratech/drawline-core/generators/orm";

const generator = new PrismaGenerator();
const output = generator.generate(collections, relationships);

console.log(output.content); // Prisma schema.prisma content

Roadmap

Short Term (v0.2.0 - v0.3.0)

  • Enhanced Validation: Post-generation integrity validation
  • Data masking: Sensitive data identification and redaction
  • Incremental generation: Delta seeding for existing databases
  • Distribution profiles: Normal, exponential, power-law distributions
  • Relationship visualization: Draw relationship graphs

Medium Term (v0.4.0 - v0.5.0)

  • Web UI Dashboard: Visual schema editor and generator interface
  • Data Templates: Reusable generation templates
  • Export formats: More export adapters (Excel, JSON Lines)
  • Audit logging: Generation audit trail
  • CI/CD integration: GitHub Actions, GitLab CI

Long Term (v1.0.0)

  • GraphQL API: REST/GraphQL API for remote generation
  • Multi-tenant:隔离的多租户支持
  • Enterprise features: SSO, RBAC, audit
  • Cloud dashboard: SaaS management console
  • Plug-in system: Third-party generator plugins

Development

Prerequisites

  • Node.js 18+
  • TypeScript 5.9+
  • pnpm or npm

Setup

npm install
npm run build

Testing

# Run all tests
npm test

# Watch mode
npm run test:watch

# UI
npm run test:ui

# CI (with coverage)
npm run test:ci

Type Checking

npm run type-check

CLI

npm run cli:build
npm link  # Link globally

drawline init
drawline gen --schema schema.json --config config.json

API Reference

Core Exports

// Main exports
export * from "./types/schemaDesign";      // Schema types
export * from "./types/schemaDiff";       // Diff types
export * from "./utils/schemaConverter";  // Converters
export * from "./utils/errorMessages"; // Errors
export * from "./schema";              // Schema engine
export * from "./generators/orm";      // ORM generators

// Server exports
export * from "./connections";         // Database connections
export * from "./generator";         // Generation engine

Key Interfaces

interface SchemaCollection {
  id: string;
  name: string;
  fields: SchemaField[];
  schema?: string;
  dbName?: string;
  position?: { x: number; y: number };
}

interface SchemaField {
  id: string;
  name: string;
  type: FieldType;
  required?: boolean;
  isPrimaryKey?: boolean;
  isForeignKey?: boolean;
  isSerial?: boolean;
  compositePrimaryKeyIndex?: number;
  compositeKeyGroup?: string;
  referencedCollectionId?: string;
  foreignKeyTarget?: string;
  rawType?: string;
  arrayItemType?: string;
  defaultValue?: any;
  constraints?: FieldConstraints;
}

interface SchemaRelationship {
  id: string;
  fromCollectionId: string;
  toCollectionId: string;
  type: "one-to-one" | "one-to-many" | "many-to-many";
  fromField?: string;
  toField?: string;
  fromFields?: string[];
  toFields?: string[];
}

interface TestDataConfig {
  collections: CollectionConfig[];
  seed?: number | string;
  batchSize?: number;
  onProgress?: (progress: ProgressUpdate) => Promise<void>;
}

License

MIT License. See LICENSE file for details.


Contributing

See CONTRIBUTING.md for development guidelines.


Support

About

An open-source engine that infers relationships using heuristic fuzzy matching and generates dependency-aware data via a directed graph execution model.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors