Comprehensive synthetic data generation for workforce modeling, organizational planning, and HR system testing using agentic-synth.
This directory contains realistic employee behavior simulations designed for:
- HR System Testing: Test HR platforms with realistic synthetic employee data
- Workforce Planning: Model hiring needs, skill gaps, and organizational changes
- Organizational Analysis: Analyze team dynamics, culture, and performance patterns
- Process Optimization: Optimize onboarding, reviews, and development programs
- Predictive Analytics: Train ML models for turnover prediction and workforce forecasting
Employee daily behavior patterns and productivity modeling.
Includes:
- Daily work schedules (flexible, remote, hybrid)
- Productivity patterns with time-of-day variations
- Collaboration and communication patterns
- Meeting attendance and participation
- Task completion rates and patterns
- Work location preferences (WFH vs office)
Use Cases:
- Productivity analytics
- Collaboration optimization
- Meeting culture analysis
- Remote work planning
- Time tracking validation
Employee performance metrics and achievement data.
Includes:
- KPI achievement across roles
- Project deliverables tracking
- Code commits and review metrics (developers)
- Sales targets and achievements
- Quality metrics and defect rates
- Learning and development progress
- 360-degree performance reviews
Use Cases:
- Performance management systems
- OKR tracking platforms
- Sales analytics
- Engineering metrics
- Quality assurance
Team formation, leadership, and cultural patterns.
Includes:
- Team formation and evolution
- Cross-functional collaboration
- Leadership effectiveness metrics
- Mentorship relationships
- Organizational culture indicators
- Succession planning scenarios
Use Cases:
- Org design planning
- Leadership development
- Team health monitoring
- Culture measurement
- Succession planning
Strategic HR planning and forecasting data.
Includes:
- Hiring needs forecasting
- Skill gap analysis
- Turnover predictions with risk factors
- Compensation analysis and equity
- Career progression paths
- Workforce diversity metrics
Use Cases:
- Headcount planning
- Budget forecasting
- Retention strategies
- Compensation planning
- Diversity initiatives
Lifecycle events and HR processes.
Includes:
- Onboarding journeys
- Offboarding and exit analytics
- Promotions and transfers
- Performance review cycles
- Training and development events
- Team building activities
- Conflict resolution scenarios
Use Cases:
- HRIS event modeling
- Process optimization
- Employee journey mapping
- Compliance testing
- Learning management systems
SYNTHETIC DATA ONLY
- All data is 100% synthetic and generated by AI
- No real employee data is included or should be used
- Never train models on real employee data without consent
ETHICAL USE
- Use only for system testing and planning
- Never use to make actual decisions about real employees
- Maintain appropriate security for even synthetic HR data
- Be aware of and mitigate algorithmic bias
PRIVACY CONSIDERATIONS
- Treat synthetic data as if it were real (practice good habits)
- Don't mix synthetic and real data
- Follow all applicable privacy regulations (GDPR, CCPA, etc.)
- Document that data is synthetic in all systems
BIAS MITIGATION
- Simulations include diverse populations
- Avoid reinforcing stereotypes
- Include representation across all demographics
- Test for disparate impact in any derived models
COMPLIANCE
- Ensure generated data complies with equal employment laws
- Don't generate protected class data inappropriately
- Follow pay equity and anti-discrimination guidelines
- Consult legal counsel for production use
import { generateWorkSchedules } from './workforce-behavior.js';
import { generateKPIData } from './performance-data.js';
// Generate 500 realistic work schedules
const schedules = await generateWorkSchedules();
console.log(`Generated ${schedules.data.length} schedules`);
// Generate performance KPI data
const kpis = await generateKPIData();
console.log(`Generated ${kpis.data.length} KPI records`);import { createSynth } from '../../src/index.js';
const synth = createSynth({
provider: 'gemini',
apiKey: process.env.GEMINI_API_KEY,
cacheStrategy: 'memory'
});
// Generate multiple datasets in parallel
const [schedules, performance, reviews] = await Promise.all([
synth.generateStructured({ count: 1000, schema: scheduleSchema }),
synth.generateStructured({ count: 500, schema: performanceSchema }),
synth.generateStructured({ count: 300, schema: reviewSchema })
]);import { createSynth } from '../../src/index.js';
const synth = createSynth({ streaming: true });
// Stream productivity data
for await (const dataPoint of synth.generateStream('timeseries', {
count: 5000,
interval: '1h',
metrics: ['productivityScore', 'focusLevel']
})) {
// Process each data point as it's generated
await processProductivityData(dataPoint);
}// Generate data for specific scenarios
const seniorEngineerSchema = {
role: { type: 'string', default: 'senior_engineer' },
yearsExperience: { type: 'number', min: 5, max: 15 },
// ... more fields
};
const result = await synth.generateStructured({
count: 100,
schema: seniorEngineerSchema,
context: 'Generate profiles for senior engineers in a fast-growing startup'
});# Required for AI-powered generation
GEMINI_API_KEY=your_gemini_api_key
# Optional: Alternative providers
OPENROUTER_API_KEY=your_openrouter_keyconst config = {
provider: 'gemini', // or 'openrouter'
model: 'gemini-2.0-flash-exp', // or specific model
cacheStrategy: 'memory', // Enable caching for faster re-runs
cacheTTL: 3600, // Cache for 1 hour
maxRetries: 3,
timeout: 30000
};- Statistical Accuracy: Distributions match industry benchmarks
- Temporal Patterns: Seasonal and cyclical variations
- Correlations: Realistic relationships between variables
- Edge Cases: Includes outliers and unusual patterns
- Diversity: Represents varied demographics and experiences
Each example includes:
- Schema validation for data structure
- Range validation for numeric values
- Enum validation for categorical data
- Relationship validation for correlated fields
- Statistical validation against benchmarks
- Generation Speed: 100-1000 records/second (cached)
- Memory Usage: ~1MB per 1000 records
- Batch Processing: Parallel generation for large datasets
- Streaming: Low memory footprint for large volumes
- Workday, SAP SuccessFactors, Oracle HCM
- BambooHR, Namely, Rippling
- Custom HRIS implementations
- Tableau, Power BI, Looker
- People analytics tools
- Custom dashboards
- Sklearn, PyTorch, TensorFlow
- Feature engineering pipelines
- Model training and validation
// Generate small sample first
const sample = await generateKPIData();
console.log(sample.data.slice(0, 3)); // Review quality// Enable caching for iterative development
const synth = createSynth({ cacheStrategy: 'memory' });// Check data quality
const result = await synth.generateStructured({ count: 100, schema });
assert(result.data.every(r => r.salary > 0));// Always document that data is synthetic
const metadata = {
synthetic: true,
generated: new Date(),
purpose: 'Testing HRIS integration',
source: 'agentic-synth'
};// Include edge cases in generation
const context = `
Include edge cases:
- New hires (0-90 days)
- Long tenure (10+ years)
- High performers and strugglers
- Various leave scenarios
`;Slow Generation
- Enable caching:
cacheStrategy: 'memory' - Use batch processing for large volumes
- Consider using faster models
Unrealistic Data
- Adjust context prompts for more specificity
- Review and refine schemas
- Add validation constraints
Memory Issues
- Use streaming for large datasets
- Process in batches
- Clear cache periodically
- GitHub Issues: ruvector/issues
- Documentation: agentic-synth docs
Contributions welcome! Please ensure:
- Synthetic data remains realistic
- Privacy and ethics guidelines are followed
- Documentation is updated
- Tests are included
MIT License - see LICENSE file in repository root.
This software generates synthetic data for testing and planning purposes only. It should not be used to make decisions about real employees. Users are responsible for ensuring compliance with all applicable laws and regulations, including employment law, privacy law, and anti-discrimination law. The authors and maintainers assume no liability for misuse of this software.
Remember: This is synthetic data for testing. Always prioritize real employee privacy, dignity, and fairness in actual HR systems and processes.