Skip to content

Latest commit

 

History

History
367 lines (288 loc) · 9.7 KB

File metadata and controls

367 lines (288 loc) · 9.7 KB

Employee Simulation Examples

Comprehensive synthetic data generation for workforce modeling, organizational planning, and HR system testing using agentic-synth.

Overview

This directory contains realistic employee behavior simulations designed for:

  • HR System Testing: Test HR platforms with realistic synthetic employee data
  • Workforce Planning: Model hiring needs, skill gaps, and organizational changes
  • Organizational Analysis: Analyze team dynamics, culture, and performance patterns
  • Process Optimization: Optimize onboarding, reviews, and development programs
  • Predictive Analytics: Train ML models for turnover prediction and workforce forecasting

Files

1. workforce-behavior.ts

Employee daily behavior patterns and productivity modeling.

Includes:

  • Daily work schedules (flexible, remote, hybrid)
  • Productivity patterns with time-of-day variations
  • Collaboration and communication patterns
  • Meeting attendance and participation
  • Task completion rates and patterns
  • Work location preferences (WFH vs office)

Use Cases:

  • Productivity analytics
  • Collaboration optimization
  • Meeting culture analysis
  • Remote work planning
  • Time tracking validation

2. performance-data.ts

Employee performance metrics and achievement data.

Includes:

  • KPI achievement across roles
  • Project deliverables tracking
  • Code commits and review metrics (developers)
  • Sales targets and achievements
  • Quality metrics and defect rates
  • Learning and development progress
  • 360-degree performance reviews

Use Cases:

  • Performance management systems
  • OKR tracking platforms
  • Sales analytics
  • Engineering metrics
  • Quality assurance

3. organizational-dynamics.ts

Team formation, leadership, and cultural patterns.

Includes:

  • Team formation and evolution
  • Cross-functional collaboration
  • Leadership effectiveness metrics
  • Mentorship relationships
  • Organizational culture indicators
  • Succession planning scenarios

Use Cases:

  • Org design planning
  • Leadership development
  • Team health monitoring
  • Culture measurement
  • Succession planning

4. workforce-planning.ts

Strategic HR planning and forecasting data.

Includes:

  • Hiring needs forecasting
  • Skill gap analysis
  • Turnover predictions with risk factors
  • Compensation analysis and equity
  • Career progression paths
  • Workforce diversity metrics

Use Cases:

  • Headcount planning
  • Budget forecasting
  • Retention strategies
  • Compensation planning
  • Diversity initiatives

5. workplace-events.ts

Lifecycle events and HR processes.

Includes:

  • Onboarding journeys
  • Offboarding and exit analytics
  • Promotions and transfers
  • Performance review cycles
  • Training and development events
  • Team building activities
  • Conflict resolution scenarios

Use Cases:

  • HRIS event modeling
  • Process optimization
  • Employee journey mapping
  • Compliance testing
  • Learning management systems

Privacy & Ethics

Critical Guidelines

SYNTHETIC DATA ONLY

  • All data is 100% synthetic and generated by AI
  • No real employee data is included or should be used
  • Never train models on real employee data without consent

ETHICAL USE

  • Use only for system testing and planning
  • Never use to make actual decisions about real employees
  • Maintain appropriate security for even synthetic HR data
  • Be aware of and mitigate algorithmic bias

PRIVACY CONSIDERATIONS

  • Treat synthetic data as if it were real (practice good habits)
  • Don't mix synthetic and real data
  • Follow all applicable privacy regulations (GDPR, CCPA, etc.)
  • Document that data is synthetic in all systems

BIAS MITIGATION

  • Simulations include diverse populations
  • Avoid reinforcing stereotypes
  • Include representation across all demographics
  • Test for disparate impact in any derived models

COMPLIANCE

  • Ensure generated data complies with equal employment laws
  • Don't generate protected class data inappropriately
  • Follow pay equity and anti-discrimination guidelines
  • Consult legal counsel for production use

Usage Examples

Basic Usage

import { generateWorkSchedules } from './workforce-behavior.js';
import { generateKPIData } from './performance-data.js';

// Generate 500 realistic work schedules
const schedules = await generateWorkSchedules();
console.log(`Generated ${schedules.data.length} schedules`);

// Generate performance KPI data
const kpis = await generateKPIData();
console.log(`Generated ${kpis.data.length} KPI records`);

Batch Generation

import { createSynth } from '../../src/index.js';

const synth = createSynth({
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  cacheStrategy: 'memory'
});

// Generate multiple datasets in parallel
const [schedules, performance, reviews] = await Promise.all([
  synth.generateStructured({ count: 1000, schema: scheduleSchema }),
  synth.generateStructured({ count: 500, schema: performanceSchema }),
  synth.generateStructured({ count: 300, schema: reviewSchema })
]);

Streaming Generation

import { createSynth } from '../../src/index.js';

const synth = createSynth({ streaming: true });

// Stream productivity data
for await (const dataPoint of synth.generateStream('timeseries', {
  count: 5000,
  interval: '1h',
  metrics: ['productivityScore', 'focusLevel']
})) {
  // Process each data point as it's generated
  await processProductivityData(dataPoint);
}

Custom Scenarios

// Generate data for specific scenarios
const seniorEngineerSchema = {
  role: { type: 'string', default: 'senior_engineer' },
  yearsExperience: { type: 'number', min: 5, max: 15 },
  // ... more fields
};

const result = await synth.generateStructured({
  count: 100,
  schema: seniorEngineerSchema,
  context: 'Generate profiles for senior engineers in a fast-growing startup'
});

Configuration

Environment Variables

# Required for AI-powered generation
GEMINI_API_KEY=your_gemini_api_key

# Optional: Alternative providers
OPENROUTER_API_KEY=your_openrouter_key

Generation Options

const config = {
  provider: 'gemini', // or 'openrouter'
  model: 'gemini-2.0-flash-exp', // or specific model
  cacheStrategy: 'memory', // Enable caching for faster re-runs
  cacheTTL: 3600, // Cache for 1 hour
  maxRetries: 3,
  timeout: 30000
};

Data Quality

Realism Features

  • Statistical Accuracy: Distributions match industry benchmarks
  • Temporal Patterns: Seasonal and cyclical variations
  • Correlations: Realistic relationships between variables
  • Edge Cases: Includes outliers and unusual patterns
  • Diversity: Represents varied demographics and experiences

Validation

Each example includes:

  • Schema validation for data structure
  • Range validation for numeric values
  • Enum validation for categorical data
  • Relationship validation for correlated fields
  • Statistical validation against benchmarks

Performance

  • Generation Speed: 100-1000 records/second (cached)
  • Memory Usage: ~1MB per 1000 records
  • Batch Processing: Parallel generation for large datasets
  • Streaming: Low memory footprint for large volumes

Integration

HR Systems

  • Workday, SAP SuccessFactors, Oracle HCM
  • BambooHR, Namely, Rippling
  • Custom HRIS implementations

Analytics Platforms

  • Tableau, Power BI, Looker
  • People analytics tools
  • Custom dashboards

Machine Learning

  • Sklearn, PyTorch, TensorFlow
  • Feature engineering pipelines
  • Model training and validation

Best Practices

1. Start Small

// Generate small sample first
const sample = await generateKPIData();
console.log(sample.data.slice(0, 3)); // Review quality

2. Use Caching

// Enable caching for iterative development
const synth = createSynth({ cacheStrategy: 'memory' });

3. Validate Output

// Check data quality
const result = await synth.generateStructured({ count: 100, schema });
assert(result.data.every(r => r.salary > 0));

4. Document Usage

// Always document that data is synthetic
const metadata = {
  synthetic: true,
  generated: new Date(),
  purpose: 'Testing HRIS integration',
  source: 'agentic-synth'
};

5. Test Edge Cases

// Include edge cases in generation
const context = `
  Include edge cases:
  - New hires (0-90 days)
  - Long tenure (10+ years)
  - High performers and strugglers
  - Various leave scenarios
`;

Troubleshooting

Common Issues

Slow Generation

  • Enable caching: cacheStrategy: 'memory'
  • Use batch processing for large volumes
  • Consider using faster models

Unrealistic Data

  • Adjust context prompts for more specificity
  • Review and refine schemas
  • Add validation constraints

Memory Issues

  • Use streaming for large datasets
  • Process in batches
  • Clear cache periodically

Support & Contribution

Questions

Contributing

Contributions welcome! Please ensure:

  • Synthetic data remains realistic
  • Privacy and ethics guidelines are followed
  • Documentation is updated
  • Tests are included

License

MIT License - see LICENSE file in repository root.

Disclaimer

This software generates synthetic data for testing and planning purposes only. It should not be used to make decisions about real employees. Users are responsible for ensuring compliance with all applicable laws and regulations, including employment law, privacy law, and anti-discrimination law. The authors and maintainers assume no liability for misuse of this software.


Remember: This is synthetic data for testing. Always prioritize real employee privacy, dignity, and fairness in actual HR systems and processes.