DataGen

DataGen is a synthetic enterprise data generation platform. It procedurally builds realistic enterprise datasets that teams can use for labs, validation, demos, exports, discovery-tool testing, and downstream integration work.

Changelog

v0.8.1

corrected repository realism so modern collaboration-heavy enterprises no longer emit one top-level file share per user home/profile path
replaced the inflated personal-share model with a small set of realistic hidden roots such as users$ / profiles$ plus only limited owner-specific exception shares
regenerated the Duckburg DTED package from the improved source contract, reducing actual file shares from an unrealistic 22k+ shape to a believable modern footprint

v0.8.0

added first-class Active Directory site, site-link, subnet, and IP-allocation realism so generated hybrid environments now include credible topology surfaces beyond OU structure alone
hardened CMDB realism with more believable criticality spread across infrastructure, application, platform, data, collaboration, and software configuration items
improved flagship repository, collaboration, and access-group realism so generated environments rely more clearly on group-centric resource access and less on synthetic naming artifacts
refreshed the Duckburg DTED package from the updated source contract, including topology, plugin-record, CMDB, and repository realism improvements

v0.7.0

hardened identity and access realism around device accounts, shared resources, group-centric access, and OU-aware account repair semantics
improved application and repository access evidence so major enterprise apps and shared resources more clearly flow through realistic governing groups
eliminated remaining flagship naming artifacts such as duplicate sAMAccountName values, synthetic mailbox/access suffixes, and weak team/resource labels in Duckburg
broadened realism validation and regenerated the Duckburg DTED bundle from the updated source contract

v0.6.0

hardened flagship realism across organization structure, reporting lines, team naming, policy scope evidence, CMDB evidence, and Duckburg scenario composition
added richer DTED-facing export evidence, including typed policy-setting source and behavior fields plus CMDB matching and recovery metadata such as fqdn, unc_path, rto_hours, and rpo_hours
improved bridge-readiness for downstream consumers by aligning account lifecycle/state evidence and non-AD identity-store association inputs without baking DTED-specific inference into DataGen itself
regenerated the Duckburg DTED demo package with the updated realism, policy, container, plugin-record, and CMDB surfaces

v0.5.1

removed the vulnerable transitive uuid path from the website toolchain by vendoring a patched sockjs copy that uses Node's built-in crypto.randomUUID()
refreshed the website lockfile so npm audit is clean again without waiting on an upstream Docusaurus or webpack-dev-server release
verified both docusaurus build and docusaurus start still work with the patched docs dependency tree

v0.5.0

fixed large-scenario person display-name collisions so flagship datasets no longer emit unrealistic repeated identity clusters
added stronger account and device evidence, including exported account lifecycle timestamps and explicit application classification fields
improved identity store realism with cleaner AD, Entra, and Okta naming/domain surfaces
expanded policy realism to richer enterprise-scale policy families, path metadata, and identity-store scope evidence
added acquired-company scenario support for Duckburg and related flagship scenarios
tightened repository and collaboration realism so site and library metrics align with generated child content

v0.4.4

corrected the release tag lineage so the GitHub release workflow runs from the fixed flagship acceptance test revision
preserves the v0.4.3 portability and release-test fixes, but publishes them under a clean new release tag

v0.4.3

fixed the flagship realism acceptance test so release builds no longer depend on a local artifacts\duckburg-subset.scenario.json file
tightened the repo portability validator so it no longer self-matches on its own detection pattern during CI and release runs

v0.4.2

added a repo portability validator and optional pre-push hook to catch machine-specific absolute paths before they break CI or releases
updated the realism review defaults to use repo-stable scenarios instead of local artifact paths
removed remaining local path defaults from the catalog build script and related docs

v0.4.1

replaced non-cryptographic machine-account password generation with cryptographically secure randomness
added explicit read-only GitHub Actions workflow permissions so CI and release automation satisfy current security policy

v0.4.0

added first-class bundled domain packs for ITSM, SecOps, and BusinessOps, plus scenario-native pack enablement
added temporal simulation foundations with timeline events, drift history, and normalized temporal export artifacts
productized scenario authoring with archetypes, persona presets, smarter overlays, and an archetype-first wizard flow
expanded end-to-end realism for organization structure, geography, identity, groups, policies, repositories, CMDB data, applications, and infrastructure
added structured quality reporting, scored validation outputs, realism review automation, and CI quality artifacts
tightened external-organization modeling so vendor metadata is no longer treated as a business relationship by default

v0.3.0

improved end-to-end realism for people, offices, applications, repositories, and architecture objects
added curated country-specific name catalogs for the United States, United Kingdom, Canada, Australia, and New Zealand
tightened international office locality, phone, and address generation, with focused upgrades for the UK, Canada, and Mexico
made repository, collaboration, and application URLs more exportable and domain-consistent
added first-class normalized export coverage for network assets and richer office address fields
refreshed the Duckburg Industries DTED demo bundle with the newer realism and export improvements

What DataGen does

DataGen is designed to generate believable enterprise structure without hand-authoring every user, group, device, application, repository, policy, or CMDB record.

Current product capabilities include:

scenario-first world generation with archetypes, persona presets, overlays, JSON, and a terminal wizard
identity, infrastructure, repository, application, policy, access-evidence, observed-data, and CMDB generation
temporal simulation with change events and snapshot-oriented export surfaces
hard identity invariants so duplicate user principal names are blocked instead of emitted as "realistic" flaws
configurable realism through deviation profiles such as Clean, Realistic, and Aggressive
normalized export and quality validation surfaces for downstream tooling and CI
a plugin model for extending the synthetic dataset safely
bundled first-party domain packs for ITSM, SecOps, and BusinessOps using the native scenario packs shape

What DataGen is not

DataGen’s responsibility is to procedurally generate synthetic enterprise data.

That means:

DataGen plugins may extend the generated dataset or add realism overlays
DataGen plugins should not translate output into consumer-specific import contracts
bridges, adapters, and import shapers for downstream systems belong outside the DataGen plugin ecosystem

Common use cases

populate Active Directory and Entra-focused labs
create broad enterprise validation environments
generate CMDB-rich and discovery-oriented datasets
validate repository and collaboration tooling
export normalized data for downstream consumers
extend worlds with synthetic plugin-driven overlays

Getting started

Install from PowerShell Gallery

For normal module use, install the published package from PowerShell Gallery:

Install-PSResource SyntheticEnterprise.PowerShell -Repository PSGallery
Import-Module SyntheticEnterprise.PowerShell

The Gallery package includes the seeded runtime catalog at catalogs\catalogs.sqlite inside the module. You do not need to download the separate catalogs.sqlite GitHub release asset for standard generation commands.

New-SEEnterpriseWorld loads the bundled catalog automatically when you omit -CatalogRootPath:

$scenario = New-SEScenarioFromArchetype -Archetype RegionalManufacturer | Resolve-SEScenario
$world = New-SEEnterpriseWorld -Scenario $scenario -Seed 4242

Use -CatalogRootPath only when you want to override the bundled catalog with a custom catalog directory or SQLite database.

Build from source

If you do not already have a local seeded catalog database, generate it first:

.\scripts\build-catalog-artifact.ps1 -InstallToCatalogRoot

That command writes the canonical build output to artifacts\catalog\catalogs.sqlite and installs a local working copy to catalogs\catalogs.sqlite for source builds.

The separate catalogs.sqlite GitHub release asset is provided for inspection, custom catalog workflows, and direct consumers that want the SQLite file outside the module package.

Build the solution

dotnet build .\DataGen.slnx -v minimal

To enable the repo-managed pre-push hook that catches machine-specific path leaks before you publish changes:

.\scripts\enable-git-hooks.ps1

Run the tests

dotnet test .\DataGen.slnx -v minimal /p:UseSharedCompilation=false -m:1

Import the PowerShell module

$modulePath = Join-Path $PWD 'src\SyntheticEnterprise.PowerShell\bin\Debug\net8.0\SyntheticEnterprise.PowerShell.dll'
Import-Module $modulePath -Force
Get-Command -Module SyntheticEnterprise.PowerShell | Sort-Object Name

If you want a release-style module bundle with a real manifest, package it first:

.\scripts\package-module.ps1 -Version 0.8.1 -Configuration Release
Import-Module .\artifacts\module\SyntheticEnterprise.PowerShell\0.8.1\SyntheticEnterprise.PowerShell.psd1 -Force

Generate a first world

$scenario = New-SEScenarioFromArchetype -Archetype RegionalManufacturer
$scenario = Resolve-SEScenario -Scenario $scenario
$world = New-SEEnterpriseWorld -Scenario $scenario -Seed 4242
$world | Get-SEWorldSummary

Export normalized artifacts

$world | Export-SEEnterpriseWorld `
  -OutputPath .\out\first-world `
  -Format Json `
  -Profile Normalized `
  -IncludeManifest `
  -IncludeSummary `
  -Overwrite

Review realism and quality

.\scripts\invoke-realism-review.ps1 `
  -ScenarioPath .\examples\regional_manufacturer.scenario.json `
  -Seed 4242 `
  -OutputPath .\artifacts\quality\realism-review.md `
  -JsonOutputPath .\artifacts\quality\realism-review.json `
  -OutputFormat Both

That review emits a human-readable summary plus machine-readable quality validation output that can also be used in CI.

Repository guide

The most important areas of the repository are:

src/ Core libraries, contracts, exporting, PowerShell module surface, and plugin host
catalogs/ Curated runtime catalog sources and packaged SQLite data
tests/ Core, exporting, integration, and workflow coverage
sdk/ Plugin SDK documentation and examples
website/ Docusaurus-based documentation site for GitHub Pages
docs/ Additional product and architecture documentation that informs the user-facing docs
examples/ Utility and helper scripts

Documentation

The primary user-facing documentation now lives in the Docusaurus site under website/.

To work on the docs locally:

Set-Location .\website
npm install
npm run start

To verify the production build:

npm run build

The docs site includes:

getting started guides
cmdlet reference
release notes and roadmap pages
multiple end-to-end walkthroughs
SDK and plugin architecture guidance
contribution guidance
integration and export patterns

First-Party Packs

DataGen now includes bundled first-party packs under packs/first-party/.

These packs use the existing external plugin runtime and can be enabled directly from scenario JSON through the packs section. The current bundled set includes:

FirstParty.NoOp
FirstParty.ITSM
FirstParty.SecOps
FirstParty.BusinessOps

For a concrete example, see:

examples/regional_manufacturer_packs.scenario.json
docs/FirstParty_Packs_Walkthrough.md

The same scenario model also supports temporal outputs and quality reports directly on the generation result.

Walkthrough assets

Reference walkthrough scenarios and scripts used by the docs site live under:

website/static/examples/scenarios/
website/static/examples/scripts/

These are intended to be practical starting points for common workflows such as:

general enterprise lab generation
Active Directory lab generation
Entra-focused tenant generation
hybrid identity generation
repository and collaboration-heavy worlds
plugin-extended dataset generation

Contributing

Contributions are welcome across the product and the docs site.

Good contribution targets include:

catalog improvements
scenario and walkthrough coverage
cmdlet help and examples
SDK examples that respect the plugin boundary
docs site polish and usability improvements

Before pushing changes, enable the repo-managed hooks once:

.\scripts\enable-git-hooks.ps1

That pre-push hook runs .\scripts\validate-repo-portability.ps1 so local absolute paths do not slip into tracked files.

When contributing, please keep the product boundary clear:

DataGen core generates synthetic enterprise data
DataGen plugins enrich that synthetic dataset
downstream-system translation belongs in external adapters or companion integrations

Publishing notes

The docs site is configured for GitHub Pages deployment through GitHub Actions. The workflow lives at:

.github/workflows/deploy-docs-site.yml

Repository validation and module packaging are also automated through GitHub Actions:

.github/workflows/ci.yml
.github/workflows/release-module.yml

The release workflow creates both the versioned module bundle and a PowerShell Gallery .nupkg, then publishes to PSGallery by using the PSGAL repository secret.

The repository also ignores generated docs-site artifacts and local scratch inspection scripts so the publishable tree stays clean.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.beads		.beads
.githooks		.githooks
.github/workflows		.github/workflows
catalogs		catalogs
docs		docs
examples		examples
packs/first-party		packs/first-party
schemas		schemas
scripts		scripts
sdk		sdk
src		src
tests		tests
website		website
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DataGen.slnx		DataGen.slnx
Directory.Build.props		Directory.Build.props
LICENSE.txt		LICENSE.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DataGen

Changelog

v0.8.1

v0.8.0

v0.7.0

v0.6.0

v0.5.1

v0.5.0

v0.4.4

v0.4.3

v0.4.2

v0.4.1

v0.4.0

v0.3.0

What DataGen does

What DataGen is not

Common use cases

Getting started

Install from PowerShell Gallery

Build from source

Build the solution

Run the tests

Import the PowerShell module

Generate a first world

Export normalized artifacts

Review realism and quality

Repository guide

Documentation

First-Party Packs

Walkthrough assets

Contributing

Publishing notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages