Releases: tabularis-ai/be_great
be_great v0.0.13
What's Changed (v0.0.12)
Added conditions parameter for constrained sampling, enabling logical constraints during generation (e.g. conditions={"age": ">= 30", "sex": "== 'Female'"}). Constraints are enforced at the token level using a trie-based LogitsProcessor, guaranteeing that every generated row satisfies the specified conditions. This closes GitHub issue #62.
Added a comprehensive metrics suite (be_great.metrics) for evaluating synthetic data quality across four dimensions:
- Statistical:
ColumnShapes,ColumnPairTrends,BasicStatistics - Privacy:
DistanceToClosestRecord,kAnonymization,lDiversity,IdentifiabilityScore,DeltaPresence,MembershipInference - Utility:
MLEfficiency(train-on-synthetic, test-on-real) - Discriminator:
DiscriminatorMetric
Revamped LoRA fine-tuning support with a new lora_config parameter for full control over LoRA hyperparameters, automatic detection of target modules across model architectures, and proper save/load of LoRA adapter weights. peft is now an optional dependency installable via pip install be_great[lora].
Added random_conditional_col parameter to fit() (enabled by default). A different random column is selected for preconditioning in each training epoch, preventing any single column from being overfitted and producing more balanced synthetic data.
minor: Added scipy as a required dependency. Updated default model in Colab example to tabularisai/Qwen3-0.3B-distil. Added new examples for constrained sampling and random preconditioning. Improved device management with centralized _resolve_device(). Fixed typo in _partial_df_to_prompts. Cleaned up old dist artifacts from the repository.
be_great v0.0.9
What's Changed
- Added
guided_sampling, a new functionality for more reliable data generation using feature-by-feature guidance patterns. This addresses several sampling issues reported in GitHub issue #45. - Added
float_precisionparameter to the GReaT class which allows controlling the decimal precision of floating-point values. Setting this parameter helps reduce token usage and improve generation quality for numerical data. - Improved error handling with clearer, more actionable feedback when sampling fails.
- minor: Set report_to=[] as default to disable Weights & Biases logging unless explicitly enabled.