Skip to content

Releases: tabularis-ai/be_great

be_great v0.0.13

09 Feb 12:04

Choose a tag to compare

What's Changed (v0.0.12)

Added conditions parameter for constrained sampling, enabling logical constraints during generation (e.g. conditions={"age": ">= 30", "sex": "== 'Female'"}). Constraints are enforced at the token level using a trie-based LogitsProcessor, guaranteeing that every generated row satisfies the specified conditions. This closes GitHub issue #62.

Added a comprehensive metrics suite (be_great.metrics) for evaluating synthetic data quality across four dimensions:

  • Statistical: ColumnShapes, ColumnPairTrends, BasicStatistics
  • Privacy: DistanceToClosestRecord, kAnonymization, lDiversity, IdentifiabilityScore, DeltaPresence, MembershipInference
  • Utility: MLEfficiency (train-on-synthetic, test-on-real)
  • Discriminator: DiscriminatorMetric

Revamped LoRA fine-tuning support with a new lora_config parameter for full control over LoRA hyperparameters, automatic detection of target modules across model architectures, and proper save/load of LoRA adapter weights. peft is now an optional dependency installable via pip install be_great[lora].

Added random_conditional_col parameter to fit() (enabled by default). A different random column is selected for preconditioning in each training epoch, preventing any single column from being overfitted and producing more balanced synthetic data.

minor: Added scipy as a required dependency. Updated default model in Colab example to tabularisai/Qwen3-0.3B-distil. Added new examples for constrained sampling and random preconditioning. Improved device management with centralized _resolve_device(). Fixed typo in _partial_df_to_prompts. Cleaned up old dist artifacts from the repository.

be_great v0.0.9

14 May 14:14

Choose a tag to compare

What's Changed

  • Added guided_sampling, a new functionality for more reliable data generation using feature-by-feature guidance patterns. This addresses several sampling issues reported in GitHub issue #45.
  • Added float_precision parameter to the GReaT class which allows controlling the decimal precision of floating-point values. Setting this parameter helps reduce token usage and improve generation quality for numerical data.
  • Improved error handling with clearer, more actionable feedback when sampling fails.
  • minor: Set report_to=[] as default to disable Weights & Biases logging unless explicitly enabled.