This directory contains research experiments and examples using Google's Gemma models.
- VaultGemma: Privacy-focused fine-tuning with differential privacy.
- T5Gemma: Encoder-decoder variant of Gemma.
- TranslateGemma: Translation models built on Gemma 3
VaultGemma is a privacy-focused variant of Google's Gemma model family, designed for secure fine-tuning and deployment with differential privacy guarantees. This implementation demonstrates how to fine-tune VaultGemma 1B on medical data using LoRA (Low-Rank Adaptation) and Opacus differential privacy.
- 4-bit Quantization: Memory-efficient training using BitsAndBytes
- LoRA Fine-tuning: Parameter-efficient adaptation with <2% trainable parameters
- Differential Privacy: Privacy-preserving training with configurable ε and δ budgets
- Medical Q&A: Fine-tuned on medical flashcard dataset for healthcare applications
This repository contains code for both training and inference:
- Fine-tuning: How to fine-tune VaultGemma with differential privacy
- Inference: How to load and run fine-tuned VaultGemma models
| Notebook Name | Description |
|---|---|
| [VaultGemma]FineTuning_Inference_Huggingface.ipynb | Complete pipeline for fine-tuning VaultGemma 1B on medical data using LoRA adapters and differential privacy, with inference example |
- Medical Meadow Medical Flashcards dataset
- 4-bit NF4 quantization for reduced memory footprint
- LoRA adapters targeting all projection layers
- Opacus differential privacy (ε=3.0, δ=1e-5)
- Cosine learning rate schedule with warmup
- Automatic checkpointing based on loss thresholds
The same notebook includes inference code to:
- Load fine-tuned LoRA adapters
- Generate responses to medical questions
- Process single or batch queries
- Adjust generation parameters (temperature, top_p)
from transformers import AutoModelForCausalLM, GemmaTokenizer
from peft import PeftModel
# Load model and adapters
model = AutoModelForCausalLM.from_pretrained("google/vaultgemma-1b")
tokenizer = GemmaTokenizer.from_pretrained("google/vaultgemma-1b")
peft_model = PeftModel.from_pretrained(model, "path/to/adapters")
# Generate response
question = "What are the symptoms of diabetes?"
response = generate_response(question)torch
transformers
peft
opacus
datasets
bitsandbytes
kagglehub
This implementation provides (ε, δ)-differential privacy guarantees:
- Target ε: 3.0 (configurable)
- Target δ: 1e-5 (inverse of dataset size)
- Gradient clipping: Max norm of 1.0
- Privacy accounting: Automatic epsilon tracking via Opacus
T5Gemma (aka encoder-decoder Gemma) is a family of encoder-decoder large language models, developed by adapting pretrained decoder-only models into an encoder-decoder architecture.
| Notebook Name | Description |
|---|---|
| [T5Gemma]Example.ipynb | Guide to sampling and fine-tuning T5Gemma using Flax and Hugging Face |
| [T5Gemma_2]Example.ipynb | Guide to inference with T5Gemma 2 270m-270m via Hugging Face |
- Encoder-Decoder Architecture: Adapts decoder-only Gemma models to T5-style architecture.
- Scales:
- Gemma 2 scale: 2B-2B, 9B-2B, and 9B-9B.
- T5 scale: Small, Base, Large, XL, and ML.
- Frameworks: Examples provided for both Hugging Face (PyTorch) and Flax (Kauldron).
- Tasks:
- Sampling: Basic text generation examples.
- Fine-tuning: Example of fine-tuning for machine translation (English to French) using the MTNT dataset.
gemma
kauldron
etils
optax
treescope
kagglehub
transformers
datasets
TranslateGemma is a family of lightweight, state-of-the-art open translation models from Google, based on the Gemma 3 family of models.
TranslateGemma models are designed to handle translation tasks across 55 languages. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art translation models and helping foster innovation for everyone.
| Notebook Name | Description |
|---|---|
| [TranslateGemma]Example.ipynb | Guide to inference with TranslateGemma via Hugging Face |