Problem
The repository currently stores all reference images (gemini_black/, gemini_white/, gemini_random/) directly in git. As contributors add more images (e.g. gemini_black_nb_pro/, gemini_white_nb_pro/ with 150-200 images each), the repo size will grow significantly - git isn't designed for large binary datasets and every clone will download the full history of these files.
Suggestion
Migrate image datasets to Hugging Face Datasets or Kaggle Datasets:
Option A: Hugging Face Hub (recommended)
- Create a dataset repo at
huggingface.co/datasets/aloshdenny/reverse-synthid-images
- Organize by folder:
gemini_black/, gemini_white/, gemini_black_nb_pro/, etc.
- Contributors can upload via
huggingface_hub CLI or the web UI
- Easy to load in Python:
from datasets import load_dataset
- Free hosting, versioned, supports large files natively via LFS
Option B: Kaggle Datasets
- Host at
kaggle.com/datasets/aloshdenny/reverse-synthid-images
- Contributors upload via Kaggle API
- Good visibility in the ML community
Repo changes needed
- Move existing images to the chosen platform
- Replace image folders with a download script (e.g.
scripts/download_images.py)
- Add the dataset link to README
- Update contribution guide to point contributors to upload images there instead of PRs
Current repo size concern
The existing gemini_black/ (101 images), gemini_white/ (101 images), and gemini_random/ (88 images) already contribute significantly. With the new nb_pro folders requesting 150-200 images each, plus future model variants, the repo could easily exceed 1-2 GB - making clones slow and CI expensive.
Benefits
- Faster clones - code-only repo stays small
- Better for contributors - uploading images to HF/Kaggle is simpler than large git PRs
- Versioning - HF Hub tracks dataset versions properly
- Discoverability - datasets on HF/Kaggle get more visibility from the ML research community
Problem
The repository currently stores all reference images (
gemini_black/,gemini_white/,gemini_random/) directly in git. As contributors add more images (e.g.gemini_black_nb_pro/,gemini_white_nb_pro/with 150-200 images each), the repo size will grow significantly - git isn't designed for large binary datasets and every clone will download the full history of these files.Suggestion
Migrate image datasets to Hugging Face Datasets or Kaggle Datasets:
Option A: Hugging Face Hub (recommended)
huggingface.co/datasets/aloshdenny/reverse-synthid-imagesgemini_black/,gemini_white/,gemini_black_nb_pro/, etc.huggingface_hubCLI or the web UIfrom datasets import load_datasetOption B: Kaggle Datasets
kaggle.com/datasets/aloshdenny/reverse-synthid-imagesRepo changes needed
scripts/download_images.py)Current repo size concern
The existing
gemini_black/(101 images),gemini_white/(101 images), andgemini_random/(88 images) already contribute significantly. With the newnb_profolders requesting 150-200 images each, plus future model variants, the repo could easily exceed 1-2 GB - making clones slow and CI expensive.Benefits