Disclaimer: Just another AI slop project, mainly created for personal use. Manage your expectations accordingly.
Status: Active development has moved to the Android app, which connects to a remote LM Studio instance to tag photos directly from your phone. The Python scripts below remain functional and are kept for desktop/server use.
A suite of Python scripts that automate organising and tagging a local photo library using a local AI model. Images are analysed by a vision-language model that generates keyword tags and a short description, which are then written directly into standard image metadata fields.
The scripts work with any local AI server that exposes an OpenAI-compatible API (e.g. LM Studio, Ollama).
- AI-Powered Tagging & Captioning – generates keyword tags and a natural-language description for each image, written to XMP Subject and IPTC/XMP Caption metadata.
- Tagging Provenance – records the model name (
XMP-xmp:CreatorTool) and tagging date (XMP-xmp:MetadataDate) so you always know when and how an image was tagged. - Recursive Processing – scans nested subdirectories automatically.
- Advanced Tag Deduplication – uses embedding-based semantic similarity (DBSCAN clustering) to merge near-duplicate tags across your entire collection.
- Metadata-First – writes everything to the image file itself (XMP, IPTC), ensuring compatibility with photo managers like Adobe Lightroom, digiKam, or Aves Gallery.
- Python 3.9+
- A running local AI model server exposing an OpenAI-compatible API endpoint.
exiftoolinstalled and on yourPATH.- Python packages – install with:
(
pip install requests numpy scikit-learn
numpyandscikit-learnare only needed for the tag deduplicator.)
.
├── image_tagger.py # Main tagging script (entry point)
├── tag_deduplicator.py # Separate deduplication script
└── lib/
├── __init__.py
├── ai_client.py # AI API interaction & response parsing
├── image_utils.py # Image format detection, base64 encoding, scanning
└── metadata.py # XMP/IPTC read & write helpers (exiftool)
Recursively scans a directory for images, sends each to a vision-language model, and writes the returned data into the image's metadata:
| Metadata field | Content |
|---|---|
| XMP Subject | AI-generated keyword tags |
| IPTC Caption-Abstract / XMP-dc Description | Short description of the image |
| XMP-xmp CreatorTool | LocalAIPhotoTagger (<model>) |
| XMP-xmp MetadataDate | UTC timestamp of the tagging run |
Usage:
# Tag all images (replace existing tags)
python image_tagger.py /path/to/photos
# Append new tags to any existing ones
python image_tagger.py /path/to/photos --append-tags
# Use a different model / server
python image_tagger.py /path/to/photos --model google/gemma-3-4b --api-url http://localhost:1234
# Verbose output
python image_tagger.py /path/to/photos -vOptions:
| Flag | Default | Description |
|---|---|---|
input_dir |
(required) | Directory containing images to tag (scanned recursively). |
--api-url |
http://127.0.0.1:1234 |
Base URL of the OpenAI-compatible API server. |
--model |
google/gemma-3n-e4b |
Vision-language model identifier. |
--append-tags |
off | Append new tags to existing XMP Subject tags instead of replacing them. |
-v, --verbose |
off | Enable debug-level logging output. |
Over time AI-generated tags can become inconsistent (e.g. "Dslr" vs "DSLR", "Forest" vs "Woodland"). This script fixes that using embedding-based semantic similarity:
- Collects every unique XMP Subject tag across all images.
- Computes text embeddings and clusters similar tags with DBSCAN.
- Selects the best representative tag per cluster (by frequency, length, alphabetical order).
- Rewrites the XMP metadata on all affected images.
- Saves a detailed JSON report.
Usage:
# Analyse and apply changes
python tag_deduplicator.py /path/to/photos
# Preview changes without modifying files
python tag_deduplicator.py /path/to/photos --dry-run
# Adjust similarity threshold (0.0–1.0, default 0.85)
python tag_deduplicator.py /path/to/photos --similarity-threshold 0.9Options:
| Flag | Default | Description |
|---|---|---|
input_dir |
(required) | Directory containing tagged images (scanned recursively). |
--embedding-url |
http://localhost:1234 |
Base URL of the embedding API server. |
--embedding-model |
text-embedding-nomic-embed-text-v1.5 |
Embedding model identifier. |
--similarity-threshold |
0.85 |
Cosine similarity threshold for grouping tags (0.0–1.0). |
--min-cluster-size |
2 |
Minimum tags required to form a cluster. |
--batch-size |
100 |
Tags per embedding API request. |
--dry-run |
off | Report proposed changes without modifying any files. |
-v, --verbose |
off | Enable debug-level logging output. |
A recommended setup is using LM Studio to run the local AI models.
The following models are the current defaults and serve as good starting points:
- Tagging (Vision-Language Model): google/gemma-3-4b
- Tag Deduplication (Embedding Model): nomic-ai/nomic-embed-text-v1.5-GGUF
For viewing and managing tagged photos on Android, Aves Gallery is a great open-source option. F-Stop Gallery is another alternative.
Note on F-Stop: To make newly changed tags visible it is often necessary to rename the folder containing the images and then rename it back. Simply clearing the app cache does not seem to be sufficient to force a metadata refresh.
- Test and evaluate other vision and language models for performance and quality.
- Find a better Android gallery app with robust and responsive tag support (or maybe create one).
- Improve instructions, guides, and documentation.
- Explore integrating these tools directly into a cross-platform gallery application.