A personal photo-and-music reminiscence tool: pick a song you love, and watch a reel of photos from your own archive drift past, matched to the song's era, mood, and imagery. Intended for a family audience (one household, not public distribution) and designed to draw on the well-documented phenomenon of music-evoked autobiographical memory.
This repository currently holds the music metadata ingest pipeline — the first of several components. The photo pipeline lives separately in photo-dedupe; the jukebox player itself is not yet built.
music_ingest.py builds a standalone SQLite database (music_index.db) that describes your music collection in enough detail to drive the jukebox's photo-matching engine. It reads from four sources and progressively enriches what it knows:
- Jellyfin
album.nfofiles — one per album folder, with inline track listings, genre, year, and MusicBrainz album IDs. - Jellyfin
artist.nfofiles — per artist, with biography and MusicBrainz artist IDs. - LRC lyric files — plain-text lyrics alongside audio files.
- MediaMonkey
MM5.DB— play counts, ratings, date added, and last-played timestamps. This is where personal significance comes from: a song played 100 times and rated five stars matters differently from one that's merely present in the library.
Three optional enrichment passes then generate the semantic themes that will drive photo-matching:
- Genre dictionary — a small built-in map (folk →
acoustic, reflective, earthy, intimate, storytelling; celtic →sea, rain, diaspora, pastoral, old, windswept; and so on). - MusicBrainz artist-tag API — community-contributed mood and style tags, queried once per artist, applied to every song by that artist.
- Local LLM via Ollama — themes grounded in the artist biography and song lyrics, producing concrete imagery vocabulary (
coast,dusk,kitchen,road) that matches the vocabulary the photo tagger uses.
Each phase is an independent, resumable subcommand. You can run them in any order, stop and restart freely, and re-run any one without affecting the others.
The music ingest is independent of the photo ingest. Writing to its own SQLite file (music_index.db) means:
- The music pipeline can run on one machine while the photo pipeline runs on another.
- The music database is a self-contained artefact — easy to back up, move, or share.
- The jukebox player joins the two databases at query time via SQLite's
ATTACH DATABASE— no schema coupling until the last possible moment.
pip install -r requirements.txt
# Create the database (or just run any other phase — the schema is auto-created)
python music_ingest.py init --music-db D:\music_archive\music_index.db
# Ingest Jellyfin album and artist NFOs
python music_ingest.py jellyfin --music-db D:\music_archive\music_index.db --nfo-root D:\music
# Attach any LRC lyric files to their songs
python music_ingest.py lyrics --music-db D:\music_archive\music_index.db --music-root D:\music
# Merge MediaMonkey play counts, ratings, and dates
python music_ingest.py mediamonkey --music-db D:\music_archive\music_index.db --mm-db "%APPDATA%\MediaMonkey5\MM5.DB"
# Generate themes (each phase is independent — run as many as you want)
python music_ingest.py themes-genre --music-db D:\music_archive\music_index.db
python music_ingest.py themes-musicbrainz --music-db D:\music_archive\music_index.db
python music_ingest.py themes-llm --music-db D:\music_archive\music_index.db --year-from 1971 --year-to 1991
# See what you've got
python music_ingest.py report --music-db D:\music_archive\music_index.dbrequests>=2.31 (MusicBrainz API calls; optional)
ollama>=0.4 (LLM theme generation; optional)
tqdm>=4.0 (progress bars; optional)
requests, ollama, and tqdm are all optional — the script degrades gracefully if any are missing, refusing only the phases that need them. Only Python 3.10+ is required; everything else is standard library.
An NFO-saving Jellyfin music library is required for Phases 1 and 2. To enable NFO saving in Jellyfin, go to Dashboard → Libraries → (your music library) → Manage Library → turn on "Nfo" in the metadata savers list. The ingest handles UTF-8 BOMs, mangled ampersands, and stray whitespace without complaint.
A running Ollama server with a text-only LLM (llama3.1:8b, qwen2.5:7b, or similar) is required for Phase 7 only. The default host http://192.168.1.20:11434 can be overridden with --ollama-host.
The database has three primary tables and one operational log:
| Table | What it holds |
|---|---|
artists |
One row per artist, with biography, MusicBrainz ID, and genre. |
songs |
One row per song (or remix variant). Bibliographic fields, file path, lyrics, MediaMonkey play-stats, derived significance score. |
song_themes |
Many-to-many of songs to themes, tagged with source so you can tell genre-dictionary themes from MusicBrainz from LLM-generated. |
music_ingest_log |
Every run appends a row with a JSON stats blob — useful for debugging long runs. |
The significance_score on each song is a derived value: log(play_count + 1) × (rating / 5.0), with unrated songs using 0.5 as a neutral multiplier. This single number captures "how much does this song seem to have mattered to me" and will weight the jukebox's song selection later.
| Subcommand | Required flags | What it does |
|---|---|---|
init |
--music-db |
Create or verify the database schema. |
jellyfin |
--music-db, --nfo-root |
Parse all album.nfo and artist.nfo files under --nfo-root. Match each track to an audio file in its album folder. |
lyrics |
--music-db, --music-root |
Find all *.lrc files under --music-root. Strip timestamps and metadata tags; attach to the matching song. |
mediamonkey |
--music-db, --mm-db |
Open MM5.DB read-only, merge play counts and ratings into existing songs or insert new MM-only rows. Close MediaMonkey first. |
themes-genre |
--music-db |
Apply the built-in genre-to-themes dictionary to every song with a genre. |
themes-musicbrainz |
--music-db |
Query MusicBrainz for every artist with an MBID; apply returned tags to every song by that artist. Rate-limited to 1 request/second. |
themes-llm |
--music-db |
Generate themes via Ollama. Supports --year-from, --year-to, --min-significance, --require-lyrics, --limit. |
report |
--music-db |
Print coverage summary, top 20 songs by significance, and theme breakdown. |
- MediaMonkey schema drift. The MM4 and MM5 schemas share the same core tables (
Songs,Artists,Albums,Genres) but MM6 or later versions may change this. If the query fails, the script prints a diagnostic and you can inspect the schema withsqlite3 MM5.DB .schema. - Windows path case sensitivity. The MM-to-Jellyfin path matching lowercases both sides, but if your libraries reference the same files via different mount points or drive letters, path-match will fail and fuzzy artist+title matching takes over. Check the
matched_pathvsmatched_fuzzynumbers in themediamonkeyphase report. - Artist duplicates. Jellyfin may write an artist name differently on different albums ("The Beatles" vs "Beatles"). Each distinct spelling becomes a separate
artistsrow. A one-off SQL UPDATE can merge these when they matter. - Music copyright. Designed for household/personal use only. Distribution of the resulting reels with copyrighted music would need separate licensing or substitution with Creative Commons material.
photo-dedupe— ingest, dedup, AI tagging, blur culling, and similar-scene culling for the photo archive. Independent development, same household, shared design philosophy.match.py(planned) — the jukebox matching engine. Uses SQLiteATTACH DATABASEto joinmusic_index.dbandingest_index.dbat query time.player/(planned) — the jukebox itself. Likely a single-file HTML app usingsql.jsfor local database access, or a Raspberry Pi embedded variant with a physical interface.
Istvandity, L. (2017). Combining music and reminiscence therapy interventions for wellbeing in elderly populations: A systematic review. Complementary Therapies in Clinical Practice, 28, 18–25. https://doi.org/10.1016/j.ctcp.2017.03.003
Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex, 19(11), 2579–2594. https://doi.org/10.1093/cercor/bhp008
Krumhansl, C. L., & Zupnick, J. A. (2013). Cascading reminiscence bumps in popular music. Psychological Science, 24(10), 2057–2068. https://doi.org/10.1177/0956797613486486
Lazar, A., Thompson, H., & Demiris, G. (2014). A systematic review of the use of technology for reminiscence therapy. Health Education & Behavior, 41(1_suppl), 51S–61S. https://doi.org/10.1177/1090198114537067
Westerhof, G. J., & Bohlmeijer, E. T. (2014). Celebrating fifty years of research and applications in reminiscence and life review: State of the art and new directions. Journal of Aging Studies, 29, 107–114. https://doi.org/10.1016/j.jaging.2014.02.003