memory-jukebox

A personal photo-and-music reminiscence tool: pick a song you love, and watch a reel of photos from your own archive drift past, matched to the song's era, mood, and imagery. Intended for a family audience (one household, not public distribution) and designed to draw on the well-documented phenomenon of music-evoked autobiographical memory.

This repository currently holds the music metadata ingest pipeline — the first of several components. The photo pipeline lives separately in photo-dedupe; the jukebox player itself is not yet built.

What this component does

music_ingest.py builds a standalone SQLite database (music_index.db) that describes your music collection in enough detail to drive the jukebox's photo-matching engine. It reads from four sources and progressively enriches what it knows:

Jellyfin album.nfo files — one per album folder, with inline track listings, genre, year, and MusicBrainz album IDs.
Jellyfin artist.nfo files — per artist, with biography and MusicBrainz artist IDs.
LRC lyric files — plain-text lyrics alongside audio files.
MediaMonkey MM5.DB — play counts, ratings, date added, and last-played timestamps. This is where personal significance comes from: a song played 100 times and rated five stars matters differently from one that's merely present in the library.

Three optional enrichment passes then generate the semantic themes that will drive photo-matching:

Genre dictionary — a small built-in map (folk → acoustic, reflective, earthy, intimate, storytelling; celtic → sea, rain, diaspora, pastoral, old, windswept; and so on).
MusicBrainz artist-tag API — community-contributed mood and style tags, queried once per artist, applied to every song by that artist.
Local LLM via Ollama — themes grounded in the artist biography and song lyrics, producing concrete imagery vocabulary (coast, dusk, kitchen, road) that matches the vocabulary the photo tagger uses.

Each phase is an independent, resumable subcommand. You can run them in any order, stop and restart freely, and re-run any one without affecting the others.

Why a separate database?

The music ingest is independent of the photo ingest. Writing to its own SQLite file (music_index.db) means:

The music pipeline can run on one machine while the photo pipeline runs on another.
The music database is a self-contained artefact — easy to back up, move, or share.
The jukebox player joins the two databases at query time via SQLite's ATTACH DATABASE — no schema coupling until the last possible moment.

Quick start

pip install -r requirements.txt

# Create the database (or just run any other phase — the schema is auto-created)
python music_ingest.py init --music-db D:\music_archive\music_index.db

# Ingest Jellyfin album and artist NFOs
python music_ingest.py jellyfin --music-db D:\music_archive\music_index.db --nfo-root D:\music

# Attach any LRC lyric files to their songs
python music_ingest.py lyrics --music-db D:\music_archive\music_index.db --music-root D:\music

# Merge MediaMonkey play counts, ratings, and dates
python music_ingest.py mediamonkey --music-db D:\music_archive\music_index.db --mm-db "%APPDATA%\MediaMonkey5\MM5.DB"

# Generate themes (each phase is independent — run as many as you want)
python music_ingest.py themes-genre --music-db D:\music_archive\music_index.db
python music_ingest.py themes-musicbrainz --music-db D:\music_archive\music_index.db
python music_ingest.py themes-llm --music-db D:\music_archive\music_index.db --year-from 1971 --year-to 1991

# See what you've got
python music_ingest.py report --music-db D:\music_archive\music_index.db

Requirements

requests>=2.31    (MusicBrainz API calls; optional)
ollama>=0.4       (LLM theme generation; optional)
tqdm>=4.0         (progress bars; optional)

requests, ollama, and tqdm are all optional — the script degrades gracefully if any are missing, refusing only the phases that need them. Only Python 3.10+ is required; everything else is standard library.

An NFO-saving Jellyfin music library is required for Phases 1 and 2. To enable NFO saving in Jellyfin, go to Dashboard → Libraries → (your music library) → Manage Library → turn on "Nfo" in the metadata savers list. The ingest handles UTF-8 BOMs, mangled ampersands, and stray whitespace without complaint.

A running Ollama server with a text-only LLM (llama3.1:8b, qwen2.5:7b, or similar) is required for Phase 7 only. The default host http://192.168.1.20:11434 can be overridden with --ollama-host.

What's in the database

The database has three primary tables and one operational log:

Table	What it holds
`artists`	One row per artist, with biography, MusicBrainz ID, and genre.
`songs`	One row per song (or remix variant). Bibliographic fields, file path, lyrics, MediaMonkey play-stats, derived significance score.
`song_themes`	Many-to-many of songs to themes, tagged with `source` so you can tell genre-dictionary themes from MusicBrainz from LLM-generated.
`music_ingest_log`	Every run appends a row with a JSON stats blob — useful for debugging long runs.

The significance_score on each song is a derived value: log(play_count + 1) × (rating / 5.0), with unrated songs using 0.5 as a neutral multiplier. This single number captures "how much does this song seem to have mattered to me" and will weight the jukebox's song selection later.

Subcommand reference

Subcommand	Required flags	What it does
`init`	`--music-db`	Create or verify the database schema.
`jellyfin`	`--music-db`, `--nfo-root`	Parse all `album.nfo` and `artist.nfo` files under `--nfo-root`. Match each track to an audio file in its album folder.
`lyrics`	`--music-db`, `--music-root`	Find all `*.lrc` files under `--music-root`. Strip timestamps and metadata tags; attach to the matching song.
`mediamonkey`	`--music-db`, `--mm-db`	Open MM5.DB read-only, merge play counts and ratings into existing songs or insert new MM-only rows. Close MediaMonkey first.
`themes-genre`	`--music-db`	Apply the built-in genre-to-themes dictionary to every song with a genre.
`themes-musicbrainz`	`--music-db`	Query MusicBrainz for every artist with an MBID; apply returned tags to every song by that artist. Rate-limited to 1 request/second.
`themes-llm`	`--music-db`	Generate themes via Ollama. Supports `--year-from`, `--year-to`, `--min-significance`, `--require-lyrics`, `--limit`.
`report`	`--music-db`	Print coverage summary, top 20 songs by significance, and theme breakdown.

Known limitations

MediaMonkey schema drift. The MM4 and MM5 schemas share the same core tables (Songs, Artists, Albums, Genres) but MM6 or later versions may change this. If the query fails, the script prints a diagnostic and you can inspect the schema with sqlite3 MM5.DB .schema.
Windows path case sensitivity. The MM-to-Jellyfin path matching lowercases both sides, but if your libraries reference the same files via different mount points or drive letters, path-match will fail and fuzzy artist+title matching takes over. Check the matched_path vs matched_fuzzy numbers in the mediamonkey phase report.
Artist duplicates. Jellyfin may write an artist name differently on different albums ("The Beatles" vs "Beatles"). Each distinct spelling becomes a separate artists row. A one-off SQL UPDATE can merge these when they matter.
Music copyright. Designed for household/personal use only. Distribution of the resulting reels with copyrighted music would need separate licensing or substitution with Creative Commons material.

Related components

photo-dedupe — ingest, dedup, AI tagging, blur culling, and similar-scene culling for the photo archive. Independent development, same household, shared design philosophy.
match.py (planned) — the jukebox matching engine. Uses SQLite ATTACH DATABASE to join music_index.db and ingest_index.db at query time.
player/ (planned) — the jukebox itself. Likely a single-file HTML app using sql.js for local database access, or a Raspberry Pi embedded variant with a physical interface.

References

Istvandity, L. (2017). Combining music and reminiscence therapy interventions for wellbeing in elderly populations: A systematic review. Complementary Therapies in Clinical Practice, 28, 18–25. https://doi.org/10.1016/j.ctcp.2017.03.003

Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex, 19(11), 2579–2594. https://doi.org/10.1093/cercor/bhp008

Krumhansl, C. L., & Zupnick, J. A. (2013). Cascading reminiscence bumps in popular music. Psychological Science, 24(10), 2057–2068. https://doi.org/10.1177/0956797613486486

Lazar, A., Thompson, H., & Demiris, G. (2014). A systematic review of the use of technology for reminiscence therapy. Health Education & Behavior, 41(1_suppl), 51S–61S. https://doi.org/10.1177/1090198114537067

Westerhof, G. J., & Bohlmeijer, E. T. (2014). Celebrating fifty years of research and applications in reminiscence and life review: State of the art and new directions. Journal of Aging Studies, 29, 107–114. https://doi.org/10.1016/j.jaging.2014.02.003

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
design.md		design.md
music_ingest.py		music_ingest.py
programmers_notes.md		programmers_notes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

memory-jukebox

What this component does

Why a separate database?

Quick start

Requirements

What's in the database

Subcommand reference

Known limitations

Related components

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

memory-jukebox

What this component does

Why a separate database?

Quick start

Requirements

What's in the database

Subcommand reference

Known limitations

Related components

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages