Caching strategy #1169

connorjburton · 2026-02-08T12:02:13Z

connorjburton
Feb 8, 2026

Hi, as I was working on a couple of changes I noticed the caching strategy used could benefit from discussion, broken down into two sections.

What is cached

Important to note I have only looked at the TMDB data, so this may not apply universally.

The data cached is the view, not the source. By that I mean we request data from TMDB, on success we then produce and cache a computed view of that data, such as external links.

The problem with this strategy is that the cache is tightly coupled with the view of data at that point in time, if we want to add an additional external link, based on the raw TMDB data, the cache has to be invalidated.

The alternative to this is to cache the raw data from TMDB, and produce the computed view of that data at runtime. The point of the cache is two fold, to improve performance and add some low-level of resiliency to API outages. The expensive part of the TMDB data is not the computation of the view, it's the API request to TMDB, caching the view limits the flexibility of the project to change.

Some pseudo code of how this would look before/after

Before:

data = cache.get(cache_key)

if data is None:
  # request TMDB and produce view
  data = { computed view here }
  cache.set(cache_key, data)

return data

After:

raw = cache.get(cache_key)

if raw is None:
  # request TMDB
  raw = tmdb_request()
  cache.set(cache_key, raw)

return compute_view(raw)

The only upside of caching the view is that you save a little space in the cache and technically it would be faster, but the computation that is happening today is so fast it will be imperceptible, but that comes onto my next point about how data is cached.

How it's cached

I haven't look too deeply into this, however I believe the how the data is cached could be changed to improve performance and resiliency.

A couple of points to talk about:

TTL
Running a cache inside Yamtrack container

TTL
Looking at the configuration, it seems data is only cached for 24 hours, at least for TMDB data that seems very short lived given the data is seldom going to change, This should be configurable and have a longer default TTL, imo.

Running a cache inside Yamtrack container
I can see that offloading the responsibility to run Redis outside Yamtrack is simple, however that does limit the usefulness of Redis.

For example, if someone were to install Redis and point Yamtrack to that, it's defaults are not congruent to the data being cached, it would not persist to disk, it doesn't have a max memory limit, it's likely running unauth'd without SSL,, there is no eviction policy etc.

Given that it's unlikely someone would need to run an advanced Redis configuration with Yamtrack (i.e. sentinel, clustering, remote instance etc), moving Redis to run within the Yamtrack container would allow us to tightly configure the cache to have the most benefit with the nature of the data stored, some ideas

LRU eviction policy
Reasonable max memory (<1gb?)
Authenticated /w SSL (zero trust)
Persist to disk (restart does not clear cache)

The option can always be given to the user to point to an external Redis if they like, but this would greatly simplify the distribution model (by default you have sqlite and redis in the container) and improve overall caching performance over time.

Thanks for reading, happy to discuss in more detail!

66Bunz · 2026-02-08T13:38:18Z

66Bunz
Feb 8, 2026

I agree, the caching could be improved a lot.
Since Yamtrack was developed only to show web pages, I think it's missing a lot of stuff, that's why I started working on an API at #924, so in the future everything can be centralized better (I'm currently studying on my last exam, then I'll resume working on it).
So about what is cached I agree, right now data is structured to be used on views, but it should be more "agnostic".

Another thing I think is important is about having metadata in the database or having to request it every time as it is now. I opened a discussion about it some time ago (#948).
I think media metadata should be saved in the database, to reduce the number of requests to the providers APIs that, as you said, are very expensive.

After doing that I think that caching could be handled way better.

Not really an expert but I wanted to give my 2 cents.

0 replies

andrebk · 2026-02-13T18:24:27Z

andrebk
Feb 13, 2026

I also agree, a lot of good points here!

Caching raw responses instead of the view makes a lot of sense to me. I also haven't looked much at other providers than TMDB, but this feels like it almost can be implemented as a wrapper around request.get(), and be used for any provider, using the URL as the cache key.

Regarding TTL, I think that needs to be more dynamic. The metadata for a movie that came out 30 years ago can probably be cached for weeks or months, but a TV season that is currently airing is updated at least once a week so the TTL must be much lower. Maybe the TTL can be determined based on how likely the media is to change given its release dates?

I agree with @66Bunz that long term the best solution is to store the metadata in the database. I mentioned it in #948 that TMDB has endpoints that gives you a list of media that has had metadata updates, so this can be used to update only what is needed when it is needed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching strategy #1169

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Caching strategy #1169

Uh oh!

Uh oh!

connorjburton Feb 8, 2026

What is cached

How it's cached

Replies: 2 comments

Uh oh!

66Bunz Feb 8, 2026

Uh oh!

andrebk Feb 13, 2026

connorjburton
Feb 8, 2026

66Bunz
Feb 8, 2026

andrebk
Feb 13, 2026