Caching strategy #1169
Replies: 2 comments
-
|
I agree, the caching could be improved a lot. Another thing I think is important is about having metadata in the database or having to request it every time as it is now. I opened a discussion about it some time ago (#948). After doing that I think that caching could be handled way better. Not really an expert but I wanted to give my 2 cents. |
Beta Was this translation helpful? Give feedback.
-
|
I also agree, a lot of good points here! Caching raw responses instead of the view makes a lot of sense to me. I also haven't looked much at other providers than TMDB, but this feels like it almost can be implemented as a wrapper around Regarding TTL, I think that needs to be more dynamic. The metadata for a movie that came out 30 years ago can probably be cached for weeks or months, but a TV season that is currently airing is updated at least once a week so the TTL must be much lower. Maybe the TTL can be determined based on how likely the media is to change given its release dates? I agree with @66Bunz that long term the best solution is to store the metadata in the database. I mentioned it in #948 that TMDB has endpoints that gives you a list of media that has had metadata updates, so this can be used to update only what is needed when it is needed. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, as I was working on a couple of changes I noticed the caching strategy used could benefit from discussion, broken down into two sections.
What is cached
Important to note I have only looked at the TMDB data, so this may not apply universally.
The data cached is the view, not the source. By that I mean we request data from TMDB, on success we then produce and cache a computed view of that data, such as external links.
The problem with this strategy is that the cache is tightly coupled with the view of data at that point in time, if we want to add an additional external link, based on the raw TMDB data, the cache has to be invalidated.
The alternative to this is to cache the raw data from TMDB, and produce the computed view of that data at runtime. The point of the cache is two fold, to improve performance and add some low-level of resiliency to API outages. The expensive part of the TMDB data is not the computation of the view, it's the API request to TMDB, caching the view limits the flexibility of the project to change.
Some pseudo code of how this would look before/after
Before:
After:
The only upside of caching the view is that you save a little space in the cache and technically it would be faster, but the computation that is happening today is so fast it will be imperceptible, but that comes onto my next point about how data is cached.
How it's cached
I haven't look too deeply into this, however I believe the how the data is cached could be changed to improve performance and resiliency.
A couple of points to talk about:
TTL
Looking at the configuration, it seems data is only cached for 24 hours, at least for TMDB data that seems very short lived given the data is seldom going to change, This should be configurable and have a longer default TTL, imo.
Running a cache inside Yamtrack container
I can see that offloading the responsibility to run Redis outside Yamtrack is simple, however that does limit the usefulness of Redis.
For example, if someone were to install Redis and point Yamtrack to that, it's defaults are not congruent to the data being cached, it would not persist to disk, it doesn't have a max memory limit, it's likely running unauth'd without SSL,, there is no eviction policy etc.
Given that it's unlikely someone would need to run an advanced Redis configuration with Yamtrack (i.e. sentinel, clustering, remote instance etc), moving Redis to run within the Yamtrack container would allow us to tightly configure the cache to have the most benefit with the nature of the data stored, some ideas
The option can always be given to the user to point to an external Redis if they like, but this would greatly simplify the distribution model (by default you have sqlite and redis in the container) and improve overall caching performance over time.
Thanks for reading, happy to discuss in more detail!
Beta Was this translation helpful? Give feedback.
All reactions