You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While fixing #7694 (slow NumbersStrip + slow /specs), I traced a few backend hot paths and noticed several DB-access patterns that are worth tackling in follow-ups. None are blockers on their own — #7694 already neutralized the user-visible symptom via aggregate queries on /stats, a fast-path on /plots/filter, and a startup prewarm — but each is a real performance smell and would compound if the catalog keeps growing.
Filing this as a tracking issue so the cluster doesn't get lost.
Concrete leads
1. /libraries/{library_id}/images over-fetches by ~13 MB per call
api/routers/libraries.py:71-112 loads every spec's code via SpecRepository.get_all_with_code() and then filters in Python for the one requested library. The Impl.code column is multi-MB total across all rows (see model comment at core/database/models.py:128), and we never need it on this endpoint.
Fix shape: add ImplRepository.get_preview_by_library(library_id) that selects only (spec_id, library_id, language_id, preview_url, preview_html) for that one library; replace the get_all_with_code() call. Add the same stale-while-revalidate pattern the other metadata endpoints use (it currently only has a plain set_cache/get_cache, no refresh_after).
2. /plots/filter still does get_all() even on cache hit
api/routers/plots.py:436-481 calls SpecRepository.get_all() (selectinload impls + library, hundreds of specs × dozen impls) on every cold-cache request, even ones that only differ from filter:all by one tag.
Fix shape: cache the heavy intermediate (all_images + spec_lookup + impl_lookup + global_counts) once under a stable key, and have the per-filter request only do the filtering work on top. The cache key churn (many filter combinations) currently means every new combo pays the full DB roundtrip.
3. or_counts is O(groups × images × categories) in Python
_calculate_or_counts in api/routers/plots.py:193-242 re-filters the entire image set once per active filter group, then re-aggregates 12 category counts. This is fine for ~300 specs but won't scale linearly past that, and it's all done on the request thread.
Fix shape: materialize per-spec / per-impl tag bags as denormalized rows in a small auxiliary table (or a SQL view), or precompute counts in a worker after each sync and serve from the cache.
4. get_total_code_lines scans every code blob on every cold cache
core/database/repositories.py:283-290 is SUM(LENGTH(code) - LENGTH(REPLACE(code, E'\n', ''))) across all rows. Even though /stats now uses lightweight COUNTs, this single aggregate still touches the deferred code column for every impl.
Fix shape: add a generated/computed Impl.line_count column maintained on insert/update (or persist it in the sync pipeline), so /stats is SELECT SUM(line_count).
5. Foreign-key indexes
Worth verifying with \d impls in psql: PostgreSQL does not auto-create indexes on FK columns. impls.spec_id, impls.library_id, impls.language_id, and feedback.spec_id are all queried by these columns. Missing indexes would surface as seq-scans on /specs/{id} and the new count_with_impls aggregate.
If absent, add via an Alembic migration.
6. Per-instance cache, no shared layer
The in-memory TTLCache in api/cache.py is per Cloud Run instance. The startup prewarm from #7694 covers the cold-start case, but autoscale to multiple instances still means each one starts cold until it gets its first request for each cached endpoint.
Fix shape (only if Cloud Monitoring shows it matters): Memorystore/Redis for the small-volume metadata caches (stats, libraries, languages, specs_list, specs_map). Skip for /plots/filter — payload size makes the network round-trip not worth it.
7. Connection pool tuning
core/database/connection.py uses SQLAlchemy defaults (pool_size=5, max_overflow=10). Not visibly a problem today, but worth confirming with metrics under burst load — every parallel AppDataProvider request currently fires four queries simultaneously per visitor.
Out of scope
/insights/* endpoints (separate hot path, separate issue if needed)
MCP server queries — they only run for the model integration, not for site traffic
Database schema redesign — keep migrations narrowly scoped to the items above
Context
While fixing #7694 (slow NumbersStrip + slow
/specs), I traced a few backend hot paths and noticed several DB-access patterns that are worth tackling in follow-ups. None are blockers on their own — #7694 already neutralized the user-visible symptom via aggregate queries on/stats, a fast-path on/plots/filter, and a startup prewarm — but each is a real performance smell and would compound if the catalog keeps growing.Filing this as a tracking issue so the cluster doesn't get lost.
Concrete leads
1.
/libraries/{library_id}/imagesover-fetches by ~13 MB per callapi/routers/libraries.py:71-112loads every spec's code viaSpecRepository.get_all_with_code()and then filters in Python for the one requested library. TheImpl.codecolumn is multi-MB total across all rows (see model comment atcore/database/models.py:128), and we never need it on this endpoint.Fix shape: add
ImplRepository.get_preview_by_library(library_id)that selects only(spec_id, library_id, language_id, preview_url, preview_html)for that one library; replace theget_all_with_code()call. Add the same stale-while-revalidate pattern the other metadata endpoints use (it currently only has a plainset_cache/get_cache, norefresh_after).2.
/plots/filterstill doesget_all()even on cache hitapi/routers/plots.py:436-481callsSpecRepository.get_all()(selectinload impls + library, hundreds of specs × dozen impls) on every cold-cache request, even ones that only differ fromfilter:allby one tag.Fix shape: cache the heavy intermediate (
all_images+spec_lookup+impl_lookup+global_counts) once under a stable key, and have the per-filter request only do the filtering work on top. The cache key churn (many filter combinations) currently means every new combo pays the full DB roundtrip.3.
or_countsis O(groups × images × categories) in Python_calculate_or_countsinapi/routers/plots.py:193-242re-filters the entire image set once per active filter group, then re-aggregates 12 category counts. This is fine for ~300 specs but won't scale linearly past that, and it's all done on the request thread.Fix shape: materialize per-spec / per-impl tag bags as denormalized rows in a small auxiliary table (or a SQL view), or precompute counts in a worker after each sync and serve from the cache.
4.
get_total_code_linesscans every code blob on every cold cachecore/database/repositories.py:283-290isSUM(LENGTH(code) - LENGTH(REPLACE(code, E'\n', '')))across all rows. Even though/statsnow uses lightweight COUNTs, this single aggregate still touches the deferredcodecolumn for every impl.Fix shape: add a generated/computed
Impl.line_countcolumn maintained on insert/update (or persist it in the sync pipeline), so/statsisSELECT SUM(line_count).5. Foreign-key indexes
Worth verifying with
\d implsin psql: PostgreSQL does not auto-create indexes on FK columns.impls.spec_id,impls.library_id,impls.language_id, andfeedback.spec_idare all queried by these columns. Missing indexes would surface as seq-scans on/specs/{id}and the newcount_with_implsaggregate.If absent, add via an Alembic migration.
6. Per-instance cache, no shared layer
The in-memory
TTLCacheinapi/cache.pyis per Cloud Run instance. The startup prewarm from #7694 covers the cold-start case, but autoscale to multiple instances still means each one starts cold until it gets its first request for each cached endpoint.Fix shape (only if Cloud Monitoring shows it matters): Memorystore/Redis for the small-volume metadata caches (
stats,libraries,languages,specs_list,specs_map). Skip for/plots/filter— payload size makes the network round-trip not worth it.7. Connection pool tuning
core/database/connection.pyuses SQLAlchemy defaults (pool_size=5, max_overflow=10). Not visibly a problem today, but worth confirming with metrics under burst load — every parallelAppDataProviderrequest currently fires four queries simultaneously per visitor.Out of scope
/insights/*endpoints (separate hot path, separate issue if needed)Priority suggestion
/libraries/{id}/images— biggest absolute win, no risk/plots/filtercache split — bigger refactor, biggest payoff on cold catalog growthline_count— only matters if/statscold-cache stays slow after the others land