Skip to content

Multi-tenant /Catalogs Extension#880

Open
jonhealy1 wants to merge 41 commits intostac-utils:mainfrom
jonhealy1:multi-tenant-catalogs-extension
Open

Multi-tenant /Catalogs Extension#880
jonhealy1 wants to merge 41 commits intostac-utils:mainfrom
jonhealy1:multi-tenant-catalogs-extension

Conversation

@jonhealy1
Copy link
Collaborator

@jonhealy1 jonhealy1 commented Feb 7, 2026

Related Issue(s):

Description: Multi-Tenant Catalogs Extension

This extension introduces a recursive /catalogs endpoint to the STAC API, enabling complex, nested hierarchies beyond the standard flat Root -> Collections structure. It transforms a STAC API into a Multi-Tenant system capable of serving distinct catalog trees (e.g., Provider -> Theme -> Project) within a single instance.

Key Architectural Concepts:

  • Recursive Hierarchy: Unlike standard STAC, which flattens data into a single list of Collections, this extension allows Catalogs to contain other Sub-Catalogs to unlimited depth.
  • Poly-Hierarchy (Virtual Organization): Collections are not physically moved but logically linked. A single Collection can belong to multiple parent Catalogs simultaneously (e.g., a "Sentinel-2" collection can exist under both a "USGS" catalog and an "Optical Data" catalog).
  • Safety-First Management: The architecture strictly separates "Organization" from "Data."
    • Unlinking vs. Deleting: Deleting a Catalog via this extension is non-destructive to the actual data. It "unlinks" the child Collections.
    • Orphan Adoption: If a Collection is unlinked from its last parent, it is automatically adopted by the Root Catalog to ensure no data is ever lost or becomes undiscoverable.
  • Unified Discovery: Integrates with the Children Extension to provide a single view (/children) that lists both Sub-Catalogs and Collections, supporting optional type filtering.
  • Configurable Transactions. The extension supports an enable_transactions flag (default: False).

Full Endpoint List:

  • Registry & Root:
    • GET /catalogs - List all root catalogs.
    • POST /catalogs - Register a new root catalog.
  • Catalog Management:
    • GET /catalogs/{catalog_id} - Get catalog metadata.
    • PUT /catalogs/{catalog_id} - Update catalog metadata.
    • DELETE /catalogs/{catalog_id} - Disband a catalog (does not delete data).
  • Sub-Catalogs (Recursive):
    • GET /catalogs/{catalog_id}/catalogs - List sub-catalogs.
    • POST /catalogs/{catalog_id}/catalogs - Create or link a sub-catalog.
    • DELETE /catalogs/{catalog_id}/catalogs/{sub_catalog_id} - Unlink a sub-catalog.
  • Collections (Poly-hierarchy):
    • GET /catalogs/{catalog_id}/collections - List linked collections.
    • POST /catalogs/{catalog_id}/collections - Create or link a collection.
    • GET /catalogs/{catalog_id}/collections/{collection_id} - Get collection details.
    • DELETE /catalogs/{catalog_id}/collections/{collection_id} - Unlink a collection.
  • Items (Scoped Access):
    • GET /catalogs/{catalog_id}/collections/{collection_id}/items - Search items within catalog context.
    • GET /catalogs/{catalog_id}/collections/{collection_id}/items/{item_id} - Get single item.
  • Discovery & Capabilities:
    • GET /catalogs/{catalog_id}/children - Unified list of child Catalogs and Collections.
    • GET /catalogs/{catalog_id}/conformance - Conformance classes for this catalog tree.
    • GET /catalogs/{catalog_id}/queryables - Queryable fields for this catalog tree.

Specification Reference:
Healy-Hyperspatial/multi-tenant-catalogs

PR Checklist:

  • pre-commit hooks pass locally
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable, and docs build successfully (run make docs)
  • Changes are added to the CHANGELOG.

@jonhealy1 jonhealy1 changed the title Multi tenant catalogs extension Multi-tenant catalogs extension Feb 8, 2026
@jonhealy1 jonhealy1 marked this pull request as ready for review February 8, 2026 17:08
@jonhealy1 jonhealy1 marked this pull request as draft February 8, 2026 17:18
@jonhealy1 jonhealy1 removed the request for review from vincentsarago February 8, 2026 17:19
@jonhealy1 jonhealy1 marked this pull request as ready for review February 9, 2026 06:59
@vincentsarago
Copy link
Member

thank for the PR @jonhealy1

Before I start the review I have a quick question: should this extension be in core or in third_party?

@jonhealy1
Copy link
Collaborator Author

Hi @vincentsarago I was wondering about that. I can move it to third party.

@jonhealy1 jonhealy1 requested a review from gadomski February 10, 2026 12:34
@gadomski gadomski dismissed their stale review February 18, 2026 14:17

Comments were addressed, but I don't have time for a re-review at the moment and don't want to let my change request block.

@bkanuka
Copy link

bkanuka commented Feb 23, 2026

@jonhealy1 Can you explain the relationship between this and the multi-tenant work done in this (poorly named 😉 ) PR: NASA-IMPACT/veda-backend#531

I have seen some other approaches to multi-catalog and have even hacked together something by myself. Is your goal here to formalize how multi-tenant or multi-catalog should be done? Do you have a working implementation of this extension?

I think this is great work, just trying to wrap my head around it all because I would be interesting in implementing it in pgstac/stac-fastapi-pgstac if not already done.

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Feb 24, 2026

Hi @bkanuka, thanks for checking this out and for the great questions!

To give you a little behind-the-scenes context: we actually originally pitched this extension as the "Virtual Catalogs Endpoint". We really see the number one innovation here as introducing the /catalogs endpoint natively into the dynamic API layer - transforming the STAC API from a flat list of /collections into a navigable, hierarchical system. The goal was to support flexible "playlists" or semantic themes (like a SKOS registry) where a single Collection could logically exist in multiple different folders simultaneously. However, the STAC Project Steering Committee (PSC) felt "Multi-Tenant" was a more appropriate term, so we adopted their recommendation. It definitely could create some confusion in the future.

Here is the easiest way to think about the difference between the VEDA PR and this extension:

1. VEDA’s Approach: Access-Level Isolation
The VEDA PR solves multi-tenancy at the routing and filtering level.

  • It takes a standard STAC API (which relies on a single, flat list of /collections) and uses API middleware to intercept a {tenant} prefix in the URL.
  • It then silently injects filters into the database queries so that "Tenant A" only sees their specific slice of that flat Collections list.
  • Analogy: It's like having one big filing cabinet, but giving different users magic glasses so they only see their own files.

2. This Extension's Approach: Structural Namespaces
This extension solves multi-tenancy at the database and structural level.

  • Crucially, this can allow you to natively store and serve multiple distinct Catalogs on the exact same database - a capability that is new to STAC APIs.
  • Instead of keeping a flat list of collections, it introduces recursive STAC Catalogs (essentially, folders) directly into the API and database.
  • You can build a deeply nested, navigable tree (e.g., /catalogs/usgs/catalogs/landsat/collections/...).
  • Because it's a structural poly-hierarchy, a single Collection can logically exist inside multiple different Catalogs simultaneously (e.g., in a "USGS" catalog and a "Forestry" catalog) without duplicating the underlying data.
  • Built-in Data Safety: Organizing your STAC API with this extension is literally like creating a music playlist. The /catalogs transaction endpoints strictly manage links, not data. You can safely build, rearrange, or delete entire sub-catalogs, and it will only "unlink" the Collections. Actual data destruction is intentionally restricted to the normal STAC API transaction routes (e.g., DELETE /collections/{id}).
  • Analogy: It's like building actual, labeled drawers and folders inside the filing cabinet, allowing users to physically browse the organizational tree without ever risking the files themselves.

The Goal & pgstac
The goal isn't to force a single architectural choice. In fact, these two approaches aren't mutually exclusive- you could absolutely have structural catalog folders and strict access-control filters! But our primary goal here was to formalize a standard, spec-compliant way to navigate massive STAC APIs that have outgrown the flat /collections list.

We currently have a working implementation operating as an optional feature in our stac-fastapi-elasticsearch-opensearch backend.

Given how powerful pgstac's relational model is, I think it is the absolute perfect candidate for this. Adding a catalogs table and a many-to-many relationship for collections would be incredibly powerful for the community.

@bkanuka
Copy link

bkanuka commented Feb 24, 2026

Thanks for the thorough reply. It clarified a few things. I like the flexibility of this many-to-many approach and hopefully it gets wider adoption. (Although the route-to-filter approach does have the benefit of being a bit "thinner").

I'll also take a look at the electric/open search implementation.

Is there a technical reason this was first implemented in opensearch over pgstac (i.e. there's something that makes opensearch a better backend for this) or was it a client-driven decision?

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Feb 24, 2026

Thanks for the thorough reply. It clarified a few things. I like the flexibility of this many-to-many approach and hopefully it gets wider adoption. (Although the route-to-filter approach does have the benefit of being a bit "thinner").

I'll also take a look at the electric/open search implementation.

Is there a technical reason this was first implemented in opensearch over pgstac (i.e. there's something that makes opensearch a better backend for this) or was it a client-driven decision?

@bkanuka You are totally right that the route-to-filter approach is "thinner" - it's an elegant way to multiplex a single database into multiple isolated STAC APIs using access control.

However, the biggest limitation of that approach is that it forces a strictly flat structure; they won't have nested catalogs. It doesn't accomplish what we set out to achieve, namely structural, deeply nested Catalogs that allow for things like SKOS vocabularies, poly-hierarchies, and improved discoverability without siloing the data.

To answer your question: there is no technical reason it was implemented in OpenSearch first, and OS isn't necessarily a "better" backend for this. It was simply because I am the lead maintainer of the stac-fastapi-elasticsearch-opensearch project. We have had people asking for a /catalogs endpoint for years, and we are currently talking to an outside organization who is very interested in this approach for SKOS.

Other organizations run their own custom versions of this already, but it's currently a "wild west" scenario because a /catalogs route has never been officially spec'ed out and documented.

In our OpenSearch implementation, we define Catalogs as a specific type of Collection. This meant we didn't even have to add new indexes (tables) or introduce breaking changes to support the functionality.

@bkanuka
Copy link

bkanuka commented Feb 24, 2026

Totally makes sense. Personally I need to balance the thinner solution that gets it done vs the more flexible, longer term, slightly "wild-west" solution but man...the wild west is so much more fun! 😁

I can't find the discussion now but I think last I read @bitner was hesitant to add anything like this to pgstac because a) potentially breaking changes b) he didn't have enough demand to warrant maintaining the additional code c) not yet a well defined stac extension (but of course that's the goal here).

I'll explore if this could be implemented on top of pgstac without core changes. Using the same DB mechanism for Catalogs and Collections makes perfect sense. Plus if I can keep the code separate there'll be less eyes on my shitty average SQL.

@jonhealy1
Copy link
Collaborator Author

I personally think this is the future of STAC apis. Feel free to look into the extension definition itself and give any feedback or contribute. There's a lot still to do - ie. with Catalog search for one thing. Adding all of these routes is significant, but the benefits are immense. Also, the transaction routes are optional - they could be left out in the first iteration. I could help with the pgstac implementation, but I would probably not spend much time on it until this pr gets merged. Tables would need to be modified - collections and catalogs need a list of parent_ids for one thing. I am mostly a nosql guy but I could get back into sql if I needed to.

@jonhealy1
Copy link
Collaborator Author

stac-utils/pgstac#206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants