Skip to content

[Feature] Refactor openbb-oecd for full SDMX 2 API Exposure#7413

Open
deeleeramone wants to merge 49 commits intodevelopfrom
feature/refactor-oecd
Open

[Feature] Refactor openbb-oecd for full SDMX 2 API Exposure#7413
deeleeramone wants to merge 49 commits intodevelopfrom
feature/refactor-oecd

Conversation

@deeleeramone
Copy link
Copy Markdown
Contributor

This PR refactors the openbb-oecd extension completely to provide a centralized module for database query building and execution. It uses the SDMX V2 API and eliminates the defusedxml dependency by targeting JSON and CSV output formats for API calls.

The library now provides access to all OECD data series and tables via economy.available_indicators, economy.indicators, oecd_utils.presentation_table, as well as adding to the existing curated endpoints.

The module ships with a pre-built index cache of all dataflows, codelists, structures, and descriptions that the endpoints actively consume. It is a compressed pickle file, approximately 1.8 MB in size. Metadata is distilled by a script that ingests the full definition files (hundreds of MBs) and the organizes them into a Python dictionary that is then highly compressed and stored. This makes it possible to explore and search what is available without making any network calls.

The cache can be rebuilt (if desired) after installing through a command line script defined in pyproject.toml:

generate-oecd-cache

For complete details, see the module's README.md file.

The easiest way to test this out is to install in dev mode (pip install -e .) and use openbb-api to start the API. You will get a new App in the Workspace,

Screenshot 2026-03-14 100948 Screenshot 2026-03-14 114603

Coverage

All data available from https://data-explorer.oecd.org can be retrieved via obb.economy.indicators(provider='oecd', **kwargs).

The extension also exposes specialized fetchers for the most commonly used OECD datasets.

The extension creates a router path, oecd_utils, that exposes utility functions for UI integrations
and metadata lookup.

Endpoints

Economy

  • obb.economy.available_indicators — search all OECD indicator symbols
  • obb.economy.indicators — fetch data for any OECD indicator symbol
  • obb.economy.balance_of_payments — Balance of Payments
  • obb.economy.composite_leading_indicator — Composite Leading Indicators
  • obb.economy.cpi — Consumer Price Indices
  • obb.economy.country_interest_rates — Short and long-term interest rates
  • obb.economy.gdp.nominal — Nominal GDP
  • obb.economy.gdp.real — Real GDP
  • obb.economy.gdp.forecast — GDP forecasts (Economic Outlook)
  • obb.economy.house_price_index — Residential property price indices
  • obb.economy.share_price_index — Share price indices
  • obb.economy.unemployment — Unemployment rates

Utilities

  • obb.oecd_utils.list_topic_choices — topic dropdown choices for UI widgets
  • obb.oecd_utils.list_subtopic_choices — subtopic dropdown choices for a selected topic
  • obb.oecd_utils.list_dataflows — list all OECD dataflows with topic breadcrumbs
  • obb.oecd_utils.list_dataflow_choices — dropdown choices for UI widgets
  • obb.oecd_utils.list_topics — browse OECD topics and subtopics with dataflow counts
  • obb.oecd_utils.get_dataflow_parameters — dimensions and valid codes for a dataflow
  • obb.oecd_utils.list_tables — search OECD presentation tables
  • obb.oecd_utils.get_table_detail — inspect a table's dimensions and indicator hierarchy
  • obb.oecd_utils.list_table_choices — table dropdown choices for UI widgets
  • obb.oecd_utils.presentation_table_choices — progressive choices for the presentation table widget
  • obb.oecd_utils.presentation_table — retrieve formatted OECD presentation tables

"Choices" endpoints are used by OpenBB Workspace to populate widget dropdown menus.

Tests

Unit test coverage has been expanded to include the QueryBuilder and Metadata implementations, along with the integration tests for the oecd_utils endpoints now included.

@deeleeramone deeleeramone added enhancement Enhancement platform OpenBB Platform extensions Extension-related v4 PRs for v4 labels Mar 14, 2026
Comment thread openbb_platform/providers/oecd/tests/test_metadata.py Dismissed
Comment thread openbb_platform/providers/oecd/tests/test_query_builder.py Fixed
Copy link
Copy Markdown
Member

@piiq piiq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the PR itself understandably very big and and very hard to review, there are a few things that caught my attention when I was looking through the changes.

  1. The cache. The main question I have is wether the cache needs to live in version control. How often would it change? Is this a static asset that should live be served over a CDN? Introduction of a zipped pickle to the repo is flagged by my security sensors.
  2. The size and the cognitive complexity of the some files. Some files are pretty much impossible to look at to get a high level undersanding of what they are doing. Consider splitting files that are over 1000 lines into modules. Specifically the files that are over 2K lines.
  3. I see some patterns that make it hard to explain them. Example: in the router file the metadata is imported by each route and the pylint error is silenced. If it's imported by each route - why not top level import? Then when looking at the routers I notices that some routes have actual logic in them - is the router the right logic place for that logic?

The biggest blocker for me is the binary cache tbh.


I ran codex asking if the code is KISS and DRY. Here's the response:


• Findings

  1. This is not KISS overall. openbb_platform/providers/oecd/openbb_oecd/
     oecd_router.py:3 suppresses most complexity warnings up front, and the file
     is 1,820 lines. Endpoints like openbb_platform/providers/oecd/openbb_oecd/
     oecd_router.py:735, openbb_platform/providers/oecd/openbb_oecd/
     oecd_router.py:897, and openbb_platform/providers/oecd/openbb_oecd/
     oecd_router.py:1441 each mix request parsing, metadata lookup, UI-specific
     behavior, fallback heuristics, and domain rules. That is functional, but not
     simple.
  2. This is not DRY enough around dimension semantics. The same concept lists are
     copied in several places: openbb_platform/providers/oecd/openbb_oecd/
     oecd_router.py:16, openbb_platform/providers/oecd/openbb_oecd/utils/
     metadata.py:71, openbb_platform/providers/oecd/openbb_oecd/utils/
     helpers.py:122, openbb_platform/providers/oecd/openbb_oecd/utils/
     table_builder.py:254, openbb_platform/providers/oecd/openbb_oecd/utils/
     table_builder.py:452, and openbb_platform/providers/oecd/openbb_oecd/utils/
     generate_cache.py:556. Any future change to what counts as a country/
     frequency/non-indicator dimension now requires coordinated edits across
     multiple files.
  3. The router repeats the same table-selection workflow instead of extracting
     it. openbb_platform/providers/oecd/openbb_oecd/oecd_router.py:1086 and
     openbb_platform/providers/oecd/openbb_oecd/oecd_router.py:1206 both rebuild
     dimension_codes, both compress long joined codes, and both progressively pin
     dimensions with OecdParamsBuilder. That is direct duplication in a hotspot.
  4. The refactor improves DRY at the model layer, but it centralizes too much
     into large “god” utilities. The shared fetch path through openbb_platform/
     providers/oecd/openbb_oecd/utils/query_builder.py:236 is a real win, and
     models like openbb_platform/providers/oecd/openbb_oecd/models/
     consumer_price_index.py:203, openbb_platform/providers/oecd/openbb_oecd/
     models/gdp_real.py:87, and openbb_platform/providers/oecd/openbb_oecd/models/
     share_price_index.py:79 are cleaner because of it. But openbb_platform/
     providers/oecd/openbb_oecd/utils/metadata.py:1 is now 3,394 lines and owns
     taxonomy listing, structure loading, cache I/O, URL building, hierarchy/table
     synthesis, and live availability. That is DRY by consolidation, not KISS by
     design.

  Assessment
  Short version: partially DRY, not KISS.

  The branch clearly reduces duplication in the individual OECD fetchers by
  pushing common logic into OecdQueryBuilder and OecdMetadata. But the overall
  design then swings too far in the other direction: large multi-responsibility
  modules, duplicated dimension rules across files, and repeated selector-building
  code in the router. I would not describe this PR as KISS/DRY in its current
  shape.

  I did not run tests; this was a structure/code review against develop on
  feature/refactor-oecd.

Comment thread openbb_platform/providers/oecd/openbb_oecd/assets/oecd_cache.pkl.xz Outdated
@deeleeramone
Copy link
Copy Markdown
Contributor Author

3. Example: in the router file the metadata is imported by each route and the pylint error is silenced. If it's imported by each route - why not top level import?

This stems from a preference for scoping modules to their function's enclosure. Here, I can convert to serve as a Depends object.

Then when looking at the routers I notices that some routes have actual logic in them - is the router the right logic place for that logic?

All functions in the oecd_router have logic in them, they are mostly there to serve dynamic dropdown parameters for Workspace, and those items are not suitable for routing through the ProviderInterface. The only route in the router that isn't parameters-related is, presentation_table, and the logic there is for ensuring that it renders correctly in AgGrid. The same data can be returned via economy.indicators, but this does the pivoting and retains the parent-child relationships that do not translate - or can't handle - when using the Workspace AgGrid Pivot Panel.

@deeleeramone
Copy link
Copy Markdown
Contributor Author

All concerns have been addressed:

  1. Cache is now JSON, not binary.
  2. The large OecdMetadata module has been broken up into smaller components.
  3. OecdMetadata is served to the router as a Depends object.

@deeleeramone deeleeramone requested a review from piiq April 10, 2026 19:20
@piiq
Copy link
Copy Markdown
Member

piiq commented Apr 13, 2026

thanks @deeleeramone do the cassettes need to be re-recorded? the tests are failing on the most recent commit related to how metadata is served

@deeleeramone
Copy link
Copy Markdown
Contributor Author

the tests are failing on the most recent commit related to how metadata is served

@piiq, this is because the test was discovering dependency injections as parameters. Fix there is to ignore Depends objects in the test utility file.

@piiq
Copy link
Copy Markdown
Member

piiq commented Apr 14, 2026

@deeleeramone is it a must have for the cache to be in version control? can the extension populate the cache when it's used? i am not fond of having binary files as a part of the package

@deeleeramone
Copy link
Copy Markdown
Contributor Author

is it a must have for the cache to be in version control? can the extension populate the cache when it's used? i am not fond of having binary files as a part of the package

If by being in version control you mean .gitignore, I've added the path to it. Building at runtime is not realistic. Compiling at publish is the better way to go, but this requires creating a workflow, then adding the workflow as a Trusted Publisher in the PyPI account for the openbb-oecd package. Additionally, the build process would need to be added into dev_install.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement extensions Extension-related platform OpenBB Platform v4 PRs for v4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants