[Feature] Refactor openbb-oecd for full SDMX 2 API Exposure#7413
[Feature] Refactor openbb-oecd for full SDMX 2 API Exposure#7413deeleeramone wants to merge 49 commits intodevelopfrom
openbb-oecd for full SDMX 2 API Exposure#7413Conversation
…ance/OpenBB into feature/refactor-oecd
…to feature/refactor-oecd
piiq
left a comment
There was a problem hiding this comment.
While the PR itself understandably very big and and very hard to review, there are a few things that caught my attention when I was looking through the changes.
- The cache. The main question I have is wether the cache needs to live in version control. How often would it change? Is this a static asset that should live be served over a CDN? Introduction of a zipped pickle to the repo is flagged by my security sensors.
- The size and the cognitive complexity of the some files. Some files are pretty much impossible to look at to get a high level undersanding of what they are doing. Consider splitting files that are over 1000 lines into modules. Specifically the files that are over 2K lines.
- I see some patterns that make it hard to explain them. Example: in the router file the metadata is imported by each route and the pylint error is silenced. If it's imported by each route - why not top level import? Then when looking at the routers I notices that some routes have actual logic in them - is the router the right logic place for that logic?
The biggest blocker for me is the binary cache tbh.
I ran codex asking if the code is KISS and DRY. Here's the response:
• Findings
1. This is not KISS overall. openbb_platform/providers/oecd/openbb_oecd/
oecd_router.py:3 suppresses most complexity warnings up front, and the file
is 1,820 lines. Endpoints like openbb_platform/providers/oecd/openbb_oecd/
oecd_router.py:735, openbb_platform/providers/oecd/openbb_oecd/
oecd_router.py:897, and openbb_platform/providers/oecd/openbb_oecd/
oecd_router.py:1441 each mix request parsing, metadata lookup, UI-specific
behavior, fallback heuristics, and domain rules. That is functional, but not
simple.
2. This is not DRY enough around dimension semantics. The same concept lists are
copied in several places: openbb_platform/providers/oecd/openbb_oecd/
oecd_router.py:16, openbb_platform/providers/oecd/openbb_oecd/utils/
metadata.py:71, openbb_platform/providers/oecd/openbb_oecd/utils/
helpers.py:122, openbb_platform/providers/oecd/openbb_oecd/utils/
table_builder.py:254, openbb_platform/providers/oecd/openbb_oecd/utils/
table_builder.py:452, and openbb_platform/providers/oecd/openbb_oecd/utils/
generate_cache.py:556. Any future change to what counts as a country/
frequency/non-indicator dimension now requires coordinated edits across
multiple files.
3. The router repeats the same table-selection workflow instead of extracting
it. openbb_platform/providers/oecd/openbb_oecd/oecd_router.py:1086 and
openbb_platform/providers/oecd/openbb_oecd/oecd_router.py:1206 both rebuild
dimension_codes, both compress long joined codes, and both progressively pin
dimensions with OecdParamsBuilder. That is direct duplication in a hotspot.
4. The refactor improves DRY at the model layer, but it centralizes too much
into large “god” utilities. The shared fetch path through openbb_platform/
providers/oecd/openbb_oecd/utils/query_builder.py:236 is a real win, and
models like openbb_platform/providers/oecd/openbb_oecd/models/
consumer_price_index.py:203, openbb_platform/providers/oecd/openbb_oecd/
models/gdp_real.py:87, and openbb_platform/providers/oecd/openbb_oecd/models/
share_price_index.py:79 are cleaner because of it. But openbb_platform/
providers/oecd/openbb_oecd/utils/metadata.py:1 is now 3,394 lines and owns
taxonomy listing, structure loading, cache I/O, URL building, hierarchy/table
synthesis, and live availability. That is DRY by consolidation, not KISS by
design.
Assessment
Short version: partially DRY, not KISS.
The branch clearly reduces duplication in the individual OECD fetchers by
pushing common logic into OecdQueryBuilder and OecdMetadata. But the overall
design then swings too far in the other direction: large multi-responsibility
modules, duplicated dimension rules across files, and repeated selector-building
code in the router. I would not describe this PR as KISS/DRY in its current
shape.
I did not run tests; this was a structure/code review against develop on
feature/refactor-oecd.
This stems from a preference for scoping modules to their function's enclosure. Here, I can convert to serve as a Depends object.
All functions in the |
…to pr/deeleeramone/7413
|
All concerns have been addressed:
|
|
thanks @deeleeramone do the cassettes need to be re-recorded? the tests are failing on the most recent commit related to how metadata is served |
@piiq, this is because the test was discovering dependency injections as parameters. Fix there is to ignore Depends objects in the test utility file. |
|
@deeleeramone is it a must have for the cache to be in version control? can the extension populate the cache when it's used? i am not fond of having binary files as a part of the package |
If by being in version control you mean |
This PR refactors the
openbb-oecdextension completely to provide a centralized module for database query building and execution. It uses the SDMX V2 API and eliminates thedefusedxmldependency by targeting JSON and CSV output formats for API calls.The library now provides access to all OECD data series and tables via
economy.available_indicators,economy.indicators,oecd_utils.presentation_table, as well as adding to the existing curated endpoints.The module ships with a pre-built index cache of all dataflows, codelists, structures, and descriptions that the endpoints actively consume. It is a compressed pickle file, approximately 1.8 MB in size. Metadata is distilled by a script that ingests the full definition files (hundreds of MBs) and the organizes them into a Python dictionary that is then highly compressed and stored. This makes it possible to explore and search what is available without making any network calls.
The cache can be rebuilt (if desired) after installing through a command line script defined in
pyproject.toml:For complete details, see the module's README.md file.
The easiest way to test this out is to install in dev mode (
pip install -e .) and useopenbb-apito start the API. You will get a new App in the Workspace,Coverage
All data available from https://data-explorer.oecd.org can be retrieved via
obb.economy.indicators(provider='oecd', **kwargs).The extension also exposes specialized fetchers for the most commonly used OECD datasets.
The extension creates a router path,
oecd_utils, that exposes utility functions for UI integrationsand metadata lookup.
Endpoints
Economy
obb.economy.available_indicators— search all OECD indicator symbolsobb.economy.indicators— fetch data for any OECD indicator symbolobb.economy.balance_of_payments— Balance of Paymentsobb.economy.composite_leading_indicator— Composite Leading Indicatorsobb.economy.cpi— Consumer Price Indicesobb.economy.country_interest_rates— Short and long-term interest ratesobb.economy.gdp.nominal— Nominal GDPobb.economy.gdp.real— Real GDPobb.economy.gdp.forecast— GDP forecasts (Economic Outlook)obb.economy.house_price_index— Residential property price indicesobb.economy.share_price_index— Share price indicesobb.economy.unemployment— Unemployment ratesUtilities
obb.oecd_utils.list_topic_choices— topic dropdown choices for UI widgetsobb.oecd_utils.list_subtopic_choices— subtopic dropdown choices for a selected topicobb.oecd_utils.list_dataflows— list all OECD dataflows with topic breadcrumbsobb.oecd_utils.list_dataflow_choices— dropdown choices for UI widgetsobb.oecd_utils.list_topics— browse OECD topics and subtopics with dataflow countsobb.oecd_utils.get_dataflow_parameters— dimensions and valid codes for a dataflowobb.oecd_utils.list_tables— search OECD presentation tablesobb.oecd_utils.get_table_detail— inspect a table's dimensions and indicator hierarchyobb.oecd_utils.list_table_choices— table dropdown choices for UI widgetsobb.oecd_utils.presentation_table_choices— progressive choices for the presentation table widgetobb.oecd_utils.presentation_table— retrieve formatted OECD presentation tables"Choices" endpoints are used by OpenBB Workspace to populate widget dropdown menus.
Tests
Unit test coverage has been expanded to include the QueryBuilder and Metadata implementations, along with the integration tests for the
oecd_utilsendpoints now included.