[GEN-2473] Refactor: migrate to synapseclient.models.Table API#214
[GEN-2473] Refactor: migrate to synapseclient.models.Table API#214
Conversation
…napseclient.models.Table API
There was a problem hiding this comment.
Pull request overview
Refactors update_data_element_catalog.py to use the newer synapseclient.models.Table API for querying and mutating Synapse Tables, replacing deprecated legacy table query/store patterns and removing manual etag handling.
Changes:
- Switch table reads from
syn.tableQuery(...).asDataFrame()toTable(id=...).query(...). - Switch table writes from
syn.store(Table(..., etag=...))toTable(...).upsert_rows(...)andTable(...).store_rows(...). - Reorder imports so
synapseclient.models.Tableoverrides the legacyTableimported viautilities.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ["synColSize", "numCols", "colLabels"] | ||
| ] | ||
| vars_to_update_df = syn.store( | ||
| Table(table_schema, vars_to_update_df, etag=results.etag) | ||
| Table(id=catalog_id).upsert_rows( | ||
| values=vars_to_update_df, | ||
| primary_keys=["variable"] |
There was a problem hiding this comment.
upsert_rows(..., primary_keys=["variable"]) requires the primary key column(s) to be present in values. Here vars_to_update_df is sliced down to only synColSize, numCols, and colLabels, so the variable column is missing and the upsert can’t match rows to update. Keep/include variable in the DataFrame passed to upsert_rows (and only drop it after the upsert, if needed).
| # add the new cohort_dd column to the table schema | ||
| if "%s_dd" % cohort not in results.asDataFrame().columns: | ||
| # Check if column exists by querying | ||
| existing_df = Table(id=catalog_id).query( |
There was a problem hiding this comment.
Do you know how big the catalog_id tables are typically? I worry that the table may not be available to query so soon after it's been updated (above).
There was a problem hiding this comment.
Given that this script belongs in a module table_updates with other scripts and a Dockerfile/requirements.txt file and I don't see that the other scripts have been updated to use the new synapseclient models yet.
I would make a note of the specific synapseclient version to use to run this script specifically in the README and not to use the usual Dockerfile/requirements txt file setup as that won't work.
Problem:
The script was using the deprecated/legacy Synapse Table API methods:
Solution:
to properly override the legacy Table reference
This ensures compatibility with the latest synapseclient library and follows the recommended patterns for Synapse Table operations.