[DCP Ingestion] Automate the DB initialization, Dockerize Ingestion Helper, and fix lock acquisition issue#1971
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a database initialization feature for the ingestion helper, including Docker and Cloud Build configurations, a Spanner schema, and logic to apply DDL statements. The locking mechanism was also updated to handle the creation of new lock rows. Feedback provides a suggestion for better DatabaseAdminClient configuration.
dwnoble
left a comment
There was a problem hiding this comment.
Thanks Gabe!
I think this tooling may be better suited to live in the DCP repo at https://github.com/datacommonsorg/datacommons
This PR converts the import automation cloud functions to a REST service, but if we're going to do that, I think it makes sense to consolidate it with the DCP FastAPI service so we don't end up adding a separate process.
Then this action could look something like: POST https://<dcp-service>/api/admin/initialize-database
Regardless, i think we should also:
- convert this toolset from requirements.txt files to uv so we can have consistent builds with the uv.lock files
- right now the functions in main.py are tied to cloud run functions. we should migrate these to a plain python module, and the cloud run part should be minimal wrapper on top of it
- we should expose these functions via a command line tool. If we migrate to the dcp repo, this might look like:
$ datacommons admin initialize-databaseor$ datacommons admin release-ingestion-lock.
cc @vish-cs
Other comment kind of replies about moving to DCP repo. I think one day maybe but it's premature, and splitting up ingestion code also seems controversial. |
dwnoble
left a comment
There was a problem hiding this comment.
Other comment kind of replies about moving to DCP repo. I think one day maybe but it's premature, and splitting up ingestion code also seems controversial.
Converting to UV - done!
Migrating from cloud functions to a plain python module, agreed but I dont think there's an immediate need for it? Can we track this post-M1?
Command line tool, again I think this is way past M1 right? All nice to haves, but are we aligned that these are future improvements, not for MVP?
Adding the python module & CLI later SGTM.
For converting to uv, we should also migrate from requirements.txt to pyproject.toml (example: https://github.com/datacommonsorg/datacommons/blob/main/packages/datacommons-db/pyproject.toml)
but that can also happen in a follow-up!
Changed to pyproject.toml! |
This PR does 3 main things:
As a follow up we should do the following: