[DCP] Add a flag to only do DB initialization#496
[DCP] Add a flag to only do DB initialization#496gmechali merged 1 commit intodatacommonsorg:masterfrom
Conversation
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
| Duplication | 0 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Code Review
This pull request introduces a new initializeDatabaseOnly flag to the ingestion pipeline options, allowing the process to terminate after database initialization and skip data ingestion. A review comment suggests refactoring the logic out of the static main method into a testable run method to ensure the new conditional logic is covered by unit tests.
If the goal is simply to initialize, would it not be better to provide a script to do that rather than running a dataflow pipeline? |
vish-cs
left a comment
There was a problem hiding this comment.
If the goal is simply to initialize, would it not be better to provide a script to do that rather than running a dataflow pipeline?
Yes, but we dont want to maintain duplicated code / a duplicate schema that can fall out of sync with what the ingestion pipeline needs. So we could factor out this into some script and have the ingestion pipeline call it, or we can just rely on the ingestion pipeline since it already does it. |
This will allow us to provide a clean command for DCP customers to run Terraform apply, then trigger the ingestion pipeline with --initializeDatabaseOnly=true to then be in a clean state, and ready to import data.
Note it is somewhat optional, since they dont need the tables created at that point, but it provides for a cleaner story to DCP clients that once you run this you're ready. They could immediately run it with their data as well, but if it's not ready to import, this provides a clean instruction to be done with the setup.