Skip to content

[DCP] Add a flag to only do DB initialization#496

Merged
gmechali merged 1 commit intodatacommonsorg:masterfrom
gmechali:schemaMode
Apr 14, 2026
Merged

[DCP] Add a flag to only do DB initialization#496
gmechali merged 1 commit intodatacommonsorg:masterfrom
gmechali:schemaMode

Conversation

@gmechali
Copy link
Copy Markdown
Contributor

@gmechali gmechali commented Apr 9, 2026

This will allow us to provide a clean command for DCP customers to run Terraform apply, then trigger the ingestion pipeline with --initializeDatabaseOnly=true to then be in a clean state, and ready to import data.

Note it is somewhat optional, since they dont need the tables created at that point, but it provides for a cleaner story to DCP clients that once you run this you're ready. They could immediately run it with their data as well, but if it's not ready to import, this provides a clean instruction to be done with the setup.

@gmechali gmechali requested a review from vish-cs April 9, 2026 20:13
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · 0 duplication

Metric Results
Complexity 0
Duplication 0

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new initializeDatabaseOnly flag to the ingestion pipeline options, allowing the process to terminate after database initialization and skip data ingestion. A review comment suggests refactoring the logic out of the static main method into a testable run method to ensure the new conditional logic is covered by unit tests.

@vish-cs
Copy link
Copy Markdown
Contributor

vish-cs commented Apr 10, 2026

This will allow us to provide a clean command for DCP customers to run Terraform apply, then trigger the ingestion pipeline with --initializeDatabaseOnly=true to then be in a clean state, and ready to import data.

Note it is somewhat optional, since they dont need the tables created at that point, but it provides for a cleaner story to DCP clients that once you run this you're ready. They could immediately run it with their data as well, but if it's not ready to import, this provides a clean instruction to be done with the setup.

If the goal is simply to initialize, would it not be better to provide a script to do that rather than running a dataflow pipeline?

Copy link
Copy Markdown
Contributor

@vish-cs vish-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is simply to initialize, would it not be better to provide a script to do that rather than running a dataflow pipeline?

@gmechali
Copy link
Copy Markdown
Contributor Author

If the goal is simply to initialize, would it not be better to provide a script to do that rather than running a dataflow pipeline?

Yes, but we dont want to maintain duplicated code / a duplicate schema that can fall out of sync with what the ingestion pipeline needs. So we could factor out this into some script and have the ingestion pipeline call it, or we can just rely on the ingestion pipeline since it already does it.
wdyt? Should we refactor now, or should this just be a future cleanup?

@gmechali gmechali requested a review from vish-cs April 10, 2026 12:33
@gmechali gmechali merged commit 591b7df into datacommonsorg:master Apr 14, 2026
6 checks passed
@gmechali gmechali deleted the schemaMode branch April 14, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants