Skip to content

Broke infra/dcp terraform script up into smaller modules#36

Open
dwnoble wants to merge 4 commits intodatacommonsorg:mainfrom
dwnoble:terraform-modularize
Open

Broke infra/dcp terraform script up into smaller modules#36
dwnoble wants to merge 4 commits intodatacommonsorg:mainfrom
dwnoble:terraform-modularize

Conversation

@dwnoble
Copy link
Copy Markdown
Contributor

@dwnoble dwnoble commented Apr 27, 2026

Refactored the infra/dcp terraform folders by flattening the directory structure and breaking down the cdc and dcp modules into a set of smaller sub-modules.

Changes

  • Flattening: Moved all sub-modules directly under infra/dcp/modules/.
  • stack: Orchestrates sub-modules based on feature toggles (modules/stack).
  • cdc_data_ingestion_job: Ingestion Cloud Run v2 Job.
  • cdc_iam: IAM and Secret Manager config for CDC.
  • cdc_mysql: Cloud SQL MySQL instance and databases.
  • cdc_network: VPC and serverless access connectors.
  • cdc_redis: Memorystore Redis instance.
  • cdc_services: CDC Cloud Run v2 web services (modules/cdc_services).
  • cdc_storage: GCS buckets for I/O.
  • dcp_dataflow_ingestion: Dataflow and helper service for DCP.
  • dcp_ingestion_workflow: Cloud Workflows for orchestration.
  • dcp_service: DCP Cloud Run service.
  • dcp_storage: GCS bucket for DCP pipeline.
  • spanner: Shared Cloud Spanner instance and databases.

Tested in the datcom-website-dev project under the danmod namespace

@dwnoble dwnoble requested a review from gmechali April 27, 2026 04:47
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Data Commons Platform infrastructure into a modular architecture, separating the DCP and Custom Data Commons (CDC) stacks and introducing a central orchestration module. Feedback focuses on improving infrastructure stability and security, specifically recommending against the use of 'force_destroy' on data buckets, the ':latest' container tag, and authoritative IAM bindings. Additionally, the use of 'timestamp()' in environment variables was identified as a cause for unnecessary resource redeployments.

resource "google_storage_bucket" "data_bucket" {
name = local.gcs_data_bucket_name
location = var.gcs_data_bucket_location
force_destroy = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Setting force_destroy = true on a data storage bucket is risky as it allows Terraform to delete the bucket even if it contains data. It is safer to set this to false or tie it to a deletion protection toggle to prevent accidental data loss.

  force_destroy = false


template {
containers {
image = "gcr.io/datcom-ci/datacommons-ingestion-helper:latest"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using the :latest tag for container images can lead to non-deterministic deployments and makes it difficult to track which version of the code is currently running. It is recommended to use a specific version tag or an image digest for production-ready infrastructure.

      image = "gcr.io/datcom-ci/datacommons-ingestion-helper:stable"

Comment on lines +67 to 73
resource "google_cloud_run_service_iam_binding" "public_invoker" {
count = var.make_service_public ? 1 : 0
location = google_cloud_run_v2_service.dcp_service.location
service = google_cloud_run_v2_service.dcp_service.name
role = "roles/run.invoker"
members = ["allUsers"]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Using google_cloud_run_service_iam_binding is authoritative and will overwrite any existing IAM policy on the service. It is generally safer to use google_cloud_run_v2_service_iam_member (non-authoritative) to manage access. Additionally, per repository rules, public access to Cloud Run services must be controlled by the make_dc_web_service_public boolean variable.

resource "google_cloud_run_v2_service_iam_member" "public_invoker" {
  count    = var.make_dc_web_service_public ? 1 : 0
  location = google_cloud_run_v2_service.dcp_service.location
  name     = google_cloud_run_v2_service.dcp_service.name
  role     = "roles/run.invoker"
  member   = "allUsers"
}
References
  1. Public access to Cloud Run services should be controlled by a boolean variable (e.g., make_dc_web_service_public) that defaults to false to avoid unintentional public exposure.

},
{
name = "FORCE_RESTART"
value = "${timestamp()}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using timestamp() in an environment variable causes the resource to be updated on every terraform apply, even if no other changes are present. This leads to unnecessary redeployments of the Cloud Run services. Consider using a build ID or a specific version variable to trigger redeployments only when the image actually changes.

Copy link
Copy Markdown
Contributor

@gmechali gmechali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments - thanks for the modularization this is such an improvement!

location_id = var.redis_location_id
alternative_location_id = var.redis_alternative_location_id
redis_version = "REDIS_6_X"
display_name = "Data Commons Redis Instance"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the name prefix here?

@@ -0,0 +1,107 @@
locals {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: can this module be dcp_ingestion_dataflow
to keep the parallel name from dcp_ingestion_workflow

resource "google_service_account" "dcp_ingestion_runner" {
count = var.deploy ? 1 : 0
account_id = "${local.name_prefix}dcp-ingestion-sa"
display_name = "Data Commons Platform Ingestion Runner"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add name prefix?

member = "serviceAccount:${google_service_account.dcp_ingestion_runner[0].email}"
}

resource "google_cloud_run_v2_service" "ingestion_helper" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im curious what you think here about further modularization.

Should we have an ingestion module, with submodules for:

  • Dataflow
  • Ingestion helper
  • Workflow

Right now you have 2 modules

  • Dataflow & Ingestion helper
  • Workflow

@@ -0,0 +1,3 @@
output "ingestion_orchestrator_id" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the ingestion helper URL also be an output here?

@@ -1,3 +1,18 @@
locals {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to leave as is, but do you think we should leave this with the other terraforms for now, given this DCP runner wont be in the first deliverables?

Fine to leave in this PR, but we may want to move it out later to give customers only things that are ready.

@@ -0,0 +1,11 @@
locals {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a separate DCP Storage and CDC storage module? This feels redundant, and we could just use the same one right?

@@ -0,0 +1,283 @@
data "google_compute_network" "default" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im trying towrap my head around what this stack module represents, and I think what im seeing is that the stack is the control for how we recommend setting it up on the basis for the CDC or DCP setup.

Im trying to think through if there's a cleaner way to put the modules together, but I dont have an answer. Would love to chat a bit more about this specific part!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants