Broke infra/dcp terraform script up into smaller modules#36
Broke infra/dcp terraform script up into smaller modules#36dwnoble wants to merge 4 commits intodatacommonsorg:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Data Commons Platform infrastructure into a modular architecture, separating the DCP and Custom Data Commons (CDC) stacks and introducing a central orchestration module. Feedback focuses on improving infrastructure stability and security, specifically recommending against the use of 'force_destroy' on data buckets, the ':latest' container tag, and authoritative IAM bindings. Additionally, the use of 'timestamp()' in environment variables was identified as a cause for unnecessary resource redeployments.
| resource "google_storage_bucket" "data_bucket" { | ||
| name = local.gcs_data_bucket_name | ||
| location = var.gcs_data_bucket_location | ||
| force_destroy = true |
There was a problem hiding this comment.
|
|
||
| template { | ||
| containers { | ||
| image = "gcr.io/datcom-ci/datacommons-ingestion-helper:latest" |
There was a problem hiding this comment.
Using the :latest tag for container images can lead to non-deterministic deployments and makes it difficult to track which version of the code is currently running. It is recommended to use a specific version tag or an image digest for production-ready infrastructure.
image = "gcr.io/datcom-ci/datacommons-ingestion-helper:stable"
| resource "google_cloud_run_service_iam_binding" "public_invoker" { | ||
| count = var.make_service_public ? 1 : 0 | ||
| location = google_cloud_run_v2_service.dcp_service.location | ||
| service = google_cloud_run_v2_service.dcp_service.name | ||
| role = "roles/run.invoker" | ||
| members = ["allUsers"] | ||
| } |
There was a problem hiding this comment.
Using google_cloud_run_service_iam_binding is authoritative and will overwrite any existing IAM policy on the service. It is generally safer to use google_cloud_run_v2_service_iam_member (non-authoritative) to manage access. Additionally, per repository rules, public access to Cloud Run services must be controlled by the make_dc_web_service_public boolean variable.
resource "google_cloud_run_v2_service_iam_member" "public_invoker" {
count = var.make_dc_web_service_public ? 1 : 0
location = google_cloud_run_v2_service.dcp_service.location
name = google_cloud_run_v2_service.dcp_service.name
role = "roles/run.invoker"
member = "allUsers"
}
References
- Public access to Cloud Run services should be controlled by a boolean variable (e.g., make_dc_web_service_public) that defaults to false to avoid unintentional public exposure.
| }, | ||
| { | ||
| name = "FORCE_RESTART" | ||
| value = "${timestamp()}" |
There was a problem hiding this comment.
Using timestamp() in an environment variable causes the resource to be updated on every terraform apply, even if no other changes are present. This leads to unnecessary redeployments of the Cloud Run services. Consider using a build ID or a specific version variable to trigger redeployments only when the image actually changes.
gmechali
left a comment
There was a problem hiding this comment.
Left a couple comments - thanks for the modularization this is such an improvement!
| location_id = var.redis_location_id | ||
| alternative_location_id = var.redis_alternative_location_id | ||
| redis_version = "REDIS_6_X" | ||
| display_name = "Data Commons Redis Instance" |
There was a problem hiding this comment.
should we use the name prefix here?
| @@ -0,0 +1,107 @@ | |||
| locals { | |||
There was a problem hiding this comment.
super nit: can this module be dcp_ingestion_dataflow
to keep the parallel name from dcp_ingestion_workflow
| resource "google_service_account" "dcp_ingestion_runner" { | ||
| count = var.deploy ? 1 : 0 | ||
| account_id = "${local.name_prefix}dcp-ingestion-sa" | ||
| display_name = "Data Commons Platform Ingestion Runner" |
| member = "serviceAccount:${google_service_account.dcp_ingestion_runner[0].email}" | ||
| } | ||
|
|
||
| resource "google_cloud_run_v2_service" "ingestion_helper" { |
There was a problem hiding this comment.
Im curious what you think here about further modularization.
Should we have an ingestion module, with submodules for:
- Dataflow
- Ingestion helper
- Workflow
Right now you have 2 modules
- Dataflow & Ingestion helper
- Workflow
| @@ -0,0 +1,3 @@ | |||
| output "ingestion_orchestrator_id" { | |||
There was a problem hiding this comment.
Should the ingestion helper URL also be an output here?
| @@ -1,3 +1,18 @@ | |||
| locals { | |||
There was a problem hiding this comment.
Fine to leave as is, but do you think we should leave this with the other terraforms for now, given this DCP runner wont be in the first deliverables?
Fine to leave in this PR, but we may want to move it out later to give customers only things that are ready.
| @@ -0,0 +1,11 @@ | |||
| locals { | |||
There was a problem hiding this comment.
Should we have a separate DCP Storage and CDC storage module? This feels redundant, and we could just use the same one right?
| @@ -0,0 +1,283 @@ | |||
| data "google_compute_network" "default" { | |||
There was a problem hiding this comment.
Im trying towrap my head around what this stack module represents, and I think what im seeing is that the stack is the control for how we recommend setting it up on the basis for the CDC or DCP setup.
Im trying to think through if there's a cleaner way to put the modules together, but I dont have an answer. Would love to chat a bit more about this specific part!
Refactored the
infra/dcpterraform folders by flattening the directory structure and breaking down thecdcanddcpmodules into a set of smaller sub-modules.Changes
infra/dcp/modules/.stack: Orchestrates sub-modules based on feature toggles (modules/stack).cdc_data_ingestion_job: Ingestion Cloud Run v2 Job.cdc_iam: IAM and Secret Manager config for CDC.cdc_mysql: Cloud SQL MySQL instance and databases.cdc_network: VPC and serverless access connectors.cdc_redis: Memorystore Redis instance.cdc_services: CDC Cloud Run v2 web services (modules/cdc_services).cdc_storage: GCS buckets for I/O.dcp_dataflow_ingestion: Dataflow and helper service for DCP.dcp_ingestion_workflow: Cloud Workflows for orchestration.dcp_service: DCP Cloud Run service.dcp_storage: GCS bucket for DCP pipeline.spanner: Shared Cloud Spanner instance and databases.Tested in the
datcom-website-devproject under thedanmodnamespace