BigQuery / Google Analytics Integration

Project Reference

Key	Value
New GCP Project ID	`mfb-data`
New GCP Project Number	`38721872277`
Old GCP Project ID	`benefits-mfb` (owned by Brian, Gary team)
GA4 Dataset	`analytics_335669714`
GitHub Repo	`MyFriendBen/data-queries`
WIF Pool	`github-actions`
WIF Provider	`github`
SA (GitHub Actions/dbt)	`github-actions-dbt@mfb-data.iam.gserviceaccount.com`
SA (Metabase)	`metabase-bigquery@mfb-data.iam.gserviceaccount.com`

Overview

The MyFriendBen analytics pipeline needs Google Analytics (GA4) data from BigQuery to populate the "Google Analytics" tab on each tenant dashboard. The dbt models and Terraform resources already exist but are disabled due to a GCP authentication blocker.

Decided Architecture

GitHub Actions (dbt + Terraform)  →  Workload Identity Federation (OIDC, no key)
Metabase on Heroku                →  Service account key (org admin exception)

GitHub Actions authenticates to BigQuery via Workload Identity Federation — no service account keys needed. The google-github-actions/auth@v2 action handles OIDC token exchange.
Metabase connects directly to BigQuery using a service account JSON key. The new GCP project (mfb-data) has no org policy blocking key creation, so this is straightforward.

What Already Exists

dbt Models (ready, untested against real data)

Three BigQuery dbt models in dbt/models/bigquery/:

Model	Type	Purpose
`stg_ga_page_views.sql`	staging view	Extracts `event_params` from GA4 `events_*` wildcard table (page_location, ga_session_id)
`int_ga4_page_views.sql`	intermediate view	Derives `page_path`, `page_hostname`, `state_code` from page_location URL
`mart_screener_conversion_funnel.sql`	mart table	Daily conversion metrics per state: sessions started/completed/converted, user metrics, conversion rates

Source config (dbt/models/bigquery/sources.yml):

Source: google_analytics database, reads from events_* wildcard table
Requires env vars: GCP_PROJECT_ID, GCP_ANALYTICS_TABLE (the BigQuery dataset name)

Terraform Resources (conditional, disabled)

In dashboards/metabase.tf:

metabase_database.bigquery — BigQuery data source in Metabase (count = var.bigquery_enabled ? 1 : 0)
metabase_card.conversion_funnel — Card querying mart_screener_conversion_funnel (conditional)
Dashboard widget on global analytics dashboard (conditional)

In dashboards/variables.tf:

bigquery_enabled (bool, default false) — master toggle
bigquery_service_account_key_content (string, sensitive) — for CI/production
bigquery_service_account_key_path (string) — for local dev (./secrets/bigquerykey.json)
gcp_project_id (string) — GCP project ID

dbt Profile (`dbt/profiles.yml.example`)

BigQuery target already configured:

bigquery:
  type: bigquery
  method: oauth
  project: "{{ env_var('GCP_PROJECT_ID') }}"
  dataset: analytics
  threads: 4
  timeout_seconds: 300
  location: US

Uses method: oauth so dbt-bigquery calls google.auth.default(), which reads the WIF credential file via GOOGLE_APPLICATION_CREDENTIALS (set automatically by google-github-actions/auth@v2).

GitHub Actions (`dbt-nightly.yml`)

Runs --target postgres and conditionally --target bigquery (gated on BIGQUERY_ENABLED).

Authentication

Two systems need BigQuery access, each using a different auth method.

System 1: GitHub Actions (dbt + Terraform)

Auth method: Workload Identity Federation (no key needed)

GitHub Actions supports OIDC tokens natively. GCP's Workload Identity Federation can exchange a GitHub OIDC token for short-lived GCP credentials — no service account key file required.

Steps to implement:

Create a GCP Service Account (this is allowed — only key creation is blocked):

gcloud iam service-accounts create github-actions-dbt \
  --display-name="GitHub Actions dbt" \
  --project=mfb-data

Grant BigQuery roles to the service account:

# For dbt (read raw data + write to analytics dataset)
gcloud projects add-iam-policy-binding mfb-data \
  --member="serviceAccount:github-actions-dbt@mfb-data.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding mfb-data \
  --member="serviceAccount:github-actions-dbt@mfb-data.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Create a Workload Identity Pool + Provider:

# Create pool
gcloud iam workload-identity-pools create "github-actions" \
  --location="global" \
  --display-name="GitHub Actions" \
  --project=mfb-data

# Create OIDC provider for GitHub (--attribute-condition is required by GCP)
gcloud iam workload-identity-pools providers create-oidc "github" \
  --location="global" \
  --workload-identity-pool="github-actions" \
  --display-name="GitHub" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
  --attribute-condition="assertion.repository == 'MyFriendBen/data-queries'" \
  --issuer-uri="https://token.actions.githubusercontent.com" \
  --project=mfb-data

Enable the IAM Service Account Credentials API (required for WIF token exchange):
```
gcloud services enable iamcredentials.googleapis.com --project=mfb-data
```

Allow the GitHub repo to impersonate the service account:

gcloud iam service-accounts add-iam-policy-binding \
  github-actions-dbt@mfb-data.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="principalSet://iam.googleapis.com/projects/38721872277/locations/global/workloadIdentityPools/github-actions/attribute.repository/MyFriendBen/data-queries" \
  --project=mfb-data

Update GitHub Actions workflows to use google-github-actions/auth@v2:

- uses: google-github-actions/auth@v2
  with:
    workload_identity_provider: "projects/38721872277/locations/global/workloadIdentityPools/github-actions/providers/github"
    service_account: "github-actions-dbt@mfb-data.iam.gserviceaccount.com"

# This sets GOOGLE_APPLICATION_CREDENTIALS automatically

Update dbt-nightly.yml to add BigQuery target:

- name: Run dbt build (BigQuery)
  if: vars.BIGQUERY_ENABLED == 'true'
  run: dbt build --target bigquery
  env:
    GCP_PROJECT_ID: ${{ vars.GCP_PROJECT_ID }}
    GCP_ANALYTICS_TABLE: ${{ vars.GCP_ANALYTICS_TABLE }}

Add GitHub Environment variables:
- BIGQUERY_ENABLED = true
- GCP_PROJECT_ID = your GCP project ID
- GCP_ANALYTICS_TABLE = your GA4 BigQuery dataset name (e.g., analytics_335669714)

System 2: Metabase on Heroku (runtime BigQuery access)

Auth method: Service account key

Metabase's BigQuery driver expects a service account JSON key. The mfb-data project inherits the iam.disableServiceAccountKeyCreation org policy from the myfriendben.org organization. To create a key, a project-level exception must be set (requires roles/orgpolicy.policyAdmin at the org level). After key creation, the exception should be re-enabled.

Steps:

Create the service account:

gcloud iam service-accounts create metabase-bigquery \
  --display-name="Metabase BigQuery Reader" \
  --project=mfb-data

Grant read-only BigQuery access:

gcloud projects add-iam-policy-binding mfb-data \
  --member="serviceAccount:metabase-bigquery@mfb-data.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding mfb-data \
  --member="serviceAccount:metabase-bigquery@mfb-data.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Temporarily disable the org policy to allow key creation (requires roles/orgpolicy.policyAdmin on org 1001672396356):

# Create a policy override file
cat > /tmp/org-policy-override.yaml <<'EOF'
name: projects/mfb-data/policies/iam.disableServiceAccountKeyCreation
spec:
  rules:
  - enforce: false
EOF

# Apply the override
gcloud org-policies set-policy /tmp/org-policy-override.yaml --project=mfb-data

# Wait ~60 seconds for propagation, then create the key
gcloud iam service-accounts keys create metabase-bigquery-key.json \
  --iam-account=metabase-bigquery@mfb-data.iam.gserviceaccount.com

Re-enable the org policy after key creation:
```
gcloud org-policies delete iam.disableServiceAccountKeyCreation --project=mfb-data
```
This removes the project-level override so the org-level enforcement is inherited again.
Store the key content as BIGQUERY_SA_KEY GitHub Environment secret (Terraform passes it to Metabase via API)

Key rotation: Service account keys should be rotated periodically. Repeat steps 3-5: temporarily disable the org policy, create a new key, update the GitHub secret, run terraform apply, delete the old key, re-enable the org policy.

Environment Variables Summary

When BigQuery is enabled, these are needed. All variables and secrets are set per GitHub environment (staging and production). Values are the same for both environments.

GitHub Environment Variables (non-sensitive)

Variable	Value	Staging	Production
`BIGQUERY_ENABLED`	`true`	Set	Not set
`GCP_PROJECT_ID`	`mfb-data`	Set	Not set
`GCP_ANALYTICS_TABLE`	`analytics_335669714`	Set	Not set
`WIF_PROVIDER`	`projects/38721872277/locations/global/workloadIdentityPools/github-actions/providers/github`	Set	Not set
`WIF_SERVICE_ACCOUNT`	`github-actions-dbt@mfb-data.iam.gserviceaccount.com`	Set	Not set

GitHub Environment Secrets

Secret	Value	Staging	Production
`BIGQUERY_SA_KEY`	Full JSON content of `metabase-bigquery-key.json`	Set	Not set

Workload Identity Federation (GitHub Actions — no secrets needed)

The google-github-actions/auth@v2 action exchanges a GitHub OIDC token for short-lived GCP credentials. It reads WIF_PROVIDER and WIF_SERVICE_ACCOUNT from GitHub Environment variables and automatically sets GOOGLE_APPLICATION_CREDENTIALS for subsequent steps (dbt, Terraform).

Setup commands for production

Once staging is validated, run these to set up production (same values):

# Variables
gh variable set BIGQUERY_ENABLED --env production --repo MyFriendBen/data-queries --body "true"
gh variable set GCP_PROJECT_ID --env production --repo MyFriendBen/data-queries --body "mfb-data"
gh variable set GCP_ANALYTICS_TABLE --env production --repo MyFriendBen/data-queries --body "analytics_335669714"
gh variable set WIF_PROVIDER --env production --repo MyFriendBen/data-queries --body "projects/38721872277/locations/global/workloadIdentityPools/github-actions/providers/github"
gh variable set WIF_SERVICE_ACCOUNT --env production --repo MyFriendBen/data-queries --body "github-actions-dbt@mfb-data.iam.gserviceaccount.com"

# Secret (same key file used for both environments)
gh secret set BIGQUERY_SA_KEY --env production --repo MyFriendBen/data-queries < metabase-bigquery-key.json

Note: Production also needs the non-BigQuery variables (DATABASE_NAME, METABASE_URL, METABASE_ADMIN_EMAIL) and secrets (DATABASE_HOST, GLOBAL_DB_USER, GLOBAL_DB_PASS, METABASE_ADMIN_PASSWORD, tenant DB credentials) that staging already has. These should be set with production-specific values before running the production workflow.

Prerequisite: Migrate GA4 Data to New GCP Project

Status: Complete — 60 tables copied (2026-01-18 to 2026-03-18)

The GA4 BigQuery export currently lives in a previous organization's GCP project. We have read access to the data, but need to move it into our own GCP project before we can set up Workload Identity, dbt, or Metabase integration.

Projects

	Project Name	Project ID	Owner
Old (source)	Benefits - MFB	`benefits-mfb`	Brian (Gary team)
New (target)	mfb-data	`mfb-data`	MFB team

Known issue: Historical data loss in `benefits-mfb`

The benefits-mfb project is on the BigQuery free sandbox, which enforces a 60-day table expiration. Tables older than 60 days are automatically and permanently deleted. As of March 2026, only ~60 days of events_* tables remain (back to approx. January 2026), despite the GA4 BigQuery link being active for 1-2 years.

Implications:

Historical data beyond 60 days is gone from BigQuery and cannot be recovered by upgrading to a paid plan
Looker Studio dashboards showing older data are pulling from the GA4 reporting API (GA4's internal storage), not BigQuery
The GA4 reporting API retains data for up to 14 months, but provides aggregated metrics — not the raw event-level data that BigQuery exports
One table is expiring every day, so copying the remaining data is time-sensitive

Action items:

Flag to Brian (Gary team) that benefits-mfb is losing data due to sandbox expiration
Verify mfb-data has billing enabled so the same expiration does not apply after migration
After cutover (Phase 3), decide as a team whether to backfill older data from the GA4 reporting API. This would provide aggregated metrics (sessions, conversions, page views by date) but not raw event-level data, and may not fit the existing dbt models without modification

Decisions (resolved)

GA4 property ownership — We have admin access to the GA4 property and can manage BigQuery links.
Migration strategy — Build-first, then cutover. GA4 only allows one BigQuery link at a time, so we cannot dual-link. Instead: copy historical data to mfb-data, build and validate dashboards against the copy, then switch the GA4 link and do a final catch-up copy to fill the gap. Old dashboards (owned by Brian/Gary team) continue working until the cutover.
What moves — All available historical data (~60 days due to sandbox expiration), plus the GA4 export link once dashboards are ready.

Migration steps

Step 1: Create the dataset in mfb-data

bq mk --dataset --location=US mfb-data:analytics_335669714

Step 2: Copy historical data from old project to mfb-data

Use the copy script (scripts/ga4-migration/copy_ga4_tables.sh), which copies each date-sharded events_* table individually and logs completed tables to a manifest file (scripts/ga4-migration/ga4_copy_manifest.log).

# Preview what will be copied
./scripts/ga4-migration/copy_ga4_tables.sh --dry-run

# Run the copy
./scripts/ga4-migration/copy_ga4_tables.sh

The script is resumable — if interrupted, re-run it and it will skip tables already in the manifest. The manifest also serves as a record of what was copied and when to resume from during the catch-up copy (Step 6).

Step 3: Verify the copy

Compare row counts between old and new datasets for a sample of date-sharded tables
Validate dbt models run successfully against the copied data in mfb-data

Step 4: Build and validate dashboards

Set up WIF, dbt, Metabase (Phases 1-2 below)
Build the GA tab with conversion funnel charts
Validate everything works against the copied historical data

Step 5: Cutover the GA4 link

Once dashboards are validated and ready for production:

Note the current date — this marks the start of the gap period
In GA4 Admin > Product Links > BigQuery Links, remove the link to benefits-mfb
Add a new link pointing to mfb-data with the same export settings (daily/streaming, events)
New events now flow to mfb-data

Step 6: Catch-up copy

Re-run the same copy script to pick up any new tables created since the initial copy:

./scripts/ga4-migration/copy_ga4_tables.sh

The script skips tables already in the manifest, so it will only copy the gap period tables.

Step 7: Coordinate with Brian (Gary team)

Notify Brian that the old export has stopped and old dashboards will no longer receive new data
Old data in benefits-mfb remains accessible until the project owner deletes it

Permissions required

In the old project (Benefits - MFB), the user running the copy needs:

bigquery.tables.getData on the source dataset (BigQuery Data Viewer role)
bigquery.jobs.create in the old project (BigQuery Job User role)

In the new project (mfb-data):

bigquery.datasets.create (to create the target dataset, if not auto-created by GA4 link)
bigquery.tables.create + bigquery.tables.updateData (BigQuery Data Editor role)
bigquery.jobs.create (BigQuery Job User role)

Implementation Order

Phase 1: GCP Setup (manual — no code changes)

Steps 1, 2, and 3 can be done in parallel. Copy data ASAP — the sandbox is deleting one table per day.

Verify mfb-data has billing enabled — GCP Console > Billing. Required to avoid the same 60-day sandbox expiration.
Create dataset + copy historical data to mfb-data — 60 tables copied (2026-01-18 to 2026-03-18). See scripts/ga4-migration/ga4_copy_manifest.log.
Set up Workload Identity Federation — Pool github-actions, provider github with attribute condition restricting to MyFriendBen/data-queries. SA github-actions-dbt bound with workloadIdentityUser.
Create Metabase service account + key — SA metabase-bigquery (pre-existing with correct roles). Key created after temporarily overriding org policy. Remember to re-enable the org policy (step 2 below).
Add GitHub Environment variables and secrets — Set for staging. See Environment Variables Summary. Production pending.

Phase 2: Code Changes + Dashboard Build (against copied historical data)

Update dbt-nightly.yml — add OIDC auth step + BigQuery target (conditional on BIGQUERY_ENABLED). Merged in PR #46.
Test dbt BigQuery build — triggered workflow on staging, mart_screener_conversion_funnel and referrer_codes created in BigQuery.
Enable bigquery_enabled = true in Terraform — set the GitHub Environment variable, ran terraform apply on staging. Created BigQuery data source, conversion funnel card, and dashboard widget in Metabase. Note: first apply fails because Metabase needs time to sync new BigQuery tables; second apply succeeds (see comment in metabase.tf).
Add GA tab cards to tenant dashboards — build conversion funnel charts for the "Google Analytics" tab

Phase 3: Cutover (after new dashboards are validated)

Switch the GA4 BigQuery link from benefits-mfb to mfb-data — migration step 5
Run catch-up copy for the gap period — migration step 6
Notify Brian (Gary team) that old export has stopped — migration step 7
Team decision: backfill older historical data? — Discuss whether to pull aggregated metrics from the GA4 reporting API for the period lost to sandbox expiration (see known issue)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery / Google Analytics Integration

Project Reference

Overview

Decided Architecture

What Already Exists

dbt Models (ready, untested against real data)

Terraform Resources (conditional, disabled)

dbt Profile (`dbt/profiles.yml.example`)

GitHub Actions (`dbt-nightly.yml`)

Authentication

System 1: GitHub Actions (dbt + Terraform)

System 2: Metabase on Heroku (runtime BigQuery access)

Environment Variables Summary

GitHub Environment Variables (non-sensitive)

GitHub Environment Secrets

Workload Identity Federation (GitHub Actions — no secrets needed)

Setup commands for production

Prerequisite: Migrate GA4 Data to New GCP Project

Projects

Known issue: Historical data loss in `benefits-mfb`

Decisions (resolved)

Migration steps

Permissions required

Implementation Order

Phase 1: GCP Setup (manual — no code changes)

Phase 2: Code Changes + Dashboard Build (against copied historical data)

Phase 3: Cutover (after new dashboards are validated)

FilesExpand file tree

BIGQUERY_INTEGRATION.md

Latest commit

History

BIGQUERY_INTEGRATION.md

File metadata and controls

BigQuery / Google Analytics Integration

Project Reference

Overview

Decided Architecture

What Already Exists

dbt Models (ready, untested against real data)

Terraform Resources (conditional, disabled)

dbt Profile (dbt/profiles.yml.example)

GitHub Actions (dbt-nightly.yml)

Authentication

System 1: GitHub Actions (dbt + Terraform)

System 2: Metabase on Heroku (runtime BigQuery access)

Environment Variables Summary

GitHub Environment Variables (non-sensitive)

GitHub Environment Secrets

Workload Identity Federation (GitHub Actions — no secrets needed)

Setup commands for production

Prerequisite: Migrate GA4 Data to New GCP Project

Projects

Known issue: Historical data loss in benefits-mfb

Decisions (resolved)

Migration steps

Permissions required

Implementation Order

Phase 1: GCP Setup (manual — no code changes)

Phase 2: Code Changes + Dashboard Build (against copied historical data)

Phase 3: Cutover (after new dashboards are validated)

dbt Profile (`dbt/profiles.yml.example`)

GitHub Actions (`dbt-nightly.yml`)

Known issue: Historical data loss in `benefits-mfb`