| Key | Value |
|---|---|
| New GCP Project ID | mfb-data |
| New GCP Project Number | 38721872277 |
| Old GCP Project ID | benefits-mfb (owned by Brian, Gary team) |
| GA4 Dataset | analytics_335669714 |
| GitHub Repo | MyFriendBen/data-queries |
| WIF Pool | github-actions |
| WIF Provider | github |
| SA (GitHub Actions/dbt) | github-actions-dbt@mfb-data.iam.gserviceaccount.com |
| SA (Metabase) | metabase-bigquery@mfb-data.iam.gserviceaccount.com |
The MyFriendBen analytics pipeline needs Google Analytics (GA4) data from BigQuery to populate the "Google Analytics" tab on each tenant dashboard. The dbt models and Terraform resources already exist but are disabled due to a GCP authentication blocker.
GitHub Actions (dbt + Terraform) → Workload Identity Federation (OIDC, no key)
Metabase on Heroku → Service account key (org admin exception)
- GitHub Actions authenticates to BigQuery via Workload Identity Federation — no service account keys needed. The
google-github-actions/auth@v2action handles OIDC token exchange. - Metabase connects directly to BigQuery using a service account JSON key. The new GCP project (
mfb-data) has no org policy blocking key creation, so this is straightforward.
Three BigQuery dbt models in dbt/models/bigquery/:
| Model | Type | Purpose |
|---|---|---|
stg_ga_page_views.sql |
staging view | Extracts event_params from GA4 events_* wildcard table (page_location, ga_session_id) |
int_ga4_page_views.sql |
intermediate view | Derives page_path, page_hostname, state_code from page_location URL |
mart_screener_conversion_funnel.sql |
mart table | Daily conversion metrics per state: sessions started/completed/converted, user metrics, conversion rates |
Source config (dbt/models/bigquery/sources.yml):
- Source:
google_analyticsdatabase, reads fromevents_*wildcard table - Requires env vars:
GCP_PROJECT_ID,GCP_ANALYTICS_TABLE(the BigQuery dataset name)
In dashboards/metabase.tf:
metabase_database.bigquery— BigQuery data source in Metabase (count =var.bigquery_enabled ? 1 : 0)metabase_card.conversion_funnel— Card queryingmart_screener_conversion_funnel(conditional)- Dashboard widget on global analytics dashboard (conditional)
In dashboards/variables.tf:
bigquery_enabled(bool, defaultfalse) — master togglebigquery_service_account_key_content(string, sensitive) — for CI/productionbigquery_service_account_key_path(string) — for local dev (./secrets/bigquerykey.json)gcp_project_id(string) — GCP project ID
BigQuery target already configured:
bigquery:
type: bigquery
method: oauth
project: "{{ env_var('GCP_PROJECT_ID') }}"
dataset: analytics
threads: 4
timeout_seconds: 300
location: USUses method: oauth so dbt-bigquery calls google.auth.default(), which reads the WIF credential file via GOOGLE_APPLICATION_CREDENTIALS (set automatically by google-github-actions/auth@v2).
Runs --target postgres and conditionally --target bigquery (gated on BIGQUERY_ENABLED).
Two systems need BigQuery access, each using a different auth method.
Auth method: Workload Identity Federation (no key needed)
GitHub Actions supports OIDC tokens natively. GCP's Workload Identity Federation can exchange a GitHub OIDC token for short-lived GCP credentials — no service account key file required.
Steps to implement:
-
Create a GCP Service Account (this is allowed — only key creation is blocked):
gcloud iam service-accounts create github-actions-dbt \ --display-name="GitHub Actions dbt" \ --project=mfb-data -
Grant BigQuery roles to the service account:
# For dbt (read raw data + write to analytics dataset) gcloud projects add-iam-policy-binding mfb-data \ --member="serviceAccount:github-actions-dbt@mfb-data.iam.gserviceaccount.com" \ --role="roles/bigquery.dataEditor" gcloud projects add-iam-policy-binding mfb-data \ --member="serviceAccount:github-actions-dbt@mfb-data.iam.gserviceaccount.com" \ --role="roles/bigquery.jobUser"
-
Create a Workload Identity Pool + Provider:
# Create pool gcloud iam workload-identity-pools create "github-actions" \ --location="global" \ --display-name="GitHub Actions" \ --project=mfb-data # Create OIDC provider for GitHub (--attribute-condition is required by GCP) gcloud iam workload-identity-pools providers create-oidc "github" \ --location="global" \ --workload-identity-pool="github-actions" \ --display-name="GitHub" \ --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \ --attribute-condition="assertion.repository == 'MyFriendBen/data-queries'" \ --issuer-uri="https://token.actions.githubusercontent.com" \ --project=mfb-data
-
Enable the IAM Service Account Credentials API (required for WIF token exchange):
gcloud services enable iamcredentials.googleapis.com --project=mfb-data -
Allow the GitHub repo to impersonate the service account:
gcloud iam service-accounts add-iam-policy-binding \ github-actions-dbt@mfb-data.iam.gserviceaccount.com \ --role="roles/iam.workloadIdentityUser" \ --member="principalSet://iam.googleapis.com/projects/38721872277/locations/global/workloadIdentityPools/github-actions/attribute.repository/MyFriendBen/data-queries" \ --project=mfb-data
-
Update GitHub Actions workflows to use
google-github-actions/auth@v2:- uses: google-github-actions/auth@v2 with: workload_identity_provider: "projects/38721872277/locations/global/workloadIdentityPools/github-actions/providers/github" service_account: "github-actions-dbt@mfb-data.iam.gserviceaccount.com" # This sets GOOGLE_APPLICATION_CREDENTIALS automatically
-
Update
dbt-nightly.ymlto add BigQuery target:- name: Run dbt build (BigQuery) if: vars.BIGQUERY_ENABLED == 'true' run: dbt build --target bigquery env: GCP_PROJECT_ID: ${{ vars.GCP_PROJECT_ID }} GCP_ANALYTICS_TABLE: ${{ vars.GCP_ANALYTICS_TABLE }}
-
Add GitHub Environment variables:
BIGQUERY_ENABLED=trueGCP_PROJECT_ID= your GCP project IDGCP_ANALYTICS_TABLE= your GA4 BigQuery dataset name (e.g.,analytics_335669714)
Auth method: Service account key
Metabase's BigQuery driver expects a service account JSON key. The mfb-data project inherits the iam.disableServiceAccountKeyCreation org policy from the myfriendben.org organization. To create a key, a project-level exception must be set (requires roles/orgpolicy.policyAdmin at the org level). After key creation, the exception should be re-enabled.
Steps:
-
Create the service account:
gcloud iam service-accounts create metabase-bigquery \ --display-name="Metabase BigQuery Reader" \ --project=mfb-data -
Grant read-only BigQuery access:
gcloud projects add-iam-policy-binding mfb-data \ --member="serviceAccount:metabase-bigquery@mfb-data.iam.gserviceaccount.com" \ --role="roles/bigquery.dataViewer" gcloud projects add-iam-policy-binding mfb-data \ --member="serviceAccount:metabase-bigquery@mfb-data.iam.gserviceaccount.com" \ --role="roles/bigquery.jobUser"
-
Temporarily disable the org policy to allow key creation (requires
roles/orgpolicy.policyAdminon org1001672396356):# Create a policy override file cat > /tmp/org-policy-override.yaml <<'EOF' name: projects/mfb-data/policies/iam.disableServiceAccountKeyCreation spec: rules: - enforce: false EOF # Apply the override gcloud org-policies set-policy /tmp/org-policy-override.yaml --project=mfb-data # Wait ~60 seconds for propagation, then create the key gcloud iam service-accounts keys create metabase-bigquery-key.json \ --iam-account=metabase-bigquery@mfb-data.iam.gserviceaccount.com
-
Re-enable the org policy after key creation:
gcloud org-policies delete iam.disableServiceAccountKeyCreation --project=mfb-data
This removes the project-level override so the org-level enforcement is inherited again.
-
Store the key content as
BIGQUERY_SA_KEYGitHub Environment secret (Terraform passes it to Metabase via API)
Key rotation: Service account keys should be rotated periodically. Repeat steps 3-5: temporarily disable the org policy, create a new key, update the GitHub secret, run terraform apply, delete the old key, re-enable the org policy.
When BigQuery is enabled, these are needed. All variables and secrets are set per GitHub environment (staging and production). Values are the same for both environments.
| Variable | Value | Staging | Production |
|---|---|---|---|
BIGQUERY_ENABLED |
true |
Set | Not set |
GCP_PROJECT_ID |
mfb-data |
Set | Not set |
GCP_ANALYTICS_TABLE |
analytics_335669714 |
Set | Not set |
WIF_PROVIDER |
projects/38721872277/locations/global/workloadIdentityPools/github-actions/providers/github |
Set | Not set |
WIF_SERVICE_ACCOUNT |
github-actions-dbt@mfb-data.iam.gserviceaccount.com |
Set | Not set |
| Secret | Value | Staging | Production |
|---|---|---|---|
BIGQUERY_SA_KEY |
Full JSON content of metabase-bigquery-key.json |
Set | Not set |
The google-github-actions/auth@v2 action exchanges a GitHub OIDC token for short-lived GCP credentials. It reads WIF_PROVIDER and WIF_SERVICE_ACCOUNT from GitHub Environment variables and automatically sets GOOGLE_APPLICATION_CREDENTIALS for subsequent steps (dbt, Terraform).
Once staging is validated, run these to set up production (same values):
# Variables
gh variable set BIGQUERY_ENABLED --env production --repo MyFriendBen/data-queries --body "true"
gh variable set GCP_PROJECT_ID --env production --repo MyFriendBen/data-queries --body "mfb-data"
gh variable set GCP_ANALYTICS_TABLE --env production --repo MyFriendBen/data-queries --body "analytics_335669714"
gh variable set WIF_PROVIDER --env production --repo MyFriendBen/data-queries --body "projects/38721872277/locations/global/workloadIdentityPools/github-actions/providers/github"
gh variable set WIF_SERVICE_ACCOUNT --env production --repo MyFriendBen/data-queries --body "github-actions-dbt@mfb-data.iam.gserviceaccount.com"
# Secret (same key file used for both environments)
gh secret set BIGQUERY_SA_KEY --env production --repo MyFriendBen/data-queries < metabase-bigquery-key.jsonNote: Production also needs the non-BigQuery variables (DATABASE_NAME, METABASE_URL, METABASE_ADMIN_EMAIL) and secrets (DATABASE_HOST, GLOBAL_DB_USER, GLOBAL_DB_PASS, METABASE_ADMIN_PASSWORD, tenant DB credentials) that staging already has. These should be set with production-specific values before running the production workflow.
Status: Complete — 60 tables copied (2026-01-18 to 2026-03-18)
The GA4 BigQuery export currently lives in a previous organization's GCP project. We have read access to the data, but need to move it into our own GCP project before we can set up Workload Identity, dbt, or Metabase integration.
| Project Name | Project ID | Owner | |
|---|---|---|---|
| Old (source) | Benefits - MFB | benefits-mfb |
Brian (Gary team) |
| New (target) | mfb-data | mfb-data |
MFB team |
The benefits-mfb project is on the BigQuery free sandbox, which enforces a 60-day table expiration. Tables older than 60 days are automatically and permanently deleted. As of March 2026, only ~60 days of events_* tables remain (back to approx. January 2026), despite the GA4 BigQuery link being active for 1-2 years.
Implications:
- Historical data beyond 60 days is gone from BigQuery and cannot be recovered by upgrading to a paid plan
- Looker Studio dashboards showing older data are pulling from the GA4 reporting API (GA4's internal storage), not BigQuery
- The GA4 reporting API retains data for up to 14 months, but provides aggregated metrics — not the raw event-level data that BigQuery exports
- One table is expiring every day, so copying the remaining data is time-sensitive
Action items:
- Flag to Brian (Gary team) that
benefits-mfbis losing data due to sandbox expiration - Verify
mfb-datahas billing enabled so the same expiration does not apply after migration - After cutover (Phase 3), decide as a team whether to backfill older data from the GA4 reporting API. This would provide aggregated metrics (sessions, conversions, page views by date) but not raw event-level data, and may not fit the existing dbt models without modification
- GA4 property ownership — We have admin access to the GA4 property and can manage BigQuery links.
- Migration strategy — Build-first, then cutover. GA4 only allows one BigQuery link at a time, so we cannot dual-link. Instead: copy historical data to
mfb-data, build and validate dashboards against the copy, then switch the GA4 link and do a final catch-up copy to fill the gap. Old dashboards (owned by Brian/Gary team) continue working until the cutover. - What moves — All available historical data (~60 days due to sandbox expiration), plus the GA4 export link once dashboards are ready.
Step 1: Create the dataset in mfb-data
bq mk --dataset --location=US mfb-data:analytics_335669714Step 2: Copy historical data from old project to mfb-data
Use the copy script (scripts/ga4-migration/copy_ga4_tables.sh), which copies each date-sharded events_* table individually and logs completed tables to a manifest file (scripts/ga4-migration/ga4_copy_manifest.log).
# Preview what will be copied
./scripts/ga4-migration/copy_ga4_tables.sh --dry-run
# Run the copy
./scripts/ga4-migration/copy_ga4_tables.shThe script is resumable — if interrupted, re-run it and it will skip tables already in the manifest. The manifest also serves as a record of what was copied and when to resume from during the catch-up copy (Step 6).
Step 3: Verify the copy
- Compare row counts between old and new datasets for a sample of date-sharded tables
- Validate dbt models run successfully against the copied data in
mfb-data
Step 4: Build and validate dashboards
- Set up WIF, dbt, Metabase (Phases 1-2 below)
- Build the GA tab with conversion funnel charts
- Validate everything works against the copied historical data
Step 5: Cutover the GA4 link
Once dashboards are validated and ready for production:
- Note the current date — this marks the start of the gap period
- In GA4 Admin > Product Links > BigQuery Links, remove the link to
benefits-mfb - Add a new link pointing to
mfb-datawith the same export settings (daily/streaming, events) - New events now flow to
mfb-data
Step 6: Catch-up copy
Re-run the same copy script to pick up any new tables created since the initial copy:
./scripts/ga4-migration/copy_ga4_tables.shThe script skips tables already in the manifest, so it will only copy the gap period tables.
Step 7: Coordinate with Brian (Gary team)
- Notify Brian that the old export has stopped and old dashboards will no longer receive new data
- Old data in
benefits-mfbremains accessible until the project owner deletes it
In the old project (Benefits - MFB), the user running the copy needs:
bigquery.tables.getDataon the source dataset (BigQuery Data Viewer role)bigquery.jobs.createin the old project (BigQuery Job User role)
In the new project (mfb-data):
bigquery.datasets.create(to create the target dataset, if not auto-created by GA4 link)bigquery.tables.create+bigquery.tables.updateData(BigQuery Data Editor role)bigquery.jobs.create(BigQuery Job User role)
Steps 1, 2, and 3 can be done in parallel. Copy data ASAP — the sandbox is deleting one table per day.
- Verify
mfb-datahas billing enabled — GCP Console > Billing. Required to avoid the same 60-day sandbox expiration. - Create dataset + copy historical data to
mfb-data— 60 tables copied (2026-01-18 to 2026-03-18). Seescripts/ga4-migration/ga4_copy_manifest.log. - Set up Workload Identity Federation — Pool
github-actions, providergithubwith attribute condition restricting toMyFriendBen/data-queries. SAgithub-actions-dbtbound withworkloadIdentityUser. - Create Metabase service account + key — SA
metabase-bigquery(pre-existing with correct roles). Key created after temporarily overriding org policy. Remember to re-enable the org policy (step 2 below). - Add GitHub Environment variables and secrets — Set for staging. See Environment Variables Summary. Production pending.
- Update
dbt-nightly.yml— add OIDC auth step + BigQuery target (conditional onBIGQUERY_ENABLED). Merged in PR #46. - Test dbt BigQuery build — triggered workflow on staging,
mart_screener_conversion_funnelandreferrer_codescreated in BigQuery. - Enable
bigquery_enabled = truein Terraform — set the GitHub Environment variable, ranterraform applyon staging. Created BigQuery data source, conversion funnel card, and dashboard widget in Metabase. Note: first apply fails because Metabase needs time to sync new BigQuery tables; second apply succeeds (see comment inmetabase.tf). - Add GA tab cards to tenant dashboards — build conversion funnel charts for the "Google Analytics" tab
- Switch the GA4 BigQuery link from
benefits-mfbtomfb-data— migration step 5 - Run catch-up copy for the gap period — migration step 6
- Notify Brian (Gary team) that old export has stopped — migration step 7
- Team decision: backfill older historical data? — Discuss whether to pull aggregated metrics from the GA4 reporting API for the period lost to sandbox expiration (see known issue)