Context
While investigating the missing contact for WL-0359 (BDMS-537), I discovered the issue was caused by a casing mismatch between the OwnerKey and OwnerLink fields.
Description
Contact transfer joins OwnersData to OwnerLink on OwnerKey with a case-sensitive pandas join. If the casing differs (e.g., "Rio en Medio MDWCA" vs "Rio En Medio MDWCA"), the join fails, LocationId/PointID are missing, and the row is dropped by filter_to_valid_point_ids.
This silently skips contact creation and linking.
Evidence
- Join logic:
odf = odf.join(ldf.set_index("OwnerKey"), on="OwnerKey") in transfers/contact_transfer.py
filter_to_valid_point_ids removes rows with missing PointID.
Expected Outcome
Contacts should still link to the correct PointID if the OwnerKey differs only by case.
Actual Outcome
Contacts are skipped entirely when OwnerKey casing differs between OwnersData and OwnerLink.
Impact
Data loss in contact migration and downstream linking (permissions, field event participants).
Proposed Fixes/Solutions
- Normalize
OwnerKey casing before joining, e.g. lowercasing both OwnersData.OwnerKey and OwnerLink.OwnerKey in get_dfs
or
- Resolve issues in source .csv
Potential Disturbances or Side Effects
Solution No. 1: Normalize OwnerKey casing before joining
- Collision risk (low but real):
- If there are distinct owners whose keys differ only by case, they will now collapse to the same normalized key. That could incorrectly link contacts to the wrong PointID.
- Mitigation: detect duplicates after normalization and log or error when a normalized key maps to multiple OwnerLink rows.
- Behavior change in skipped rows:
- Previously, case-mismatched rows were dropped; after the change they’ll transfer. This is intended but will alter counts.
- Repeatability / idempotency concerns:
- This fix may cause previously skipped contacts to be created on a rerun. That’s desirable, but if you rely on “no new data on rerun” you’ll see new inserts. It’s still idempotent in spirit if you use de-duplication (you already do (name, organization)).
- Performance:
- Minor overhead from computing normalized columns and potential duplicate checks. This is negligible compared to the transfer itself.
Solution No. 2: Resolve issues in source .csv
- Provenance and auditability
- Manually editing source CSVs can blur “what the source system said” vs “what we changed.” That can make audits or re-imports harder to defend.
- Repeatability
- If the CSV is regenerated from the source system, the fix will be overwritten on the next refresh. The bug can silently reappear.
- Change management
- Fixing the CSV outside the code path is easy to miss for teammates and harder to track in version control (especially if the CSV is large or not checked in).
- Upstream reconciliation
- If the source system genuinely contains inconsistent casing, “fixing” only one file means a local copy diverges from upstream. Later diffs become noisy.
- Collision risk still exists
- If there are two real owners whose keys differ only by case, normalizing by editing the CSV can silently collapse them into one record without a warning.
- Hidden dependencies
- Other transfers or scripts might expect the original casing for debugging or reconciliation. Changing it might break ad hoc tooling or comparisons.
- Cache behavior
- We'd be editing a cached .csv. If the cache is considered disposable, fixes won’t persist across refreshes.
Additional Notes
This issue is purely in the transfer step and doesn’t affect runtime API usage.
Recommended Solution
Solution No. 1, to normalize OwnerKey casing before joining. It is the more robust, traceable, and repeatable option.
Context
While investigating the missing contact for WL-0359 (BDMS-537), I discovered the issue was caused by a casing mismatch between the
OwnerKeyandOwnerLinkfields.Description
Contact transfer joins
OwnersDatatoOwnerLinkonOwnerKeywith a case-sensitive pandas join. If the casing differs (e.g., "Rio en Medio MDWCA" vs "Rio En Medio MDWCA"), the join fails,LocationId/PointIDare missing, and the row is dropped byfilter_to_valid_point_ids.This silently skips contact creation and linking.
Evidence
odf = odf.join(ldf.set_index("OwnerKey"), on="OwnerKey")intransfers/contact_transfer.pyfilter_to_valid_point_idsremoves rows with missingPointID.Expected Outcome
Contacts should still link to the correct PointID if the OwnerKey differs only by case.
Actual Outcome
Contacts are skipped entirely when OwnerKey casing differs between OwnersData and OwnerLink.
Impact
Data loss in contact migration and downstream linking (permissions, field event participants).
Proposed Fixes/Solutions
OwnerKeycasing before joining, e.g. lowercasing bothOwnersData.OwnerKeyandOwnerLink.OwnerKeyinget_dfsor
Potential Disturbances or Side Effects
Solution No. 1: Normalize
OwnerKeycasing before joiningSolution No. 2: Resolve issues in source .csv
Additional Notes
This issue is purely in the transfer step and doesn’t affect runtime API usage.
Recommended Solution
Solution No. 1, to normalize
OwnerKeycasing before joining. It is the more robust, traceable, and repeatable option.