Skip to content

Contact transfer drops rows when OwnerKey case doesn’t match OwnerLink #479

Description

@ksmuczynski

Context

While investigating the missing contact for WL-0359 (BDMS-537), I discovered the issue was caused by a casing mismatch between the OwnerKey and OwnerLink fields.

Description

Contact transfer joins OwnersData to OwnerLink on OwnerKey with a case-sensitive pandas join. If the casing differs (e.g., "Rio en Medio MDWCA" vs "Rio En Medio MDWCA"), the join fails, LocationId/PointID are missing, and the row is dropped by filter_to_valid_point_ids.

This silently skips contact creation and linking.

Evidence

  • Join logic: odf = odf.join(ldf.set_index("OwnerKey"), on="OwnerKey") in transfers/contact_transfer.py
  • filter_to_valid_point_ids removes rows with missing PointID.

Expected Outcome

Contacts should still link to the correct PointID if the OwnerKey differs only by case.

Actual Outcome

Contacts are skipped entirely when OwnerKey casing differs between OwnersData and OwnerLink.

Impact

Data loss in contact migration and downstream linking (permissions, field event participants).

Proposed Fixes/Solutions

  1. Normalize OwnerKey casing before joining, e.g. lowercasing both OwnersData.OwnerKey and OwnerLink.OwnerKey in get_dfs
    or
  2. Resolve issues in source .csv

Potential Disturbances or Side Effects

Solution No. 1: Normalize OwnerKey casing before joining

  1. Collision risk (low but real):
    • If there are distinct owners whose keys differ only by case, they will now collapse to the same normalized key. That could incorrectly link contacts to the wrong PointID.
    • Mitigation: detect duplicates after normalization and log or error when a normalized key maps to multiple OwnerLink rows.
  2. Behavior change in skipped rows:
    • Previously, case-mismatched rows were dropped; after the change they’ll transfer. This is intended but will alter counts.
  3. Repeatability / idempotency concerns:
    • This fix may cause previously skipped contacts to be created on a rerun. That’s desirable, but if you rely on “no new data on rerun” you’ll see new inserts. It’s still idempotent in spirit if you use de-duplication (you already do (name, organization)).
  4. Performance:
    • Minor overhead from computing normalized columns and potential duplicate checks. This is negligible compared to the transfer itself.

Solution No. 2: Resolve issues in source .csv

  1. Provenance and auditability
    • Manually editing source CSVs can blur “what the source system said” vs “what we changed.” That can make audits or re-imports harder to defend.
  2. Repeatability
    • If the CSV is regenerated from the source system, the fix will be overwritten on the next refresh. The bug can silently reappear.
  3. Change management
    • Fixing the CSV outside the code path is easy to miss for teammates and harder to track in version control (especially if the CSV is large or not checked in).
  4. Upstream reconciliation
    • If the source system genuinely contains inconsistent casing, “fixing” only one file means a local copy diverges from upstream. Later diffs become noisy.
  5. Collision risk still exists
    • If there are two real owners whose keys differ only by case, normalizing by editing the CSV can silently collapse them into one record without a warning.
  6. Hidden dependencies
    • Other transfers or scripts might expect the original casing for debugging or reconciliation. Changing it might break ad hoc tooling or comparisons.
  7. Cache behavior
    • We'd be editing a cached .csv. If the cache is considered disposable, fixes won’t persist across refreshes.

Additional Notes

This issue is purely in the transfer step and doesn’t affect runtime API usage.

Recommended Solution

Solution No. 1, to normalize OwnerKey casing before joining. It is the more robust, traceable, and repeatable option.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions