Problem
Even with only GitHub integration, we have been having difficulties with the sync up/down to the external service.
- Writes are coupled to the external API
- Multiple
task and comment records are created
- Records become easily out of sync
Our basic goals:
- Eventual consistency to the destination records in our database
- Eventual consistency to the external APIs
- Events to/from the external APIs are fully concurrent:
- enqueued in order
- dropped when more recent events come in
- restartable when errored
Each Event can have a:
direction - :inbound | :outbound
integration_ fields
integration_external_id - the id of the integration resource from the external provider
integration_updated_at - the last updated at timestamp of the integration resource from the external provider
integration_record_id - the id of our cached record for the resource
integration_record_type - the type our cached record for the resource as the table name
record_ fields
record_id - the id of the record for the resource connected to this integration
record_type - the type of the record for the resource connected to this integration as the table name
canceled_by - the id of the Event that canceled this one
duplicate_of - the id of the Event that this is a duplicate of
ignored_for_id - the id of the record that caused this event to be ignored
ignored_for_type - the type of the record (table name) that caused this event to be ignored
state - :queued | :processing | :completed | :errored | :canceled | :ignored | :duplicate | :disabled
We may want our own writes to our own records, even without integrations, to also go through this process. Not sure.
When an event comes in we should:
- check if there is any event for the
integration_external_id where:
- the
integration_updated_at is after our event's last updated timestamp (limit 1)
- if yes, set state to
:ignored and stop processing, set ignored_for_id to the id of the event in the limit 1 query and ignored_for_type to this event table's name
- the
integration_updated_at timestamp for the relevant record_ is equal to our event's last updated timestamp (limit 1)
- if yes, set state to
:duplicate and stop processing, set duplicate_of to the id of the event in the limit 1
- the
modified_at timestamp for the relevant record_ is after our event's last updated timestamp
- if yes, set state to
:ignored and stop processing, set ignored_for_id to the record_id and ignored_for_type to the record_type
- check if there are any events for the
integration_external_id where:
integration_updated_at is before our event's last updated timestamp
- if yes, set state of those events to
:canceled and set canceled_by to the id of this event
- check if there is any other
:queued event or :processing event for the integration_external_id
- if yes, set state to
:queued
- when
:processing, create or update the relevant record matching record_id and record_type through the relationship on the record for integration_record_id and integration_record_type
- when
:completed, kick off process to look for next :queued item where the integration_updated_at timestamp is the oldest
We would also need within the logic for updating the given record to check whether the record's updated timestamp is after the event's timestamp. If it is, then we need to bubble the changeset validation error and mark the event :ignored as above.
Problem
Even with only GitHub integration, we have been having difficulties with the sync up/down to the external service.
taskandcommentrecords are createdOur basic goals:
Each
Eventcan have a:direction-:inbound | :outboundintegration_fieldsintegration_external_id- theidof the integration resource from the external providerintegration_updated_at- the last updated at timestamp of the integration resource from the external providerintegration_record_id- theidof our cached record for the resourceintegration_record_type- thetypeour cached record for the resource as the table namerecord_fieldsrecord_id- theidof the record for the resource connected to this integrationrecord_type- thetypeof the record for the resource connected to this integration as the table namecanceled_by- theidof theEventthat canceled this oneduplicate_of- theidof theEventthat this is a duplicate ofignored_for_id- theidof the record that caused this event to be ignoredignored_for_type- thetypeof the record (table name) that caused this event to be ignoredstate-:queued | :processing | :completed | :errored | :canceled | :ignored | :duplicate | :disabledWe may want our own writes to our own records, even without integrations, to also go through this process. Not sure.
When an event comes in we should:
integration_external_idwhere:integration_updated_atis after our event's last updated timestamp (limit 1):ignoredand stop processing, setignored_for_idto theidof the event in thelimit 1query andignored_for_typeto this event table's nameintegration_updated_attimestamp for the relevantrecord_is equal to our event's last updated timestamp (limit 1):duplicateand stop processing, setduplicate_ofto theidof the event in thelimit 1modified_attimestamp for the relevantrecord_is after our event's last updated timestamp:ignoredand stop processing, setignored_for_idto therecord_idandignored_for_typeto therecord_typeintegration_external_idwhere:integration_updated_atis before our event's last updated timestamp:canceledand setcanceled_byto theidof this event:queuedevent or:processingevent for theintegration_external_id:queued:processing, create or update the relevant record matchingrecord_idandrecord_typethrough the relationship on the record forintegration_record_idandintegration_record_type:completed, kick off process to look for next:queueditem where theintegration_updated_attimestamp is the oldestWe would also need within the logic for updating the given record to check whether the record's updated timestamp is after the event's timestamp. If it is, then we need to bubble the changeset validation error and mark the event
:ignoredas above.