fix: ExtDType equality#1961
Conversation
| impl PartialEq for ExtDType { | ||
| fn eq(&self, other: &Self) -> bool { | ||
| self.id == other.id | ||
| self.id == other.id && self.storage_dtype == other.storage_dtype |
There was a problem hiding this comment.
at the very least, shouldn't we factor in the metadata? E.g. vortex.timestamp always stores i64, but the timezone is stored in the extmetadata. Should two timestamp types be equal if they have different timezones?
|
Capturing some thoughts from Slack: There is a lot of logic that the extension type author needs to have control over, namely the semantics of how some compute operations (like CastFn) affect the type, and how to interpret the metadata of the type to ensure semantic equality. We should find a way to make it possible for external type authors to plug-in these decisions. For now, it's probably sufficient to just compare |
|
For opaque extension types, Arrow-cpp defines equality in terms of name and storage type. It seems to ignore metadata. Arrow-rs appears to take a strong stance against treating extension types any differently from their storage type: apache/arrow-rs#4472. I think Arrow-rs would put this information into the Schema metadata; however, I am not entirely certain of that. |
|
PyArrow includes the storage dtype in its definition of equality. The docs actually highlight using different storage types with the same extension type as a feature, not a bug:
|
Trying to better "adhere" to the Arrow behavior.