Skip to content

chore: all scalars support cast#1965

Merged
danking merged 15 commits into
developfrom
dk/fix-casting
Jan 21, 2025
Merged

chore: all scalars support cast#1965
danking merged 15 commits into
developfrom
dk/fix-casting

Conversation

@danking

@danking danking commented Jan 15, 2025

Copy link
Copy Markdown
Contributor

No description provided.

@danking danking requested a review from gatesn January 15, 2025 18:59
@danking

danking commented Jan 15, 2025

Copy link
Copy Markdown
Contributor Author

@gatesn curious for your thoughts here.

I'm not blocked by this PR, but I keep getting turned around by our types and scalars. There's at least one bug: casting a null BoolScalar to Bool(Nullable) raises the error "not a bool".

Scalar::cast felt wrong to me. It seems that we should be matching on the our type and then casting to the target type (the parameter dtype). But the only case where I care about our type is extension. In every other case, all that matters is that the value is compatible.

assert_eq!(min, Some(null_i32.clone()));
assert_eq!(max, Some(null_i32));
assert_eq!(min, None);
assert_eq!(max, None);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just wrong. I'm not really sure why it worked.

Scalar::null(DType::Bool(Nullable))
);
assert_eq!(bool_arr.statistics().compute(Stat::Min), None);
assert_eq!(bool_arr.statistics().compute(Stat::Max), None);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were just wrong, min and max are never null.

.encoding()
.compute_statistics(self, stat)
.ok()?
.vortex_expect("compute_statistics must not fail")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debatable but it's really annoying to debug compute_statistics errors without this.

@danking danking marked this pull request as ready for review January 16, 2025 16:19
@danking

danking commented Jan 16, 2025

Copy link
Copy Markdown
Contributor Author

@gatesn OK, this is finally working as expected.

Comment thread vortex-scalar/src/utf8.rs Outdated
todo!()
pub(crate) fn cast(&self, dtype: &DType) -> VortexResult<Scalar> {
if !matches!(dtype, DType::Bool(..)) {
vortex_bail!("Can't cast bool to {}", dtype)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this bool?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy pasta. fixed.

Comment thread vortex-scalar/src/utf8.rs

pub fn cast(&self, _dtype: &DType) -> VortexResult<Scalar> {
todo!()
pub(crate) fn cast(&self, dtype: &DType) -> VortexResult<Scalar> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you cast a string into a number if it parses correctly?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for binary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing a string seems wrong to me because it's so much more work. Casting from string to binary (i.e. forgetting the UTF8-ness) seems correct to me. I guess Binary to any integer type should be fine, erroring if the binary is not the right length. Particularly because if we add FixedWidthBinary, I would think binary -> fixedwidthbinary(4) -> u32 should work.

Should I add the String to Binary case to this PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to add a FLUP issue with various casts we want to support (e.g., bool to int, utf8 to binary, etc)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danking danking requested a review from joseph-isaacs January 16, 2025 17:31
@danking danking mentioned this pull request Jan 16, 2025
Comment thread vortex-scalar/src/extension.rs Outdated
use crate::Scalar;

pub struct ExtScalar<'a> {
dtype: &'a DType,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe don't need dtype

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread vortex-scalar/src/lib.rs
vortex_bail!("Can't cast null scalar to non-nullable type")
pub fn cast(&self, target: &DType) -> VortexResult<Self> {
if let DType::Extension(ext_dtype) = target {
let storage_scalar = self.cast_to_non_extension(ext_dtype.storage_dtype())?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's theoretically possible right now to have a storage dtype that is an Extension type... we should potentially prohibit that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we prohibit that.

Comment thread vortex-scalar/src/lib.rs Outdated
if target.is_nullable() {
return Ok(Scalar::new(target.clone(), self.value.clone()));
} else {
vortex_bail!("Can't cast null scalar to non-nullable type")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include target in error msg

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread vortex-scalar/src/utf8.rs

pub fn cast(&self, _dtype: &DType) -> VortexResult<Scalar> {
todo!()
pub(crate) fn cast(&self, dtype: &DType) -> VortexResult<Scalar> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to add a FLUP issue with various casts we want to support (e.g., bool to int, utf8 to binary, etc)

@danking danking requested a review from lwwmanning January 21, 2025 11:23
@danking danking enabled auto-merge (squash) January 21, 2025 11:24
@danking danking merged commit 64b6087 into develop Jan 21, 2025
@danking danking deleted the dk/fix-casting branch January 21, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants