Skip to content

feat: Add push_validity_into_children methods to StructArray#5826

Closed
amorynan wants to merge 3 commits into
vortex-data:developfrom
amorynan:clean_push_validity_branch
Closed

feat: Add push_validity_into_children methods to StructArray#5826
amorynan wants to merge 3 commits into
vortex-data:developfrom
amorynan:clean_push_validity_branch

Conversation

@amorynan

Copy link
Copy Markdown

For solving : #3859
Add methods to push struct-level validity into child fields:

  • push_validity_into_children(preserve_struct_validity: bool)
  • push_validity_into_children_default() - convenience method with preserve=false

The functionality propagates null information from struct level down to individual fields, with options to preserve or remove the struct-level validity.

Includes comprehensive tests covering all scenarios:

  • preserve_struct_validity = true
  • preserve_struct_validity = false (default)
  • no nulls edge case

Add methods to push struct-level validity into child fields:
- push_validity_into_children(preserve_struct_validity: bool)
- push_validity_into_children_default() - convenience method with preserve=false

The functionality propagates null information from struct level down to
individual fields, with options to preserve or remove the struct-level validity.

Includes comprehensive tests covering all scenarios:
- preserve_struct_validity = true
- preserve_struct_validity = false (default)
- no nulls edge case

Signed-off-by: amorynan <amorywang111@gmail.com>
@connortsui20 connortsui20 added the changelog/feature A new feature label Dec 27, 2025
@codecov

codecov Bot commented Dec 27, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.96%. Comparing base (1e0f608) to head (cefd1e1).
⚠️ Report is 14 commits behind head on develop.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@connortsui20 connortsui20 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good first step, though I'm trying to figure out how we want to implement this behavior for the new operator world we are migrating to.

@gatesn mentioned this in the original issue:

Since this issue was filed, we now have a "Mask" expression that essentially performs intersection with validity.

So you could take a look at how you might take the validity from the struct array itself, and wrap up each child field in a mask expression (see builtins.rs), before constructing a new StructArray with or without the previous validity. These two are different behaviors.

This PR implements this feature correctly for the old (current) world, but we will likely want to implement the behavior described above very very soon, so soon that it might not even be worth merging this right now? @gatesn do you have any thoughts?

Comment thread vortex-array/src/arrays/struct_/array.rs Outdated


pub fn push_validity_into_children(&self, preserve_struct_validity: bool) -> VortexResult<Self> {
use crate::compute::mask;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this import to the top?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer to see compute::mask as the function call instead of just mask, but that is just my personal preference

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Comment thread vortex-array/src/arrays/struct_/array.rs Outdated
Comment on lines +480 to +502
/// // Create struct with top-level nulls
/// let struct_array = StructArray::try_new(
/// ["a", "b"].into(),
/// vec![
/// buffer![1i32, 2i32, 3i32].into_array(),
/// buffer![10i32, 20i32, 30i32].into_array(),
/// ],
/// 3,
/// Validity::from_iter([true, false, true]), // row 1 is null
/// ).unwrap();
///
/// // Push validity into children, preserving struct validity
/// let pushed = struct_array.push_validity_into_children(true).unwrap();
/// // pushed.fields()[0] now has nulls at position 1
/// // pushed.fields()[1] now has nulls at position 1
/// // pushed.validity still shows row 1 as null
///
/// // Push validity into children, removing struct validity
/// let pushed_no_struct = struct_array.push_validity_into_children(false).unwrap();
/// // pushed_no_struct.fields()[0] now has nulls at position 1
/// // pushed_no_struct.fields()[1] now has nulls at position 1
/// // pushed_no_struct.validity is AllValid
/// ```

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more helpful to comment what the struct array looks like instead of requiring the reader to parse what the created struct array becomes after try_new. And you might as well have assertions stating that specific positions are null rather than just comments.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a two struct are compare using assert_array_eq!

Comment on lines +484 to +485
/// buffer![1i32, 2i32, 3i32].into_array(),
/// buffer![10i32, 20i32, 30i32].into_array(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be immediately obvious to someone reading this that the child buffers are non-nullable

self.names().clone(),
self.fields().clone(),
self.len(),
Validity::AllValid,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I wonder if we should make the validity of the top-level struct array Validity::NonNullable instead of AllValid if the user requests to not preserve the struct validity. Do you have any thoughts?

Comment on lines +529 to +540
let null_mask = struct_validity_mask.iter_bools(|iter| {
Mask::from_iter(iter.map(|valid| !valid)) // invert: valid->invalid, invalid->valid
});

let masked_fields: Vec<ArrayRef> = self
.fields()
.iter()
.map(|field| {
// Use the mask function to apply null positions to each field.
mask(field.as_ref(), &null_mask)
})
.collect::<VortexResult<Vec<_>>>()?;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inverted logic makes me think more that we should just move this to the new world now instead of merging this and then getting rid of it immediately. @gatesn Any thoughts about this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this for now

// Apply the struct validity mask to each child field.
// We want to set nulls where the struct is null (i.e., where struct_validity_mask is false).
// So we need to invert the mask: where struct is invalid, set child to invalid.
let null_mask = struct_validity_mask.iter_bools(|iter| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a not method

@joseph-isaacs joseph-isaacs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to split this into two functions.

One which pushes down the validity into all children and makes the struct non-nullable (push_validity_into_children()) and another that only push down validity (possibly duplicating it) (compact_validity())

Comment on lines +464 to +466
/// * `preserve_struct_validity` - If true, the new struct array retains the original struct-level
/// validity. If false, the new struct array has `Validity::AllValid` since all null information
/// is now contained within the individual fields.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we invert this and have a remove_struct_validity since the default (false) should be to keep it. Since that doesn't change the values in the struct whereas removing this does

Comment on lines +480 to +502
/// // Create struct with top-level nulls
/// let struct_array = StructArray::try_new(
/// ["a", "b"].into(),
/// vec![
/// buffer![1i32, 2i32, 3i32].into_array(),
/// buffer![10i32, 20i32, 30i32].into_array(),
/// ],
/// 3,
/// Validity::from_iter([true, false, true]), // row 1 is null
/// ).unwrap();
///
/// // Push validity into children, preserving struct validity
/// let pushed = struct_array.push_validity_into_children(true).unwrap();
/// // pushed.fields()[0] now has nulls at position 1
/// // pushed.fields()[1] now has nulls at position 1
/// // pushed.validity still shows row 1 as null
///
/// // Push validity into children, removing struct validity
/// let pushed_no_struct = struct_array.push_validity_into_children(false).unwrap();
/// // pushed_no_struct.fields()[0] now has nulls at position 1
/// // pushed_no_struct.fields()[1] now has nulls at position 1
/// // pushed_no_struct.validity is AllValid
/// ```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a two struct are compare using assert_array_eq!

/// // pushed_no_struct.fields()[1] now has nulls at position 1
/// // pushed_no_struct.validity is AllValid
/// ```
/// Push validity into children with default behavior (preserve_struct_validity = false).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invert please

Comment on lines +529 to +540
let null_mask = struct_validity_mask.iter_bools(|iter| {
Mask::from_iter(iter.map(|valid| !valid)) // invert: valid->invalid, invalid->valid
});

let masked_fields: Vec<ArrayRef> = self
.fields()
.iter()
.map(|field| {
// Use the mask function to apply null positions to each field.
mask(field.as_ref(), &null_mask)
})
.collect::<VortexResult<Vec<_>>>()?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this for now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use assert_arrays_eq! to check equality

@github-actions

github-actions Bot commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

This PR has been marked as stale because it has been open for 30 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days

@github-actions github-actions Bot added the stale This PR is stale and will be auto-closed soon label Feb 5, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions Bot closed this Feb 12, 2026
robert3005 pushed a commit that referenced this pull request Jun 25, 2026
## Summary

`push_validity_into_children` masks each field with the struct's
top-level validity, so a row null at the struct level becomes null in
every field (`{a: 1, b: 2}, NULL` -> `{a: 1, b: 2}, {a: NULL, b:
NULL}`), mirroring Arrow's `StructArray::flatten`.
`remove_struct_validity` drops the top-level validity to non-nullable;
otherwise it is kept, and a struct with no top-level nulls is returned
unchanged.

Each field is masked via a `mask` expression (per @gatesn's note on the
issue, not the eager `compute::mask` of #5826). Open question: should
this be a `StructArray` method, or a standalone mask expression in the
new operator world?

Closes: #3859

## Benchmark

For reference (not committed), vs hand-rolling the same masking without
the fast path: with no top-level nulls the fast path is ~5-7x faster
(0.26us vs 1.2us at 4 fields, 0.65us vs 4.5us at 16); with nulls the two
are equal (~1.7us / ~6.3us), so the method adds no overhead.

## Testing

`cargo nextest run -p vortex-array` passes (drops/preserves validity,
intersecting field-level nulls, all-invalid, no-nulls fast path); `fmt
--all` + `clippy --all-targets --all-features` clean.

---

I'm Korean, so sorry if any wording reads a little awkward.

Signed-off-by: Han Damin <miniex@daminstudio.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature stale This PR is stale and will be auto-closed soon

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants