Skip to content

Conversation

@mscolnick
Copy link
Contributor

Fixes #7399
When using mo.ui.dataframe with lazy dataframes (Ibis tables, Polars LazyFrames), the lazy behavior was being lost. The dataframe would be inadvertently collected/materialized, converting lazy Ibis tables to PyArrow tables and Polars LazyFrames to eager DataFrames. This PR refactors the lazy/eager tracking logic into a reusable utility function that properly preserves the original dataframe type.

This PR adds two utilities make_lazy and collect_and_preserve_type to help with this nuanced logic and then updates usage of this throughout (and some tests).

…ervation in transformations

Fixes #7399
When using `mo.ui.dataframe` with lazy dataframes (Ibis tables, Polars LazyFrames), the lazy behavior was being lost. The dataframe would be inadvertently collected/materialized, converting lazy Ibis tables to PyArrow tables and Polars LazyFrames to eager DataFrames. This PR refactors the lazy/eager tracking logic into a reusable utility function that properly preserves the original dataframe type.

This PR adds two utilities `make_lazy` and `collect_and_preserve_type` to help with this nuanced logic and then updates usage of this throughout (and some tests).
@vercel
Copy link

vercel bot commented Dec 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
marimo-docs Ready Ready Preview Comment Dec 5, 2025 2:28pm

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances lazy dataframe handling in marimo by introducing two new utility functions (make_lazy and collect_and_preserve_type) that properly preserve the original dataframe type (eager vs. lazy) throughout transformations. Previously, lazy dataframes like Ibis tables and Polars LazyFrames were inadvertently being materialized when used with mo.ui.dataframe, converting them to eager types. The new utilities track the original type and provide callback functions to restore it after operations.

Key Changes

  • Added make_lazy() utility that converts any dataframe to lazy and returns an undo callback
  • Added collect_and_preserve_type() utility that collects a lazy dataframe while preserving backend information
  • Refactored dataframe transform handlers to use these utilities instead of manual collect/lazy calls
  • Added comprehensive test coverage for type preservation across all dataframe backends

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
marimo/_utils/narwhals_utils.py Adds the new make_lazy and collect_and_preserve_type utility functions with type tracking and undo callbacks
marimo/_plugins/ui/_impl/dataframes/dataframe.py Refactors to use make_lazy with undo callback instead of manual lazy/eager tracking
marimo/_plugins/ui/_impl/dataframes/transforms/apply.py Simplifies transform application by using make_lazy and its undo callback
marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py Updates shuffle and sample handlers to use collect_and_preserve_type
marimo/_plugins/ui/_impl/altair_chart.py Updates _filter_dataframe to use make_lazy for type preservation
marimo/_plugins/ui/_impl/charts/altair_transformer.py Refactors sanitize_nan_infs to use make_lazy
tests/_utils/test_narwhals_utils.py Adds comprehensive tests for the new utility functions
tests/_plugins/ui/_impl/dataframes/test_dataframe.py Adds type preservation assertions to existing tests
tests/_plugins/ui/_impl/dataframes/test_handlers.py Updates tests to use make_lazy and verify type preservation
tests/_plugins/ui/_impl/test_table.py Adds type preservation assertions to table tests
tests/_plugins/ui/_impl/test_altair_chart.py Adds test for dataframe type preservation in chart filtering
tests/_plugins/ui/_impl/charts/test_altair_transformers.py Adds test for type preservation in NaN/Inf sanitization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



if TYPE_CHECKING:
UndoCallback = Callable[[nw.LazyFrame[Any]], IntoFrame]
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UndoCallback type hint specifies that it takes a nw.LazyFrame[Any], but the actual undo function accepts Union[nw.LazyFrame[Any], nw.DataFrame[Any]]. The type hint should be updated to match the implementation: Callable[[Union[nw.LazyFrame[Any], nw.DataFrame[Any]]], IntoFrame]

Suggested change
UndoCallback = Callable[[nw.LazyFrame[Any]], IntoFrame]
UndoCallback = Callable[[Union[nw.LazyFrame[Any], nw.DataFrame[Any]]], IntoFrame]

Copilot uses AI. Check for mistakes.
"df",
create_dataframes(
{"A": [1, 2, 3], "B": ["a", "b", "c"]},
# exclude=["duckdb", "ibis", "lazy-polars"],
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] There's a commented-out exclude parameter that seems to have been used during development. If this exclusion is no longer needed, the comment should be removed. If it's needed for specific cases, it should be uncommented and documented why certain backends are excluded.

Suggested change
# exclude=["duckdb", "ibis", "lazy-polars"],

Copilot uses AI. Check for mistakes.
Comment on lines 214 to 215
def test_passing_as_different_df_to_undo():
"""Test that passing a different dataframe type to undo raises an error."""
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The test name test_passing_as_different_df_to_undo suggests it should test error handling when passing a different dataframe type, but the test doesn't actually verify any error behavior. The test just verifies that the undo callback returns the correct type. Consider renaming the test to better reflect what it actually tests, e.g., test_undo_preserves_lazy_or_eager_type.

Suggested change
def test_passing_as_different_df_to_undo():
"""Test that passing a different dataframe type to undo raises an error."""
def test_undo_preserves_lazy_or_eager_type():
"""Test that the undo callback returns the correct type (eager or lazy) matching the original input."""

Copilot uses AI. Check for mistakes.
Comment on lines 652 to 653
def test_ibis_dataframe_types_is_preserved(df: IntoDataFrame):
"""Test that Ibis tables remain lazy after using mo.ui.dataframe."""
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The test name test_ibis_dataframe_types_is_preserved is misleading because it tests all dataframe types (via create_dataframes), not just Ibis tables. Consider renaming to test_dataframe_types_are_preserved or test_ui_dataframe_preserves_type to better reflect what the test actually does.

Suggested change
def test_ibis_dataframe_types_is_preserved(df: IntoDataFrame):
"""Test that Ibis tables remain lazy after using mo.ui.dataframe."""
def test_dataframe_types_are_preserved(df: IntoDataFrame):
"""Test that dataframe types are preserved after using mo.ui.dataframe."""

Copilot uses AI. Check for mistakes.
),
)
def test_ibis_dataframe_types_is_preserved(df: IntoDataFrame):
"""Test that Ibis tables remain lazy after using mo.ui.dataframe."""
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring "Test that Ibis tables remain lazy after using mo.ui.dataframe" doesn't match the test implementation, which tests all dataframe types, not just Ibis. Update the docstring to something like "Test that dataframe types are preserved after using mo.ui.dataframe."

Suggested change
"""Test that Ibis tables remain lazy after using mo.ui.dataframe."""
"""Test that dataframe types are preserved after using mo.ui.dataframe."""

Copilot uses AI. Check for mistakes.
Comment on lines 252 to 254
# Return the original data using the undo callback
lazy_data, _ = make_lazy(self._data)
return self._undo(lazy_data)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new lazy dataframe and discarding its undo callback is inefficient. Instead, use self._transform_container._original_df which already contains the lazy version of the original dataframe. This avoids redundant conversion and ensures consistency.

Copilot uses AI. Check for mistakes.
Comment on lines 266 to 267
lazy_data, _ = make_lazy(self._data)
return self._undo(lazy_data)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new lazy dataframe and discarding its undo callback is inefficient. Instead, use self._transform_container._original_df which already contains the lazy version of the original dataframe. This avoids redundant conversion and ensures consistency.

Copilot uses AI. Check for mistakes.
Comment on lines 333 to 334
nw_df = nw.from_native(df, pass_through=False)
original_backend = nw_df.implementation
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter df is already typed as nw.LazyFrame[Any], so calling nw.from_native(df, pass_through=False) is redundant. The function should directly use df.implementation and df.collect() instead. This extra conversion is unnecessary and could potentially cause issues if the LazyFrame is already a narwhals object.

Suggested change
nw_df = nw.from_native(df, pass_through=False)
original_backend = nw_df.implementation
original_backend = df.implementation

Copilot uses AI. Check for mistakes.
LOGGER.warning(
"Expected a narwhals DataFrame, got %s", type(result)
)
return result.lazy()
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input is not a narwhals DataFrame, the function logs a warning but then calls result.lazy(), which will likely fail if result is not a narwhals object. The function should either return early after the warning or handle the non-DataFrame case more gracefully by raising an exception or attempting to convert it.

Suggested change
return result.lazy()
return result

Copilot uses AI. Check for mistakes.
Light2Dark
Light2Dark previously approved these changes Dec 5, 2025
@Light2Dark Light2Dark added the enhancement New feature or request label Dec 5, 2025
@szst11
Copy link
Contributor

szst11 commented Dec 5, 2025

@mscolnick @Light2Dark
really cool, how fast you made that change. I am impressed!

@Light2Dark
Copy link
Contributor

@mscolnick @Light2Dark really cool, how fast you made that change. I am impressed!

glad to hear it! Myles is a beast :)

@mscolnick mscolnick merged commit bbea8c7 into main Dec 5, 2025
39 of 41 checks passed
@mscolnick mscolnick deleted the ms/type-preserve branch December 5, 2025 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mo.ui.dataframe changed behaviour for ibis tables

4 participants