Skip to content

[SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect#45915

Closed
HyukjinKwon wants to merge 1 commit into
apache:masterfrom
HyukjinKwon:SPARK-47752
Closed

[SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect#45915
HyukjinKwon wants to merge 1 commit into
apache:masterfrom
HyukjinKwon:SPARK-47752

Conversation

@HyukjinKwon

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR proposes to make pyspark.pandas compatible with pyspark-connect.

Why are the changes needed?

In order for pyspark-connect to work without classic PySpark packages and dependencies.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Yes, at #45870. Once CI is setup there, it will be tested there properly.

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon

Copy link
Copy Markdown
Member Author

cc @itholic @zhengruifeng

@itholic

itholic commented Apr 7, 2024

Copy link
Copy Markdown
Contributor

LGTM when CI pass

from pandas.core.base import PandasObject
from pandas.core.dtypes.inference import is_integer

from pyspark.ml.feature import Bucketizer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those plotting functions are actually not support in spark connect, since the underlying functions are built atop classic mllib.
we'd reimpl them to support connect. cc @xinrong-meng

@HyukjinKwon

Copy link
Copy Markdown
Member Author

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants