[SPARK-52930][CONNECT] Use DataType.Array/Map for Array/Map Literals#51653
Closed
heyihong wants to merge 1 commit into
Closed
[SPARK-52930][CONNECT] Use DataType.Array/Map for Array/Map Literals#51653heyihong wants to merge 1 commit into
heyihong wants to merge 1 commit into
Conversation
Contributor
Author
78f2bb7 to
4f64261
Compare
Contributor
There was a problem hiding this comment.
I personally feel it might be better to introduce new messages for this purpose, so that we can minimize the code changes, and the conversion logic in the server side can be more clear
Contributor
Author
There was a problem hiding this comment.
@zhengruifeng FYI, I have simplified the implementation a bit. It is possible to minimize the code changes and make the conversion logic on the server side clearer without introducing new messages.
ac9d081 to
2ca98c4
Compare
aa2f322 to
bb44b23
Compare
zhengruifeng
approved these changes
Aug 26, 2025
Contributor
|
merged to master |
dongjoon-hyun
added a commit
to apache/spark-connect-swift
that referenced
this pull request
Oct 1, 2025
…th `4.1.0-preview2` ### What changes were proposed in this pull request? This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`. ### Why are the changes needed? There are many changes from Apache Spark 4.1.0. - apache/spark#52342 - apache/spark#52256 - apache/spark#52271 - apache/spark#52242 - apache/spark#51473 - apache/spark#51653 - apache/spark#52072 - apache/spark#51561 - apache/spark#51563 - apache/spark#51489 - apache/spark#51507 - apache/spark#51462 - apache/spark#51464 - apache/spark#51442 To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`. ``` $ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git $ cd spark/sql/connect/common/src/main/protobuf/ $ protoc --swift_out=. spark/connect/*.proto $ protoc --grpc-swift_out=. spark/connect/*.proto // Remove empty GRPC files $ cd spark/connect $ grep 'This file contained no services' * catalog.grpc.swift:// This file contained no services. commands.grpc.swift:// This file contained no services. common.grpc.swift:// This file contained no services. example_plugins.grpc.swift:// This file contained no services. expressions.grpc.swift:// This file contained no services. ml_common.grpc.swift:// This file contained no services. ml.grpc.swift:// This file contained no services. pipelines.grpc.swift:// This file contained no services. relations.grpc.swift:// This file contained no services. types.grpc.swift:// This file contained no services. $ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift ``` ### Does this PR introduce _any_ user-facing change? Pass the CIs. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #250 from dongjoon-hyun/SPARK-53777. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR introduces a transition to use
DataType.ArrayandDataType.Mapfor array and map literals throughout the Spark Connect codebase for Array/Map Literals.While the Spark Connect server supports both new and old data type fields, in this change, the new data type fields are set only in ColumnNodeToProtoConverter for the Spark Connect Scala client. All other components (e.g., ML, Python) still use the old data type fields because literal values are used not only in requests but also in responses, making it difficult to maintain compatibility—clients using older versions may not recognize the new fields in the response. Deprecation and the transition to the new fields require a gradual migration. The key changes include:
Protocol Buffer Updates:
expressions.prototo add newdata_typefields forArrayandMapmessageselement_type,key_type, andvalue_typefields in favor of the unifieddata_typeapproachexpressions_pb2.py,expressions_pb2.pyi) to reflect these changesCore Implementation Changes:
LiteralValueProtoConverter.scalawith new internal methodtoLiteralProtoBuilderInternalthat acceptsToLiteralProtoOptionsLiteralExpressionProtoConverter.scalato support inference of array and map data typescolumnNodeSupport.scalato use the newtoLiteralProtoBuilderWithOptionsmethod withuseDeprecatedDataTypeFieldsset tofalseWhy are the changes needed?
The changes are needed to improve Spark's data type handling for array and map literals:
Nullability of Array/Map literals are now included in the DataType.Array/Map: This ensures that nullability information is properly captured and handled within the data type structure itself.
Work better with type inference by including all type information in one field: By consolidating all type information into a single field, it is easier to infer data types for complex data structures.
Does this PR introduce any user-facing change?
Yes. Previously, the nullability of arrays and map values using typedlit was not preserved (which I believe was a bug). It is now preserved. Please see the changes in ClientE2ETestSuite for details.
How was this patch tested?
build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite -- -z SPARK-52930"Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.4.5