[SPARK-52448][CONNECT] Add simplified Struct Expression.Literal#51561
[SPARK-52448][CONNECT] Add simplified Struct Expression.Literal#51561heyihong wants to merge 1 commit into
Conversation
5d8e764 to
9d68335
Compare
There was a problem hiding this comment.
Thank you for this improvement. So we will contain the field name in every element, the space complexity would be O(n) compared to O(1) in the deprecated way? So what's the perf gain from this change?
There was a problem hiding this comment.
My understanding is that the space complexity in the deprecated approach is also O(n), since every element includes the field name in the struct_type field, correct?
There was a problem hiding this comment.
On second thought, it is not necessary to include the field name/data type in every element, since not setting the data_type field should be sufficient for saving space.
1db4711 to
f38223d
Compare
|
Merged to master. |
| // | ||
| // Whether data_type_struct.fields.data_type should be set depends on | ||
| // whether each field's type can be inferred from the elements field. | ||
| DataType.Struct data_type_struct = 3; |
There was a problem hiding this comment.
@heyihong sorry for the late comments, what about introducing a new message for this purpose?
for example, SimplifiedStruct or LightStruct, my feeling is that the conversion in the server side become too complex
There was a problem hiding this comment.
@zhengruifeng FYI, I simplified the implementation in this PR: #52098, but it's not necessary to introduce a new message.
…th `4.1.0-preview2` ### What changes were proposed in this pull request? This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`. ### Why are the changes needed? There are many changes from Apache Spark 4.1.0. - apache/spark#52342 - apache/spark#52256 - apache/spark#52271 - apache/spark#52242 - apache/spark#51473 - apache/spark#51653 - apache/spark#52072 - apache/spark#51561 - apache/spark#51563 - apache/spark#51489 - apache/spark#51507 - apache/spark#51462 - apache/spark#51464 - apache/spark#51442 To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`. ``` $ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git $ cd spark/sql/connect/common/src/main/protobuf/ $ protoc --swift_out=. spark/connect/*.proto $ protoc --grpc-swift_out=. spark/connect/*.proto // Remove empty GRPC files $ cd spark/connect $ grep 'This file contained no services' * catalog.grpc.swift:// This file contained no services. commands.grpc.swift:// This file contained no services. common.grpc.swift:// This file contained no services. example_plugins.grpc.swift:// This file contained no services. expressions.grpc.swift:// This file contained no services. ml_common.grpc.swift:// This file contained no services. ml.grpc.swift:// This file contained no services. pipelines.grpc.swift:// This file contained no services. relations.grpc.swift:// This file contained no services. types.grpc.swift:// This file contained no services. $ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift ``` ### Does this PR introduce _any_ user-facing change? Pass the CIs. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #250 from dongjoon-hyun/SPARK-53777. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR adds a new
data_type_structfield to the protobuf definition for struct literals in Spark Connect, addressing the ambiguity issues with the existingstruct_typefield. The changes include:Protobuf Schema Update: Added a new
data_type_structfield of typeDataType.Structto theLiteral.Structmessage inexpressions.proto, while marking the existingstruct_typefield as deprecated.Enhanced Struct Conversion Logic: Updated
LiteralValueProtoConverter.scalato:data_type_structfield when available for more precise struct type definitionstruct_typefieldWhy are the changes needed?
The current Expression.Struct literal is somewhat overcomplicated since it duplicates most of the information its fields already have. This is bulky to send over the wire, and it can be ambiguous.
Does this PR introduce any user-facing change?
No. This PR maintains backward compatibility with existing struct literal implementations. Existing code using the deprecated
struct_typefield will continue to work without modification.How was this patch tested?
build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.2.4