[SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix#42686
[SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix#42686WweiL wants to merge 4 commits into
Conversation
|
Thank you, @WweiL . SPARK-44971 seems to be filed as a new feature (maybe by default value). You can switch to |
| timestamp=j["timestamp"], | ||
| batchId=j["batchId"], | ||
| batchDuration=j["batchDuration"], | ||
| batchDuration=j["batchDuration"] if "batchDuration" in j else None, |
There was a problem hiding this comment.
I'm wondering when Spark receive a JSON file without batchDuration? We are inside the same Spark versions, aren't we? Is this listener able to listen old Spark?
There was a problem hiding this comment.
So this is currently how this method is used: in spark connect, the way the listener works, is that
- The user's listener code is serialized and sent to spark server
- The server starts a scala listener, in which starts a python process (essentially as another connect client), that runs the user's code
- Each time a new event comes in, the event on java side is serialized to json and passed to server python process, which calls this
fromJsonmethod to convert it back to the actualStreamingQueryProgressobject
But before #42077 in 3.5, that field is not added in the jvm json method of StreamingQueryProgress. Here it excepts that to always be presented, hence we get an error
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
…971-fromJson-bugfix
|
Hi @HyukjinKwon can we merge this : ) |
|
Merged |
### What changes were proposed in this pull request? The `fromJson` method for `StreamingQueryProgress` excepts the field `batchDuration` is in the dict. That method is used internally for converting a json representation of `StreamingQueryProgress` into python object, commonly created in the Scala side `json` method of the same object. But the `batchDuration` field is not there before #42077, which is only merged to 4.0. Therefore we add a catch there to prevent this method from failing. ### Why are the changes needed? Necessary bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #42686 from WweiL/SPARK-44971-fromJson-bugfix. Lead-authored-by: Wei Liu <wei.liu@databricks.com> Co-authored-by: Wei Liu <z920631580@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
|
@HyukjinKwon Can we just merge this to 3.5 |
|
ah nvm I set the merge target to 3.5, should be fine |
|
Yup, merged to 3.5. |

What changes were proposed in this pull request?
The
fromJsonmethod forStreamingQueryProgressexcepts the fieldbatchDurationis in the dict.That method is used internally for converting a json representation of
StreamingQueryProgressinto python object, commonly created in the Scala sidejsonmethod of the same object.But the
batchDurationfield is not there before #42077, which is only merged to 4.0. Therefore we add a catch there to prevent this method from failing.Why are the changes needed?
Necessary bug fix
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit test
Was this patch authored or co-authored using generative AI tooling?
No