[SPARK-44484][SS]Add batchDuration to StreamingQueryProgress json method#42077
[SPARK-44484][SS]Add batchDuration to StreamingQueryProgress json method#42077WweiL wants to merge 2 commits into
Conversation
|
Perhaps reasonable, but I think this should be Spark 4.0 only cc @HeartSaVioR FYI |
|
Would you mind explain your judgement for only 4.0? Is it because we cut the branch for 3.5 already, or do you think this is a sort of breaking change? Despite of being custom metrics, we have been adding fields for minor releases, so I'd be surprised if the case is latter. |
|
@HeartSaVioR My judgment is based on the following: The If there is anything wrong, please correct me |
|
OK so that's former, which I agree. We should make an exception if this is somehow tied to Spark connect. |
|
OK ~ |
|
@LuciferYang @HeartSaVioR Thanks guys. This is not connect related IMO. Ongoing connect streaming listener still work without this change. I think it's fine to be 4.0 only Just got pinged why they can't see that field in pyspark and realized no one ever add this to the json method... |
|
Not sure why these tests are failing, checking |
|
CI failures don't seem to be related. |
|
Thanks! Merging to master. |
…thod ### What changes were proposed in this pull request? Add the missing field batchDuration to StreamingQueryProgress json method. Also modify tests accordingly ### Why are the changes needed? Add a missing field ### Does this PR introduce _any_ user-facing change? Probably yes - in their call to `query.lastProgress` or `query.recentProgress` and inside listener this new field will show up ### How was this patch tested? Existing unit tests Closes apache#42077 from WweiL/SPARK-44484-missing-json-field-progress. Authored-by: Wei Liu <wei.liu@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
### What changes were proposed in this pull request? The `fromJson` method for `StreamingQueryProgress` excepts the field `batchDuration` is in the dict. That method is used internally for converting a json representation of `StreamingQueryProgress` into python object, commonly created in the Scala side `json` method of the same object. But the `batchDuration` field is not there before #42077, which is only merged to 4.0. Therefore we add a catch there to prevent this method from failing. ### Why are the changes needed? Necessary bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #42686 from WweiL/SPARK-44971-fromJson-bugfix. Lead-authored-by: Wei Liu <wei.liu@databricks.com> Co-authored-by: Wei Liu <z920631580@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Add the missing field batchDuration to StreamingQueryProgress json method. Also modify tests accordingly
Why are the changes needed?
Add a missing field
Does this PR introduce any user-facing change?
Probably yes - in their call to
query.lastProgressorquery.recentProgressand inside listener this new field will show upHow was this patch tested?
Existing unit tests