[SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener#42664
[SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener#42664WweiL wants to merge 4 commits into
Conversation
### What changes were proposed in this pull request? Add several new test cases for streaming foreachBatch and streaming query listener events to test various scenarios. ### Why are the changes needed? More tests is better ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test only change Closes apache#42521 from WweiL/SPARK-44435-tests-foreachBatch-listener. Authored-by: Wei Liu <wei.liu@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 2d44848) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…thod ### What changes were proposed in this pull request? Add the missing field batchDuration to StreamingQueryProgress json method. Also modify tests accordingly ### Why are the changes needed? Add a missing field ### Does this PR introduce _any_ user-facing change? Probably yes - in their call to `query.lastProgress` or `query.recentProgress` and inside listener this new field will show up ### How was this patch tested? Existing unit tests Closes apache#42077 from WweiL/SPARK-44484-missing-json-field-progress. Authored-by: Wei Liu <wei.liu@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
| ("name" -> JString(name)) ~ | ||
| ("timestamp" -> JString(timestamp)) ~ | ||
| ("batchId" -> JInt(batchId)) ~ | ||
| ("batchDuration" -> JInt(batchDuration)) ~ |
There was a problem hiding this comment.
Thank you for making a new PR.
However, given the previous discussion on SPARK-44484. We cannot backport a new feature like this under a different Tests for foreachBatch and Listener title.
There was a problem hiding this comment.
If SPARK-44484 is inevitable for SPARK-44435, I'd prefer to stop backporting because we are in RC2 stage already, @WweiL .
|
@dongjoon-hyun I see. I'll close this. Thanks for the context. |
|
Thank you for closing this, @WweiL . |
… json method" This reverts commit ba49adb.
| batchId=j["batchId"], | ||
| batchDuration=j["batchDuration"], | ||
| # before spark 4.0, batchDuration is not in the json method of jvm side StreamingQueryProgress | ||
| batchDuration=j["batchDuration"] if "batchDuration" in j else None, |
There was a problem hiding this comment.
@dongjoon-hyun Hi Dongjoon, sorry for the back and forth. On second thought I actually find out that the newly added tests and the test failure in #42521 (comment) actually finds out a bug. Here before I assume batchDuration is always in the passed in json, but before 4.0 it is not there.
Given that we don't add the change in #42077, this check is needed. I reverted that commit, and add this check.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
It's totally fine. :)
If this is a bug fix, please make a new JIRA to proceed it instead of SPARK-44435 or SPARK-44484. The JIRA title should be clear that is a bug fix. We can merge that first.
|
fixed in 7be69bf |

What changes were proposed in this pull request?
Add several new test cases for streaming foreachBatch and streaming query listener events to test various scenarios.
Also merge this PR to 3.5 to fix the test error: #42077
Why are the changes needed?
More tests is better
Does this PR introduce any user-facing change?
No
How was this patch tested?
Test only change