Skip to content

[SPARK-44484][SS]Add batchDuration to StreamingQueryProgress json method#42077

Closed
WweiL wants to merge 2 commits into
apache:masterfrom
WweiL:SPARK-44484-missing-json-field-progress
Closed

[SPARK-44484][SS]Add batchDuration to StreamingQueryProgress json method#42077
WweiL wants to merge 2 commits into
apache:masterfrom
WweiL:SPARK-44484-missing-json-field-progress

Conversation

@WweiL

@WweiL WweiL commented Jul 19, 2023

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add the missing field batchDuration to StreamingQueryProgress json method. Also modify tests accordingly

Why are the changes needed?

Add a missing field

Does this PR introduce any user-facing change?

Probably yes - in their call to query.lastProgress or query.recentProgress and inside listener this new field will show up

How was this patch tested?

Existing unit tests

@LuciferYang

Copy link
Copy Markdown
Contributor

Perhaps reasonable, but I think this should be Spark 4.0 only

cc @HeartSaVioR FYI

@HeartSaVioR

Copy link
Copy Markdown
Contributor

Would you mind explain your judgement for only 4.0? Is it because we cut the branch for 3.5 already, or do you think this is a sort of breaking change?

Despite of being custom metrics, we have been adding fields for minor releases, so I'd be surprised if the case is latter.

@LuciferYang

LuciferYang commented Jul 20, 2023

Copy link
Copy Markdown
Contributor

@HeartSaVioR My judgment is based on the following:

The Type of this pr is as a New Feature, and I did not find the Jira number in the exception list: https://lists.apache.org/thread/119vxmysrpdhtxv8hdrmb1ob8klb6ows

If there is anything wrong, please correct me

@HeartSaVioR

Copy link
Copy Markdown
Contributor

OK so that's former, which I agree. We should make an exception if this is somehow tied to Spark connect.

@LuciferYang

Copy link
Copy Markdown
Contributor

OK ~

@WweiL

WweiL commented Jul 20, 2023

Copy link
Copy Markdown
Contributor Author

@LuciferYang @HeartSaVioR Thanks guys. This is not connect related IMO. Ongoing connect streaming listener still work without this change. I think it's fine to be 4.0 only

Just got pinged why they can't see that field in pyspark and realized no one ever add this to the json method...

@WweiL

WweiL commented Jul 20, 2023

Copy link
Copy Markdown
Contributor Author

Not sure why these tests are failing, checking

@HeartSaVioR

Copy link
Copy Markdown
Contributor

CI failures don't seem to be related.
https://github.com/WweiL/oss-spark/actions/runs/5603998745/job/15209877836

@HeartSaVioR HeartSaVioR left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HeartSaVioR

Copy link
Copy Markdown
Contributor

Thanks! Merging to master.

WweiL added a commit to WweiL/oss-spark that referenced this pull request Aug 24, 2023
…thod

### What changes were proposed in this pull request?

Add the missing field batchDuration to StreamingQueryProgress json method. Also modify tests accordingly

### Why are the changes needed?

Add a missing field

### Does this PR introduce _any_ user-facing change?

Probably yes - in their call to `query.lastProgress` or `query.recentProgress` and inside listener this new field will show up

### How was this patch tested?

Existing unit tests

Closes apache#42077 from WweiL/SPARK-44484-missing-json-field-progress.

Authored-by: Wei Liu <wei.liu@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
HyukjinKwon pushed a commit that referenced this pull request Aug 31, 2023
### What changes were proposed in this pull request?

The `fromJson` method for `StreamingQueryProgress` excepts the field `batchDuration` is in the dict.
That method is used internally for converting a json representation of `StreamingQueryProgress` into python object, commonly created in the Scala side `json` method of the same object.

But the `batchDuration` field is not there before #42077, which is only merged to 4.0. Therefore we add a catch there to prevent this method from failing.

### Why are the changes needed?

Necessary bug fix

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42686 from WweiL/SPARK-44971-fromJson-bugfix.

Lead-authored-by: Wei Liu <wei.liu@databricks.com>
Co-authored-by: Wei Liu <z920631580@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants