Skip to content

[SPARK-29777][SparkR] SparkR::cleanClosure aggressively removes a function required by user function#26429

Closed
falaki wants to merge 5 commits into
apache:masterfrom
falaki:SPARK-29777
Closed

[SPARK-29777][SparkR] SparkR::cleanClosure aggressively removes a function required by user function#26429
falaki wants to merge 5 commits into
apache:masterfrom
falaki:SPARK-29777

Conversation

@falaki

@falaki falaki commented Nov 7, 2019

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

The implementation for walking through the user function AST and picking referenced variables and functions, had an optimization to skip a branch if it had already seen it. This runs into an interesting problem in the following example

df <- createDataFrame(data.frame(x=1))
f1 <- function(x) x + 1
f2 <- function(x) f1(x) + 2
dapplyCollect(df, function(x) { f1(x); f2(x) })

Results in error:

org.apache.spark.SparkException: R computation failed with
 Error in f1(x) : could not find function "f1"
Calls: compute -> computeFunc -> f2

Why are the changes needed?

Bug fix

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests in test_utils.R

@mengxr mengxr self-requested a review November 7, 2019 23:04
@falaki

falaki commented Nov 7, 2019

Copy link
Copy Markdown
Contributor Author

@mengxr can we cherry-pick this to 2.4.x?

@SparkQA

SparkQA commented Nov 8, 2019

Copy link
Copy Markdown

Test build #113409 has finished for PR 26429 at commit 51481fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment thread R/pkg/R/utils.R
found <- sapply(funcList, function(func) {
ifelse(identical(func, obj), TRUE, FALSE)
})
if (sum(found) > 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. The change would lead to dead loop. We should make sure newEnv contains the (cleaned) node.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the dead loop case when the same function with the same environments is recursively called in the closure?

@SparkQA

SparkQA commented Nov 9, 2019

Copy link
Copy Markdown

Test build #113493 has finished for PR 26429 at commit 10925bf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Retest this please.

@SparkQA

SparkQA commented Nov 10, 2019

Copy link
Copy Markdown

Test build #113519 has finished for PR 26429 at commit 10925bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@falaki

falaki commented Nov 11, 2019

Copy link
Copy Markdown
Contributor Author

@shivaram and @felixcheung do you guys have any input on this?

@falaki

falaki commented Nov 13, 2019

Copy link
Copy Markdown
Contributor Author

@HyukjinKwon would you please take a look?

@HyukjinKwon

Copy link
Copy Markdown
Member

retest this please

Comment thread R/pkg/R/utils.R Outdated

@HyukjinKwon HyukjinKwon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. one question and one nit. Wait, I take it back. Let me take another look.

Comment thread R/pkg/R/utils.R Outdated
@SparkQA

SparkQA commented Nov 14, 2019

Copy link
Copy Markdown

Test build #113738 has finished for PR 26429 at commit 10925bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment thread R/pkg/tests/fulltests/test_utils.R Outdated
f1 <- function(x) x + 1
f2 <- function(x) f1(x) + 2
user_func <- function(x) { f1(x); f2(x) }
c_user_func_env <- environment(cleanClosure(user_func))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think naming c_user_func_env is not preferred (it was discussed here #17590 (comment)) before.
I think it should rather be cUserFuncEnv.

Google guide seems updated rapidly (https://google.github.io/styleguide/Rguide.html).. We will have to update the guide ...

@HyukjinKwon

Copy link
Copy Markdown
Member

Looks fine except two nits: #26429 (comment) and #26429 (comment)

@falaki

falaki commented Nov 14, 2019

Copy link
Copy Markdown
Contributor Author

Thanks for reviewing @HyukjinKwon I addressed your comments.

@SparkQA

SparkQA commented Nov 15, 2019

Copy link
Copy Markdown

Test build #113814 has finished for PR 26429 at commit 6ad35c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

Copy link
Copy Markdown
Member

retest this please

@SparkQA

SparkQA commented Nov 15, 2019

Copy link
Copy Markdown

Test build #113827 has finished for PR 26429 at commit 6ad35c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

HyukjinKwon commented Nov 15, 2019

Copy link
Copy Markdown
Member

The test cases should be fixed as of 65a189c

@srowen

srowen commented Nov 16, 2019

Copy link
Copy Markdown
Member

Just needs a rebase?

@felixcheung

felixcheung commented Nov 17, 2019 via email

Copy link
Copy Markdown
Member

@falaki

falaki commented Nov 18, 2019

Copy link
Copy Markdown
Contributor Author

Rebased this. Sorry for the delay.

@SparkQA

SparkQA commented Nov 18, 2019

Copy link
Copy Markdown

Test build #114031 has finished for PR 26429 at commit cc9c416.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

Copy link
Copy Markdown
Member

Let's give a try.
Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants