[SPARK-20726][SPARKR] wrapper for SQL broadcast#17965
Conversation
|
Points to discuss:
|
|
Test build #76874 has finished for PR 17965 at commit
|
|
Test build #76875 has finished for PR 17965 at commit
|
| sdf <- callJMethod(object@sdf, "alias", data) | ||
| dataFrame(sdf) | ||
| }) | ||
|
|
There was a problem hiding this comment.
nit: one empty line instead of two
| #' | ||
| #' Return a new SparkDataFrame marked as small enough for use in broadcast joins. | ||
| #' | ||
| #' Equivalent to hint(x, "broadcast). |
There was a problem hiding this comment.
\code{hint(x, "broadcast")}
| #' sumRDD <- lapply(rdd, useBroadcast) | ||
| #'} | ||
| broadcast <- function(sc, object) { | ||
| broadcast_ <- function(sc, object) { |
There was a problem hiding this comment.
please change this to broadcastRDD like other functions
There was a problem hiding this comment.
right, generally this is how we have handled name conflict with an existing RDD method.
we should be removing the internal only RDD methods at some point
|
|
||
| #' @rdname broadcast | ||
| #' @export | ||
| setGeneric("broadcast", function(x) { standardGeneric("broadcast") }) |
There was a problem hiding this comment.
this list is sorted alphabetically within this section
There was a problem hiding this comment.
there is a rd for broadcast already though https://github.com/zero323/spark/blob/397ab1f7b4b4e2b9e51b697c92e3be197fed4554/R/pkg/R/generics.R#L376
we probably need to remove that one
There was a problem hiding this comment.
this list is sorted alphabetically within this section
Looks like it used to be at some point, but these days are long gone. I can reorder it right now, but this means rearranging a whole section.
There was a problem hiding this comment.
ouch it is # and not #' on this line https://github.com/zero323/spark/blob/397ab1f7b4b4e2b9e51b697c92e3be197fed4554/R/pkg/R/generics.R#L376
let's leave the sorting for now. we really need to stick with one method
There was a problem hiding this comment.
let's fix up the sorting when 2.2.0 is released - it would help to minimize major changes for now to make it easier to merge fixes, just in case.
|
Test build #76898 has finished for PR 17965 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
I think we need to add broadcast to NAMESPACE
https://github.com/apache/spark/blob/master/R/pkg/NAMESPACE
(testthat is running inside the SparkR namespace)
|
Test build #76912 has finished for PR 17965 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
LGTM, Jenkins passed, AppVeyor passed before
|
merged to master |
## What changes were proposed in this pull request? - Adds R wrapper for `o.a.s.sql.functions.broadcast`. - Renames `broadcast` to `broadcast_`. ## How was this patch tested? Unit tests, check `check-cran.sh`. Author: zero323 <zero323@users.noreply.github.com> Closes apache#17965 from zero323/SPARK-20726.
## What changes were proposed in this pull request? - Adds R wrapper for `o.a.s.sql.functions.broadcast`. - Renames `broadcast` to `broadcast_`. ## How was this patch tested? Unit tests, check `check-cran.sh`. Author: zero323 <zero323@users.noreply.github.com> Closes apache#17965 from zero323/SPARK-20726.

What changes were proposed in this pull request?
o.a.s.sql.functions.broadcast.broadcasttobroadcast_.How was this patch tested?
Unit tests, check
check-cran.sh.