Skip to content

[SPARK-17912][SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch#15467

Closed
kiszk wants to merge 2 commits into
apache:masterfrom
kiszk:columnarrefactor
Closed

[SPARK-17912][SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch#15467
kiszk wants to merge 2 commits into
apache:masterfrom
kiszk:columnarrefactor

Conversation

@kiszk

@kiszk kiszk commented Oct 13, 2016

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR refactors the code generation part to get data from ColumnarVector and ColumnarBatch by using a trait ColumnarBatchScan for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since ColumnarBatch will be first citizen.

This PR is a part of #15219. In advance, this PR makes the code generation for ColumnarVector and ColumnarBatch reuseable as a trait. In general, this is very useful for other components from the reuseability view, too.

How was this patch tested?

tested existing test suites

@kiszk kiszk changed the title [SPARK-17192][SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch [SPARK-17912][SQL] Refactor code generation to get data for ColumnVector/ColumnarBatch Oct 13, 2016
@SparkQA

SparkQA commented Oct 13, 2016

Copy link
Copy Markdown

Test build #66897 has finished for PR 15467 at commit 5d78d01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk

kiszk commented Oct 17, 2016

Copy link
Copy Markdown
Member Author

@ericl, could you please review this? cc @davies

@ericl

ericl commented Oct 17, 2016

Copy link
Copy Markdown
Contributor

Will do

On Sun, Oct 16, 2016, 11:35 PM Kazuaki Ishizaki notifications@github.com
wrote:

@ericl https://github.com/ericl, could you please review this? cc
@davies https://github.com/davies


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15467 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6Smf05GrW4y2953NydrjUkz_gr4GWks5q0xcogaJpZM4KWFqi
.

@kiszk

kiszk commented Oct 18, 2016

Copy link
Copy Markdown
Member Author

@ericl thank you very much

@kiszk

kiszk commented Nov 29, 2016

Copy link
Copy Markdown
Member Author

ping @ericl

@kiszk

kiszk commented Dec 6, 2016

Copy link
Copy Markdown
Member Author

@andrewor14 would it be possible to review this code? Or, could you please create the similar PR from #13899?
Your original refactoring was great. In addition to that, I need to add some field for some cases.

@kiszk

kiszk commented Jan 8, 2017

Copy link
Copy Markdown
Member Author

@ericl, would it be possible to review this?

val scanTimeMetric = metricTerm(ctx, "scanTime")
val scanTimeTotalNs = ctx.freshName("scanTime")
ctx.addMutableState("long", scanTimeTotalNs, s"$scanTimeTotalNs = 0;")
val incReadBatches = if (!enableScanStatistics) "" else {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave this out from this refactoring?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I did. Even if it does not exists, it works for now.

val colVars = output.indices.map(i => ctx.freshName("colInstance" + i))
val columnAssigns = colVars.zipWithIndex.map { case (name, i) =>
ctx.addMutableState(columnVectorClz, name, s"$name = null;")
val index = if (columnIndexes == null) i else columnIndexes(i)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment: maybe we can not introduce columnIndexes for now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA

SparkQA commented Jan 18, 2017

Copy link
Copy Markdown

Test build #71597 has finished for PR 15467 at commit 9a53219.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private[sql] trait ColumnarBatchScan extends CodegenSupport {

val inMemoryTableScan: InMemoryTableScanExec = null

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is unused right?

@sameeragarwal sameeragarwal left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@davies

davies commented Jan 19, 2017

Copy link
Copy Markdown
Contributor

Merging this into master, thanks!

@asfgit asfgit closed this in 148a84b Jan 19, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ctor/ColumnarBatch

## What changes were proposed in this pull request?

This PR refactors the code generation part to get data from `ColumnarVector` and `ColumnarBatch` by using a trait `ColumnarBatchScan` for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since `ColumnarBatch` will be first citizen.

This PR is a part of apache#15219. In advance, this PR makes the code generation for  `ColumnarVector` and `ColumnarBatch` reuseable as a trait. In general, this is very useful for other components from the reuseability view, too.
## How was this patch tested?

tested existing test suites

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#15467 from kiszk/columnarrefactor.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…ctor/ColumnarBatch

## What changes were proposed in this pull request?

This PR refactors the code generation part to get data from `ColumnarVector` and `ColumnarBatch` by using a trait `ColumnarBatchScan` for ease of reuse. This is because this part will be reused by several components (e.g. parquet reader, Dataset.cache, and others) since `ColumnarBatch` will be first citizen.

This PR is a part of apache#15219. In advance, this PR makes the code generation for  `ColumnarVector` and `ColumnarBatch` reuseable as a trait. In general, this is very useful for other components from the reuseability view, too.
## How was this patch tested?

tested existing test suites

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#15467 from kiszk/columnarrefactor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants