Skip to content

[BEAM-11712] Run TPC-DS via BeamSQL and Spark runner#14373

Closed
aromanenko-dev wants to merge 2 commits intoapache:masterfrom
aromanenko-dev:BEAM-11712-TPCDS-Spark
Closed

[BEAM-11712] Run TPC-DS via BeamSQL and Spark runner#14373
aromanenko-dev wants to merge 2 commits intoapache:masterfrom
aromanenko-dev:BEAM-11712-TPCDS-Spark

Conversation

@aromanenko-dev
Copy link
Copy Markdown
Contributor

@aromanenko-dev aromanenko-dev commented Mar 30, 2021

  • Make all paths configurable to make it possible to run on other systems than only Dataflow
  • Adjust build.gradle to "standard" project way and make it possible to run via SparkRunner
  • Most of the code changes are spotless and spotbugs fixes
  • Successfully tested with Query3 and SparkRunner 2

R: @iemejia
CC R: @kennknowles


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK ULR Dataflow Flink Samza Spark Twister2
Go --- --- Build Status --- Build Status ---
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@aromanenko-dev aromanenko-dev force-pushed the BEAM-11712-TPCDS-Spark branch 5 times, most recently from 0333d5d to 2cebe19 Compare March 31, 2021 15:17
@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

retest this please

@github-actions
Copy link
Copy Markdown
Contributor

The Workflow run is cancelling this PR. It is an earlier duplicate of 2173354 run.

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Java PreCommit

@aromanenko-dev aromanenko-dev force-pushed the BEAM-11712-TPCDS-Spark branch from 2cebe19 to 11c28ba Compare March 31, 2021 17:06
Copy link
Copy Markdown
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will merge manually to fix some typos and minor IntelliJ analysis warnings.

Can you please in a subsequent PR remove the README and better add this info into the website in a new webpage probably in the same section than Nexmark is, or into the wiki.

iemejia added a commit that referenced this pull request Apr 13, 2021
@iemejia
Copy link
Copy Markdown
Member

iemejia commented Apr 13, 2021

Merged now, thanks @aromanenko-dev

@iemejia iemejia closed this Apr 13, 2021
@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Can you please in a subsequent PR remove the README and better add this info into the website in a new webpage probably in the same section than Nexmark is, or into the wiki.

Yes, it was a plan to do this later. Thanks!

}
plugins { id 'org.apache.beam.module' }
applyJavaNature(
automaticModuleName: 'org.apache.beam.sdk.tpcds',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to publish this package during beam release?

def isDataflowRunner = ":runners:google-cloud-dataflow-java".equals(tpcdsRunnerDependency)
def runnerConfiguration = ":runners:direct-java".equals(tpcdsRunnerDependency) ? "shadow" : null

if (isDataflowRunner) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennknowles I'm not sure about the purpose of this test but do we consider adding runner_v2 as well?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a standard benchmark suite. Yes, it makes sense to run on runner_v2 (and any other runner configuration someone is interested in)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants