Skip to content

Merge move join#191

Merged
andrewlawhh merged 84 commits into
mc2-project:comp-integrityfrom
andrewlawhh:merge-move-join
Apr 2, 2021
Merged

Merge move join#191
andrewlawhh merged 84 commits into
mc2-project:comp-integrityfrom
andrewlawhh:merge-move-join

Conversation

@andrewlawhh
Copy link
Copy Markdown
Collaborator

No description provided.

Andrew Law and others added 30 commits October 1, 2020 18:14
* add date_add, interval sql still running into issues

* Add Interval SQL support

* uncomment out the other tests

* resolve comments

* change interval equality

Co-authored-by: Eric Feng <fengeric11@berkeley.edu>
Refactor construction of executed DAG.
Andrew Law and others added 29 commits February 8, 2021 16:15
This PR implements the scalar subquery expression, which is triggered whenever a subquery returns a scalar value. There were two main problems that needed to be solved.

First, support for matching the scalar subquery expression is necessary. Spark implements this by wrapping a SparkPlan within the expression and calls executeCollect. Then it constructs a literal with that value. However, this is problematic for us because that value should not be decrypted by the driver and serialized into an expression, since it's an intermediate value.

Therefore, the second issue to be addressed here is supporting an encrypted literal. This is implemented in this PR by serializing an encrypted ciphertext into a base64 encoded string, and wrapping a Decrypt expression on top of it. This expression is then evaluated in the enclave and returns a literal. Note that, in order to test our implementation, we also implement a Decrypt expression in Scala. However, this should never be evaluated on the driver side and serialized into a plaintext literal. This is because Decrypt is designated as a Nondeterministic expression, and therefore will always evaluate on the workers.
* logic decoupling in TPCH.scala for easier benchmarking

* added TPCHBenchmark.scala

* Benchmark.scala rewrite

* done adding all support TPC-H query benchmarks

* changed commandline arguments that benchmark takes

* TPCHBenchmark takes in parameters

* fixed issue with spark conf

* size error handling, --help flag

* add Utils.force, break cluster mode

* comment out logistic regression benchmark

* ensureCached right before temp view created/replaced

* upgrade to 3.0.1

* upgrade to 3.0.1

* 10 scale factor

* persistData

* almost done refactor

* more cleanup

* compiles

* 9 passes

* cleanup

* collect instead of force, sf_none

* remove sf_none

* defaultParallelism

* no removing trailing/leading whitespace

* add sf_med

* hdfs works in local case

* cleanup, added new CLI argument

* added newly supported tpch queries

* function for running all supported tests
This PR is the first of two parts towards making TPC-H 16 work: the other will be implementing `is_distinct` for aggregate operations.

`BroadcastNestedLoopJoin` is Spark's "catch all" for non-equi joins. It works by first picking a side to broadcast, then iterating through every possible row combination and checking the non-equi condition against the pair.
…oject#164)

* Add in TPC-H 21

* Add condition processing in enclave code

* Code clean up

* Enable query 18

* WIP

* Local tests pass

* Apply suggestions from code review

Co-authored-by: octaviansima <34696537+octaviansima@users.noreply.github.com>

* WIP

* Address comments

* q21.sql

Co-authored-by: octaviansima <34696537+octaviansima@users.noreply.github.com>
@andrewlawhh andrewlawhh merged commit 697644b into mc2-project:comp-integrity Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants