Benchmark comparing DOM load and path extractor by raganhan · Pull Request #8 · amazon-ion/ion-java-path-extraction

raganhan · 2018-10-31T20:00:33Z

See README.md changes for description of benchmarks

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tgregg · 2018-10-31T20:11:44Z

+more details.
+
+To execute the benchmarks run: `gradle --no-daemon jmh`, requires an internet connection as it downloads the data set. 
+Results bellow, higher is better. 


bellow -> below

tgregg · 2018-10-31T20:16:29Z

+            final IonReader reader = newReader(inputStream);
+            final IonWriter writer = newBinaryWriter(binaryOut)
+        ) {
+            // all data is in the `dataset` key as a list, only write out that field to keep the extractor smaller


Can you elaborate on this? It looks like the binary and text data has different structure?

The text data is created by writing the binary data as Ion text so they are the same, see line 89.

Both are different than the original dataset. The original dataset is something like:

{ <some meta field>: <value>, <other meta field>: <value>, "dataset": [ <item1>, <item2> ] }

the bulk of the data is in the dataset struct field, that code is only picking the contents of the dataset and writing them out, e.g. <item1><item2>.

I'm doing this to make the search paths less verbose but not really work it if it caused confusion. Will change the benchmark to work with the data as is.

The binary and text are the same as bytesText is created by writing out bytesBinary as IonText.

I'm extracting the dataset field from the original dataset for the benchmark test data as the bulk of the data is there, the rest is just some metadata. But thinking more on it it's probably not worth it as can cause confusion, better to have the benchmark to work on the original dataset, will change that

Ah, I see that now, thanks. I'm fine with it as-is.

tgregg · 2018-10-31T20:46:13Z

+/**
+ * Benchmarks comparing the PathExtractor with fully materializing the DOM.
+ */
+public class PathExtractorBenchmark {


It would be nice to make this pluggable for different data sets as well, since the performance of the path extractor is highly dependent on the characteristics of the data. It would be good to eventually provide results for several data sets along with a description of the data and what was skipped. This doesn't block the initial release.

Agree, opened #9 to track this

tgregg · 2018-10-31T20:46:48Z

+```
+
+Using the path extractor has equivalent performance for both text and binary when fully materializing the document and 
+can give significant performance improvements when partially materializing binary documents. This happens due to Ion's 


Benchmark comparing DOM load and path extractor

4fe97aa

raganhan requested review from tgregg and zslayton October 31, 2018 20:00

tgregg reviewed Oct 31, 2018

View reviewed changes

raganhan mentioned this pull request Oct 31, 2018

Make included benchmark pluggable for different dataset #9

Open

tgregg approved these changes Oct 31, 2018

View reviewed changes

fixing typo in Readme.md

c2cf541

raganhan merged commit 36adff2 into master Oct 31, 2018

raganhan deleted the benchmarks-6 branch October 31, 2018 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark comparing DOM load and path extractor#8

Benchmark comparing DOM load and path extractor#8
raganhan merged 2 commits into
masterfrom
benchmarks-6

raganhan commented Oct 31, 2018

Uh oh!

tgregg Oct 31, 2018

Uh oh!

tgregg Oct 31, 2018

Uh oh!

raganhan Oct 31, 2018

Uh oh!

raganhan Oct 31, 2018

Uh oh!

tgregg Oct 31, 2018

Uh oh!

tgregg Oct 31, 2018

Uh oh!

raganhan Oct 31, 2018

Uh oh!

tgregg Oct 31, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raganhan commented Oct 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants