Skip to content

persist: mzworkflows based load test#9812

Merged
ruchirK merged 2 commits into
MaterializeInc:mainfrom
ruchirK:persist-cloud-load-test
Jan 18, 2022
Merged

persist: mzworkflows based load test#9812
ruchirK merged 2 commits into
MaterializeInc:mainfrom
ruchirK:persist-cloud-load-test

Conversation

@ruchirK
Copy link
Copy Markdown
Contributor

@ruchirK ruchirK commented Dec 30, 2021

Motivation

test: add dedicated kafka upsert benchmark
This commit adds a dedicated kafka upsert benchmark using the new mzworkflows
framework but not within the feature-benchmarks framework.

This benchmark allows users to set up a kafka topic and a corresponding upsert
source with byte keys and values and send records to it and observe the time
it takes to ingest the specified number of records.

The benchmark allows for flexibility in the following dimensions:

  • key cardinality
  • value size
  • whether or not Materialize gets restarted after each batch of inserts
  • whether or not persistence is enabled
  • how many records are sent and how record sending is shaped (e.g. one can
    send many large batches, or send one initial large batch and then many small
    batches of records)

The benchmark currently doesn't support:

  • data formats other than bytes
  • specifying the number of partitions (currently hardcoded to 4)
  • specifying the number of materialize workers

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any user-facing behavior changes.

This change is Reviewable

@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Dec 30, 2021

@danhhz the second commit automates the restart testing workflow - seed materialize with a view that counts the number of rows in an upsert source, repeatedly shut down mz, place more data in the topic, and restart that I've been doing manually / we want for the december milestone. One nice thing here is that its trivial to convert this from records with almost no duplicate keys (sampling uniformly over all u64s) to one where compaction helps. Still a fairly rough cut - just wanted to give you a heads up.

Comment thread test/persistence/kafka-sources/kafka-sources-load-test.td
@ruchirK ruchirK force-pushed the persist-cloud-load-test branch from a303f9e to 4ca7b72 Compare December 30, 2021 16:02
@ruchirK ruchirK marked this pull request as draft December 30, 2021 16:04
Comment thread test/persistence/mzworkflows.py Outdated
Comment thread test/persistence/mzworkflows.py Outdated
@ruchirK ruchirK force-pushed the persist-cloud-load-test branch 4 times, most recently from 10d77ed to 3168a59 Compare January 4, 2022 18:51
@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Jan 4, 2022

This is RFAL! @benesch / @philip-stoev this is what I had in mind for a kafka upsert benchmark to replace perf-upsert / avro_upsert / avro_ingest (I know this doesn't support avro yet but it could), and to use as a long-running load test.

My biggest open question here is "should this be done with the feature-benchmark framework instead" - I'm not currently sure. My read currently from skimming is no, but I don't have a more substantive way to back that up just yet. I'll spend the rest of today getting more familiar with the feature-benchmark framework to get a better answer to this.

@ruchirK ruchirK force-pushed the persist-cloud-load-test branch from 3168a59 to d56fceb Compare January 4, 2022 20:30
@ruchirK ruchirK marked this pull request as ready for review January 4, 2022 20:46
Comment thread test/kafka-upsert-benchmark/mzworkflows.py Outdated
Comment thread test/kafka-upsert-benchmark/mzworkflows.py Outdated
Comment thread test/kafka-upsert-benchmark/mzworkflows.py Outdated

def workflow_kafka_upsert_benchmark(w: Workflow, args: List[str]):
parser = WorkflowArgumentParser(w)
parser.add_argument("--num-steps", type=int, default=10)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With those defaults, the benchmark is measuring the time the SELECT took to run, as the data is fully ingested by the time the statement executes. So the numbers are into the low milliseconds range:

C> waiting for materialize to handle 'SELECT * FROM num_records_ingested'C> query result: 200000 after 0.004557367414236069

You may wish to consider much higher defaults, ones that will push the results far beyond the 1-second granularity of the measurements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the approach to be totally different here, but I agree with that analysis. had picked the defaults to be low to not surprise first time users - not a very good reason.

query = "SELECT * FROM num_records_ingested"
ui.progress(f"waiting for materialize to handle {query!r}", "C")
error = None
start_time = time.monotonic()
Copy link
Copy Markdown
Contributor

@philip-stoev philip-stoev Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the topic and the source are being reused across runs, Mz can start ingesting stuff while the kgen is still in progress. Obtaining start_time here will fail to take this part of the effort into account.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the restarts bit.

Comment thread test/kafka-upsert-benchmark/setup.td Outdated
# made definite, and try do so with as little overhead as possible.
> CREATE MATERIALIZED VIEW max_offsets AS SELECT
kafka_partition,
MAX(mz_offset) as max_offset FROM load_test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MAX should be considered as a "big overhead" operation, given that it will keep a list of all the mz_offsets ever seen in memory.

It may be best to simply use COUNT(*) instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I considered this in depth because count(*) is more memory efficient but doesn't offer a good way to track when we've caught up when there are duplicates (because a duplicate key won't show up in the counts twice)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with a solution that I like (basically, we track an approximate max with mz_offset / 1000 + 1 aka the mz_offset to the next thousandth offset, which lets us use a max with 0.1% of the memory overhead. The math around tracking "have we caught up" gets more complicated so I'm going to do that in a followup

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I considered this in depth because count(*) is more memory efficient but doesn't offer a good way to track when we've caught up when there are duplicates (because a duplicate key won't show up in the counts twice)

The trick I used with the feature benchmark framework is to emit a unique marker record that has not been generated before. Once I see this record in the output, I know that we have caught up to that point, regardless of any duplicates and such that may have been present in the stream prior to that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry could you say a bit more about what the feature benchmarks are computing? (do they just CREATE MATERIALIZED SOURCE ... or do they do something else?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about this scenario:

https://github.com/MaterializeInc/materialize/blob/main/test/feature-benchmark/scenarios.py#L716

Basically we ingest a bunch of key=1's followed by one key=2 . Then we measure the time it takes from CREATE MATERIALIZED SOURCE to the time it takes for the 2 to show up in the output of the SELECT.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually came up with another approach that I think works even better.

Basically, you can use mz_source_info but keep a wholly uncompacted version around, and then you just need to:

  1. look up the current time the dataflow you are testing is materialized up to
  2. look up which how much data was recorded in mz_source_info AS OF that time,

this cuts the memory requirement down by a lot because now the dataflow for the source can be a simple count(*) (according to docker stats on my laptop a test sending 10k records per second for ~120 seconds goes from ~850 MB to ~350 MB)

Comment thread test/kafka-upsert-benchmark/mzworkflows.py Outdated
),
)

if restart_mz:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first measurement reported is always 1s , which I believe is due to the way events are ordered:

  1. kgen
  2. start Mz
  3. do the SELECT
    By the time step 3 is performed, Mz has been up and running and ingesting, which causes that measurement to be incorrect. Here is the log:
> CREATE MATERIALIZED VIEW num_records_ingested AS SELECT SUM(max_offset) FROM max_offsets;
rows match; continuing at ts 1641369900.6116853
testdrive completed successfully.
Killing kafka-upsert-benchmark_materialized_1 ... done
Creating kafka-upsert-benchmark_kgen_run ... done
Using 8 threads...
Starting kafka-upsert-benchmark_materialized_1 ... done
C> waiting for dbname=materialize host=localhost port=56467 user=materialize password=postgres to handle 'SELECT 1'success!
C> waiting for materialize to handle 'SELECT * FROM num_records_ingested'C> materialize has not ingested 200000 records, got: 149700
C> query result: 200000 after 1.0029516108334064
success!

So Mz has ingested 149700 records already by the time the SELECT is issued.

Comment thread test/kafka-upsert-benchmark/mzworkflows.py Outdated
@philip-stoev
Copy link
Copy Markdown
Contributor

With respect to this vs. the feature benchmark, the feature benchmark currently does not integrate kgen and the attendant flexibility in generating data -- $ kafka-ingest is used instead. I hope to remedy this shortcoming in due course.

That said, the feature benchmark has benefited from a period of intense traumatic head-banging on my part that has eliminated some sources of incorrect measurements:

  • the granularity of the measurements is hopefully equal to the timestamp-frequency=100ms, as the framework itself does not do any sleeps;
  • a fresh source is used for each measurement, meaning no accumulation of previous data in the memory of the upsert operator
  • some care has been taken to measure the entirety of the ingestion process, all the way from the CREATE SOURCE to the SELECT returning the correct output.
  • the KafkaRecovery feature benchmark is structured so that it measures the true time it takes to recover a given static source - all the ingestion happens at the start of the benchmark run and is not interleaved with the recovery.

@benesch
Copy link
Copy Markdown
Contributor

benesch commented Jan 6, 2022

With respect to this vs. the feature benchmark, the feature benchmark currently does not integrate kgen and the attendant flexibility in generating data -- $ kafka-ingest is used instead. I hope to remedy this shortcoming in due course.

That said, the feature benchmark has benefited from a period of intense traumatic head-banging on my part that has eliminated some sources of incorrect measurements:

  • the granularity of the measurements is hopefully equal to the timestamp-frequency=100ms, as the framework itself does not do any sleeps;
  • a fresh source is used for each measurement, meaning no accumulation of previous data in the memory of the upsert operator
  • some care has been taken to measure the entirety of the ingestion process, all the way from the CREATE SOURCE to the SELECT returning the correct output.
  • the KafkaRecovery feature benchmark is structured so that it measures the true time it takes to recover a given static source - all the ingestion happens at the start of the benchmark run and is not interleaved with the recovery.

I'm not entirely sure that I understand whether you're advocating for using the feature benchmark to measure this or not. :D

On a quick glance it seems like we're only using the most basic of kgen features here, so probably could use kafka-ingest instead without too much trouble?

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Jan 6, 2022

I think there's been some confusion around the context of this PR. Part of persist's dec milestone goals is the following two measurements:

  • A long-running load test to measure the worst-case steady-state overhead of persisting a source vs not. Specifically, take a source that does as little work as possible in the source pipeline (e.g. no avro decoding etc) and pipe it into a simple materialized view. Now do the same thing while persisting the source on another machine. For each of these, measure the throughput of the source, cpu and memory usage, (maybe) the delay between writing to the source and the materialized view being updated, and how these change over time.

    Ideally, this would be an envelope=append-only source to avoid even the upsert work, but at the moment the only source we support is kafka with envelope=upsert, so that's our temporary compromise for the initial version of this benchmark. We'll run both of these weekly as part of the release process and eyeball the difference and whether persist has a regression.

    This one almost by definition can't be a feature benchmark because it's not "do this thing and see how long it takes" so much as "run this indefinitely so we can examine the behavior".

  • A real-world user-scale measure of persist's fast restart speedup. We wrote in the milestone doc that this would just be something that the persist team periodically runs manually and writes down the results (perhaps in a spreadsheet, nothing fancy). At some point @benesch mentioned that it'd be totally reasonable to have this be run by the release engineer and for them to be the one to write down the results.

    One idea we had for doing this was to run exactly the current chbench release benchmark and periodically restart it. We could also do the same for a non-persisted chbench and compare. Sadly, the chbench envelope is one we don't yet support, so we decided to just do something similar but with upsert. I think that's where this PR came from: Ruchir's intuition that this idea and the above bullet could both be done by one piece of code. (Note that getting an idea of how the restart performance changes over time as data accumulates was a fun bonus, not an essential part of the original goal.)

    I'm starting to forget my timelines, but I'm reasonably sure all of these conversations happened before I found out about @philip-stoev's feature benchmarks. They almost certainly happened before the KafkaRecovery scenario was added to them. TBH, KafkaRecovery is 90% of what we wanted for this milestone goal and maybe even already better than this "periodically restart a release loadtest" idea. We've even been talking about adding the feature benchmarks to the release process, anyway. I think the only changes we'd want to make it 100% perfect is bigger data and making the schema and duplicate key distribution less arbitrary (i.e. ideally chbench). Philip has asked me a couple times to let him know any requests for the feature benchmarks, I'm curious if this would be a good fit.

I think one takeaway here is that, while at some point the interim, temporary solutions we've discussed look very similar, the ideal end goals are actually different enough that maybe we don't want to make one piece of code do both.

Perhaps the first benchmark should be something that's in spirit closer to demo/chbench (though I think it's fine to have the load generator talk directly to kafka instead of mysql). (I haven't looked closely at this PR yet, but it's possible that if we pull out some of the complexity, it's basically this?)

Then the second (maybe?) works well as a big big feature benchmark.

@philip-stoev
Copy link
Copy Markdown
Contributor

This one almost by definition can't be a feature benchmark because it's not "do this thing and see how long it takes" so much as "run this indefinitely so we can examine the behavior".

Yes true that, if you would like to measure CPU utilization and memory and such, have Graphana, etc. the feature benchmark would not be the right vehicle for the time being.

We've even been talking about adding the feature benchmarks to the release process, anyway.

Yes, this will be happening shortly. The ducks are all aligning to make it possible

I think the only changes we'd want to make it 100% perfect is bigger data and making the schema and duplicate key distribution less arbitrary (i.e. ideally chbench).

I will work on getting the schema and the key distribution more realistic.

@philip-stoev
Copy link
Copy Markdown
Contributor

On a quick glance it seems like we're only using the most basic of kgen features here, so probably could use kafka-ingest instead without too much trouble?

I think I can have kgen do the data generation within the context of the feature benchmark.

@philip-stoev
Copy link
Copy Markdown
Contributor

I'm starting to forget my timelines, but I'm reasonably sure all of these conversations happened before I found out about @philip-stoev's feature benchmarks.

Apologies for any duplication of effort, I should have kept myself more informed about the developments within the persistence team.

@ruchirK ruchirK force-pushed the persist-cloud-load-test branch from d56fceb to 2d31ee8 Compare January 12, 2022 07:37
@ruchirK ruchirK requested a review from philip-stoev January 12, 2022 07:48
@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Jan 12, 2022

I've reworked this benchmark to be more in line with the "we want to observe the behavior of ingesting stuff over time" and be a lot simpler PTAL! It's basically a full rewrite of the non-boilerplate parts of mzcompose.py which is why I squashed the changes.

Now, it takes in a records_per_second throughput argument, for the target rate of insertions to kafka, and a num_seconds to do the insertions for. The benchmark is careful to try to match the requested throughput to the best of its abilities (it obviously will fall over if the requested throughput is too high). At the same time, the benchmark tracks the delta between the number of records Materialize has fully ingested and the number of records that have been inserted into Kafka. There's more details in the new commit message.

Comment thread test/kafka-upsert-benchmark/mzcompose.py Outdated
@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Jan 12, 2022

I ran this ~30 times just now using configurations like

./mzcompose run kafka-ingest-open-loop --num-seconds 100 --records-per-second 200000 --enable-persistence

before learning that all of my results are probably invalid because docker for mac isn't the best. I'll redo everything on a linux box but clear learnings here:

  • This benchmark is able to suss out big performance differences between having persistence enabled vs. not. Without persistence, a single worker mz instance with timestamp frequency=1s can handle up to 400k records / second (approximately 200 mbps) for about 50 seconds before totally falling over
  • In contrast, Materialize with persistence enabled and all of the same other parameters falls over at ~200k records / second (because the RAM usage spikes up much much faster. I didn't write down the exact values but I will when I re-run on linux)
  • The takeaway here is not so much the absolute performance claims (because again docker for mac), but the fact that the benchmark does its job of letting us make relative claims about the performance with vs without persistence.

@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Jan 13, 2022

Ok. Running on a r5a.4xlarge with 125 GB RAM (same config as our release test boxes)

running:

./mzcompose run kafka-ingest-open-loop --num-seconds 100 --records-per-second 100000

(100k QPS for 100 seconds) on a single worker MZ instance produces:

...
C> after 96.926s sent 9301084 records, and ingested 9015031. max observed lag 7442337 records, most recent lag 286053 records
C> after 100.885s sent 9692592 records, and ingested 9409841. max observed lag 7442337 records, most recent lag 282751 records
C> after 103.830s sent 10000000 records, and ingested 9794605. max observed lag 7442337 records, most recent lag 205395 records
C> after 105.202s sent 10000000 records, and ingested 9794605. max observed lag 7442337 records, most recent lag 205395 records
C> Finished after 106.091s sent and ingested 10000000 records. max observed lag 7442337 records.

I'm eliding the other output to make this more clear. docker stats running separately shows us that this instance took at peak 2.655GiB of RAM and used at peak 127.74% CPU. (docker stats lets us capture the history of cpu/mem usage over time as well and I will open a second PR incorporating that into this load test)

In contrast, running with single worker + --enable-persistence:

...
C> after 83.936s sent 6912163 records, and ingested 4454170. max observed lag 2457993 records, most recent lag 2457993 records
C> after 98.637s sent 8393562 records, and ingested 4454170. max observed lag 3939392 records, most recent lag 3939392 records
C> after 115.272s sent 9863689 records, and ingested 4987390. max observed lag 4876299 records, most recent lag 4876299 records
C> after 120.200s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 121.538s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 122.865s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 124.212s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 125.539s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 126.867s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 128.203s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 129.531s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 130.900s sent 10000000 records, and ingested 5537784. max observed lag 4876299 records, most recent lag 4462216 records
C> after 132.239s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 133.589s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 134.923s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 136.259s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 137.590s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 138.921s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 140.255s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 141.630s sent 10000000 records, and ingested 6912163. max observed lag 4876299 records, most recent lag 3087837 records
C> after 142.973s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 144.300s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 145.639s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 146.989s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 148.318s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 149.655s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 151.001s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 152.334s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> after 153.683s sent 10000000 records, and ingested 8401438. max observed lag 4876299 records, most recent lag 1598562 records
C> Finished after 154.129s sent and ingested 10000000 records. max observed lag 4876299 records.

During this run MZ took at peak 16.25GiB of RAM and used 230.52% CPU. This lag corresponds to being ~48 seconds behind the input.

In contrast, single worker Materialize without persistence can handle 200kqps just fine:

Running:

./mzcompose run kafka-ingest-open-loop --num-seconds 100 --records-per-second 200000

gives us:

C> after 95.084s sent 15684111 records, and ingested 15684111. max observed lag 459605 records, most recent lag 0 records
C> after 110.409s sent 19016736 records, and ingested 18702948. max observed lag 459605 records, most recent lag 313788 records
C> after 117.025s sent 20000000 records, and ingested 19764390. max observed lag 459605 records, most recent lag 235610 records
C> Finished after 117.918s sent and ingested 20000000 records. max observed lag 459605 records.

(note that this lag corresponds to being ~2.25 seconds behind), using 4.35GiB RAM.

@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Jan 13, 2022

Other notes:

  • for the actual release tests, kgen should run on a different box than the mz instance, as the increased resource utilization of Materialize with persistence affects its ability to generate load in a timely manner. Will handle this in a separate PR when I plug this benchmark into the release benchmarks
  • kgen taps out at ~240k QPS on a r5a.4xlarge box (also running single worker MZ without persistence. Concretely, it took 78 seconds to produce 18.6MM messages. If we want to really saturate MZ with 4+ workers in short timescales (like 100 seconds) we'll need more power than that. I believe for now we're more interested in how materialize behaves over hours, and I think a slower ingest rate is sufficient to test interesting cases for that.

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Jan 13, 2022

This is really cool! I like how it turned out. I haven't looked at the code yet, but I'll do that today

You've mentioned docker stats in a couple places in slack, IMO let's hold off on doing anything with that for V1 of this. I'd much rather just hook it up to prometheus+grafana like the rest of the release benchmarks to start. We can always get that in async (and I'll want grafana even if we have the docker stats stuff).

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Jan 13, 2022

As for how we tune this in the release benchmark, there's sort of 2 major things we could be investigating wrt the diff between persist on and off. One is a workload they both can handle relatively easily and looking at the diff in cpu and memory usage. The other is the max sustained rate they each can handle on the same hardware. We should discuss in the sync which we want, but my inclination is to start with the former. In my experience, behavior under near failure conditions is its own set of issues and the steady state differences tends to be an easier place to start.

@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Jan 13, 2022

Ack on docker stats. I'm not attached to it, it just seemed like a portable way to grab the requested cpu/memory utilization

Re: what to run in release tests -- yeah that makes sense to me. I was thinking something like 10k qps running over the order of hours?

I think we should test at max capacity manually maybe once a week or something like that and track results in a spreadsheet

@danhhz
Copy link
Copy Markdown
Contributor

danhhz commented Jan 13, 2022

I very much like the docker stats idea in that it's something we can stick in the benchmark output and copy paste for easy comparison. I just wanted to be clear that even if we have docker stats, I'll still want grafana and that if we have grafana, we don't need docker stats to start :)

Re: what to run in release tests -- yeah that makes sense to me. I was thinking something like 10k qps running over the order of hours?

I think we should test at max capacity manually maybe once a week or something like that and track results in a spreadsheet

These sound good to me modulo bikeshedding the cadence of the latter. My gut says weekly will be too frequently for something manual. Let's punt on that bikeshed for now though

Copy link
Copy Markdown
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! you might want someone familiar with python to stamp this too. i skimmed over the pg8000 usage, for example

<null>

> CREATE SOURCE load_test
FROM KAFKA BROKER '${testdrive.kafka-addr}' TOPIC 'testdrive-load-test-${testdrive.seed}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we hardcode -1 here to match the hardcoded value in send_records?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - i think its better to keep the knowledge that the seed is hardcoded to 1 contained within mzcompose.py so that the testdrive file doesn't have to change if its user changes their behavior

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I don't follow. isn't this the only user of this script?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thats true but sometimes I'll use the testdrive file within a workflow to manually do what the workflow is doing. I've definitely done that in the past with the exactly-once-sinks test, and I could see doing something like this in order to have measure mz with instruments on my laptop. When running in that kind of environment its nice to be able to change the seed so you don't have to tear down the kafka topic as you iterate on stuff

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely a niche usecase. My second thought here is that using the seed is idiomatic testdrive usage, and its actually a hack that we hardcode the seed in the mzcompose.py, and perhaps instead that code should invoke testdrive with consistent_seed=True and have a method that lets someone get the seed. I think the second thought is more convincing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially related: I'd like to very soon move this sort of stuff out of testdrive entirely. It should be possible to write this all in the Python workflow file. E.g.:

kafka = c.kafka_client("kafka")
kafka.create_topic(topic="foo", partitions=4)
c.sql("CREATE SOURCE...")
c.sleep(5)
c.sql("...")

Then you wouldn't need to worry about the seed at all.

Comment thread test/kafka-ingest-open-loop/mzcompose.py Outdated
if row is None or len(row) != 1 or row[0] is None:
return 0
return int(row[0])
except Exception as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except Exception is bad practice because it'll catch all sorts of things besides database errors. Can you figure out what more specific exception pg8000 is raising and catch that instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. it was an InterfaceError.

I had to import it like this:

from pg8000 import InterfaceError # type: ignore

to make pycheck happy.

Comment thread test/kafka-ingest-open-loop/mzcompose.py Outdated
Comment thread test/kafka-ingest-open-loop/mzcompose.py
conn.autocommit = True

cursor = conn.cursor()
cursor.execute("SELECT * FROM load_test_materialization_frontier")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to replace most of this with Composition.sql if you change that method to return the cursor. (Or add a new method like sql_cursor that returns a ready-to-use cursor, and use it in Composition.sql and here.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. added sql_cursor() in a separate commit.

@ruchirK ruchirK force-pushed the persist-cloud-load-test branch from f2c648d to 1367da5 Compare January 14, 2022 20:39
@philip-stoev philip-stoev removed their request for review January 17, 2022 10:00
@ruchirK ruchirK force-pushed the persist-cloud-load-test branch from 1367da5 to 0396836 Compare January 18, 2022 16:52
This commit adds a dedicated kafka upsert benchmark using the new mzworkflows
framework but not within the feature-benchmarks framework.

This benchmark is an open loop benchmark that takes as its main arguments
- a desired QPS rate for messages to send to Kafka
- the number of seconds to write for.

It's open loop in the sense that the rate of message generation / insertion
to Kafka is decoupled as much as possible from the rate at which messages are
ingested into Materialize. We intentionally don't wait for Materialize to catch
up to the last set of messages before sending the next set of messages.

The benchmark allows for flexibility in the following other dimensions:
- key cardinality
- value size
- whether or not persistence is enabled

Given the following inputs the benchmark sends approximately `records_per_second`
messages to Kafka per second and corrects for delays caused by slow input generation
or slow querying. It also queries Materialize and tracks how many records Materialize
has ingested, and reports the lag relative to the number of messages already inserted.

The benchmark currently doesn't support:
- data formats other than bytes
- specifying the number of partitions (currently hardcoded to 4)
- specifying the number of materialize workers
@ruchirK ruchirK force-pushed the persist-cloud-load-test branch from 0396836 to 9a0fbd5 Compare January 18, 2022 19:24
@ruchirK
Copy link
Copy Markdown
Contributor Author

ruchirK commented Jan 18, 2022

TFTR! merging on green

@ruchirK ruchirK enabled auto-merge January 18, 2022 19:31
@ruchirK ruchirK merged commit f2367e6 into MaterializeInc:main Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants