[ci] Remove hardcoded test shards by driazati · Pull Request #10743 · apache/tvm

driazati · 2022-03-23T20:45:22Z

This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in conftest.py and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). Some tests are also manually allocated via round-robin to different shards to ensure that long-running tests run on different shards as much as possible.

This only does this for the GPU frontend tests but eventually we could expand it to more.

cc @areusch

areusch

thanks @driazati this looks pretty good! just a couple small suggestions, could also defer if you feel strongly against them.

areusch · 2022-04-06T17:53:19Z

+
+{% macro sharded_test_step(name, num_shards, node, ws) %}
+{% for shard_index in range(1, num_shards + 1) %}
+  '{{ name }} {{ shard_index }} of {{ num_shards }}': {


minor nit: want to 0-pad shard_index and num_shards here?

meh, this is intended for humans and everything is text-align: center-ed anyways so 0-padding won't make it easier to read IMO (also we are at like 3 shards max not 10 so we can revisit later)

ok sg. the main intent was that Jenkins sorts these alphabetically iiuc.

ah I see, we should still be ok on that front but would probably need to pad if we shard more than n=9

This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more.

driazati · 2022-04-08T17:57:14Z

I also verified the tests ran in the various shards vs the tests from a recent main run, the overall result of tests ran is the same so the sharding is working correctly. Code is here https://gist.github.com/f636700cd68b5717350c107a0eaaee4e for the curious.

areusch

thanks @driazati , couple small comments here and there but could defer them as they're less important than reducing CI time

areusch · 2022-04-08T18:00:16Z

+
+def pytest_collection_modifyitems(config, items):
+    if not all(k in os.environ for k in ["CI", "TVM_NUM_SHARDS", "TVM_SHARD_INDEX"]):
+        # Only apportion tests if in CI and in a job that is set up for it


nit: could log here if CI is present

areusch · 2022-04-08T18:01:20Z

+    "tests/python/topi/python/test_topi_conv2d_winograd.py::test_conv2d_nchw",
+    "tests/python/relay/test_py_converter.py::test_global_recursion",
+]
+HARDCODED_ALLOCATIONS = {}


nit: could do with dict comprehension: HARDCODED_ALLOCATIONS = {i: v for i, v in enumerate(_slowest_tests)}

This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more. Co-authored-by: driazati <driazati@users.noreply.github.com>

driazati changed the title ~~generate shard~~ [ci] Remove hardcoded test shards Mar 23, 2022

driazati marked this pull request as ready for review March 25, 2022 23:00

github-actions Bot requested a review from areusch March 25, 2022 23:03

driazati marked this pull request as draft March 29, 2022 20:25

areusch reviewed Apr 6, 2022

View reviewed changes

driazati marked this pull request as ready for review April 6, 2022 18:40

driazati requested a review from areusch April 6, 2022 18:40

driazati marked this pull request as draft April 7, 2022 19:33

driazati marked this pull request as ready for review April 8, 2022 00:58

areusch approved these changes Apr 8, 2022

View reviewed changes

areusch merged commit 0c17f07 into apache:main Apr 8, 2022

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] Remove hardcoded test shards#10743

[ci] Remove hardcoded test shards#10743
areusch merged 1 commit intoapache:mainfrom
driazati:generate_shard

driazati commented Mar 23, 2022 •

edited

Loading

Uh oh!

areusch left a comment

Uh oh!

areusch Apr 6, 2022

Uh oh!

driazati Apr 6, 2022

Uh oh!

areusch Apr 8, 2022

Uh oh!

driazati Apr 8, 2022

Uh oh!

Uh oh!

Uh oh!

driazati commented Apr 8, 2022

Uh oh!

areusch left a comment

Uh oh!

areusch Apr 8, 2022

Uh oh!

areusch Apr 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

driazati commented Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

areusch left a comment

Choose a reason for hiding this comment

Uh oh!

areusch Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

driazati Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

areusch Apr 8, 2022

Choose a reason for hiding this comment

Uh oh!

driazati Apr 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

driazati commented Apr 8, 2022

Uh oh!

areusch left a comment

Choose a reason for hiding this comment

Uh oh!

areusch Apr 8, 2022

Choose a reason for hiding this comment

Uh oh!

areusch Apr 8, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

driazati commented Mar 23, 2022 •

edited

Loading