[ci] Remove hardcoded test shards#10743
[ci] Remove hardcoded test shards#10743areusch merged 1 commit intoapache:mainfrom driazati:generate_shard
Conversation
|
|
||
| {% macro sharded_test_step(name, num_shards, node, ws) %} | ||
| {% for shard_index in range(1, num_shards + 1) %} | ||
| '{{ name }} {{ shard_index }} of {{ num_shards }}': { |
There was a problem hiding this comment.
minor nit: want to 0-pad shard_index and num_shards here?
There was a problem hiding this comment.
meh, this is intended for humans and everything is text-align: center-ed anyways so 0-padding won't make it easier to read IMO (also we are at like 3 shards max not 10 so we can revisit later)
There was a problem hiding this comment.
ok sg. the main intent was that Jenkins sorts these alphabetically iiuc.
There was a problem hiding this comment.
ah I see, we should still be ok on that front but would probably need to pad if we shard more than n=9
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more.
|
I also verified the tests ran in the various shards vs the tests from a recent main run, the overall result of tests ran is the same so the sharding is working correctly. Code is here https://gist.github.com/f636700cd68b5717350c107a0eaaee4e for the curious. |
|
|
||
| def pytest_collection_modifyitems(config, items): | ||
| if not all(k in os.environ for k in ["CI", "TVM_NUM_SHARDS", "TVM_SHARD_INDEX"]): | ||
| # Only apportion tests if in CI and in a job that is set up for it |
There was a problem hiding this comment.
nit: could log here if CI is present
| "tests/python/topi/python/test_topi_conv2d_winograd.py::test_conv2d_nchw", | ||
| "tests/python/relay/test_py_converter.py::test_global_recursion", | ||
| ] | ||
| HARDCODED_ALLOCATIONS = {} |
There was a problem hiding this comment.
nit: could do with dict comprehension: HARDCODED_ALLOCATIONS = {i: v for i, v in enumerate(_slowest_tests)}
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more. Co-authored-by: driazati <driazati@users.noreply.github.com>
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more. Co-authored-by: driazati <driazati@users.noreply.github.com>
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in
conftest.pyand the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). Some tests are also manually allocated via round-robin to different shards to ensure that long-running tests run on different shards as much as possible.This only does this for the GPU frontend tests but eventually we could expand it to more.
cc @areusch