Skip to content

[CI] Retry headers check and threading tests in case of failure.#2982

Merged
serban-nicusor-toptal merged 1 commit into
developfrom
ci-retry-stages
Dec 17, 2023
Merged

[CI] Retry headers check and threading tests in case of failure.#2982
serban-nicusor-toptal merged 1 commit into
developfrom
ci-retry-stages

Conversation

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor

Summary

Retry headers check and threading tests in case of failure during CI.

Tests

Side Effects

Are there any side effects that we should be aware of?
No

Release notes

Checklist

  • Copyright holder: (fill in copyright holder information)

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@serban-nicusor-toptal
Copy link
Copy Markdown
Contributor Author

Hey @WardBrian the change for headers check is quite straightforward, just retry once.
About your earlier question:

The other thing that sometimes randomly fails is the thread tests sometimes don't build.
If this is due to some sort of runner incompatibility, will the retry pick up the same runner again?

The retry in it's simplest form seems to handle simple blocks, tho it seems to be possible to use it for a stage-wide failure and retry https://community.jenkins.io/t/how-to-retry-a-jenkins-pipeline-stage-with-an-agent-condition/3667/4
TL;DR; detecting agent issues and if that's the case, find a new agent and try again.

Now while that's possible I don't think it might be best to use in our case, let me explain.
We leverage Docker to ship our CI images with all the dependencies, so irrelevant of the agent we will always run the same thing. The only case where differences can occur is when the code is trying to read the kernel or some low-level CPU instructions (docker containers share the host kernel). Rember we had an issue at the beginning with Flatiron that an Intel CPU was not supporting some CPU instructions ? (or was it a GPU?)

Knowing this, I think it might be more simple and straightforward to detect on which host the threading tests fail and simply exclude it in an agent label conditional, thus always running on the valid one.
Could these threading failures occur because of two concurrent runs on the same host ? a develop building running at the same time with a PR build.

Because of the above train of thought I put it in a simple retry block for now, please let me know what do you think about this! Thanks!

Copy link
Copy Markdown
Member

@syclik syclik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@serban-nicusor-toptal serban-nicusor-toptal merged commit 56cc817 into develop Dec 17, 2023
@WardBrian WardBrian deleted the ci-retry-stages branch August 5, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants