Skip to content

[AIRFLOW-3885] Improve Travis CI cycle time#4703

Closed
drewsonne wants to merge 2 commits into
apache:masterfrom
drewsonne:feature/nose-fast-fail
Closed

[AIRFLOW-3885] Improve Travis CI cycle time#4703
drewsonne wants to merge 2 commits into
apache:masterfrom
drewsonne:feature/nose-fast-fail

Conversation

@drewsonne

@drewsonne drewsonne commented Feb 13, 2019

Copy link
Copy Markdown
Contributor

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

  • Remove the "install" action on the "pre-test" stage to avoid performing lengthy Docker pulls to perform pre-checks

  • Set nosetests to return on any test failures

    • Given the lengthy runtime of the airflow CI test suites, if any tests fail, we should fail immediately and return the failed test locally. Users can run the tests locally to get full lists of failed tests.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    This change pre-test changes

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

No changes to documented code

Code Quality

  • Passes flake8

@drewsonne drewsonne force-pushed the feature/nose-fast-fail branch 4 times, most recently from 7cc6f40 to f2aa084 Compare February 13, 2019 08:21
@BasPH

BasPH commented Feb 13, 2019

Copy link
Copy Markdown
Contributor

In my experience stopping a CI on first error leads to people fixing a single test and resubmitting, in the end resulting in more CI runs. I fully agree the CI time is beyond anything normal, but not sure this is the way to go.

@drewsonne drewsonne force-pushed the feature/nose-fast-fail branch from f2aa084 to 977af3d Compare February 13, 2019 19:32
@codecov-io

codecov-io commented Feb 13, 2019

Copy link
Copy Markdown

Codecov Report

Merging #4703 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4703      +/-   ##
==========================================
- Coverage   79.12%   79.12%   -0.01%     
==========================================
  Files         489      489              
  Lines       30688    30688              
==========================================
- Hits        24282    24281       -1     
- Misses       6406     6407       +1
Impacted Files Coverage Δ
airflow/models/taskinstance.py 93.02% <0%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6afb12f...038f780. Read the comment docs.

@drewsonne

drewsonne commented Feb 14, 2019

Copy link
Copy Markdown
Contributor Author

@BasPH I would guess the amount of time taken to run the entire test suite, fail, run subsequent tests, and return fail information back to the developer would still be shorter in aggregate, even if each test run only failed on the first test case.

I'm not disagreeing with your suggestion, but I'm not seeing another way out at the moment.

That said... py27 won't be supported at the end of the year, so maybe time to stop thinking about supporting it, and then we can remove those py27 test runs ^_^.

Comment thread scripts/ci/5-run-tests.sh Outdated
@BasPH

BasPH commented Feb 14, 2019

Copy link
Copy Markdown
Contributor

+1 for Python3 only, although some Airflow users still run on Python2.7 so unfortunately I'm pretty certain it has to last until end of life...

Some other suggestions (all take quite some work):

  • Split Airflow into core & extras (all operators, hooks, etc.) so we can test separate
  • Remove Python2
  • Create pre-built test environments (unsure how Docker and Travis work together?) to avoid installing all CI dependencies every single run
  • Make sensible decision about which tests should be ran against multiple database types and which are okay to be tested only once

I hope some day there's time for all this 😛

@drewsonne

Copy link
Copy Markdown
Contributor Author

@ffinfo @BasPH I would suggest that the single test fix per run is happening anyway in the PR’s.

Are you guys open to at least trying it for a while and seeing if it makes things better?

Sent with GitHawk

@drewsonne

drewsonne commented Feb 14, 2019

Copy link
Copy Markdown
Contributor Author

@BasPH as you say, each will take a long time. And the long cycle will make them longer as well. It’s a chicken and egg problem. :-)

I struggle with getting work done on airflow due to the difficulty in setting up a dev environment. I still only half have a docker env like the one in the test env’s, so I end up relying on the test cases running in travis. Furthermore reading the logs to find which tests failed takes a non trivial amount of time, and then trying to figure out what more elements of the env setup I need to run a particular test take more time.

The main drive of the work that I’m doing is towards your first suggestion. I’m working on the plugin architecture to split the airflow code into modular sections to let us split it out. In the meantime, it would be nice not to have to wait 2 hours if my tests passed. :-)

Sent with GitHawk

@ashb

ashb commented Feb 14, 2019

Copy link
Copy Markdown
Member

I'll raise/amend my AIP-3: Dropping support for Python2 to be "Drop official support for Python2 in Airflow 2.0". 2.0 seems like a nice time to do this.

@drewsonne

Copy link
Copy Markdown
Contributor Author

@ashb Is airflow 2.0 slate before end of the year?

@ashb

ashb commented Feb 14, 2019

Copy link
Copy Markdown
Member

There is currently no planned/talked about release date for 2.0. It's possible/likely to be before end of year

@mik-laj

mik-laj commented Feb 15, 2019

Copy link
Copy Markdown
Member
  • Make sensible decision about which tests should be ran against multiple database types and which are okay to be tested only once

@BasPH It's a brillant idea. Most operators, hooks, etc do not require a database. All operators for GCP also count in this. We can divide the tests into two categories: core and external integration (Not to be confused with integration tests). Dividing will be easy because we look at the structure of directories. Some operators that can not do without a database will be markers as core tests, but it will be information that they are written incorrectly. In the future, we will be able to improve them..

I propose from the creation of the AIrflowTestCase/CoreAirflowTestCase class. CoreAIrflowTestCase will run external integration tests only when the AIRFLOW_TEST_SUBSET environment variable equals to "core".

Another idea is to use the decorator, but I think that such a class would be needed anyway. Another idea is to use a decorator, but I think that such a class would be useful anyway. I do not like the idea that every test would have to have a decorator, or run too many times. The idea that all tests have a parent in common sounds logical to me.

@drewsonne

Copy link
Copy Markdown
Contributor Author

@ffinfo @BasPH @ashb any issues with the remaining changes in this PR?

@mik-laj

mik-laj commented Feb 17, 2019

Copy link
Copy Markdown
Member

Not about this PR, but generally.
We have 2340 tests in total. These tests take 1667 seconds
We have a lot of tests that take a long time. 319 tests take over 1 second. In total, these tests take 900 seconds.
13 percent of tests of the longest running tests take almost 50 percent of all tests.
From what I noticed, one of the problems is time.sleep command in the tests.

@drewsonne

Copy link
Copy Markdown
Contributor Author

@mik-laj Did you happen to keep a list of those tests?

@mik-laj

mik-laj commented Feb 18, 2019

Copy link
Copy Markdown
Member

@drewsonne Travis has this list.
CI Job: https://travis-ci.org/apache/airflow/jobs/493174627
Start line: 5755
Someone has noticed my comment and is already improving something
#4726
I have private PR: mik-laj#1 I work there when I have free moments over it

@drewsonne drewsonne force-pushed the feature/nose-fast-fail branch 2 times, most recently from ef164cd to 856b48f Compare February 19, 2019 07:14
@Fokko

Fokko commented Feb 19, 2019

Copy link
Copy Markdown
Contributor

I would like to release Airflow 2.0 this year. And I agree with Bas that stopping the CI after the first failed tests is a bit wasteful. Since the environment is already built and up and running, lets run the full test suite.

@drewsonne

drewsonne commented Feb 21, 2019

Copy link
Copy Markdown
Contributor Author

@Fokko fair enough! I've updated the PR not to stop on failed errors and to override the install actions in travis, to we don't do the docker pulls in every build.

@mik-laj has some of the same changes from this PR in #4740 as well

@drewsonne drewsonne force-pushed the feature/nose-fast-fail branch 3 times, most recently from f7f94d4 to 005f1ca Compare February 24, 2019 08:40
@drewsonne drewsonne force-pushed the feature/nose-fast-fail branch from 005f1ca to 55aef3b Compare March 7, 2019 19:26
@drewsonne drewsonne force-pushed the feature/nose-fast-fail branch from 55aef3b to d575d60 Compare April 3, 2019 05:59

acquire_rat_jar () {
declare -r TMP_DIR=/tmp
declare -r rat_jar="${TMP_DIR}"/lib/apache-rat-${RAT_VERSION}.jar

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe double quote after TMP_DIR var should be at the end of the line.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perhaps more conventional there but this works fine

$ echo "$HOME"
/Users/ash
$ echo "$HOME"def
/Users/ashdef

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm aware it works, but looking at the fact that there is furhter on ${RAT_VERSION} I would rather expect the whole declare -r rat_jar= to be double quoted.

@drewsonne

Copy link
Copy Markdown
Contributor Author

@ashb @ffinfo any other changes?

@stale

stale Bot commented Sep 3, 2019

Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale Bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 3, 2019
@stale stale Bot closed this Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants