Skip to content

[AIRFLOW-3862] Check types with mypy.#4685

Merged
ashb merged 1 commit into
apache:masterfrom
jmcarp:mypy
Mar 15, 2019
Merged

[AIRFLOW-3862] Check types with mypy.#4685
ashb merged 1 commit into
apache:masterfrom
jmcarp:mypy

Conversation

@jmcarp

@jmcarp jmcarp commented Feb 10, 2019

Copy link
Copy Markdown
Contributor

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@codecov-io

Copy link
Copy Markdown

Codecov Report

Merging #4685 into master will increase coverage by 0.05%.
The diff coverage is 91.86%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4685      +/-   ##
=========================================
+ Coverage   74.35%   74.4%   +0.05%     
=========================================
  Files         430     430              
  Lines       27962   27987      +25     
=========================================
+ Hits        20790   20823      +33     
+ Misses       7172    7164       -8
Impacted Files Coverage Δ
...rflow/contrib/operators/kubernetes_pod_operator.py 100% <ø> (ø) ⬆️
airflow/contrib/sensors/python_sensor.py 85% <ø> (-0.72%) ⬇️
airflow/operators/dummy_operator.py 100% <ø> (ø) ⬆️
airflow/operators/subdag_operator.py 90% <ø> (-0.33%) ⬇️
airflow/operators/python_operator.py 95.23% <ø> (-0.03%) ⬇️
airflow/operators/dagrun_operator.py 94.59% <ø> (-0.15%) ⬇️
airflow/contrib/executors/mesos_executor.py 0% <0%> (ø) ⬆️
...low/contrib/example_dags/example_winrm_operator.py 0% <0%> (ø) ⬆️
...contrib/example_dags/example_gcs_to_bq_operator.py 0% <0%> (ø) ⬆️
airflow/operators/check_operator.py 58.59% <100%> (+0.32%) ⬆️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59d2615...de913b8. Read the comment docs.

@codecov-io

codecov-io commented Feb 10, 2019

Copy link
Copy Markdown

Codecov Report

Merging #4685 into master will increase coverage by 0.87%.
The diff coverage is 93.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4685      +/-   ##
==========================================
+ Coverage   74.45%   75.33%   +0.87%     
==========================================
  Files         450      450              
  Lines       29023    29056      +33     
==========================================
+ Hits        21609    21888     +279     
+ Misses       7414     7168     -246
Impacted Files Coverage Δ
airflow/contrib/sensors/python_sensor.py 85% <ø> (-0.72%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 98.61% <ø> (-0.06%) ⬇️
airflow/operators/dummy_operator.py 100% <ø> (ø) ⬆️
airflow/operators/subdag_operator.py 90% <ø> (-0.33%) ⬇️
airflow/operators/python_operator.py 95.8% <ø> (-0.03%) ⬇️
airflow/operators/dagrun_operator.py 94.59% <ø> (-0.15%) ⬇️
...low/contrib/example_dags/example_winrm_operator.py 0% <0%> (ø) ⬆️
...contrib/example_dags/example_gcs_to_bq_operator.py 0% <0%> (ø) ⬆️
airflow/www/views.py 76.4% <0%> (+0.27%) ⬆️
airflow/operators/check_operator.py 60.48% <100%> (+0.32%) ⬆️
... and 53 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e220ac5...6072d5f. Read the comment docs.

Comment thread airflow/bin/cli.py Outdated
Comment thread airflow/config_templates/airflow_local_settings.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay, otherwise the example dags with only a sub-set of extras installs gets nosiy and confusing on the CLI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this on every operator? :(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we need to set types explicitly for classes that we subclass elsewhere. Defining template_fields here gives it a type of Tuple[str], which is incompatible with the type Tuple[str, str] in the subclass.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we define the type on the base class as List[str] -- it doesn't need to explicitly be a tuple, just an iterable of strings. Does that help?

(It feels like for the type to be "right" since it is used in the base class we shouldn't re-define the type in the subclass)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would work, but I've been trying not to change the code more than I have to, so I didn't want to change tuples to lists in every operator. Do you think that would be a useful change?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was thinking List was an abstract type (cos I call them arrays in my head). Maybe I mean Iterable[str] then?

Not changing code sounds like an idea

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcarp What about this comment? If we change this type to Iterable[str] can we avoid having to change code or add comment to every operator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to add type annotations for every operator--we have to add annotations on operators that are later subclassed. Changing the annotation to Iterable[str] doesn't change this. The issue is that a subclass defining template_fields as e.g. ("path", ) overrides its type from Sequence[str] to Tuple[str]; when the subclass of the subclass overrides the value to e.g. ("src_adls", "dest_gcs"), its new type of Tuple[str, str] is incompatible with Tuple[str]. We could also set the types in BaseOperator to be List[str] like we talked about earlier, but that would mean changing all operators to use lists, which I think we agree we shouldn't do.

Here's a simplified example, using python3-style annotations for convenience:

from typing import Iterable

class A:
    a: Iterable[str] = ("a", )

class B(A):
    a = ("a", "b")

class C(B):
    a = ("a", "b", "c")

This isn't valid, because the class variable gets typed as Tuple[str, str] in B but Tuple[str, str, str] in C:

error: Incompatible types in assignment (expression has type "Tuple[str, str, str]", base class "B" defined the type as "Tuple[str, str]")

If we annotate B.a with Iterable[str], then mypy passes.

Altogether, the annotations we have so far are the simplest solution that I can think of without a larger refactor: we only have to annotate variables in a handful of classes (BaseOperator and other operators that are later subclassed), and we don't have to change tuples to lists for all operators.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whaaaaat. That strikes me as a bug in mypy - since there is no type annotation on B.a it should keep the one from A. Oh well.

Thanks for the explanation.

Still, let's change for Iterable[Str] where we do have to have an annotation (over Sequence) - I reported/asked about this in python/mypy#6511

Comment thread airflow/contrib/operators/azure_container_instances_operator.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make it this:

    template_ext = (('.txt',),)

i.e. you've made it a tuple-of-tuple-of-strings

Comment thread airflow/models/__init__.py Outdated
Comment thread airflow/models/__init__.py Outdated
Comment thread airflow/www/views.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Any and not airflow.models.DagBag?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this should be type DagBag, but the else block below defines this variable as the class DagBag, not an instance, which mypy reasonably complains about. I'm actually not sure why (or if) this works. Maybe instead of setting dagbag to be a class, we should pass skip_dag_parsing to the DagBag constructor so that this variable is always an instance. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I suspect that this only "works" because this flagis only set in very few cases and most of the code path isn't exercised when it is set.

How about making the else be dagbag = models.DagBag(os.devnull, include_examples=False) to make it load no DAGs?

Comment thread setup.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 - not a fan of needing typing at run time if we can avoid it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need this for python <3.5. I can extract install_requires into a variable and append this for legacy runtimes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a built in syntax for that:

'enum34~=1.1.6;python_version<"3.4"',

Can we not use the if TYPE_CHECKING: in the code base to not even import the module at run time though?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated to install conditional on python version.

I don't see why we couldn't check a flag before importing, but what's the benefit? The typing module is trivially fast to import, it's in the standard library for versions of python that aren't soon to be EOL, and for older versions, the wheel is ~25kb.

@techalchemy techalchemy Feb 15, 2019

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, random stranger here -- I found this issue because I was trying to typecheck my own airflow implementation and found that airflow itself doesn't have published stubs, so that caused some issues. Just wanted to chime in and say that type checking actually can be pretty slow, which is why python 3.7 even introduced a __future__ import to perform lazy typechecking imports. See PEP 563 for reference.

What's more, you can't put typechecking imports behind a conditional unless you actually have typing installed in the first place to check whether the typechecker is running. The conventional approach is to just use 'typing;python_version<"3.5"' as your constraint.

The alternative approach is to do this (I actually do both personally but in theory you can pick one):

def is_type_checking():
    try:
        from typing import TYPE_CHECKING
    except ImportError:
        return False
    return TYPE_CHECKING

MYPY_RUNNING = is_type_checking()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @techalchemy. This patch installs typing conditional on python version like you described. Conditionally importing expensive modules for type checking makes sense, but as far as I can tell, the only imports I'm doing for type checking here are from the typing module: Sequence, Optional, etc. Those aren't expensive to import, and we have to import typing anyway to figure out if we're type checking, like you pointed out--so I'm not sure we need conditional imports yet. What do you think?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the function-based approach with the safety net of handling the ImportError is to provide some flexibility if it's desired -- typechecking is very slow, and by always performing the requisite imports, you basically ensure that annotations are going to be evaluated at runtime no matter what. This approach allows you a few options:

  1. You can set this up as a configuration parameter and toggle it
  2. You can make typing an extra that has a builtin failsafe if it's not installed
  3. Even if you do only what I pasted above, you ensure that annotations won't be evaluated at runtime if you put all of your typing imports inside an if MYPY_RUNNING: block.

The imports themselves aren't slow, it's the consequence of performing them that causes a performance hit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow--are you saying that importing values from typing affects performance at runtime? What are the consequences of importing those values at runtime, when we're not type checking?

@ashb ashb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing at CI would be ace.

The template_fields/exts don't seem to be typed universally. Do we need to apply it to every sub-class? Type hints aren't inherited from parent attributes?

@jmcarp jmcarp force-pushed the mypy branch 10 times, most recently from 5e376ae to afaa218 Compare February 14, 2019 04:00
@jmcarp

jmcarp commented Feb 16, 2019

Copy link
Copy Markdown
Contributor Author

@ashb: thanks for reviewing! I think I addressed your comments--sorry if I missed anything. Have time to take another look?

@mik-laj

mik-laj commented Feb 16, 2019

Copy link
Copy Markdown
Member

Why is this process taking over 4 minutes? i because of this line?
Docker-compose -f scripts/ci/docker-compose.yml pull --quiet --parallel
maybe it can be speeded up?

@jmcarp

jmcarp commented Feb 16, 2019

Copy link
Copy Markdown
Contributor Author

Thanks @mik-laj. Looks like travis was using the same install steps for all pre-test stages. I updated those stages to just install their requirements and skip docker-compose.

@jmcarp

jmcarp commented Feb 20, 2019

Copy link
Copy Markdown
Contributor Author

Conflicts resolved @ashb.

@jmcarp

jmcarp commented Feb 24, 2019

Copy link
Copy Markdown
Contributor Author

Conflicts resolved again 😅

@jmcarp

jmcarp commented Feb 28, 2019

Copy link
Copy Markdown
Contributor Author

Conflicts resolved. PTAL when you have time @ashb; cc @XD-DENG @feng-tao @Fokko. I'd like to start adding type annotations to new PRs going forward, but want to get this merged first.

@XD-DENG XD-DENG left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Comment thread airflow/contrib/operators/azure_container_instances_operator.py Outdated
@jmcarp jmcarp force-pushed the mypy branch 2 times, most recently from fcb5752 to 46cac4b Compare March 2, 2019 02:35
@ashb

ashb commented Mar 4, 2019

Copy link
Copy Markdown
Member

@feng-tao

feng-tao commented Mar 6, 2019

Copy link
Copy Markdown
Member

side question, I thought mypy doesn't work with python 2.7? Or we won't do mypy when it is in py2.7?

@jmcarp

jmcarp commented Mar 7, 2019

Copy link
Copy Markdown
Contributor Author

@ashb: switched from Sequence[str] to Iterable[str] like you suggested.
@feng-tao: python 2.7 doesn't have type annotations, so we have to define them using comments: https://mypy.readthedocs.io/en/latest/python2.html. If we drop python 2.x support in the future, we can update to use type annotations.

Comment thread airflow/models/__init__.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since the property is the "preferred" way could we name the method _get_previous_ti please? (otherwise this might show up in generated API docs and cause confusion as which one is the "recommended" one to call)

Or do you think it's okay to have both?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, revised to make the new methods private and put docstrings in the public properties.

@jmcarp jmcarp force-pushed the mypy branch 3 times, most recently from a14f72f to 7e03bba Compare March 9, 2019 03:47
@jmcarp

jmcarp commented Mar 9, 2019

Copy link
Copy Markdown
Contributor Author

Conflicts resolved.

@jmcarp

jmcarp commented Mar 11, 2019

Copy link
Copy Markdown
Contributor Author

Conflicts resolved, helper methods made private. Ready for another look when you have time @ashb.

@jmcarp

jmcarp commented Mar 15, 2019

Copy link
Copy Markdown
Contributor Author

Ping @ashb. Let me know if this needs more revisions.

@ashb

ashb commented Mar 15, 2019

Copy link
Copy Markdown
Member

Nice work @jmcarp - sorry it took so long, and thanks for sticking with it!

@ashb ashb merged commit 2c255e9 into apache:master Mar 15, 2019
ashb pushed a commit to ashb/airflow that referenced this pull request Mar 27, 2019
andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
wmorris75 pushed a commit to modmed-external/incubator-airflow that referenced this pull request Jul 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants