[AIRFLOW-3862] Check types with mypy.#4685
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4685 +/- ##
=========================================
+ Coverage 74.35% 74.4% +0.05%
=========================================
Files 430 430
Lines 27962 27987 +25
=========================================
+ Hits 20790 20823 +33
+ Misses 7172 7164 -8
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #4685 +/- ##
==========================================
+ Coverage 74.45% 75.33% +0.87%
==========================================
Files 450 450
Lines 29023 29056 +33
==========================================
+ Hits 21609 21888 +279
+ Misses 7414 7168 -246
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
This needs to stay, otherwise the example dags with only a sub-set of extras installs gets nosiy and confusing on the CLI.
There was a problem hiding this comment.
Do we need to do this on every operator? :(
There was a problem hiding this comment.
Seems like we need to set types explicitly for classes that we subclass elsewhere. Defining template_fields here gives it a type of Tuple[str], which is incompatible with the type Tuple[str, str] in the subclass.
There was a problem hiding this comment.
What if we define the type on the base class as List[str] -- it doesn't need to explicitly be a tuple, just an iterable of strings. Does that help?
(It feels like for the type to be "right" since it is used in the base class we shouldn't re-define the type in the subclass)
There was a problem hiding this comment.
I think that would work, but I've been trying not to change the code more than I have to, so I didn't want to change tuples to lists in every operator. Do you think that would be a useful change?
There was a problem hiding this comment.
Oh I was thinking List was an abstract type (cos I call them arrays in my head). Maybe I mean Iterable[str] then?
Not changing code sounds like an idea
There was a problem hiding this comment.
@jmcarp What about this comment? If we change this type to Iterable[str] can we avoid having to change code or add comment to every operator?
There was a problem hiding this comment.
We don't have to add type annotations for every operator--we have to add annotations on operators that are later subclassed. Changing the annotation to Iterable[str] doesn't change this. The issue is that a subclass defining template_fields as e.g. ("path", ) overrides its type from Sequence[str] to Tuple[str]; when the subclass of the subclass overrides the value to e.g. ("src_adls", "dest_gcs"), its new type of Tuple[str, str] is incompatible with Tuple[str]. We could also set the types in BaseOperator to be List[str] like we talked about earlier, but that would mean changing all operators to use lists, which I think we agree we shouldn't do.
Here's a simplified example, using python3-style annotations for convenience:
from typing import Iterable
class A:
a: Iterable[str] = ("a", )
class B(A):
a = ("a", "b")
class C(B):
a = ("a", "b", "c")This isn't valid, because the class variable gets typed as Tuple[str, str] in B but Tuple[str, str, str] in C:
error: Incompatible types in assignment (expression has type "Tuple[str, str, str]", base class "B" defined the type as "Tuple[str, str]")
If we annotate B.a with Iterable[str], then mypy passes.
Altogether, the annotations we have so far are the simplest solution that I can think of without a larger refactor: we only have to annotate variables in a handful of classes (BaseOperator and other operators that are later subclassed), and we don't have to change tuples to lists for all operators.
There was a problem hiding this comment.
Whaaaaat. That strikes me as a bug in mypy - since there is no type annotation on B.a it should keep the one from A. Oh well.
Thanks for the explanation.
Still, let's change for Iterable[Str] where we do have to have an annotation (over Sequence) - I reported/asked about this in python/mypy#6511
There was a problem hiding this comment.
Does this make it this:
template_ext = (('.txt',),)
i.e. you've made it a tuple-of-tuple-of-strings
There was a problem hiding this comment.
Why Any and not airflow.models.DagBag?
There was a problem hiding this comment.
I agree that this should be type DagBag, but the else block below defines this variable as the class DagBag, not an instance, which mypy reasonably complains about. I'm actually not sure why (or if) this works. Maybe instead of setting dagbag to be a class, we should pass skip_dag_parsing to the DagBag constructor so that this variable is always an instance. What do you think?
There was a problem hiding this comment.
Oh I suspect that this only "works" because this flagis only set in very few cases and most of the code path isn't exercised when it is set.
How about making the else be dagbag = models.DagBag(os.devnull, include_examples=False) to make it load no DAGs?
There was a problem hiding this comment.
-1 - not a fan of needing typing at run time if we can avoid it.
There was a problem hiding this comment.
We only need this for python <3.5. I can extract install_requires into a variable and append this for legacy runtimes.
There was a problem hiding this comment.
There's a built in syntax for that:
Line 294 in 69e9e97
Can we not use the if TYPE_CHECKING: in the code base to not even import the module at run time though?
There was a problem hiding this comment.
Thanks, updated to install conditional on python version.
I don't see why we couldn't check a flag before importing, but what's the benefit? The typing module is trivially fast to import, it's in the standard library for versions of python that aren't soon to be EOL, and for older versions, the wheel is ~25kb.
There was a problem hiding this comment.
hey, random stranger here -- I found this issue because I was trying to typecheck my own airflow implementation and found that airflow itself doesn't have published stubs, so that caused some issues. Just wanted to chime in and say that type checking actually can be pretty slow, which is why python 3.7 even introduced a __future__ import to perform lazy typechecking imports. See PEP 563 for reference.
What's more, you can't put typechecking imports behind a conditional unless you actually have typing installed in the first place to check whether the typechecker is running. The conventional approach is to just use 'typing;python_version<"3.5"' as your constraint.
The alternative approach is to do this (I actually do both personally but in theory you can pick one):
def is_type_checking():
try:
from typing import TYPE_CHECKING
except ImportError:
return False
return TYPE_CHECKING
MYPY_RUNNING = is_type_checking()There was a problem hiding this comment.
Thanks @techalchemy. This patch installs typing conditional on python version like you described. Conditionally importing expensive modules for type checking makes sense, but as far as I can tell, the only imports I'm doing for type checking here are from the typing module: Sequence, Optional, etc. Those aren't expensive to import, and we have to import typing anyway to figure out if we're type checking, like you pointed out--so I'm not sure we need conditional imports yet. What do you think?
There was a problem hiding this comment.
The point of the function-based approach with the safety net of handling the ImportError is to provide some flexibility if it's desired -- typechecking is very slow, and by always performing the requisite imports, you basically ensure that annotations are going to be evaluated at runtime no matter what. This approach allows you a few options:
- You can set this up as a configuration parameter and toggle it
- You can make typing an extra that has a builtin failsafe if it's not installed
- Even if you do only what I pasted above, you ensure that annotations won't be evaluated at runtime if you put all of your typing imports inside an
if MYPY_RUNNING:block.
The imports themselves aren't slow, it's the consequence of performing them that causes a performance hit.
There was a problem hiding this comment.
I don't follow--are you saying that importing values from typing affects performance at runtime? What are the consequences of importing those values at runtime, when we're not type checking?
ashb
left a comment
There was a problem hiding this comment.
Typing at CI would be ace.
The template_fields/exts don't seem to be typed universally. Do we need to apply it to every sub-class? Type hints aren't inherited from parent attributes?
5e376ae to
afaa218
Compare
|
@ashb: thanks for reviewing! I think I addressed your comments--sorry if I missed anything. Have time to take another look? |
|
Why is this process taking over 4 minutes? i because of this line? |
|
Thanks @mik-laj. Looks like travis was using the same install steps for all pre-test stages. I updated those stages to just install their requirements and skip docker-compose. |
|
Conflicts resolved @ashb. |
|
Conflicts resolved again 😅 |
fcb5752 to
46cac4b
Compare
|
https://github.com/apache/airflow/pull/4685/files#r262035094 still to do/respond to. |
|
side question, I thought mypy doesn't work with python 2.7? Or we won't do mypy when it is in py2.7? |
|
@ashb: switched from |
There was a problem hiding this comment.
Actually, since the property is the "preferred" way could we name the method _get_previous_ti please? (otherwise this might show up in generated API docs and cause confusion as which one is the "recommended" one to call)
Or do you think it's okay to have both?
There was a problem hiding this comment.
I agree, revised to make the new methods private and put docstrings in the public properties.
a14f72f to
7e03bba
Compare
|
Conflicts resolved. |
|
Conflicts resolved, helper methods made private. Ready for another look when you have time @ashb. |
|
Ping @ashb. Let me know if this needs more revisions. |
|
Nice work @jmcarp - sorry it took so long, and thanks for sticking with it! |
Make sure you have checked all steps below.
Jira
Description
Tests
Commits
Documentation
Code Quality
flake8