Functions should be able to have what essentially amount to hidden args (dependencies) that are not passed in as args by the user. They should be specified upon the construction of the function. That way functions can act like libraries and we can have proper abstractions.
Options
Dag dependencies
We allow resources to inject dependencies as a function-specific dags map that gets added to the function's index.
@dataclass(frozen=True)
class Resource:
uri: str
adapter: Optional[str] = None
@dataclass(frozen=True)
class Executable:
resource: Ref # => Resource
data: Optional[Ref] = None # -> Datum
dags: Dict[str, Ref] = field(default_factory=dict) # => Dag
This is the import numpy as np version of this problem.
Pros
- Users use
dml.load in their functions to access the hidden args the same way they would outside of functions. This unified interface makes the library easier to use.
- These dag dependencies become like libraries. We expect you to have a dag called "X" with "y" and "z" nodes, etc.
Cons
- The exact dependencies (the specific dag versions) are more complicated to track.
- This would restrict hidden dependencies to completed dags.
Partials
Something like functools.partial in python, but for dags. This solution seems suboptimal because you quickly end up with an explosion of args and things get annoying to deal with.
Pros
- Minimal changes to the current setup.
- Facilitates any dml object as a hidden dependency.
Cons
- The explosion of args (in the
argv node) gets very complicated to work with.
- Would require a resource wrapper or something like that -- more types.
kwargs
Functions would not only take args, but also kwargs. Simliar to partials, but with named args.
@dataclass(frozen=True)
class Resource:
uri: str
data: Optional[Ref] = None # -> Datum
adapter: Optional[str] = None
kwargs: Dict[str, Ref] = field(default_factory=dict)
and then each fndag would have a Dag.kwargs property populated accordingly.
Pros
- Very interpretable.
- Hashing is natural.
- Alleviates the
argv explosion problem.
- Facilitates any dml object as a hidden dependency.
Cons
- Working with the
kwargs object adds new syntax for the user and deviates from the typical usage.
- Specifying functions as kwargs is totally legit from a functional programming standpoint, but it's somewhat unnatural (at least for the majority of users).
Functions should be able to have what essentially amount to hidden args (dependencies) that are not passed in as args by the user. They should be specified upon the construction of the function. That way functions can act like libraries and we can have proper abstractions.
Options
Dag dependencies
We allow resources to inject dependencies as a function-specific dags map that gets added to the function's index.
This is the
import numpy as npversion of this problem.Pros
dml.loadin their functions to access the hidden args the same way they would outside of functions. This unified interface makes the library easier to use.Cons
Partials
Something like
functools.partialin python, but for dags. This solution seems suboptimal because you quickly end up with an explosion of args and things get annoying to deal with.Pros
Cons
argvnode) gets very complicated to work with.kwargs
Functions would not only take args, but also kwargs. Simliar to partials, but with named args.
and then each fndag would have a
Dag.kwargsproperty populated accordingly.Pros
argvexplosion problem.Cons
kwargsobject adds new syntax for the user and deviates from the typical usage.