Skip to content

function dependencies #46

@amniskin

Description

@amniskin

Functions should be able to have what essentially amount to hidden args (dependencies) that are not passed in as args by the user. They should be specified upon the construction of the function. That way functions can act like libraries and we can have proper abstractions.

Options

Dag dependencies

We allow resources to inject dependencies as a function-specific dags map that gets added to the function's index.

@dataclass(frozen=True)
class Resource:
    uri: str
    adapter: Optional[str] = None

@dataclass(frozen=True)
class Executable:
    resource: Ref  # => Resource
    data: Optional[Ref] = None  # -> Datum
    dags: Dict[str, Ref] = field(default_factory=dict)  # => Dag

This is the import numpy as np version of this problem.

Pros

  1. Users use dml.load in their functions to access the hidden args the same way they would outside of functions. This unified interface makes the library easier to use.
  2. These dag dependencies become like libraries. We expect you to have a dag called "X" with "y" and "z" nodes, etc.

Cons

  1. The exact dependencies (the specific dag versions) are more complicated to track.
  2. This would restrict hidden dependencies to completed dags.

Partials

Something like functools.partial in python, but for dags. This solution seems suboptimal because you quickly end up with an explosion of args and things get annoying to deal with.

Pros

  1. Minimal changes to the current setup.
  2. Facilitates any dml object as a hidden dependency.

Cons

  1. The explosion of args (in the argv node) gets very complicated to work with.
  2. Would require a resource wrapper or something like that -- more types.

kwargs

Functions would not only take args, but also kwargs. Simliar to partials, but with named args.

@dataclass(frozen=True)
class Resource:
    uri: str
    data: Optional[Ref] = None  # -> Datum
    adapter: Optional[str] = None
    kwargs: Dict[str, Ref] = field(default_factory=dict)

and then each fndag would have a Dag.kwargs property populated accordingly.

Pros

  1. Very interpretable.
  2. Hashing is natural.
  3. Alleviates the argv explosion problem.
  4. Facilitates any dml object as a hidden dependency.

Cons

  1. Working with the kwargs object adds new syntax for the user and deviates from the typical usage.
  2. Specifying functions as kwargs is totally legit from a functional programming standpoint, but it's somewhat unnatural (at least for the majority of users).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions