-
Notifications
You must be signed in to change notification settings - Fork 17.3k
Use Task Output as Extra for Dataset Trigger and DAG Run Conf #38432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5a242a7
c841ba5
fcae73f
29b7fde
0d0c5e8
63bedaa
847a824
b562990
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2425,6 +2425,7 @@ def _run_raw_task( | |
| self.test_mode = test_mode | ||
| self.refresh_from_task(self.task, pool_override=pool) | ||
| self.refresh_from_db(session=session) | ||
| result = None | ||
|
|
||
| self.job_id = job_id | ||
| self.hostname = get_hostname() | ||
|
|
@@ -2455,7 +2456,7 @@ def _run_raw_task( | |
|
|
||
| try: | ||
| if not mark_success: | ||
| self._execute_task_with_callbacks(context, test_mode, session=session) | ||
| result = self._execute_task_with_callbacks(context, test_mode, session=session) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Questions -
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding 1+2: Yes, if no result is generated, the content will be just Regarding your question 1: The publish of the event actually is happening after this line of code. The change was attemping to capture the result to put is as extra a few lines of code later from here. |
||
| if not test_mode: | ||
| self.refresh_from_db(lock_for_update=True, session=session) | ||
| self.state = TaskInstanceState.SUCCESS | ||
|
|
@@ -2543,7 +2544,7 @@ def _run_raw_task( | |
| session.add(Log(self.state, self)) | ||
| session.merge(self).task = self.task | ||
| if self.state == TaskInstanceState.SUCCESS: | ||
| self._register_dataset_changes(session=session) | ||
| self._register_dataset_changes(result, session=session) | ||
|
|
||
| session.commit() | ||
| if self.state == TaskInstanceState.SUCCESS: | ||
|
|
@@ -2553,21 +2554,30 @@ def _run_raw_task( | |
|
|
||
| return None | ||
|
|
||
| def _register_dataset_changes(self, *, session: Session) -> None: | ||
| def _register_dataset_changes(self, result: Any, *, session: Session) -> None: | ||
| if TYPE_CHECKING: | ||
| assert self.task | ||
|
|
||
| for obj in self.task.outlets or []: | ||
| self.log.debug("outlet obj %s", obj) | ||
| # Lineage can have other types of objects besides datasets | ||
| if isinstance(obj, Dataset): | ||
| if obj.extra: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we merge the event static information with dynamic information?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No we should not :-( |
||
| extra = obj.extra | ||
|
Comment on lines
+2565
to
+2566
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not think this is right. It’s made quite clear in previous issues that
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, then it seems I mis-understood your comments in the previous PR7discussions about this. Thanks for clearly documenting the differences between |
||
| elif obj.extra_from_return: | ||
| extra = result if isinstance(result, dict) else {str(self.task_id): result} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Automatically putting the value under a key is too magical to me. I would prefer this either just forward the value (the extra field is capable of storing any JSON-able values, after all), or skip the value entirely if it has an unexpected type. The task ID is also not a particularly obvious key either. |
||
| else: | ||
| extra = None | ||
| dataset_manager.register_dataset_change( | ||
| task_instance=self, | ||
| dataset=obj, | ||
| extra=extra, | ||
| session=session, | ||
| ) | ||
|
|
||
| def _execute_task_with_callbacks(self, context: Context, test_mode: bool = False, *, session: Session): | ||
| def _execute_task_with_callbacks( | ||
| self, context: Context, test_mode: bool = False, *, session: Session | ||
| ) -> Any: | ||
| """Prepare Task for Execution.""" | ||
| from airflow.models.renderedtifields import RenderedTaskInstanceFields | ||
|
|
||
|
|
@@ -2658,6 +2668,8 @@ def signal_handler(signum, frame): | |
| Stats.incr("operator_successes", tags={**self.stats_tags, "task_type": self.task.task_type}) | ||
| Stats.incr("ti_successes", tags=self.stats_tags) | ||
|
|
||
| return result | ||
|
|
||
| def _execute_task(self, context, task_orig): | ||
| """ | ||
| Execute Task (optionally with a Timeout) and push Xcom results. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| Dataset triggers now provide extra as DAG Run conf. | ||
|
|
||
| With data aware scheduling you can automate triggers between DAGs. passing context information was so far possible via | ||
| the name of the Dataset or by passing ``extra`` information to a downstream DAG. Retrieving the ``extra`` context | ||
| was only possible via the ``triggering_dataset_events`` context object and iterating over the details. With the new | ||
| version of Airflow the ``extra`` is merged into a dictionary and passed to the DAG directly as DAG Run conf so that | ||
| DAG params can directly be used. | ||
|
|
||
| See :ref:`Dataset Event Extra pushed as DAG Run Config <datasets:event-extra>` for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we put this in a separate PR to discuss. It’s not entirely obvious how extras should be merged from different events that trigger the run.