Skip to content

Conversation

@uplsh580
Copy link
Contributor

@uplsh580 uplsh580 commented Jan 25, 2026

Related Issues

Issue: #61033

Summary

This PR ensures that Airflow CLI commands are executed with the server process context by default.
This is achieved by setting the _AIRFLOW_PROCESS_CONTEXT environment variable to "server" within the core CLI decorators (action_cli and suppress_logs_and_warning).

Rationale

Previously, commands like airflow dag-processor -B <bundle_name> failed with an AirflowNotFoundException when initializing bundles that require database connections (e.g., GitDagBundle).

The failure occurred because:

  1. The -B flag triggers bundle validation (validate_dag_bundle_arg in _create_dag_processor_job_runner) during the command initialization phase.
    @enable_memray_trace(component=MemrayTraceComponents.dag_processor)
    @cli_utils.action_cli
    @providers_configuration_loaded
    def dag_processor(args):
    """Start Airflow Dag Processor Job."""
    job_runner = _create_dag_processor_job_runner(args)

    def _create_dag_processor_job_runner(args: Any) -> DagProcessorJobRunner:
    """Create DagFileProcessorProcess instance."""
    if args.bundle_name:
    cli_utils.validate_dag_bundle_arg(args.bundle_name)
    return DagProcessorJobRunner(
    job=Job(),
    processor=DagFileProcessorManager(
    max_runs=args.num_runs,
    bundle_names_to_parse=args.bundle_name,
    ),
    )

    def validate_dag_bundle_arg(bundle_names: list[str]) -> None:
    """Make sure only known bundles are passed as arguments."""
    known_bundles = {b.name for b in DagBundlesManager().get_all_dag_bundles()}
    unknown_bundles: set[str] = set(bundle_names) - known_bundles
    if unknown_bundles:
    raise SystemExit(f"Bundles not found: {', '.join(unknown_bundles)}")
  2. At this early stage, the server context was not yet set, causing the Task SDK to skip MetastoreBackend and fail to resolve connections from the metadata database.

By moving the context setting to the decorator level, we ensure that all CLI management commands have the necessary privileges to access database-backed secrets before any validation or execution logic runs.

Key Changes

  • airflow/utils/cli.py:
    • Set _AIRFLOW_PROCESS_CONTEXT = "server" in action_cli decorator when check_db is True.
    • Set _AIRFLOW_PROCESS_CONTEXT = "server" in suppress_logs_and_warning decorator.
  • airflow/cli/commands/connection_command.py: Removed redundant manual environment settings in connections_get and connections_test.

Test plan

  1. Configure a DAG bundle that requires a connection (e.g., GitDagBundle).
  2. Run airflow dag-processor -B <bundle_name>.
  3. Verify the command starts successfully without AirflowNotFoundException.
  4. Run existing connection commands (e.g., airflow connections get) to ensure no regressions.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
  • Gemini 3 flash

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@uplsh580
Copy link
Contributor Author

uplsh580 commented Jan 25, 2026

Validation

Issue case - airflow dag-processor -B bundle1

image

airflow dag-processor

image

airflow connections get

image

@uplsh580
Copy link
Contributor Author

Could you please verify if this approach aligns with the project's long-term architectural goals regarding component isolation and database access?

I believe this change is consistent with our future direction for the following reasons:

  1. Administrative Identity: CLI tools are inherently administrative and designed for system management. Even as we move towards greater component isolation, these management commands require a 'Server' identity to resolve credentials and initialize system components effectively.
  2. Architectural Consistency: This PR utilizes the existing _AIRFLOW_PROCESS_CONTEXT abstraction defined for Airflow 3. Centralizing this setting within the core CLI decorators ensures the correct context is established at the initial entry point, preventing early initialization failures during bundle validation.
  3. Security Isolation: Importantly, this does not compromise the isolation of user code. The actual task execution and DAG parsing remain strictly restricted, as the Task SDK's Supervisor and the DAG processor explicitly override the context to client for child processes to prevent inheriting server privileges:

Your feedback on whether this centralization is the preferred way to handle CLI context would be greatly appreciated.

@uplsh580 uplsh580 force-pushed the dagprocessor/bundle-name-flag branch from 7a9160e to f9184ba Compare January 25, 2026 12:10
@uplsh580 uplsh580 marked this pull request as ready for review January 25, 2026 13:25
@potiuk
Copy link
Member

potiuk commented Jan 26, 2026

I think we are going to get rid of _AIRFLOW_PROCESS_CONTEXT altogether - correct @amoghrajesh @bugraoz93 ? - so rather than moving it around, I would wait for completing the isolation process, where if needed we pass the necessary context to "shared" library from the calling process via some parameter during the initialization.

Some of the cli methods are shared between airflow-ctl and airflow regular CLI commands, and this will be the main distinction that will determine whether the DB calls are made or CLI calls - and then, it will not be necessary to have an enviironment variable to pass the context. This is - as I see it - a bit of a hack to make it possible to use the same code in two places - and the current shared library context is taking another - more long term - approach, similarly the initialization will be more of a deliberate process where just "importing" airflow is not going to to do much - there will be a deliberate initialization code that will depend on which component is running the initialization and passign the right "context" via this initialization to the shared code seems to be a better approach.

Maybe we can merge it as a temporary measure, but I don't see it as even mid-term sustainable approach.

@bugraoz93
Copy link
Contributor

bugraoz93 commented Jan 26, 2026

Thanks, @potiuk! You summarised it pretty clearly. We will have deprecations on the CLI and will start this pretty soon. In CLI, there will be isolation work on remote commands defined in AIP-81. Deprecation and migration are on the way. Airflow CLI will mainly be used for always server-side triggers and actions rather than anything we can safely provide from the API.

The voting will start later today or tomorrow. There are incoming dev call, and I wanted to align the voting so we can maybe go over it all together with everyone in the call to see if there are any concerns which most of the part already agreed on in AIP-81, but we now have a more concrete plan on how to achieve it in AIP-94 and will be voted from entire community.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=382175838

I would be on the safe side and won't put much effort into something we won't maintain eventually and not needed. I really like any improvement, but some can create more maintenance effort than we put into solving the actual problem that won't be there after 3-5 minor release according to how fast we can migrate, but I believe we have a good foundation to work on over CI/CD and more 🙌

For the TaskSDK side, @amoghrajesh can have a clearer answer

@amoghrajesh
Copy link
Contributor

Thanks @potiuk!

Yeah we want to get rid of using that env variable entirely in long term but for dag processor and such, it will be dependent also on AIP92 (entirely decoupling dag processor and triggerer from core and having API first communication for those too). So this would be a good temporary solution until then

@uranusjr
Copy link
Member

The logic looks right to me, but I don’t like how a decorator magically modifies the global state (an environment variable) that leaks out after it exits. If we’re to do this, the environment varialbe needs to be restored when the context ends.

@uplsh580 uplsh580 force-pushed the dagprocessor/bundle-name-flag branch from ae74302 to d3455f9 Compare January 27, 2026 10:38
@uplsh580
Copy link
Contributor Author

Validation

Issue case - airflow dag-processor -B bundle1

image

airflow dag-processor

스크린샷 2026-01-27 오후 7 54 00

airflow connections get GITHUB__SAMPLE

image

@uplsh580
Copy link
Contributor Author

@uranusjr Thanks for the feedback. I've updated the logic to ensure the environment variable is restored when the context ends, avoiding any global state leakage. I appreciate the suggestion!

@uplsh580 uplsh580 force-pushed the dagprocessor/bundle-name-flag branch from d3455f9 to fbbbc2a Compare January 27, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants