[ML] Add bypass for graph validation by edsavage · Pull Request #3013 · elastic/ml-cpp

edsavage · 2026-03-26T03:20:03Z

Summary

Adds a --skipModelValidation command-line flag to pytorch_inference to bypass TorchScript model graph validation
When the flag is passed, the allowlist check is skipped and a warning is logged
This can be wired to an Elasticsearch cluster setting (e.g. xpack.ml.model_graph_validation.enabled) so that operators can disable validation without infrastructure access, covering all deployment types including serverless
Default behaviour (validation enabled) is unchanged

Test plan

Built and ran CModelGraphValidatorTest suite locally — all tests pass
Integration test: --skipModelValidation bypasses validation for a malicious model (PASS)
Integration test: without the flag, validation runs normally (PASS)
Integration test: benign model passes validation as before (PASS)
CI passes
ES-side: add cluster setting that passes --skipModelValidation to the native process

Provides an emergency escape hatch to bypass TorchScript model graph validation without requiring a code change or rebuild. When ML_SKIP_MODEL_VALIDATION is set (to any value), the pytorch_inference process skips the graph validator and logs a warning. Elasticsearch can set this environment variable for the native process via its ML settings, allowing operators to unblock model deployments immediately if the validator incorrectly rejects a legitimate model. Made-with: Cursor

prodsecmachine · 2026-03-26T03:20:17Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Extends the evil model integration test to verify that: - ML_SKIP_MODEL_VALIDATION=true bypasses graph validation (with warning logged) - ML_SKIP_MODEL_VALIDATION=false still validates (only exact "true" activates the bypass) Made-with: Cursor

Copilot

Pull request overview

Adds an environment-variable “kill switch” to bypass TorchScript model graph validation in pytorch_inference, plus a Python integration script intended to exercise validator behavior (including the bypass).

Changes:

Add ML_SKIP_MODEL_VALIDATION=true env-var check to skip verifySafeModel() and emit a warning.
Add a standalone Python script that generates known-malicious TorchScript models and runs pytorch_inference to confirm rejection/bypass behavior.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`bin/pytorch_inference/Main.cc`	Adds the `ML_SKIP_MODEL_VALIDATION` env-var bypass around `verifySafeModel()` with warning logging.
`test/test_pytorch_inference_evil_models.py`	Adds a standalone integration script to generate “evil” models and validate expected `pytorch_inference` behavior (including bypass).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T21:21:59Z

test/test_pytorch_inference_evil_models.py

+                generate_model(spec["class"], model_path)
+                print(f"  Model generated: {model_path.name} ({model_path.stat().st_size} bytes)")
+            except Exception as e:
+                print(f"  SKIP: could not generate model: {e}")


If TorchScript scripting fails for a model (e.g., due to Torch version differences), this test currently prints SKIP and continues, which can result in an overall PASS without having exercised the validator at all. For a security regression test, it would be safer to treat model-generation failures as a test failure (or at least fail when the expected-rejected models can’t be generated).

Suggested change

print(f" SKIP: could not generate model: {e}")

print(f" FAIL: could not generate model: {e}")

all_passed = False

Copilot · 2026-03-26T21:21:59Z

test/test_pytorch_inference_evil_models.py

+    raise FileNotFoundError(
+        "Could not find pytorch_inference binary. "
+        "Build from the feature/harden_pytorch_inference branch, or pass --binary."
+    )


This script’s requirements/error message still references building from the "feature/harden_pytorch_inference" branch. That’s likely to become stale/confusing once this change is on main; consider updating the wording to refer to a built pytorch_inference binary (or a minimum version) rather than a specific branch name.

Copilot · 2026-03-26T21:21:59Z

test/test_pytorch_inference_evil_models.py

+Requires: torch, a built pytorch_inference binary with graph validation
+          (feature/harden_pytorch_inference branch or later).


The docstring says this requires a binary built from the "feature/harden_pytorch_inference" branch. Since this file is being added to the mainline repo, consider updating this to a stable requirement (e.g., “a pytorch_inference binary built from this repo at/after ”) to avoid confusion for future readers.

Suggested change

Requires: torch, a built pytorch_inference binary with graph validation

(feature/harden_pytorch_inference branch or later).

Requires: torch, and a built pytorch_inference binary from this repository

with graph validation enabled (i.e., including the

CModelGraphValidator checks).

valeriy42

I see the reason for wanting an escape patch, but setting an environment variable is not a practical solution. You need a cluster setting and a --skipValidation flag on the pytorch_inference process.

Adds a command-line flag to bypass TorchScript model graph validation. When --skipModelValidation is passed to pytorch_inference, the allowlist check is skipped and a warning is logged. This can be wired to an Elasticsearch cluster setting (e.g. xpack.ml.model_graph_validation.enabled) so that operators can disable validation without infrastructure access, covering all deployment types including serverless. Made-with: Cursor

edsavage · 2026-03-29T21:32:58Z

Updated per Valeriy's review — replaced the ML_SKIP_MODEL_VALIDATION environment variable with a --skipModelValidation CLI flag on the pytorch_inference process.

This is the better approach because:

Elasticsearch already passes CLI args to native processes (--numThreadsPerAllocation, --validElasticLicenseKeyConfirmed, etc.)
It can be wired to a dynamic cluster setting (changeable at runtime without restart)
Works for all deployment types (self-managed, Cloud, serverless) once the ES-side setting is added
No infrastructure access needed — operators can toggle it via the ES API

The ES-side change (adding a cluster setting like xpack.ml.model_graph_validation.enabled that passes the flag) would be a separate PR in the elasticsearch repo.

Adds a dynamic node-scope setting to control TorchScript model graph validation. When set to false, the pytorch_inference process is launched with --skipModelValidation, bypassing the operation allowlist/forbidden list check. This provides an operator-accessible escape hatch for all deployment types (self-managed, Cloud, serverless) via the cluster settings API, without requiring infrastructure access or a rebuild. The setting is dynamic — changes take effect on the next model deployment without restarting the node. Companion to elastic/ml-cpp#3013 which adds the --skipModelValidation CLI flag to the pytorch_inference binary. Made-with: Cursor

Made-with: Cursor

edsavage requested review from Copilot and valeriy42 and removed request for Copilot March 26, 2026 03:47

edsavage added >non-issue :ml v9.4.0 labels Mar 26, 2026

edsavage changed the title ~~[ML] Add ML_SKIP_MODEL_VALIDATION kill switch for graph validation~~ [ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation Mar 26, 2026

edsavage requested a review from Copilot March 26, 2026 21:17

Copilot started reviewing on behalf of edsavage March 26, 2026 21:17 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

valeriy42 reviewed Mar 27, 2026

View reviewed changes

edsavage force-pushed the feature/model-validation-kill-switch branch from 242ddfd to e49d6c2 Compare March 29, 2026 21:32

edsavage mentioned this pull request Mar 29, 2026

[ML] Add xpack.ml.model_graph_validation_enabled cluster setting elastic/elasticsearch#145157

Draft

3 tasks

edsavage changed the title ~~[ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation~~ [ML] Add bypass for graph validation Mar 29, 2026

[ML] Fix clang-format

466177f

Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add bypass for graph validation#3013

[ML] Add bypass for graph validation#3013
edsavage wants to merge 4 commits intoelastic:mainfrom
edsavage:feature/model-validation-kill-switch

edsavage commented Mar 26, 2026 •

edited

Loading

Uh oh!

prodsecmachine commented Mar 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

valeriy42 left a comment

Uh oh!

edsavage commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	print(f" SKIP: could not generate model: {e}")
	print(f" FAIL: could not generate model: {e}")
	all_passed = False

		Requires: torch, a built pytorch_inference binary with graph validation
		(feature/harden_pytorch_inference branch or later).

-Requires: torch, a built pytorch_inference binary with graph validation
-          (feature/harden_pytorch_inference branch or later).
+Requires: torch, and a built pytorch_inference binary from this repository
+          with graph validation enabled (i.e., including the
+          CModelGraphValidator checks).

Conversation

edsavage commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

prodsecmachine commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

edsavage commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edsavage commented Mar 26, 2026 •

edited

Loading

prodsecmachine commented Mar 26, 2026 •

edited

Loading