Conversation
Provides an emergency escape hatch to bypass TorchScript model graph validation without requiring a code change or rebuild. When ML_SKIP_MODEL_VALIDATION is set (to any value), the pytorch_inference process skips the graph validator and logs a warning. Elasticsearch can set this environment variable for the native process via its ML settings, allowing operators to unblock model deployments immediately if the validator incorrectly rejects a legitimate model. Made-with: Cursor
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Extends the evil model integration test to verify that: - ML_SKIP_MODEL_VALIDATION=true bypasses graph validation (with warning logged) - ML_SKIP_MODEL_VALIDATION=false still validates (only exact "true" activates the bypass) Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Adds an environment-variable “kill switch” to bypass TorchScript model graph validation in pytorch_inference, plus a Python integration script intended to exercise validator behavior (including the bypass).
Changes:
- Add
ML_SKIP_MODEL_VALIDATION=trueenv-var check to skipverifySafeModel()and emit a warning. - Add a standalone Python script that generates known-malicious TorchScript models and runs
pytorch_inferenceto confirm rejection/bypass behavior.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
bin/pytorch_inference/Main.cc |
Adds the ML_SKIP_MODEL_VALIDATION env-var bypass around verifySafeModel() with warning logging. |
test/test_pytorch_inference_evil_models.py |
Adds a standalone integration script to generate “evil” models and validate expected pytorch_inference behavior (including bypass). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| generate_model(spec["class"], model_path) | ||
| print(f" Model generated: {model_path.name} ({model_path.stat().st_size} bytes)") | ||
| except Exception as e: | ||
| print(f" SKIP: could not generate model: {e}") |
There was a problem hiding this comment.
If TorchScript scripting fails for a model (e.g., due to Torch version differences), this test currently prints SKIP and continues, which can result in an overall PASS without having exercised the validator at all. For a security regression test, it would be safer to treat model-generation failures as a test failure (or at least fail when the expected-rejected models can’t be generated).
| print(f" SKIP: could not generate model: {e}") | |
| print(f" FAIL: could not generate model: {e}") | |
| all_passed = False |
| raise FileNotFoundError( | ||
| "Could not find pytorch_inference binary. " | ||
| "Build from the feature/harden_pytorch_inference branch, or pass --binary." | ||
| ) |
There was a problem hiding this comment.
This script’s requirements/error message still references building from the "feature/harden_pytorch_inference" branch. That’s likely to become stale/confusing once this change is on main; consider updating the wording to refer to a built pytorch_inference binary (or a minimum version) rather than a specific branch name.
| Requires: torch, a built pytorch_inference binary with graph validation | ||
| (feature/harden_pytorch_inference branch or later). |
There was a problem hiding this comment.
The docstring says this requires a binary built from the "feature/harden_pytorch_inference" branch. Since this file is being added to the mainline repo, consider updating this to a stable requirement (e.g., “a pytorch_inference binary built from this repo at/after ”) to avoid confusion for future readers.
| Requires: torch, a built pytorch_inference binary with graph validation | |
| (feature/harden_pytorch_inference branch or later). | |
| Requires: torch, and a built pytorch_inference binary from this repository | |
| with graph validation enabled (i.e., including the | |
| CModelGraphValidator checks). |
valeriy42
left a comment
There was a problem hiding this comment.
I see the reason for wanting an escape patch, but setting an environment variable is not a practical solution. You need a cluster setting and a --skipValidation flag on the pytorch_inference process.
Adds a command-line flag to bypass TorchScript model graph validation. When --skipModelValidation is passed to pytorch_inference, the allowlist check is skipped and a warning is logged. This can be wired to an Elasticsearch cluster setting (e.g. xpack.ml.model_graph_validation.enabled) so that operators can disable validation without infrastructure access, covering all deployment types including serverless. Made-with: Cursor
242ddfd to
e49d6c2
Compare
|
Updated per Valeriy's review — replaced the This is the better approach because:
The ES-side change (adding a cluster setting like |
Adds a dynamic node-scope setting to control TorchScript model graph validation. When set to false, the pytorch_inference process is launched with --skipModelValidation, bypassing the operation allowlist/forbidden list check. This provides an operator-accessible escape hatch for all deployment types (self-managed, Cloud, serverless) via the cluster settings API, without requiring infrastructure access or a rebuild. The setting is dynamic — changes take effect on the next model deployment without restarting the node. Companion to elastic/ml-cpp#3013 which adds the --skipModelValidation CLI flag to the pytorch_inference binary. Made-with: Cursor
Made-with: Cursor
Summary
--skipModelValidationcommand-line flag topytorch_inferenceto bypass TorchScript model graph validationxpack.ml.model_graph_validation.enabled) so that operators can disable validation without infrastructure access, covering all deployment types including serverlessTest plan
CModelGraphValidatorTestsuite locally — all tests pass--skipModelValidationbypasses validation for a malicious model (PASS)--skipModelValidationto the native process