Skip to content

Feature: Plugin for podman support#1024

Open
WillButAgain wants to merge 1 commit intoProject-MONAI:developfrom
WillButAgain:feat/add-podman-support
Open

Feature: Plugin for podman support#1024
WillButAgain wants to merge 1 commit intoProject-MONAI:developfrom
WillButAgain:feat/add-podman-support

Conversation

@WillButAgain
Copy link

@WillButAgain WillButAgain commented Mar 10, 2026

Summary:

  1. Add Podman container runtime plugin as an alternative to Docker for GPU-accelerated container execution, using CDI (Container Device Interface) for NVIDIA GPU passthrough
  2. Use Podman's native libpod API (/libpod/containers/create) for container creation, since the Docker compat API does not support CDI device identifiers — it treats DeviceMapping.PathOnHost as a literal file path rather than a CDI device like nvidia.com/gpu=all
  3. Docker.DotNet is still used for all other operations (image pull, container start/inspect/kill/logs) via the Docker compat API, which works correctly for those endpoints

Summary by CodeRabbit

  • New Features

    • Added Podman container task plugin enabling workflow tasks to execute in Podman containers.
    • Added container lifecycle management with status monitoring and timeout handling.
    • Added artifact management supporting input, output, and intermediate volume mounting.
    • Added container image management with pulling and existence verification.
  • Tests

    • Added comprehensive unit tests for Podman plugin and container monitoring.

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Walkthrough

This pull request introduces Podman plugin support for the Workflow Manager, enabling task execution via Podman containers. It includes a complete plugin implementation with container creation, lifecycle monitoring, artifact management, and comprehensive unit tests.

Changes

Cohort / File(s) Summary
Configuration & Constants
src/Common/Miscellaneous/ValidationConstants.cs, src/TaskManager/TaskManager/PluginStrings.cs, src/TaskManager/Plug-ins/Podman/Keys.cs, src/TaskManager/Plug-ins/Podman/Strings.cs
Added Podman configuration keys, validation constants, plugin identifier, and Docker state/MIME type strings; extended NotificationValues and ValidTaskTypes to include PodmanTaskType.
Solution & Project Setup
src/Monai.Deploy.WorkflowManager.sln, src/TaskManager/Plug-ins/Podman/Monai.Deploy.WorkflowManager.TaskManager.Podman.csproj, src/TaskManager/TaskManager/Monai.Deploy.WorkflowManager.TaskManager.csproj, src/TaskManager/TaskManager/packages.lock.json
Added Podman plugin project to solution with Debug/Release configurations and nested project hierarchy; added project reference and package dependencies (Docker.DotNet).
Core Podman Plugin
src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs, src/TaskManager/Plug-ins/Podman/IPodmanClientFactory.cs, src/TaskManager/Plug-ins/Podman/IPodmanContainerCreator.cs
Implemented main PodmanPlugin class with task execution, status polling, and container lifecycle management; added factories for Docker client creation and container specification via Podman libpod API with Unix socket support.
Container Monitoring & Volumes
src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs, src/TaskManager/Plug-ins/Podman/ContainerVolumeMount.cs
Added container status monitor with polling, artifact upload, and callback messaging; introduced immutable volume mount container with source, host, and container path mappings.
Exception & Assembly
src/TaskManager/Plug-ins/Podman/ContainerMonitorException.cs, src/TaskManager/Plug-ins/Podman/SetPermissionException.cs, src/TaskManager/Plug-ins/Podman/AssemblyInfo.cs
Added custom exception types for container monitoring and permission errors; configured assembly visibility to expose internal members to test assembly.
Logging
src/TaskManager/Plug-ins/Podman/Logging/Log.cs
Implemented structured logging façade with 25+ LoggerMessage methods covering initialization, container lifecycle, artifact operations, monitoring, and error events.
DI Integration
src/TaskManager/TaskManager/Extensions/TaskManagerExtensions.cs
Registered Podman components (IPodmanClientFactory, IPodmanContainerCreator, IContainerStatusMonitor) in dependency injection container; replaced generic container status monitor with Docker-specific binding.
Unit Tests
tests/UnitTests/TaskManager.Podman.Tests/PodmanPluginTest.cs, tests/UnitTests/TaskManager.Podman.Tests/ContainerStatusMonitorTest.cs, tests/UnitTests/TaskManager.Podman.Tests/Monai.Deploy.WorkflowManager.TaskManager.Podman.Tests.csproj, tests/UnitTests/TaskManager.Podman.Tests/Usings.cs
Added comprehensive test suite with 15+ test cases validating plugin initialization, container execution, status polling, timeout handling, artifact uploads, and callback messaging; configured test project with xUnit and Moq dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Workflow Client
    participant Plugin as PodmanPlugin
    participant Factory as IPodmanClientFactory
    participant Creator as IPodmanContainerCreator
    participant Docker as Docker API
    participant Storage as Storage Service

    Client->>Plugin: ExecuteTask(TaskDispatchEvent)
    Plugin->>Storage: Download input artifacts
    Storage-->>Plugin: Artifacts ready
    Plugin->>Factory: CreateClient(podmanEndpoint)
    Factory->>Docker: Create DockerClient
    Docker-->>Factory: Client instance
    Plugin->>Creator: CreateContainerAsync(spec)
    Creator->>Docker: POST /libpod/containers/create
    Docker-->>Creator: ContainerId
    Plugin->>Docker: StartContainer(containerId)
    Docker-->>Plugin: Container started
    Plugin-->>Client: ExecutionStatus.Accepted
Loading
sequenceDiagram
    participant Monitor as ContainerStatusMonitor
    participant Docker as Docker API
    participant Storage as Storage Service
    participant Broker as Message Broker
    participant FileSystem as FileSystem

    Monitor->>Docker: Poll InspectContainer()
    loop Until container exits or timeout
        Docker-->>Monitor: Container state
        Monitor->>Monitor: Check if completed
    end
    Monitor->>FileSystem: Read output artifacts
    FileSystem-->>Monitor: File list
    Monitor->>Storage: Upload each artifact
    Storage-->>Monitor: Upload confirmed
    Monitor->>Broker: Publish TaskCallbackEvent
    Broker-->>Monitor: Published
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A podman hops into the fold,
Containers dancing, stories told,
Artifacts swiftly mount and soar,
New workflows bloom forevermore! 🐳✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.28% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description lacks adherence to the repository's template structure. It is missing the required sections: Status checkbox, Types of changes checkboxes, and other standard template elements, making it difficult to assess completeness against the template. Reformat the description to follow the provided template with all required sections (Status, Types of changes, Documentation comments, etc.) and checkboxes properly filled out.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Feature: Plugin for podman support' clearly and concisely describes the main change: adding a new Podman plugin. It is specific, relevant, and accurately reflects the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: will tepe <will.tepe@cchmc.org>
@WillButAgain WillButAgain force-pushed the feat/add-podman-support branch from 7273860 to f7c40bc Compare March 10, 2026 13:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (7)
src/TaskManager/Plug-ins/Podman/IPodmanContainerCreator.cs (1)

68-68: Consider documenting or making the Podman API version configurable.

The API version v4.0.0 is hardcoded. Consider adding a comment noting the minimum supported Podman version, or making this configurable for environments with different Podman versions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/IPodmanContainerCreator.cs` at line 68, The
hardcoded Podman API version in the POST URL inside the container creation call
should be made configurable or documented; update the code in the method that
calls httpClient.PostAsync (in IPodmanContainerCreator implementation where
"/v4.0.0/libpod/containers/create" is used) to pull the API prefix from a
configurable setting (e.g., constructor-injected podmanApiVersion or options)
and fall back to a sensible default, or add an inline comment stating the
minimum supported Podman version if you choose not to make it configurable;
ensure the identifier (podmanApiVersion / PodmanOptions) is used to build the
request path consistently across methods that call the Podman HTTP API.
src/TaskManager/Plug-ins/Podman/Keys.cs (1)

21-24: Update documentation comments to reference Podman instead of Docker.

Several XML comments reference "Docker" but this is the Podman plugin:

  • Line 22: "Docker server" → "Podman server"
  • Line 37: "docker run" → "podman run"
  • Line 82: "Docker workflow" → "Podman workflow"
📝 Proposed documentation fixes
         /// <summary>
-        /// Key for the endpoint where the Docker server is running.
+        /// Key for the endpoint where the Podman server is running.
         /// </summary>
         public static readonly string BaseUrl = "server_url";

         /// <summary>
-        /// Key for specifying the user to the container. Same as -u argument for docker run.
+        /// Key for specifying the user to the container. Same as -u argument for podman run.
         /// </summary>
         public static readonly string User = "user";

         /// <summary>
-        /// Required arguments to run the Docker workflow.
+        /// Required arguments to run the Podman workflow.
         /// </summary>

Also applies to: 36-39, 82-83

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/Keys.cs` around lines 21 - 24, The XML
documentation in the Podman plugin contains Docker-specific wording; update the
comments in the Keys class (e.g., the BaseUrl field and other XML comments
around lines referencing "docker run" and "Docker workflow") to say "Podman"
instead—replace "Docker server" with "Podman server", "docker run" with "podman
run", and "Docker workflow" with "Podman workflow" so the comments accurately
describe the Podman plugin and its keys.
src/TaskManager/Plug-ins/Podman/Logging/Log.cs (1)

41-48: Consider renaming Docker* methods to Podman* for consistency.

Methods like DockerInputMapped, DockerOutputMapped, DockerEnvironmentVariableAdded, and DockerIntermediateVolumeMapped use "Docker" prefix in a Podman plugin. While functionally correct, this may cause confusion during maintenance.

Also applies to: 65-66

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/Logging/Log.cs` around lines 41 - 48, Rename
the logging methods to use the Podman prefix instead of Docker to match the
plugin context: change DockerInputMapped to PodmanInputMapped,
DockerOutputMapped to PodmanOutputMapped, DockerEnvironmentVariableAdded to
PodmanEnvironmentVariableAdded, and DockerIntermediateVolumeMapped to
PodmanIntermediateVolumeMapped (keep the LoggerMessage attributes and
EventId/Message text unless message wording must change). Update all call sites
and any unit tests or references to these symbols to the new method names so
compilation succeeds.
src/TaskManager/Plug-ins/Podman/ContainerVolumeMount.cs (1)

17-17: Unused import: Ardalis.GuardClauses is not used.

The code uses ArgumentNullException.ThrowIfNull and ThrowIfNullOrWhiteSpace instead of Ardalis guard clauses.

🧹 Proposed fix
-using Ardalis.GuardClauses;
-
 namespace Monai.Deploy.WorkflowManager.TaskManager.Podman
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/ContainerVolumeMount.cs` at line 17, Remove
the unused Ardalis.GuardClauses using directive from ContainerVolumeMount.cs:
delete the line "using Ardalis.GuardClauses;" and ensure no other code in the
ContainerVolumeMount class or related methods (e.g., constructors or validation
helpers) reference Ardalis guard methods; rely on the existing
ArgumentNullException.ThrowIfNull/ThrowIfNullOrWhiteSpace calls as currently
implemented.
tests/UnitTests/TaskManager.Podman.Tests/PodmanPluginTest.cs (2)

195-240: This test doesn't observe the background failure it describes.

Start throws from a fire-and-forget task, so the exception is off the awaited code path and these assertions still pass even if Start is never invoked. Add a Verify or TaskCompletionSource so the monitor invocation is observed deterministically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/UnitTests/TaskManager.Podman.Tests/PodmanPluginTest.cs` around lines
195 - 240, The test
ExerciseTask_WhenFailedToMonitorContainer_ExpectTaskToBeAccepted does not
observe the background exception because ContainerStatusMonitor.Start is invoked
fire-and-forget; update the test to deterministically observe the monitor call
by either (a) replacing the current
_containerStatusMonitor.Setup(...).ThrowsAsync(...) with a setup that signals a
TaskCompletionSource when Start is invoked and then throws inside a Task
returned from Start so the test can await the TCS and assert the invocation, or
(b) add a Verify on _containerStatusMonitor.Verify(m => m.Start(...),
Times.Once()) after ExecuteTask to ensure Start was called; reference the mocked
member _containerStatusMonitor and the Start method and coordinate with a
TaskCompletionSource in the test to ensure the background failure is observed
before asserting result.

376-495: Add a non-zero exited/dead regression here.

The GetStatus suite covers paused, OOM-killed, and successful exit, but not Status=exited|dead with a non-zero exit code. That missing case is exactly what lets the current implementation report failed containers as Succeeded.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/UnitTests/TaskManager.Podman.Tests/PodmanPluginTest.cs` around lines
376 - 495, Add a new unit test in the PodmanPlugin GetStatus suite that
simulates a container with State.Status set to exited (or dead) and a non-zero
ExitCode to cover the regression; create a test (e.g.,
GetStatus_WhenContainerIsExitedWithNonZeroExitCode_ExpectFailureStatus) that
setups _podmanClient.Containers.InspectContainerAsync to return
ContainerInspectResponse with State.Status = Strings.DockerStatusExited (or
"dead") and ExitCode = 100, instantiate PodmanPlugin and call
GetStatus("identity", ...), then assert result.Status ==
TaskExecutionStatus.Failed, result.FailureReason ==
FailureReason.ExternalServiceError (or the same code path used for non-zero
exits in PodmanPlugin), result.Errors contains "Exit code=100", and verify
InspectContainerAsync was called for "identity".
tests/UnitTests/TaskManager.Podman.Tests/ContainerStatusMonitorTest.cs (1)

99-125: Add a transient-failure regression for the completion path.

These tests only exercise the no-artifact and all-success cases. A case where PutObjectAsync or Publish fails after the container reaches exited would catch duplicate uploads/callbacks and callbacks sent with missing artifacts.

Also applies to: 127-153

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/UnitTests/TaskManager.Podman.Tests/ContainerStatusMonitorTest.cs`
around lines 99 - 125, Add a new unit test covering a transient-failure
regression for the completion path: simulate ContainerStatusMonitor.Start
receiving an exited container (use the same ContainerInspectResponse setup) but
make IStorageService.PutObjectAsync and/or
IMessageBrokerPublisherService.Publish throw on the first call and succeed on a
retry, then assert that artifacts are uploaded exactly once and the final
callback is published exactly once; reference the existing test
Start_WhenCalledWithoutAnyArtifacts_ExpectToSendCallbackEvent for structure and
the methods PutObjectAsync and Publish to locate the mocks and the Start method
invocation on ContainerStatusMonitor for where to introduce the
failure-and-retry behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs`:
- Around line 91-107: The polling loop currently calls UploadOutputArtifacts and
SendCallbackMessage inside the try so any exception re-enters the retry loop and
can cause duplicate side effects; change the logic in the method that uses
dockerClient.Containers.InpectContainerAsync and IsContainerCompleted so that
when IsContainerCompleted(response.State) is true you set a local flag (e.g.,
completed = true) and break out of the polling loop (do not call
UploadOutputArtifacts or SendCallbackMessage there), then after the loop
(outside the retry/inspect try/catch/finally) invoke
UploadOutputArtifacts(intermediateVolumeMount, outputVolumeMounts,
cancellationToken) and SendCallbackMessage(taskDispatchEvent, containerId) once,
each wrapped with their own appropriate try/catch/error logging so failures do
not cause the poll loop to re-run and produce duplicates.
- Around line 175-177: The catch block in ContainerStatusMonitor that calls
_logger.ErrorUploadingFile(file, ex) is swallowing upload failures so Start can
still publish callbacks with missing artifacts; after logging, rethrow or
propagate the exception (e.g., throw; or throw a new exception that wraps ex) so
the failure prevents Start from publishing the callback, or alternatively set
and check a failure flag that prevents PublishCallback from running — update the
catch that contains _logger.ErrorUploadingFile(file, ex) to either rethrow the
exception or mark the upload as failed and ensure Start checks that flag before
publishing.
- Around line 164-165: The object key assembly is unsafe: using
file.Replace(artifactsPath, ...) and _fileSystem.Path.Combine produces wrong
results (prefix collisions, rooted paths dropping the prefix, and OS-specific
backslashes). Change the logic that sets objectName (the code referencing file,
artifactsPath and destination.RelativeRootPath and calling
_fileSystem.Path.Combine) to first ensure you only strip a leading artifactsPath
prefix (e.g., check file.StartsWith(artifactsPath) and remove exactly that
prefix) then normalize the remaining segment by trimming any leading slashes,
and finally join destination.RelativeRootPath and the normalized segment using
explicit forward-slash concatenation (not Path.Combine) so the resulting object
key always uses '/' separators; apply the same fix to the Docker variant where
the same pattern appears.

In `@src/TaskManager/Plug-ins/Podman/Keys.cs`:
- Around line 62-64: Fix the typo in the XML comment above the
EnvironmentVariableKeyPrefix field in Keys.cs: change "envrionment" to
"environment" so the summary reads "Prefix for environment variables." and
ensure the XML comment remains well-formed for the public static readonly string
EnvironmentVariableKeyPrefix.

In `@src/TaskManager/Plug-ins/Podman/Logging/Log.cs`:
- Around line 41-48: Fix the typos in the LoggerMessage Message strings: update
DockerInputMapped and DockerOutputMapped messages to use "host={hostPath}"
instead of "host=={hostPath}", change DockerEnvironmentVariableAdded message to
"Environment variable added {key}={value}." instead of "variabled", and find any
other logger messages around the same area (the two messages referenced at the
end of the comment) that use "retreiving" and correct them to "retrieving";
ensure each LoggerMessage attribute's Message text is corrected for these
spelling and punctuation issues.
- Around line 116-117: The ErrorSettingDirectoryPermission logger method should
accept an Exception so stack traces are captured: change the partial method
signature from ErrorSettingDirectoryPermission(this ILogger logger, string path,
string user) to include an Exception parameter (e.g.,
ErrorSettingDirectoryPermission(this ILogger logger, Exception exception, string
path, string user)) and update any call sites to pass the caught exception; keep
the LoggerMessage attribute as-is (the exception is provided as a method
parameter and will be logged by the generated source) and mirror the parameter
ordering used by other methods like ErrorGeneratingContainerSpecification and
ErrorDeployingContainer for consistency.
- Around line 65-66: The LoggerMessage attribute on the
DockerIntermediateVolumeMapped partial method has a typo in EventId (10014) that
breaks the project's EventId sequence; update the attribute to use EventId =
1014 on the DockerIntermediateVolumeMapped method in Log.cs and ensure it
matches the surrounding 1000-1031 range (and confirm no other log entry already
uses 1014 to avoid duplicates).

In `@src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs`:
- Around line 366-372: The loop in PodmanPlugin (iterating
Event.TaskPluginArguments and using Keys.EnvironmentVariableKeyPrefix) currently
logs raw env var values via _logger.DockerEnvironmentVariableAdded(envVarKey,
Event.TaskPluginArguments[key]) which can leak secrets; change the logging to
only record the env var name or a redacted placeholder (e.g. "***REDACTED***")
when calling _logger.DockerEnvironmentVariableAdded, while still assigning the
real value into envvars[envVarKey]; if needed, update or add an overload on
_logger.DockerEnvironmentVariableAdded to accept a single parameter (key) to
avoid passing the secret value.
- Around line 245-270: The current logic in PodmanPlugin.cs uses
ContainerStatusMonitor.IsContainerCompleted(response.State) to mark a task as
Succeeded regardless of exit code; update the branch in the method that builds
ExecutionStatus so it only returns Status = TaskExecutionStatus.Succeeded when
the container is terminal AND response.State.ExitCode == 0; otherwise treat
terminal non-zero exit codes (or response.State.Dead/response.State.OOMKilled)
as failures—use FailureReason.ExternalServiceError for OOMKilled/Dead and
FailureReason.Unknown (or a more specific reason if available) for non-zero exit
codes, and populate Errors with the exit code and state (reference
response.State, ContainerStatusMonitor.IsContainerCompleted, and the
ExecutionStatus construction sites).
- Around line 452-454: The code incorrectly casts the result of
storageService.GetObjectAsync(input.Bucket, obj.FilePath, cancellationToken) to
MemoryStream and calls WriteTo, which can NRE for non-MemoryStream types;
instead assign the returned Stream to a variable (e.g., stream = await
storageService.GetObjectAsync(... ) ?? throw new
InvalidOperationException(...)), remove the MemoryStream cast, and replace
stream.WriteTo(fileStream) with await stream.CopyToAsync(fileStream,
cancellationToken) so any Stream implementation is supported; keep using the
using declarations for stream and fileStream.
- Around line 193-197: The monitor is being fire-and-forget via Task.Run calling
IContainerStatusMonitor.Start(containerId, ...) with the caller's
cancellationToken, so the monitor can be cancelled mid-upload and its exceptions
go unobserved; change the call in PodmanPlugin.cs to start the monitor with a
dedicated token (e.g., new CancellationTokenSource or CancellationToken.None)
instead of the caller's cancellationToken so artifact uploads/callbacks can
finish, capture the returned Task (from Task.Run or Start if it returns a Task)
and observe exceptions by either awaiting it where appropriate or attaching a
ContinueWith/try/catch to log faults (so exceptions from monitor.Start are not
lost), and ensure any intended cooperative cancellation from the caller is
translated into a graceful stop request rather than directly canceling the
upload (use a linked CTS if you need to forward cancellation requests while
allowing finalization).

---

Nitpick comments:
In `@src/TaskManager/Plug-ins/Podman/ContainerVolumeMount.cs`:
- Line 17: Remove the unused Ardalis.GuardClauses using directive from
ContainerVolumeMount.cs: delete the line "using Ardalis.GuardClauses;" and
ensure no other code in the ContainerVolumeMount class or related methods (e.g.,
constructors or validation helpers) reference Ardalis guard methods; rely on the
existing ArgumentNullException.ThrowIfNull/ThrowIfNullOrWhiteSpace calls as
currently implemented.

In `@src/TaskManager/Plug-ins/Podman/IPodmanContainerCreator.cs`:
- Line 68: The hardcoded Podman API version in the POST URL inside the container
creation call should be made configurable or documented; update the code in the
method that calls httpClient.PostAsync (in IPodmanContainerCreator
implementation where "/v4.0.0/libpod/containers/create" is used) to pull the API
prefix from a configurable setting (e.g., constructor-injected podmanApiVersion
or options) and fall back to a sensible default, or add an inline comment
stating the minimum supported Podman version if you choose not to make it
configurable; ensure the identifier (podmanApiVersion / PodmanOptions) is used
to build the request path consistently across methods that call the Podman HTTP
API.

In `@src/TaskManager/Plug-ins/Podman/Keys.cs`:
- Around line 21-24: The XML documentation in the Podman plugin contains
Docker-specific wording; update the comments in the Keys class (e.g., the
BaseUrl field and other XML comments around lines referencing "docker run" and
"Docker workflow") to say "Podman" instead—replace "Docker server" with "Podman
server", "docker run" with "podman run", and "Docker workflow" with "Podman
workflow" so the comments accurately describe the Podman plugin and its keys.

In `@src/TaskManager/Plug-ins/Podman/Logging/Log.cs`:
- Around line 41-48: Rename the logging methods to use the Podman prefix instead
of Docker to match the plugin context: change DockerInputMapped to
PodmanInputMapped, DockerOutputMapped to PodmanOutputMapped,
DockerEnvironmentVariableAdded to PodmanEnvironmentVariableAdded, and
DockerIntermediateVolumeMapped to PodmanIntermediateVolumeMapped (keep the
LoggerMessage attributes and EventId/Message text unless message wording must
change). Update all call sites and any unit tests or references to these symbols
to the new method names so compilation succeeds.

In `@tests/UnitTests/TaskManager.Podman.Tests/ContainerStatusMonitorTest.cs`:
- Around line 99-125: Add a new unit test covering a transient-failure
regression for the completion path: simulate ContainerStatusMonitor.Start
receiving an exited container (use the same ContainerInspectResponse setup) but
make IStorageService.PutObjectAsync and/or
IMessageBrokerPublisherService.Publish throw on the first call and succeed on a
retry, then assert that artifacts are uploaded exactly once and the final
callback is published exactly once; reference the existing test
Start_WhenCalledWithoutAnyArtifacts_ExpectToSendCallbackEvent for structure and
the methods PutObjectAsync and Publish to locate the mocks and the Start method
invocation on ContainerStatusMonitor for where to introduce the
failure-and-retry behavior.

In `@tests/UnitTests/TaskManager.Podman.Tests/PodmanPluginTest.cs`:
- Around line 195-240: The test
ExerciseTask_WhenFailedToMonitorContainer_ExpectTaskToBeAccepted does not
observe the background exception because ContainerStatusMonitor.Start is invoked
fire-and-forget; update the test to deterministically observe the monitor call
by either (a) replacing the current
_containerStatusMonitor.Setup(...).ThrowsAsync(...) with a setup that signals a
TaskCompletionSource when Start is invoked and then throws inside a Task
returned from Start so the test can await the TCS and assert the invocation, or
(b) add a Verify on _containerStatusMonitor.Verify(m => m.Start(...),
Times.Once()) after ExecuteTask to ensure Start was called; reference the mocked
member _containerStatusMonitor and the Start method and coordinate with a
TaskCompletionSource in the test to ensure the background failure is observed
before asserting result.
- Around line 376-495: Add a new unit test in the PodmanPlugin GetStatus suite
that simulates a container with State.Status set to exited (or dead) and a
non-zero ExitCode to cover the regression; create a test (e.g.,
GetStatus_WhenContainerIsExitedWithNonZeroExitCode_ExpectFailureStatus) that
setups _podmanClient.Containers.InspectContainerAsync to return
ContainerInspectResponse with State.Status = Strings.DockerStatusExited (or
"dead") and ExitCode = 100, instantiate PodmanPlugin and call
GetStatus("identity", ...), then assert result.Status ==
TaskExecutionStatus.Failed, result.FailureReason ==
FailureReason.ExternalServiceError (or the same code path used for non-zero
exits in PodmanPlugin), result.Errors contains "Exit code=100", and verify
InspectContainerAsync was called for "identity".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d05a00ca-9536-4f3d-baa6-1c8453407771

📥 Commits

Reviewing files that changed from the base of the PR and between 9e10e95 and 7273860.

📒 Files selected for processing (22)
  • src/Common/Miscellaneous/ValidationConstants.cs
  • src/Monai.Deploy.WorkflowManager.sln
  • src/TaskManager/Plug-ins/Podman/AssemblyInfo.cs
  • src/TaskManager/Plug-ins/Podman/ContainerMonitorException.cs
  • src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs
  • src/TaskManager/Plug-ins/Podman/ContainerVolumeMount.cs
  • src/TaskManager/Plug-ins/Podman/IPodmanClientFactory.cs
  • src/TaskManager/Plug-ins/Podman/IPodmanContainerCreator.cs
  • src/TaskManager/Plug-ins/Podman/Keys.cs
  • src/TaskManager/Plug-ins/Podman/Logging/Log.cs
  • src/TaskManager/Plug-ins/Podman/Monai.Deploy.WorkflowManager.TaskManager.Podman.csproj
  • src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs
  • src/TaskManager/Plug-ins/Podman/SetPermissionException.cs
  • src/TaskManager/Plug-ins/Podman/Strings.cs
  • src/TaskManager/TaskManager/Extensions/TaskManagerExtensions.cs
  • src/TaskManager/TaskManager/Monai.Deploy.WorkflowManager.TaskManager.csproj
  • src/TaskManager/TaskManager/PluginStrings.cs
  • src/TaskManager/TaskManager/packages.lock.json
  • tests/UnitTests/TaskManager.Podman.Tests/ContainerStatusMonitorTest.cs
  • tests/UnitTests/TaskManager.Podman.Tests/Monai.Deploy.WorkflowManager.TaskManager.Podman.Tests.csproj
  • tests/UnitTests/TaskManager.Podman.Tests/PodmanPluginTest.cs
  • tests/UnitTests/TaskManager.Podman.Tests/Usings.cs

Comment on lines +91 to +107
var response = await dockerClient.Containers.InspectContainerAsync(containerId, cancellationToken).ConfigureAwait(false);

if (IsContainerCompleted(response.State))
{
await UploadOutputArtifacts(intermediateVolumeMount, outputVolumeMounts, cancellationToken).ConfigureAwait(false);
await SendCallbackMessage(taskDispatchEvent, containerId).ConfigureAwait(false);
return;
}
}
catch (Exception ex)
{
_logger.ErrorMonitoringContainerStatus(containerId, ex);
}
finally
{
await Task.Delay(pollingPeriod, cancellationToken).ConfigureAwait(false);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Separate polling from completion side effects.

Once the container is complete, any exception from UploadOutputArtifacts or SendCallbackMessage falls into the outer retry loop and the next poll re-enters the completion path. That can re-upload artifacts and publish duplicate callbacks until the timeout elapses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs` around lines 91 -
107, The polling loop currently calls UploadOutputArtifacts and
SendCallbackMessage inside the try so any exception re-enters the retry loop and
can cause duplicate side effects; change the logic in the method that uses
dockerClient.Containers.InpectContainerAsync and IsContainerCompleted so that
when IsContainerCompleted(response.State) is true you set a local flag (e.g.,
completed = true) and break out of the polling loop (do not call
UploadOutputArtifacts or SendCallbackMessage there), then after the loop
(outside the retry/inspect try/catch/finally) invoke
UploadOutputArtifacts(intermediateVolumeMount, outputVolumeMounts,
cancellationToken) and SendCallbackMessage(taskDispatchEvent, containerId) once,
each wrapped with their own appropriate try/catch/error logging so failures do
not cause the poll loop to re-run and produce duplicates.

Comment on lines +164 to +165
var objectName = file.Replace(artifactsPath, string.Empty).TrimStart('/');
objectName = _fileSystem.Path.Combine(destination.RelativeRootPath, objectName);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the file
find . -type f -name "ContainerStatusMonitor.cs"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 206


🏁 Script executed:

# Read the relevant section with more context
cat -n "src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs" | sed -n '150,180p'

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 1768


🏁 Script executed:

# Search for how objectName is used after these lines
rg -A 10 -B 5 "objectName" "src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs" | head -60

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 1306


🏁 Script executed:

# Check what artifactsPath is and how it's defined
rg "artifactsPath" "src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs" -B 2 -A 2

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 1236


🏁 Script executed:

# Check what type of storage service is being used
rg -B 5 -A 5 "PutObjectAsync" "src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 725


🏁 Script executed:

# Check the IStorageService interface to understand object storage expectations
find . -type f -name "*.cs" | xargs rg -l "interface IStorageService"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 71


🏁 Script executed:

# Look at the destination object to understand RelativeRootPath
rg "RelativeRootPath" --type cs -B 2 -A 2 | head -40

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 3643


🏁 Script executed:

# Check if there's any documentation or tests about objectName formatting
rg "objectName" --type cs -B 2 -A 2 | grep -E "(objectName|storage|S3|blob)" | head -30

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 6206


🏁 Script executed:

# Verify the prefix-safe concern by checking edge cases
python3 << 'EOF'
# Test the current problematic approach
artifactsPath1 = "/home/user/artifacts"
file1 = "/home/user/artifacts2/file.txt"
result1 = file1.replace(artifactsPath1, "").lstrip('/')
print(f"Case 1 - Path that starts with same prefix:")
print(f"  artifactsPath: {artifactsPath1}")
print(f"  file: {file1}")
print(f"  Result: {result1}")
print(f"  Expected: 2/file.txt (WRONG!)")
print()

# Correct approach with GetRelativePath-like logic
import os
file2 = "/home/user/artifacts/subdir/file.txt"
artifactsPath2 = "/home/user/artifacts"
result2 = os.path.relpath(file2, artifactsPath2)
print(f"Case 2 - Proper relative path:")
print(f"  artifactsPath: {artifactsPath2}")
print(f"  file: {file2}")
print(f"  Result: {result2}")
EOF

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 401


🏁 Script executed:

# Let's verify Path.Combine behavior on Windows with leading slashes
python3 << 'EOF'
# Simulate Path.Combine behavior on Windows
# On Windows, Path.Combine drops the second path if it starts with /
import ntpath
import posixpath

# Simulate Windows Path.Combine behavior
relativeRootPath = "minio/workflow"
objectName = "/artifacts/file.txt"  # Leading slash from TrimStart('/')

# Windows behavior - rooted paths reset the path
windows_result = ntpath.join(relativeRootPath, objectName)
print(f"Windows Path.Combine simulation:")
print(f"  RelativeRootPath: {relativeRootPath}")
print(f"  objectName: {objectName}")
print(f"  Result: {windows_result}")
print(f"  Expected: minio/workflow/artifacts/file.txt")
print()

# Also test with backslashes in objectName
objectName2 = "artifacts\\file.txt"
windows_result2 = ntpath.join(relativeRootPath, objectName2)
print(f"With backslashes:")
print(f"  Result: {windows_result2}")
EOF

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 327


Use forward slash string concatenation for object storage keys instead of Path.Combine.

The current implementation has three critical issues:

  1. Prefix-safety: file.Replace(artifactsPath, ...) is unsafe—it will incorrectly match overlapping paths (e.g., /artifacts won't properly strip from /artifacts2/file.txt).

  2. Leading slash behavior on Windows: Path.Combine treats rooted paths (those starting with /) as absolute, dropping the RelativeRootPath entirely. For example, Path.Combine("minio/workflow", "/artifacts/file.txt") returns /artifacts/file.txt instead of the intended minio/workflow/artifacts/file.txt.

  3. OS-specific separators in object keys: On Windows, Path.Combine produces backslashes, but object storage systems (MinIO/S3) expect forward slashes only.

This issue exists in both the Podman and Docker variants.

🛠️ Suggested fix
-                    var objectName = file.Replace(artifactsPath, string.Empty).TrimStart('/');
-                    objectName = _fileSystem.Path.Combine(destination.RelativeRootPath, objectName);
+                    var relativePath = _fileSystem.Path.GetRelativePath(artifactsPath, file)
+                        .Replace('\\', '/')
+                        .TrimStart('/');
+                    var objectName = string.IsNullOrWhiteSpace(destination.RelativeRootPath)
+                        ? relativePath
+                        : $"{destination.RelativeRootPath.TrimEnd('/')}/{relativePath}";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs` around lines 164 -
165, The object key assembly is unsafe: using file.Replace(artifactsPath, ...)
and _fileSystem.Path.Combine produces wrong results (prefix collisions, rooted
paths dropping the prefix, and OS-specific backslashes). Change the logic that
sets objectName (the code referencing file, artifactsPath and
destination.RelativeRootPath and calling _fileSystem.Path.Combine) to first
ensure you only strip a leading artifactsPath prefix (e.g., check
file.StartsWith(artifactsPath) and remove exactly that prefix) then normalize
the remaining segment by trimming any leading slashes, and finally join
destination.RelativeRootPath and the normalized segment using explicit
forward-slash concatenation (not Path.Combine) so the resulting object key
always uses '/' separators; apply the same fix to the Docker variant where the
same pattern appears.

Comment on lines +175 to +177
catch (Exception ex)
{
_logger.ErrorUploadingFile(file, ex);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't swallow failed artifact uploads.

Logging and continuing here means Start can still publish the callback even though one or more outputs never reached storage. That advances the workflow with incomplete artifacts.

🛠️ Suggested fix
                 catch (Exception ex)
                 {
                     _logger.ErrorUploadingFile(file, ex);
+                    throw new ContainerMonitorException($"Failed to upload artifact '{file}'.", ex);
                 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
catch (Exception ex)
{
_logger.ErrorUploadingFile(file, ex);
catch (Exception ex)
{
_logger.ErrorUploadingFile(file, ex);
throw new ContainerMonitorException($"Failed to upload artifact '{file}'.", ex);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs` around lines 175 -
177, The catch block in ContainerStatusMonitor that calls
_logger.ErrorUploadingFile(file, ex) is swallowing upload failures so Start can
still publish callbacks with missing artifacts; after logging, rethrow or
propagate the exception (e.g., throw; or throw a new exception that wraps ex) so
the failure prevents Start from publishing the callback, or alternatively set
and check a failure flag that prevents PublishCallback from running — update the
catch that contains _logger.ErrorUploadingFile(file, ex) to either rethrow the
exception or mark the upload as failed and ensure Start checks that flag before
publishing.

Comment on lines +62 to +64
/// Prefix for envrionment variables.
/// </summary>
public static readonly string EnvironmentVariableKeyPrefix = "env_";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: "envrionment" should be "environment".

🔧 Proposed fix
         /// <summary>
-        /// Prefix for envrionment variables.
+        /// Prefix for environment variables.
         /// </summary>
         public static readonly string EnvironmentVariableKeyPrefix = "env_";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/Keys.cs` around lines 62 - 64, Fix the typo
in the XML comment above the EnvironmentVariableKeyPrefix field in Keys.cs:
change "envrionment" to "environment" so the summary reads "Prefix for
environment variables." and ensure the XML comment remains well-formed for the
public static readonly string EnvironmentVariableKeyPrefix.

Comment on lines +41 to +48
[LoggerMessage(EventId = 1006, Level = LogLevel.Information, Message = "Input volume mapping host=={hostPath}, container={containerPath}.")]
public static partial void DockerInputMapped(this ILogger logger, string hostPath, string containerPath);

[LoggerMessage(EventId = 1007, Level = LogLevel.Information, Message = "Output volume mapping host=={hostPath}, container={containerPath}.")]
public static partial void DockerOutputMapped(this ILogger logger, string hostPath, string containerPath);

[LoggerMessage(EventId = 1008, Level = LogLevel.Information, Message = "Environment variabled added {key}={value}.")]
public static partial void DockerEnvironmentVariableAdded(this ILogger logger, string key, string value);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typos in log messages.

Several log messages contain typos:

  • Line 41, 44, 65: host== should be host=
  • Line 47: variabled should be variable
  • Line 50: retreiving should be retrieving
🔧 Proposed fixes
-        [LoggerMessage(EventId = 1006, Level = LogLevel.Information, Message = "Input volume mapping host=={hostPath}, container={containerPath}.")]
+        [LoggerMessage(EventId = 1006, Level = LogLevel.Information, Message = "Input volume mapping host={hostPath}, container={containerPath}.")]

-        [LoggerMessage(EventId = 1007, Level = LogLevel.Information, Message = "Output volume mapping host=={hostPath}, container={containerPath}.")]
+        [LoggerMessage(EventId = 1007, Level = LogLevel.Information, Message = "Output volume mapping host={hostPath}, container={containerPath}.")]

-        [LoggerMessage(EventId = 1008, Level = LogLevel.Information, Message = "Environment variabled added {key}={value}.")]
+        [LoggerMessage(EventId = 1008, Level = LogLevel.Information, Message = "Environment variable added {key}={value}.")]

-        [LoggerMessage(EventId = 1009, Level = LogLevel.Error, Message = "Error retreiving status from container {identity}.")]
+        [LoggerMessage(EventId = 1009, Level = LogLevel.Error, Message = "Error retrieving status from container {identity}.")]

Also applies to: 50-51

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/Logging/Log.cs` around lines 41 - 48, Fix the
typos in the LoggerMessage Message strings: update DockerInputMapped and
DockerOutputMapped messages to use "host={hostPath}" instead of
"host=={hostPath}", change DockerEnvironmentVariableAdded message to
"Environment variable added {key}={value}." instead of "variabled", and find any
other logger messages around the same area (the two messages referenced at the
end of the comment) that use "retreiving" and correct them to "retrieving";
ensure each LoggerMessage attribute's Message text is corrected for these
spelling and punctuation issues.

Comment on lines +116 to +117
[LoggerMessage(EventId = 1031, Level = LogLevel.Error, Message = "Error setting directory {path} with permission {user}.")]
public static partial void ErrorSettingDirectoryPermission(this ILogger logger, string path, string user);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ErrorSettingDirectoryPermission lacks Exception parameter.

Unlike other error logging methods (e.g., ErrorGeneratingContainerSpecification, ErrorDeployingContainer), this method doesn't accept an Exception parameter, preventing stack trace capture.

🔧 Proposed fix
-        [LoggerMessage(EventId = 1031, Level = LogLevel.Error, Message = "Error setting directory {path} with permission {user}.")]
-        public static partial void ErrorSettingDirectoryPermission(this ILogger logger, string path, string user);
+        [LoggerMessage(EventId = 1031, Level = LogLevel.Error, Message = "Error setting directory {path} with permission {user}.")]
+        public static partial void ErrorSettingDirectoryPermission(this ILogger logger, string path, string user, Exception ex);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/Logging/Log.cs` around lines 116 - 117, The
ErrorSettingDirectoryPermission logger method should accept an Exception so
stack traces are captured: change the partial method signature from
ErrorSettingDirectoryPermission(this ILogger logger, string path, string user)
to include an Exception parameter (e.g., ErrorSettingDirectoryPermission(this
ILogger logger, Exception exception, string path, string user)) and update any
call sites to pass the caught exception; keep the LoggerMessage attribute as-is
(the exception is provided as a method parameter and will be logged by the
generated source) and mirror the parameter ordering used by other methods like
ErrorGeneratingContainerSpecification and ErrorDeployingContainer for
consistency.

Comment on lines +193 to +197
var monitor = _scope.ServiceProvider.GetService<IContainerStatusMonitor>() ?? throw new ServiceNotFoundException(nameof(IContainerStatusMonitor));
_ = Task.Run(async () =>
{
await monitor.Start(Event, _containerTimeout, containerId, intermediateVolumeMount, outputVolumeMounts, cancellationToken);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

wc -l src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 138


🏁 Script executed:

sed -n '185,205p' src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 1171


🏁 Script executed:

sed -n '520,545p' src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 838


🏁 Script executed:

# Get broader context around both sections to understand the scope and lifecycle
sed -n '150,210p' src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 3296


🏁 Script executed:

# Check class structure and Dispose method
grep -n "class PodmanPlugin\|void Dispose\|IDisposable" src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | head -20

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 197


🏁 Script executed:

# Look for _scope declaration and usage
rg "_scope" src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 960


🏁 Script executed:

# Check the method signature and how ExecuteTask is called/awaited
sed -n '85,120p' src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 1986


🏁 Script executed:

# Check where ExecuteTask is called from to understand cancellation token source
grep -n "ExecuteTask\|override.*async" src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | head -10

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 436


🏁 Script executed:

# Check if there's a GetStatus method that might indicate how monitor completion is tracked
sed -n '225,280p' src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 3105


🏁 Script executed:

# Check the base class to understand plugin lifecycle
find . -name "TaskPluginBase.cs" -type f

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 126


🏁 Script executed:

# Check TaskPluginBase to understand the plugin lifecycle
sed -n '1,100p' src/TaskManager/API/TaskPluginBase.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 2469


🏁 Script executed:

# Check if there's more context about plugin disposal and cancellation
grep -n "Dispose\|CancellationToken\|class TaskPluginBase" src/TaskManager/API/TaskPluginBase.cs | head -20

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 835


🏁 Script executed:

# Check the IContainerStatusMonitor interface to understand what the monitor does
find . -name "*ContainerStatusMonitor*" -type f

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 352


🏁 Script executed:

# Check the IContainerStatusMonitor interface
grep -n "interface IContainerStatusMonitor\|async Task Start" src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs | head -10

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 171


🏁 Script executed:

# Check ContainerStatusMonitor.Start method and its dependencies
sed -n '1,80p' src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 4016


🏁 Script executed:

# Check if the monitor uses any services from _scope that would be disposed
rg "ServiceProvider\|_scope" src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

# Check the full Start method to understand what the monitor does
sed -n '69,150p' src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 4909


🏁 Script executed:

# Check how the monitor is registered in DI and if it's scoped
rg "IContainerStatusMonitor\|AddScoped\|AddSingleton" src/TaskManager/Plug-ins/Podman/ -A 2 -B 2

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

# Check when/how PodmanPlugin is disposed
rg "PodmanPlugin\|\.Dispose()" src/TaskManager/ --type cs | grep -i "dispose\|podman" | head -20

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

# Check the SendCallbackMessage method to understand what the monitor does
sed -n '100,180p' src/TaskManager/Plug-ins/Podman/ContainerStatusMonitor.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 4646


🏁 Script executed:

# Check how PodmanPlugin instances are created and managed
grep -n "new PodmanPlugin\|ExecuteTask\|GetStatus" src/TaskManager/TaskDispatcher.cs | head -20

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 153


🏁 Script executed:

# Check if there's any awaiting of the background task or cleanup
rg "Task\.Run\|Task\.WhenAll\|await.*monitor" src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

# Find where the DI is configured for ContainerStatusMonitor
find . -name "*.cs" -type f -exec grep -l "IContainerStatusMonitor\|AddContainerStatusMonitor" {} \; | head -10

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 644


🏁 Script executed:

# Check the startup/configuration files
grep -r "IContainerStatusMonitor\|ContainerStatusMonitor" src/TaskManager/Plug-ins/Podman/ --include="*.cs" | grep -i "add\|register\|factory"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

# Check if there's a Startup or configuration file
find . -name "Startup.cs" -o -name "*ServiceConfiguration*" -o -name "*ServiceExtension*" | head -10

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 204


🏁 Script executed:

# Check TaskManagerExtensions to see DI registration
cat src/TaskManager/TaskManager/Extensions/TaskManagerExtensions.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 4770


🏁 Script executed:

# Check how the plugin is used - find where ExecuteTask is called
find . -name "*.cs" -type f -exec grep -l "\.ExecuteTask\|ITaskPlugin" {} \; | grep -v test | head -10

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 260


🏁 Script executed:

# Check TaskManager.cs to see how ExecuteTask is called
sed -n '1,100p' src/TaskManager/TaskManager/TaskManager.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 5781


🏁 Script executed:

# Search for where ExecuteTask is called and when the plugin is disposed
rg "ExecuteTask\|\.Dispose\(\)" src/TaskManager/TaskManager/TaskManager.cs -B 3 -A 3

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

# Search for where ExecuteTask is called
grep -n "ExecuteTask" src/TaskManager/TaskManager/TaskManager.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 212


🏁 Script executed:

# Find all mentions of plugin disposal and ExecuteTask
grep -n "plugin\|ExecuteTask\|Dispose" src/TaskManager/TaskManager/TaskManager.cs | head -30

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 2162


🏁 Script executed:

# Get more context around line 492 to see the full flow
sed -n '470,525p' src/TaskManager/TaskManager/TaskManager.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 3198


🏁 Script executed:

# Check if there's any exception handling or finally blocks that dispose the plugin
sed -n '420,530p' src/TaskManager/TaskManager/TaskManager.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 6136


Fire-and-forget monitor task can be cancelled before completing artifact uploads or callbacks.

This code launches monitor.Start() in a background Task.Run without awaiting, passing the caller's cancellationToken. If the token is cancelled before the monitor completes, in-flight artifact uploads and callback delivery can be interrupted. Additionally, the surrounding catch block only catches scheduling failures; exceptions from monitor.Start() are never observed.

Also applies to: 530-537

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs` around lines 193 - 197, The
monitor is being fire-and-forget via Task.Run calling
IContainerStatusMonitor.Start(containerId, ...) with the caller's
cancellationToken, so the monitor can be cancelled mid-upload and its exceptions
go unobserved; change the call in PodmanPlugin.cs to start the monitor with a
dedicated token (e.g., new CancellationTokenSource or CancellationToken.None)
instead of the caller's cancellationToken so artifact uploads/callbacks can
finish, capture the returned Task (from Task.Run or Start if it returns a Task)
and observe exceptions by either awaiting it where appropriate or attaching a
ContinueWith/try/catch to log faults (so exceptions from monitor.Start are not
lost), and ensure any intended cooperative cancellation from the caller is
translated into a graceful stop request rather than directly canceling the
upload (use a linked CTS if you need to forward cancellation requests while
allowing finalization).

Comment on lines +245 to +270
if (ContainerStatusMonitor.IsContainerCompleted(response.State))
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Succeeded,
FailureReason = FailureReason.None,
Stats = stats
};
}
else if (response.State.OOMKilled || response.State.Dead)
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Failed,
FailureReason = FailureReason.ExternalServiceError,
Errors = $"Exit code={response.State.ExitCode}",
Stats = stats
};
}
else
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Failed,
FailureReason = FailureReason.Unknown,
Errors = $"Exit code={response.State.ExitCode}. Status={response.State.Status}.",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't equate "completed" with "succeeded".

This branch only checks that the container reached a terminal state. Status=exited with a non-zero exit code, or Status=dead, currently lands in Succeeded as long as FinishedAt is populated.

🛠️ Suggested fix
-                if (ContainerStatusMonitor.IsContainerCompleted(response.State))
+                if (string.Equals(response.State.Status, Strings.DockerStatusExited, StringComparison.OrdinalIgnoreCase) &&
+                    response.State.ExitCode == 0)
                 {
                     return new ExecutionStatus
                     {
                         Status = TaskExecutionStatus.Succeeded,
                         FailureReason = FailureReason.None,
                         Stats = stats
                     };
                 }
-                else if (response.State.OOMKilled || response.State.Dead)
+                else if (response.State.OOMKilled ||
+                         response.State.Dead ||
+                         string.Equals(response.State.Status, Strings.DockerStatusDead, StringComparison.OrdinalIgnoreCase))
                 {
                     return new ExecutionStatus
                     {
                         Status = TaskExecutionStatus.Failed,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (ContainerStatusMonitor.IsContainerCompleted(response.State))
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Succeeded,
FailureReason = FailureReason.None,
Stats = stats
};
}
else if (response.State.OOMKilled || response.State.Dead)
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Failed,
FailureReason = FailureReason.ExternalServiceError,
Errors = $"Exit code={response.State.ExitCode}",
Stats = stats
};
}
else
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Failed,
FailureReason = FailureReason.Unknown,
Errors = $"Exit code={response.State.ExitCode}. Status={response.State.Status}.",
if (string.Equals(response.State.Status, Strings.DockerStatusExited, StringComparison.OrdinalIgnoreCase) &&
response.State.ExitCode == 0)
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Succeeded,
FailureReason = FailureReason.None,
Stats = stats
};
}
else if (response.State.OOMKilled ||
response.State.Dead ||
string.Equals(response.State.Status, Strings.DockerStatusDead, StringComparison.OrdinalIgnoreCase))
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Failed,
FailureReason = FailureReason.ExternalServiceError,
Errors = $"Exit code={response.State.ExitCode}",
Stats = stats
};
}
else
{
return new ExecutionStatus
{
Status = TaskExecutionStatus.Failed,
FailureReason = FailureReason.Unknown,
Errors = $"Exit code={response.State.ExitCode}. Status={response.State.Status}.",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs` around lines 245 - 270, The
current logic in PodmanPlugin.cs uses
ContainerStatusMonitor.IsContainerCompleted(response.State) to mark a task as
Succeeded regardless of exit code; update the branch in the method that builds
ExecutionStatus so it only returns Status = TaskExecutionStatus.Succeeded when
the container is terminal AND response.State.ExitCode == 0; otherwise treat
terminal non-zero exit codes (or response.State.Dead/response.State.OOMKilled)
as failures—use FailureReason.ExternalServiceError for OOMKilled/Dead and
FailureReason.Unknown (or a more specific reason if available) for non-zero exit
codes, and populate Errors with the exit code and state (reference
response.State, ContainerStatusMonitor.IsContainerCompleted, and the
ExecutionStatus construction sites).

Comment on lines +366 to +372
foreach (var key in Event.TaskPluginArguments.Keys)
{
if (key.StartsWith(Keys.EnvironmentVariableKeyPrefix, false, CultureInfo.InvariantCulture))
{
var envVarKey = key.Replace(Keys.EnvironmentVariableKeyPrefix, string.Empty);
envvars[envVarKey] = Event.TaskPluginArguments[key];
_logger.DockerEnvironmentVariableAdded(envVarKey, Event.TaskPluginArguments[key]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact environment variable values in logs.

Task plug-in env vars can carry credentials or tokens. Logging the raw value here leaks secrets into application logs; keep only the key name or a redacted placeholder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs` around lines 366 - 372, The
loop in PodmanPlugin (iterating Event.TaskPluginArguments and using
Keys.EnvironmentVariableKeyPrefix) currently logs raw env var values via
_logger.DockerEnvironmentVariableAdded(envVarKey,
Event.TaskPluginArguments[key]) which can leak secrets; change the logging to
only record the env var name or a redacted placeholder (e.g. "***REDACTED***")
when calling _logger.DockerEnvironmentVariableAdded, while still assigning the
real value into envvars[envVarKey]; if needed, update or add an overload on
_logger.DockerEnvironmentVariableAdded to accept a single parameter (key) to
avoid passing the secret value.

Comment on lines +452 to +454
using var stream = await storageService.GetObjectAsync(input.Bucket, obj.FilePath, cancellationToken).ConfigureAwait(false) as MemoryStream;
using var fileStream = new FileStream(filePath, FileMode.CreateNew, FileAccess.Write);
stream!.WriteTo(fileStream);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "PodmanPlugin.cs" -type f

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 136


🏁 Script executed:

wc -l ./src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 140


🏁 Script executed:

sed -n '440,465p' ./src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | cat -n

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 1562


🏁 Script executed:

sed -n '1,100p' ./src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs | grep -n "storageService"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

grep -n "storageService" ./src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 565


🏁 Script executed:

find . -name "*.cs" -type f | xargs grep -l "interface IStorageService"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 71


🏁 Script executed:

rg "interface IStorageService" -t cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

rg "IStorageService" -t cs | head -20

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 3009


🏁 Script executed:

rg "GetObjectAsync" -t cs -A 2 | head -40

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 4025


🏁 Script executed:

fd "IStorageService" -t f

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

rg "interface IStorageService" -t cs --no-heading

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

find . -path "*/node_modules" -prune -o -name "*.cs" -type f -print | xargs grep -l "GetObjectAsync"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 840


🏁 Script executed:

rg "GetObjectAsync.*Stream" -t cs -B 5 | head -50

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 2604


🏁 Script executed:

rg "Task.*GetObjectAsync" -t cs -A 1 | head -30

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 685


🏁 Script executed:

cat ./tests/UnitTests/WorkflowManager.Tests/DummyStorageService.cs | grep -A 50 "interface IStorageService"

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 69


🏁 Script executed:

cat ./tests/UnitTests/WorkflowManager.Tests/DummyStorageService.cs

Repository: Project-MONAI/monai-deploy-workflow-manager

Length of output: 6230


Remove the MemoryStream downcast and use CopyToAsync instead.

GetObjectAsync returns Task<Stream>, not Task<MemoryStream>. The cast with as MemoryStream will return null if the storage implementation returns any other Stream type, causing a NullReferenceException when calling stream!.WriteTo(fileStream). Additionally, WriteTo() is not available on the base Stream class.

Use the null-coalescing operator to ensure the result is non-null, and switch to CopyToAsync() which works on all Stream implementations:

🛠️ Suggested fix
-                    using var stream = await storageService.GetObjectAsync(input.Bucket, obj.FilePath, cancellationToken).ConfigureAwait(false) as MemoryStream;
-                    using var fileStream = new FileStream(filePath, FileMode.CreateNew, FileAccess.Write);
-                    stream!.WriteTo(fileStream);
+                    using var stream = await storageService.GetObjectAsync(input.Bucket, obj.FilePath, cancellationToken).ConfigureAwait(false)
+                        ?? throw new InvalidOperationException($"Storage returned no content for '{obj.FilePath}'.");
+                    using var fileStream = new FileStream(filePath, FileMode.Create, FileAccess.Write);
+                    await stream.CopyToAsync(fileStream, cancellationToken).ConfigureAwait(false);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
using var stream = await storageService.GetObjectAsync(input.Bucket, obj.FilePath, cancellationToken).ConfigureAwait(false) as MemoryStream;
using var fileStream = new FileStream(filePath, FileMode.CreateNew, FileAccess.Write);
stream!.WriteTo(fileStream);
using var stream = await storageService.GetObjectAsync(input.Bucket, obj.FilePath, cancellationToken).ConfigureAwait(false)
?? throw new InvalidOperationException($"Storage returned no content for '{obj.FilePath}'.");
using var fileStream = new FileStream(filePath, FileMode.Create, FileAccess.Write);
await stream.CopyToAsync(fileStream, cancellationToken).ConfigureAwait(false);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/TaskManager/Plug-ins/Podman/PodmanPlugin.cs` around lines 452 - 454, The
code incorrectly casts the result of storageService.GetObjectAsync(input.Bucket,
obj.FilePath, cancellationToken) to MemoryStream and calls WriteTo, which can
NRE for non-MemoryStream types; instead assign the returned Stream to a variable
(e.g., stream = await storageService.GetObjectAsync(... ) ?? throw new
InvalidOperationException(...)), remove the MemoryStream cast, and replace
stream.WriteTo(fileStream) with await stream.CopyToAsync(fileStream,
cancellationToken) so any Stream implementation is supported; keep using the
using declarations for stream and fileStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant