Add guidance for capturing ETW traces in Kubernetes pods (#2344)

Copilot · brianrob · web-flow · commit 098a616cc5b4 · 2025-12-03T09:19:33.000-08:00
Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: brianrob &lt;6210322+brianrob@users.noreply.github.com&gt;
Co-authored-by: Brian Robbins &lt;brianrob@microsoft.com&gt;
diff --git a/src/PerfView/SupportFiles/UsersGuide.htm b/src/PerfView/SupportFiles/UsersGuide.htm
@@ -6335,6 +6335,131 @@ <h5> Known issues (in Windows Version 1803 or earlier) </h5>
         put them.
     </p>
 
+    <!--  *************** -->
+    <h5><a id="ProcessIsolationContainers">Capturing ETW Traces with Process-Isolation Windows Containers (Kubernetes)</a></h5>
+    <p>
+        When running Windows containers in Kubernetes using process-isolation mode (the default mode, as opposed to Hyper-V isolation), 
+        the containers share the host's kernel. While this enables ETW tracing from the host, it requires a specific 
+        workflow to capture and analyze traces for processes running inside these containers.
+    </p>
+    <p>
+        <strong>Note:</strong> If you are running containers in Hyper-V isolation mode, these instructions are not required. 
+        In Hyper-V mode, each container has its own kernel, so you can capture traces directly inside the container 
+        using the normal PerfView workflow.
+    </p>
+    <p>
+        <strong>Important Limitation:</strong> In process-isolation mode, kernel ETW sessions cannot be started from 
+        <em>inside</em> the container. Since PerfView almost always captures a kernel session, all trace collection 
+        must be initiated from the host node.
+    </p>
+
+    <h6>Step 1: Capture a Trace on the Host Node</h6>
+    <p>
+        Start the trace collection on the Kubernetes host node (not inside the pod). Use the <strong>/EnableEventsInContainers</strong> 
+        option to ensure that user-mode events from processes inside containers flow to the ETW session on the host.  Example capture command:
+    </p>
+    <ul>
+        <li>PerfView collect /EnableEventsInContainers MyContainerTrace.etl</li>
+    </ul>
+    <p>
+        <strong>What /EnableEventsInContainers does:</strong> By default, an ETW session on the host only receives 
+        user-mode events from processes running directly on the host. The /EnableEventsInContainers option enables 
+        the ETW session to also receive user-mode events (such as .NET CLR events, custom EventSource events, etc.) 
+        from processes running inside process-isolation containers.
+    </p>
+    <p>
+        <strong>What happens if you don't use /EnableEventsInContainers:</strong> You will still capture all kernel 
+        events (CPU sampling, context switches, etc.) for container processes, and you will still receive user-mode 
+        events from processes running directly on the host node (outside of containers). However, you will miss 
+        user-mode events like .NET garbage collection events, JIT events, exception events, and any custom 
+        EventSource events from processes inside containers.
+    </p>
+
+    <h6>Step 2a: Analyze While Container is Running (Optional)</h6>
+    <p>
+        If the container(s) containing the process(es) of interest are still running when you stop the trace, you
+        can open and analyze the trace directly on the host node. PerfView will be able to find binaries that it
+        needs both on the host and inside the running containers through the container's file system view. NOTE: This only works for as long
+        as the container is running.
+    </p>
+    <p>
+        This is the simplest analysis path since no additional steps are required—just open the trace in PerfView 
+        on the host node.
+    </p>
+
+    <h6>Step 2b: Prepare Trace for Offline Analysis (Optional)</h6>
+    <p>
+        If you need to analyze the trace after the container has been shut down, or if you want to copy the trace 
+        to another machine for analysis, you need to prepare the trace while the container is still accessible. 
+        This is done using the merge command with the <strong>/ImageIDsOnly</strong> option.
+    </p>
+    <p>
+        First, copy the trace file into the container:
+    </p>
+    <ul>
+        <li>kubectl cp MyContainerTrace.etl.zip my-namespace/my-pod:/app/MyContainerTrace.etl.zip</li>
+    </ul>
+    <p>
+        Then, inside the container, run the merge command to inject the necessary image identification data:
+    </p>
+    <ul>
+        <li>PerfViewCollect merge /ImageIDsOnly MyContainerTrace.etl.zip</li>
+    </ul>
+    <p>
+        <strong>Note:</strong> PerfViewCollect needs to be built from source at 
+        <a href="https://github.com/microsoft/perfview">https://github.com/microsoft/perfview</a>. 
+        It is not currently shipped as a binary. See the "Windows Nanoserver and PerfViewCollect" 
+        section above for build instructions.
+    </p>
+    <p>
+        <strong>What /ImageIDsOnly does:</strong> When you run merge with /ImageIDsOnly, PerfView reads through 
+        the trace and for each DLL that was loaded by processes in the trace, it looks up the DLL's PDB signature 
+        and injects that information into the trace. This unique identifier is what allows PerfView to later 
+        download the correct PDB symbols from a symbol server. Without this information, PerfView cannot resolve 
+        method names for code in those DLLs.
+    </p>
+    <p>
+        <strong>What happens if you don't run merge with /ImageIDsOnly:</strong> If you skip this step and later 
+        try to analyze the trace on another machine after the container is gone, PerfView will be unable to find 
+        the symbol files for DLLs that were loaded inside the container. Your stack traces will show the module 
+        name with a question mark (for example: <code>MyAssembly!?</code> instead of <code>MyAssembly!MyClass.MyMethod</code>). 
+        Jitted .NET code will still resolve correctly, but nothing else from binaries inside the container will have symbols.
+    </p>
+    <p>
+        <strong>Why run merge inside the container:</strong> The merge component does not have access to look inside 
+        of containers when run from the host. Running merge inside the container ensures it can access the DLLs that 
+        were loaded by the container's processes. If you run merge on the host or on a different machine, those 
+        container-specific DLLs will not be accessible.
+    </p>
+
+    <h6>Step 3: Copy and Analyze (After Using /ImageIDsOnly)</h6>
+    <p>
+        After running merge with /ImageIDsOnly, copy the trace out of the container:
+    </p>
+    <ul>
+        <li>kubectl cp my-namespace/my-pod:/app/MyContainerTrace.etl.zip ./MyContainerTrace.etl.zip</li>
+    </ul>
+    <p>
+        You can now open this trace on any machine with PerfView installed. With the image identification 
+        information embedded in the trace, PerfView can download symbols from symbol servers as needed.
+    </p>
+
+    <h6>Summary of Commands</h6>
+    <p>
+        Here is the complete workflow:
+    </p>
+    <ul>
+        <li><strong>On the host:</strong> PerfView collect /EnableEventsInContainers /MaxCollectSec:30 MyContainerTrace.etl</li>
+        <li><strong>Copy to container:</strong> kubectl cp MyContainerTrace.etl.zip my-namespace/my-pod:/app/</li>
+        <li><strong>In the container:</strong> PerfViewCollect merge /ImageIDsOnly MyContainerTrace.etl.zip</li>
+        <li><strong>Copy from container:</strong> kubectl cp my-namespace/my-pod:/app/MyContainerTrace.etl.zip ./</li>
+        <li><strong>Analyze anywhere:</strong> PerfView MyContainerTrace.etl.zip</li>
+    </ul>
+    <p>
+        <strong>Note:</strong> If you analyze the trace on the host while the container is still running, you 
+        can skip the copy and merge steps entirely.
+    </p>
+
     <!--  ************************ -->
     <hr />
     <h4>