Skip to content

Commit 098a616

Browse files
Copilotbrianrob
andauthored
Add guidance for capturing ETW traces in Kubernetes pods (#2344)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: brianrob <6210322+brianrob@users.noreply.github.com> Co-authored-by: Brian Robbins <brianrob@microsoft.com>
1 parent 699501e commit 098a616

File tree

1 file changed

+125
-0
lines changed

1 file changed

+125
-0
lines changed

src/PerfView/SupportFiles/UsersGuide.htm

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6335,6 +6335,131 @@ <h5> Known issues (in Windows Version 1803 or earlier) </h5>
63356335
put them.
63366336
</p>
63376337

6338+
<!-- *************** -->
6339+
<h5><a id="ProcessIsolationContainers">Capturing ETW Traces with Process-Isolation Windows Containers (Kubernetes)</a></h5>
6340+
<p>
6341+
When running Windows containers in Kubernetes using process-isolation mode (the default mode, as opposed to Hyper-V isolation),
6342+
the containers share the host's kernel. While this enables ETW tracing from the host, it requires a specific
6343+
workflow to capture and analyze traces for processes running inside these containers.
6344+
</p>
6345+
<p>
6346+
<strong>Note:</strong> If you are running containers in Hyper-V isolation mode, these instructions are not required.
6347+
In Hyper-V mode, each container has its own kernel, so you can capture traces directly inside the container
6348+
using the normal PerfView workflow.
6349+
</p>
6350+
<p>
6351+
<strong>Important Limitation:</strong> In process-isolation mode, kernel ETW sessions cannot be started from
6352+
<em>inside</em> the container. Since PerfView almost always captures a kernel session, all trace collection
6353+
must be initiated from the host node.
6354+
</p>
6355+
6356+
<h6>Step 1: Capture a Trace on the Host Node</h6>
6357+
<p>
6358+
Start the trace collection on the Kubernetes host node (not inside the pod). Use the <strong>/EnableEventsInContainers</strong>
6359+
option to ensure that user-mode events from processes inside containers flow to the ETW session on the host. Example capture command:
6360+
</p>
6361+
<ul>
6362+
<li>PerfView collect /EnableEventsInContainers MyContainerTrace.etl</li>
6363+
</ul>
6364+
<p>
6365+
<strong>What /EnableEventsInContainers does:</strong> By default, an ETW session on the host only receives
6366+
user-mode events from processes running directly on the host. The /EnableEventsInContainers option enables
6367+
the ETW session to also receive user-mode events (such as .NET CLR events, custom EventSource events, etc.)
6368+
from processes running inside process-isolation containers.
6369+
</p>
6370+
<p>
6371+
<strong>What happens if you don't use /EnableEventsInContainers:</strong> You will still capture all kernel
6372+
events (CPU sampling, context switches, etc.) for container processes, and you will still receive user-mode
6373+
events from processes running directly on the host node (outside of containers). However, you will miss
6374+
user-mode events like .NET garbage collection events, JIT events, exception events, and any custom
6375+
EventSource events from processes inside containers.
6376+
</p>
6377+
6378+
<h6>Step 2a: Analyze While Container is Running (Optional)</h6>
6379+
<p>
6380+
If the container(s) containing the process(es) of interest are still running when you stop the trace, you
6381+
can open and analyze the trace directly on the host node. PerfView will be able to find binaries that it
6382+
needs both on the host and inside the running containers through the container's file system view. NOTE: This only works for as long
6383+
as the container is running.
6384+
</p>
6385+
<p>
6386+
This is the simplest analysis path since no additional steps are required—just open the trace in PerfView
6387+
on the host node.
6388+
</p>
6389+
6390+
<h6>Step 2b: Prepare Trace for Offline Analysis (Optional)</h6>
6391+
<p>
6392+
If you need to analyze the trace after the container has been shut down, or if you want to copy the trace
6393+
to another machine for analysis, you need to prepare the trace while the container is still accessible.
6394+
This is done using the merge command with the <strong>/ImageIDsOnly</strong> option.
6395+
</p>
6396+
<p>
6397+
First, copy the trace file into the container:
6398+
</p>
6399+
<ul>
6400+
<li>kubectl cp MyContainerTrace.etl.zip my-namespace/my-pod:/app/MyContainerTrace.etl.zip</li>
6401+
</ul>
6402+
<p>
6403+
Then, inside the container, run the merge command to inject the necessary image identification data:
6404+
</p>
6405+
<ul>
6406+
<li>PerfViewCollect merge /ImageIDsOnly MyContainerTrace.etl.zip</li>
6407+
</ul>
6408+
<p>
6409+
<strong>Note:</strong> PerfViewCollect needs to be built from source at
6410+
<a href="https://github.com/microsoft/perfview">https://github.com/microsoft/perfview</a>.
6411+
It is not currently shipped as a binary. See the "Windows Nanoserver and PerfViewCollect"
6412+
section above for build instructions.
6413+
</p>
6414+
<p>
6415+
<strong>What /ImageIDsOnly does:</strong> When you run merge with /ImageIDsOnly, PerfView reads through
6416+
the trace and for each DLL that was loaded by processes in the trace, it looks up the DLL's PDB signature
6417+
and injects that information into the trace. This unique identifier is what allows PerfView to later
6418+
download the correct PDB symbols from a symbol server. Without this information, PerfView cannot resolve
6419+
method names for code in those DLLs.
6420+
</p>
6421+
<p>
6422+
<strong>What happens if you don't run merge with /ImageIDsOnly:</strong> If you skip this step and later
6423+
try to analyze the trace on another machine after the container is gone, PerfView will be unable to find
6424+
the symbol files for DLLs that were loaded inside the container. Your stack traces will show the module
6425+
name with a question mark (for example: <code>MyAssembly!?</code> instead of <code>MyAssembly!MyClass.MyMethod</code>).
6426+
Jitted .NET code will still resolve correctly, but nothing else from binaries inside the container will have symbols.
6427+
</p>
6428+
<p>
6429+
<strong>Why run merge inside the container:</strong> The merge component does not have access to look inside
6430+
of containers when run from the host. Running merge inside the container ensures it can access the DLLs that
6431+
were loaded by the container's processes. If you run merge on the host or on a different machine, those
6432+
container-specific DLLs will not be accessible.
6433+
</p>
6434+
6435+
<h6>Step 3: Copy and Analyze (After Using /ImageIDsOnly)</h6>
6436+
<p>
6437+
After running merge with /ImageIDsOnly, copy the trace out of the container:
6438+
</p>
6439+
<ul>
6440+
<li>kubectl cp my-namespace/my-pod:/app/MyContainerTrace.etl.zip ./MyContainerTrace.etl.zip</li>
6441+
</ul>
6442+
<p>
6443+
You can now open this trace on any machine with PerfView installed. With the image identification
6444+
information embedded in the trace, PerfView can download symbols from symbol servers as needed.
6445+
</p>
6446+
6447+
<h6>Summary of Commands</h6>
6448+
<p>
6449+
Here is the complete workflow:
6450+
</p>
6451+
<ul>
6452+
<li><strong>On the host:</strong> PerfView collect /EnableEventsInContainers /MaxCollectSec:30 MyContainerTrace.etl</li>
6453+
<li><strong>Copy to container:</strong> kubectl cp MyContainerTrace.etl.zip my-namespace/my-pod:/app/</li>
6454+
<li><strong>In the container:</strong> PerfViewCollect merge /ImageIDsOnly MyContainerTrace.etl.zip</li>
6455+
<li><strong>Copy from container:</strong> kubectl cp my-namespace/my-pod:/app/MyContainerTrace.etl.zip ./</li>
6456+
<li><strong>Analyze anywhere:</strong> PerfView MyContainerTrace.etl.zip</li>
6457+
</ul>
6458+
<p>
6459+
<strong>Note:</strong> If you analyze the trace on the host while the container is still running, you
6460+
can skip the copy and merge steps entirely.
6461+
</p>
6462+
63386463
<!-- ************************ -->
63396464
<hr />
63406465
<h4>

0 commit comments

Comments
 (0)