wpa-mcp

A C# MCP server that exposes Windows ETW (.etl) trace analyzers — CPU, wait, image-load, file / disk / mmap I/O — over any MCP-compatible client (Claude Code, Claude Desktop, Codex, Cursor). Domain-neutral: works on any Windows trace; commonly used to debug app startup, slow forks, AV-induced stalls, and disk-bound regressions.

Status — PoC. 54 tools live. Windows-only (TraceEvent kernel parsers are not portable). Apache-2.0.

See it in action: a real investigation — process creation 50× slower than baseline, root-caused via wpa-mcp's tools to multiple EDR stacks colliding on PsSetCreateProcessNotifyRoutineEx. Reproduced independently by two different LLM agents on the same trace.

Quickstart

wpa-mcp Quickstart demo — load a trace, find slow processes, drill into a fork burst

Once installed (one-liner below), ask the agent in plain language and it picks the matching tools:

> Load this trace: C:\path\to\trace.etl
(load_trace; first call 30 s – 3 min as .etlx index is built; subsequent are
 instant.  Response includes a Capabilities map so you know upfront which
 keywords are present in the trace.)

> Which processes have the highest wait ratio?
(list_processes orderBy=wait_ratio — trace-resident processes auto-filtered out)

> For parent PID <X>, what was each fork's kernel-side gap?
(process_create_timing — one call gives kernel-window distribution across all
 children of one parent)

> Top wait stacks for PID <X> between <t0> and <t1>, with 20-bucket histogram
(wait_top_stacks — shows the Filter Manager / driver chain blocking the thread)

> Drill into "<frame!?>": who calls it?
(wait_caller_callee — caller / callee neighbours of the focus frame)

The same pattern works for CPU (cpu_top_functions → cpu_caller_callee), file / disk / mmap I/O, image loads, etc. Each "top" view has a matching "caller-callee" drill-down that takes a focus frame.

For an end-to-end walkthrough — symptoms, tool chain, evidence, root cause, recommendations — see docs/CASE_STUDIES.md.

Install

One-liner (no clone, no build)

PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.ps1) }"

Git Bash on Windows:

curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.sh | bash

Both routes do the same thing: download the latest GitHub Release zip (pre-built DLL), cache under %LOCALAPPDATA%\wpa-mcp\releases\<tag>\, and run the bundled setup.ps1. Auto-detects every MCP client on the machine (Claude Code / Codex / Claude Desktop) and registers wpa-mcp against each. .NET 8 runtime is auto-installed user-scope if missing. Subsequent runs are instant (cache hit).

Forward extra flags through the one-liner:

# PowerShell — pin tag, force a single client, set custom symbol path
iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.ps1) } -Tag v0.2.0 -InstallArgs @('-Client','claude-desktop','-SymbolPath','SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols')"

# Bash — flags after `bash -s --` go to install.ps1
curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.sh | bash -s -- -Tag v0.2.0

Uninstall (one-liner, symmetric)

Web-invokable, edits the same client configs in reverse. No download / cache touched.

iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/uninstall.ps1) }"

curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/uninstall.sh | bash

This removes the wpa-mcp entry from every detected MCP client. The cached release zip and symbol cache stay (delete %LOCALAPPDATA%\wpa-mcp\ and %LocalAppData%\WprMcp\Symbols\ to remove those).

Requirements

Windows 10 / 11 (TraceEvent kernel APIs are Windows-only)
.NET 8 — auto-installed user-scope by the installer if missing (uses Microsoft's official dotnet-install.ps1; no admin needed). Pass -SkipDotNetInstall to opt out.
For symbol resolution: _NT_SYMBOL_PATH set, or use the symbol tools at runtime (see Configuration → Symbols).

Install from a clone (developers)

git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
.\scripts\setup.ps1

git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
./scripts/setup.sh

Builds (Release) and registers wpa-mcp with every detected MCP client. Idempotent — re-run to update.

Common flags:

.\scripts\setup.ps1 -Client claude-desktop                    # force a specific client
.\scripts\setup.ps1 -SymbolPath "SRV*C:\Symbols*https://..." # custom _NT_SYMBOL_PATH
.\scripts\setup.ps1 -SkipBuild                                # use existing DLL

Uninstall from clone (also -CleanBuild to wipe bin/ obj/):

.\scripts\uninstall.ps1
.\scripts\uninstall.ps1 -CleanBuild

./scripts/uninstall.sh
./scripts/uninstall.sh -CleanBuild

Install manually (custom JSON / non-standard MCP client)

Build:

git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
dotnet build -c Release
# DLL: src\WprMcp\bin\Release\net8.0\WprMcp.dll

Smoke-check:

dotnet src\WprMcp\bin\Release\net8.0\WprMcp.dll --version    # prints "WprMcp 0.1.0-poc"
dotnet test                                                   # runs the xUnit suite (needs fixtures, see CONTRIBUTING.md)

Then register with your MCP client. The DLL path must be absolute.

Claude Code — per-project (<project>/.mcp.json) or global (~/.claude.json):

{
  "mcpServers": {
    "wpa-mcp": {
      "command": "dotnet",
      "args": ["C:/Users/me/Dev/wpa-mcp/src/WprMcp/bin/Release/net8.0/WprMcp.dll"],
      "env": {
        "_NT_SYMBOL_PATH": "SRV*C:\\Symbols*https://msdl.microsoft.com/download/symbols",
        "WPRMCP_CACHE_SIZE": "2"
      }
    }
  }
}

Or via the CLI helper:

claude mcp add wpa-mcp --scope user -- dotnet C:/Users/me/Dev/wpa-mcp/src/WprMcp/bin/Release/net8.0/WprMcp.dll

(Add -e _NT_SYMBOL_PATH=... for env vars.)

Claude Desktop — %APPDATA%\Claude\claude_desktop_config.json, same shape as above.

Codex / Cursor / other MCP-compatible clients — the server speaks stdio MCP; any client that accepts a command + args config works. Use the same JSON snippet.

Verify — after restart, the client exposes the tools as mcp__wpa-mcp__load_trace, etc. First call to load_trace on a fresh .etl takes 30 s – 3 min while the .etlx index is built (logged to stderr).

Tools

54 tools across 15 groups. All built on the same Microsoft.Diagnostics.Tracing.TraceEvent library PerfView uses, so the underlying analysis quality is identical — what changes is the surface (stdio MCP + JSON instead of a Windows GUI) and the addition of composite tools that package multi-step PerfView workflows into one call.

What wpa-mcp adds vs PerfView

Agent-driven, not UI-driven. PerfView is a Windows GUI you click through; wpa-mcp is a stdio MCP server you talk to in plain language. Same data, no UI fatigue, easy to compose into a CI / regression script.
Composite tools. diagnose_slow_startup, process_create_timing, image_load_top_gaps package multi-step PerfView workflows into one call.
Capabilities-aware. Every tool's "won't return data" state maps to a single keyword bit in load_trace's Capabilities map — no more "why is this view empty" detective work in PerfView.
Per-trace symbol recommendations. load_trace inspects modules in the trace and recommends which symbol servers to add. PerfView leaves symbol setup to the user.

Pattern

Always call load_trace first. It opens the .etl, builds (or reuses) the .etlx index, and returns a Capabilities map — a per-keyword presence check (HasCpuSamples, HasCSwitch, HasFileIo, HasDiskIo, HasImageLoad, HasHardFaults, HasStackWalks, HasVirtualAlloc, HasNetIo, HasRegistry, HasReadyThread, HasInterrupt, HasAlpc, HasThreadEvents, HasClrGc, HasClrJit, HasClrAlloc, HasClrException, HasClrContention, HasNtHeap). Every other tool's behaviour depends on those keywords.

Most groups follow the same three-tool shape: a summary (top-N flat rows), a stacks view (top-N call stacks weighted by the metric), and a caller-callee drill-down (given a focus frame, returns its caller / callee neighbours weighted by the same metric — same shape as PerfView's "Callers" / "Callees" tabs).

In the tables below, "PerfView equivalent" is the matching view in PerfView's GUI; entries tagged [Composite] combine multiple PerfView views into one call, [Manual filter] use raw events that PerfView's Events view exposes but doesn't pre-aggregate, and [Programmatic] replace a GUI dialog with structured JSON. The other ~45 tools are 1:1 mappings of PerfView views.

CPU stacks

Tool	What it does	PerfView equivalent
`cpu_top_functions`	Top-N hot functions by exclusive CPU samples in a window / for a PID. Optional `excludeEtwSelfOverhead` folds `EtwpLogKernelEvent` etc. into a single `[ETW Overhead]` bucket.	CPU Stacks → ByName
`cpu_top_functions_batch`	Same as above for multiple PIDs in a single trace load. Each PID gets an independent CallTree (its inclusive-% column normalises to that PID's samples).	[Composite] — batch variant, saves N round-trips through CPU Stacks → ByName
`cpu_caller_callee`	Drill into a focus frame: callers (frames calling INTO it) and callees (frames it calls OUT to), each ranked by inclusive CPU samples. Recursion-safe.	CPU Stacks → Callers / Callees tabs

Wait / blocked time (CSwitch-derived)

Requires the CSwitch kernel keyword (default WPR CPU profiles include it).

Tool	What it does	PerfView equivalent
`wait_analysis`	Per-thread blocked time + dominant wait reasons. The canonical answer to "why was this slow?" when CPU is low. Reasons like `WrFilterContext` (blocked in a Filter Manager minifilter callback) directly identify the kernel state.	Thread Time → blocked-time per thread
`wait_top_stacks`	Top-N call stacks ranked by blocked μs, built from the resume-point stack walk on each `ThreadCSwitch` event. Answers "where in the code is the wait happening" (vs `wait_analysis` which answers "which thread / which reason").	Thread Time / Wait Time → BlockedTime metric (`ThreadTimeStackComputer`)
`wait_caller_callee`	Drill into a focus frame; metric is blocked μs.	Thread Time → Callers / Callees tabs

Image / DLL load

Tool	What it does	PerfView equivalent
`image_load_timing`	Per-process chronological list of every `ImageLoad` event with offset from `ProcessStart`. Spot late-loading DLLs or per-load minifilter / sig-scan delays between loads.	[Manual filter] — Events view, filter on `ImageLoad`, compute offsets by hand
`image_load_top_gaps`	Top-N largest gaps between consecutive image loads. Pairs with the chronological view; same data, ranked by gap. Response also carries `FirstLoadOffsetUs` (kernel-side fork tax before any DLL loads).	[Manual filter] — same `ImageLoad` filter as above, sort by inter-event delta
`image_load_top_stacks`	Top-N call stacks ranked by `ImageLoad` event count. Distinguishes eager loads (`LoadLibraryEx` in a main initialiser) from lazy / cascading loads (`CoCreateInstance`, `AmsiOpenSession`, EDR-injected providers).	Image Load Stacks
`image_load_caller_callee`	Drill into a focus frame; metric is image-load count.	Image Load Stacks → Callers / Callees tabs

File / disk / mmap I/O

The three layers cover different parts of the I/O stack — diff them to localise where time actually goes.

Tool	What it does	PerfView equivalent
`file_io_top_files`	Top-N files by total `read + write` bytes.	File I/O view → ByFile
`file_io_top_stacks`	Top-N stacks by file-IO bytes. Captures all syscalls including cache-served reads — diff with `disk_io_top_stacks` to find cache hits. Requires the `FileIO` keyword (default `CPU.light` omits it).	File I/O Stacks
`file_io_caller_callee`	Drill on a focus frame; metric is file-IO bytes.	File I/O Stacks → Callers / Callees tabs
`disk_io_top_stacks`	Top-N stacks by physical disk-IO bytes — only events that hit physical media (no cache). Requires the `DiskIO` keyword.	Disk I/O Stacks
`disk_io_caller_callee`	Drill on a focus frame; metric is physical disk bytes.	Disk I/O Stacks → Callers / Callees tabs
`hard_fault_by_file`	Top-N files by hard page-in bytes. Most hard faults are mmap'd files being touched for the first time (DLLs, data files, network-share content); some also come from paged-out heap/stack pages and the page file. Identifies which file caused the page-in load. Requires the `HardFaults` keyword (NOT in default WPR profiles — see `docs/WPR_PROFILE.md`).	Memory Hard Fault → ByFile
`hard_fault_top_stacks`	Top-N stacks by hard-fault page-in bytes. Distinguishes eager loader-driven page-in from lazy / scanner-induced page-in.	Memory Hard Fault Stacks
`hard_fault_caller_callee`	Drill on a focus frame; metric is page-in bytes.	Memory Hard Fault Stacks → Callers / Callees tabs

Virtual memory

Tool	What it does	PerfView equivalent
`virtual_alloc_top_stacks`	Top-N stacks by `VirtualMemAlloc` + `VirtualMemFree` bytes. Distinct from physical residence (`hard_fault_*`) — answers "who's reserving 4 GB of address space" / "who's leaking VirtualAllocs". Each row carries both `Bytes` and `OpCount`. Requires the `VirtualAlloc` kernel keyword (NOT in default WPR `CPU` profiles).	VirtualAlloc Stacks
`virtual_alloc_caller_callee`	Drill on a focus frame; metric is virtual-memory bytes.	VirtualAlloc Stacks → Callers / Callees tabs
`heap_alloc_top_stacks`	Top-N stacks by NT-heap allocation bytes (`RtlAllocateHeap` / `HeapAlloc` / `malloc` / `new` — anything that lands in the user-mode heap). Native-leak finder. Distinct from VirtualAlloc: VirtualAlloc reserves page-granular address space, the heap allocator sub-allocates from it. Splits `AllocBytes` / `ReallocBytes`. Free events carry no size on the wire and are not counted. Requires the `Heap` provider enabled per-process (default WPR profiles do NOT enable it; use PerfView's `/HeapTrace` flag or a custom `.wprp` `<Heap>` element).	HeapAllocStacks
`heap_alloc_caller_callee`	Drill on a focus frame; metric is NT-heap bytes.	HeapAllocStacks → Callers / Callees tabs

Network I/O

Tool	What it does	PerfView equivalent
`net_top_stacks`	Top-N stacks by network bytes — TCP + UDP, IPv4 + IPv6 send/recv merged. Splits `TcpBytes` / `UdpBytes` in the response. Pairs well with `wait_analysis` for "high wall, low CPU" cases where the wait is on a network round-trip. `Connect` / `Accept` / `Disconnect` events have no byte metric — use `find_marker` for those. Requires the `NetworkTrace` keyword (NOT in default `CPU` profiles).	TCP/IP Stacks + UDP/IP Stacks (merged)
`net_caller_callee`	Drill on a focus frame; metric is network bytes.	TCP/IP Stacks → Callers / Callees tabs
`net_connections`	Per-connection lifecycle list — Connect/Accept paired with Disconnect/Reconnect by `connid` to give "connection X opened at T1, closed at T2, lasted T2−T1". Useful for "connect-to-disconnect latency outliers" / "is RPC slow because of connection setup". IPv4 + IPv6 merged with an `IsIPv6` flag. Connections still open at trace end have `TraceResidentEnd=true`.	[Manual filter] — Events view, pair `TcpIp/Connect` with `TcpIp/Disconnect` by `connid` by hand

Registry

Tool	What it does	PerfView equivalent
`registry_top_stacks`	Top-N stacks by registry-operation count (Query / Open / Create / SetValue / EnumerateKey / etc.). Useful for "who's pounding the registry on every hot-path call". Metric is op count (no natural byte cost for registry). Requires the `Registry` keyword (NOT in default `CPU` profiles).	Registry Stacks
`registry_caller_callee`	Drill on a focus frame; metric is registry op count.	Registry Stacks → Callers / Callees tabs

ReadyThread (causality)

Tool	What it does	PerfView equivalent
`ready_thread_top_stacks`	Top-N readier stacks (the code that did the `SetEvent` / lock release / IOCP completion that woke a blocked thread). Pair with `wait_analysis`: that one says "thread X blocked on Y for Z μs" — this one closes the loop with "and here's who finally unblocked it". Filter `awakenedPid` to focus on "who readied threads in this PID". Requires `CSwitch` / `ReadyThread` keywords (in default kernel profiles).	ReadyThread Stacks
`ready_thread_caller_callee`	Drill on a focus frame; metric is ready-event count.	ReadyThread Stacks → Callers / Callees tabs

Interrupts (DPC / ISR)

Tool	What it does	PerfView equivalent
`interrupt_top_stacks`	Top-N stacks by kernel interrupt time (DPC + ISR microseconds). Surfaces hot driver routines burning CPU at high IRQL — frequent offenders are consumer-grade GPU drivers, network drivers under load, AV mini-filter callbacks. On a healthy system this should show <5% of trace CPU. Splits `DpcUs` / `IsrUs`. Requires `Interrupt` + `DPC` keywords (default `CPU` profiles enable both).	DPC/ISR Stacks
`interrupt_caller_callee`	Drill on a focus frame; metric is interrupt μs.	DPC/ISR Stacks → Callers / Callees tabs

ALPC (cross-process IPC)

Tool	What it does	PerfView equivalent
`alpc_top_stacks`	Top-N stacks by ALPC message count (Send + Receive). ALPC is the kernel IPC primitive used by RPC, COM, AppContainer broker calls, lsass, the SCM, and most of the Windows service surface — useful for "is this slow because of an LPC round-trip" / "which call chain is doing all the cross-process IPC". Requires the `ALPC` keyword (NOT in default `CPU` profiles).	ALPC Stacks
`alpc_caller_callee`	Drill on a focus frame; metric is ALPC message count.	ALPC Stacks → Callers / Callees tabs

CLR (.NET runtime)

Requires the Microsoft-Windows-DotNETRuntime ETW provider in the capture profile (WPR .wprp files need an explicit <EventCollectorId> for it).

Tool	What it does	PerfView equivalent
`clr_gc_analysis`	Per-GC list with wall duration AND stop-the-world pause time. `GCStart`→`GCStop` brackets the wall interval; `GCSuspendEEStart`→`GCRestartEEStop` is the actual mutator pause (matters for background / concurrent GC, where the wall covers far more than the pause). Reports per-row `Generation` / `Reason` / `PauseUs` plus aggregate `TotalGcCount` / `Gen0Count` / `Gen1Count` / `Gen2Count` / `TotalPauseUs`.	GCStats
`clr_jit_analysis`	Top-N methods by JIT compilation duration. Matches `MethodJittingStarted`→`MethodLoadVerbose` on `(PID, MethodID)`. R2R / NGen / pre-jitted methods don't fire `JittingStarted`, so they're invisible — which is correct for "what's the JIT cost in this trace".	JIT Stats
`clr_alloc_top_stacks`	Top-N stacks by managed-heap allocation bytes, driven by `GCAllocationTick` events (one per ~100 KB allocated per `(heap, generation, type)` — sampled, low-overhead, on every CLR ≥ 4.0). Response includes `TopTypes` (top type names by total bytes). The canonical "who's allocating all the strings on the request hot path" tool. Requires the `GC` keyword.	GC Heap Alloc Stacks
`clr_alloc_caller_callee`	Drill on a focus frame; metric is allocation bytes.	GC Heap Alloc Stacks → Callers / Callees tabs
`clr_exception_top_stacks`	Top-N stacks by .NET exception throw count (`ExceptionStart` events). Useful for "is this code path throwing 1000 exceptions per second" / "where is `FormatException` being swallowed in a retry loop". Response includes `TopTypes` (top exception type names by count). Requires the `Exception` keyword.	Exceptions Stacks
`clr_exception_caller_callee`	Drill on a focus frame; metric is exception count.	Exceptions Stacks → Callers / Callees tabs
`clr_contention_top_stacks`	Top-N stacks by managed-monitor blocked μs — `lock` / `Monitor.Enter` waits. Matches `ContentionStart`→`ContentionStop` by `ThreadID`. Filters to `ContentionFlags.Managed` (native lock contention from the same provider is excluded). The canonical lock-hotspot tool for managed code. Requires the `Contention` keyword.	Monitor Contention Stacks
`clr_contention_caller_callee`	Drill on a focus frame; metric is blocked μs.	Monitor Contention Stacks → Callers / Callees tabs
`clr_gc_heap_stats`	Managed-heap snapshot timeline — one row per `GCHeapStats` event (CLR fires it at the end of each GC) with `TotalHeapBytes`, `Gen0/1/2/LOH/POH` sizes, `PinnedObjectCount`, `GcHandleCount`. Use to answer "is the heap leaking" / "are pinned objects climbing" without orchestrating multiple calls. Pairs with `clr_gc_analysis`.	GCStats per-GC snapshot table
`clr_finalizer_analysis`	Top types finalized + finalizer-thread pause batches. Aggregates `GCFinalizeObject` events by `TypeName` for the TopTypes table and pairs `GCFinalizersStart`→`GCFinalizersStop` for the per-batch list (each carries the count of finalizers run). Useful for "why are GCs slow" (finalizer queue can hold up the next GC) and "what's allocating finalizable objects".	[Composite] — GCStats fields + Events view filtering combined into one call

Markers / generic ETW events

Tool	What it does	PerfView equivalent
`find_marker`	Search all ETW events whose name or task contains a substring. Default mode `count_by_event` returns a histogram (avoids token blow-up); also `count_by_process` and `rows` (full event detail). Useful for surfacing first-party Defender / EDR provider telemetry — e.g., the `Microsoft-Antimalware-AMFilter` provider's `AMFilter_FileScan` rows directly show what the scanner is doing.	Events view
`generic_event_top_stacks`	Top-N stacks by event count for any user-mode ETW provider — AspNetCore, Kestrel, EFCore, Antimalware-AMFilter, Sense (Defender for Endpoint), `Microsoft-Windows-DxgKrnl` (GPU), `Microsoft-Windows-Kernel-Power` (CPU frequency / C-state), or any custom EventSource. Use `find_marker` first to identify which providers are in the trace, then plug the exact `ProviderName` here. Optional `eventNameSubstring` narrows to a specific event class. Stack quality depends on whether stack-walks were enabled for the provider in the `.wprp`.	Any Stacks (single-provider)
`generic_event_caller_callee`	Drill on a focus frame; metric is event count.	Any Stacks → Callers / Callees tabs

Composite diagnostics

Tool	What it does	PerfView equivalent
`diagnose_slow_startup`	Picks slowest-by-wait-ratio processes (or matches `nameSubstring`), then runs `wait_analysis` + `image_load_timing` + `cpu_top_functions` for each in the startup window — one call instead of orchestrating four.	[Composite] — wraps four PerfView views in one call

Symbols

Tool	What it does	PerfView equivalent
`set_symbol_path`	Sets `_NT_SYMBOL_PATH` for the running server (replaces or appends).	File → Set Symbol Path…
`add_symbol_server`	Appends a symbol server URL with optional local cache (defaults to `%LocalAppData%\WprMcp\Symbols`).	File → Set Symbol Path… (single entry)
`diagnose_symbols`	Reports per-module symbol status for a loaded trace and suggests fixes (which servers to add) for unresolved modules.	[Programmatic] — replaces Modules tab + Set Symbol Path dialog with structured JSON + auto-recommendations

Configuration

Trace cache

LRU, default capacity 2 traces. Override with WPRMCP_CACHE_SIZE=N. First load builds .etlx (slow); cached calls are instant. Capabilities and TraceLog are both cached per (path, mtime) — re-loading the same .etl is free.

Capturing your own traces

See docs/WPR_PROFILE.md for a recommended .wprp that captures CPU + CSwitch + FileIO + DiskIO + HardFaults + Loader stacks. Quick canonical capture:

wpr.exe -start tests\WprMcp.Tests\fixtures\MmapCapture.wprp -filemode
# … reproduce the slow case …
wpr.exe -stop C:\path\to\my_capture.etl

Symbols

If cpu_top_functions shows module!? everywhere and Stats.ResolutionRate < 0.8, your symbols are not working. This is the single biggest source of "garbage output".

Where to set the path

_NT_SYMBOL_PATH accepts semicolon-separated entries: SRV*<cache>*<url> for symbol servers, bare folder paths for local PDBs, mix and match. Three setup paths (any one suffices — they all set the same env var):

Pre-launch env var (cleanest, survives restarts):

[Environment]::SetEnvironmentVariable("_NT_SYMBOL_PATH",
    "SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols", "User")

Per-MCP-server env block in the config JSON (see manual install above). Easiest to share between teammates.
Runtime via tool calls — ask the agent: "set the symbol path to SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols, then run diagnose_symbols on this trace."

Symbol cache defaults to %LocalAppData%\WprMcp\Symbols (separate from PerfView's C:\Symbols to avoid PDB-lock contention). Per-trace recommendations come back inside load_trace's SymbolStatus.Recommendations field, telling you which servers to add for the modules actually present in this trace.

Beyond Microsoft modules

The auto-recommendation in load_trace only knows the public servers it has patterns for (Microsoft, Chromium). For your own DLLs, third-party SDKs, or internal builds, append entries explicitly — common shapes:

What you have	Entry to append
Internal team symbol server	`SRVC:\Symbolshttps://internal-symsrv.example.com/symbols`
Team shared drop on a UNC share	`SRVC:\Symbols\\fileserver\symbols`
Local dev build output (your own PDBs)	`C:\src\myapp\out\Default` (bare folder, no `SRV*`)

Order matters — entries are tried left-to-right, first signature match wins. Put the local dev folder first when iterating on a build so your fresh PDB beats the public one.

Build prerequisites for your own DLLs

A symbol server doesn't help if the build never produced a PDB, or if PDB and deployed DLL are from different builds.

.NET / C#: <DebugType>portable</DebugType> + <DebugSymbols>true</DebugSymbols>. Check that Release configurations don't disable PDB output.
C++ (MSVC): /Zi + /DEBUG:FULL, even in Release. Keep PDB next to DLL.
PDB and DLL must share the same signature (GUID + age) — re-link → new signature → old PDB no longer resolves.

Verifying it worked

> load_trace C:\my\trace.etl
> diagnose_symbols C:\my\trace.etl
> cpu_top_functions C:\my\trace.etl

diagnose_symbols lists per-module status with hints for unresolved ones; cpu_top_functions's Stats.ResolutionRate should be ≥ 0.8 for actionable output. After changing the symbol path mid-session, unload_trace + load_trace to force re-resolution — LookupWarmSymbols is cached per loaded trace.

For full recipes (UNC paths, private vendors, Chromium-family browsers, cache management, troubleshooting), see docs/SYMBOL_RECIPES.md (中文). Architecture overview and contribution invariants live in docs/ARCHITECTURE.md and CONTRIBUTING.md.

Tool	What it does	PerfView equivalent
`load_trace`	Opens / caches a `.etl`. Returns trace metadata, the `Capabilities` keyword presence map, and per-trace symbol-server recommendations. First call 30 s – 3 min while `.etlx` builds; subsequent are instant.	Open a trace file (no `Capabilities` equivalent)
`list_processes`	Lists processes (sortable by `cpu` / `wall` / `wait_ratio`). `WaitRatio = WallUs / CpuUs` surfaces "high wall, low CPU" processes (blocked on minifilter / IPC / etc.). PID 0 (Idle) and PID 4 (System) hidden by default.	Processes view
`process_create_timing`	Per-fork timing for a parent PID. `FirstImageLoadOffsetUs` = the kernel-side window between `ProcessStart` and the first DLL load — exactly where AV / EDR process-create callbacks burn time invisibly. Median / p95 / max aggregates across all children.	[Composite] — Processes + Events + Excel; see `docs/CASE_STUDIES.md`
`thread_lifetime`	Per-PID chronological thread lifecycle: every `ThreadStart` / `ThreadStop` with `StartTimeUs`, `EndTimeUs`, `LifetimeUs`, and `PeakConcurrentThreads`. Catches thread-pool thrash and fork-bomb patterns. `TraceResidentStart/End` flags threads bounded by trace capture rather than real spawn / exit.	[Manual filter] — Events view, filter on `Thread/Start` + `Thread/Stop`, pair by hand

wpa-mcp

wpa-mcp

Quickstart

Install

One-liner (no clone, no build)

Uninstall (one-liner, symmetric)

Requirements

Tools

What wpa-mcp adds vs PerfView

Pattern

Meta

CPU stacks

Wait / blocked time (CSwitch-derived)

Image / DLL load

File / disk / mmap I/O

Virtual memory

Network I/O

Registry

ReadyThread (causality)

Interrupts (DPC / ISR)

ALPC (cross-process IPC)

CLR (.NET runtime)

Markers / generic ETW events

Composite diagnostics

Symbols

Configuration

Trace cache

Capturing your own traces

Symbols

Where to set the path

Beyond Microsoft modules

Build prerequisites for your own DLLs

Verifying it worked

Reviews (0)