From 5886d7772751758292176ffdaa4e383ff086c173 Mon Sep 17 00:00:00 2001 From: "alejandro.gonzalez" Date: Mon, 25 May 2026 12:53:46 +0200 Subject: [PATCH] docs(appsec): add AppSec developer reference docs and module AGENTS.md files Adds docs/appsec/ with six reference documents covering blocking patterns, WAF API, IG event wiring, file content reading, RASP CSI, and multipart framework invariants. Adds lean AGENTS.md files in netty-4.1, tomcat-appsec, akka-http, jetty-appsec, vertx-web, undertow, and the appsec module pointing to the extended docs. --- AGENTS.md | 1 + dd-java-agent/appsec/AGENTS.md | 58 ++++ .../instrumentation/akka/akka-http/AGENTS.md | 27 ++ .../jetty/jetty-appsec/AGENTS.md | 24 ++ .../instrumentation/netty/netty-4.1/AGENTS.md | 57 ++++ .../tomcat/tomcat-appsec/AGENTS.md | 48 +++ .../instrumentation/undertow/AGENTS.md | 37 +++ .../instrumentation/vertx/vertx-web/AGENTS.md | 31 ++ docs/appsec/blocking-patterns.md | 239 +++++++++++++++ docs/appsec/file-content.md | 131 ++++++++ docs/appsec/ig-events.md | 184 +++++++++++ docs/appsec/multipart-frameworks.md | 168 ++++++++++ docs/appsec/rasp-file-io.md | 74 +++++ docs/appsec/waf-api.md | 286 ++++++++++++++++++ 14 files changed, 1365 insertions(+) create mode 100644 dd-java-agent/appsec/AGENTS.md create mode 100644 dd-java-agent/instrumentation/akka/akka-http/AGENTS.md create mode 100644 dd-java-agent/instrumentation/jetty/jetty-appsec/AGENTS.md create mode 100644 dd-java-agent/instrumentation/netty/netty-4.1/AGENTS.md create mode 100644 dd-java-agent/instrumentation/tomcat/tomcat-appsec/AGENTS.md create mode 100644 dd-java-agent/instrumentation/undertow/AGENTS.md create mode 100644 dd-java-agent/instrumentation/vertx/vertx-web/AGENTS.md create mode 100644 docs/appsec/blocking-patterns.md create mode 100644 docs/appsec/file-content.md create mode 100644 docs/appsec/ig-events.md create mode 100644 docs/appsec/multipart-frameworks.md create mode 100644 docs/appsec/rasp-file-io.md create mode 100644 docs/appsec/waf-api.md diff --git a/AGENTS.md b/AGENTS.md index e09dff0d473..4cbdb6d953f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -39,6 +39,7 @@ docs/ Developer documentation (see below) | Working with Gradle | [docs/how_to_work_with_gradle.md](docs/how_to_work_with_gradle.md) | | Bootstrap/premain constraints | [docs/bootstrap_design_guidelines.md](docs/bootstrap_design_guidelines.md) | | CI/CD workflows | [.github/workflows/README.md](.github/workflows/README.md) | +| AppSec: blocking, WAF API, IG events | [docs/appsec/](docs/appsec/) | **When working on a topic above, read the linked file first** — they are the source of truth maintained by humans. diff --git a/dd-java-agent/appsec/AGENTS.md b/dd-java-agent/appsec/AGENTS.md new file mode 100644 index 00000000000..3a9dc970f95 --- /dev/null +++ b/dd-java-agent/appsec/AGENTS.md @@ -0,0 +1,58 @@ +# AppSec Module + +Extended reference: [docs/appsec/](../../docs/appsec/) + +## Adding a new WAF address: 4-file checklist + +Every new WAF address requires changes in exactly 4 files. Missing any one causes a silent no-op: + +1. `Events.java` -- declare the event type with a unique sequential ID +2. `InstrumentationGateway.java` -- register the callback type in `getCallback()` +3. `KnownAddresses.java` -- declare the `Address` constant and add to `fromString()` switch +4. `GatewayBridge.java` -- subscriber field, `init()` registration, `reset()` null, handler method, `DATA_DEPENDENCIES` entry + +See [docs/appsec/ig-events.md](../../docs/appsec/ig-events.md) for the full handler pattern including the retry-on-expiry loop. + +## `KnownAddressesSpecificationForkedTest` instance count + +Every new `Address` declared in `KnownAddresses.java` must be accompanied by incrementing the +expected count in `KnownAddressesSpecificationForkedTest`: + +```groovy +Address.instanceCount() == 47 // increment by 1 per new Address +``` + +The test fails if this is not updated. + +## WAF API quick reference + +- `server.response.status` must be `Integer`, not `String` (changed in libddwaf v1.28.0) +- gRPC messages must use `runEphemeral()`, not `run()` -- persistent data caches only the first message +- Processor outputs (`_dd.appsec.s.req.body`, fingerprints) come from `ResultWithData.attributes` -- never pass them as input addresses +- `server.request.headers.no_cookies` must not include cookies -- cookies go in `server.request.cookies` + +Full type mapping and limits: [docs/appsec/waf-api.md](../../docs/appsec/waf-api.md) + +## Callback guard ordering in advice + +Fetch ALL callbacks before any early-return guard. A callback fetched after a conditional return +is silently skipped when only that callback is registered: + +```java +// Correct +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +BiFunction<...> filenamesCallback = cbp.getCallback(EVENTS.requestFilesFilenames()); +BiFunction<...> contentCallback = cbp.getCallback(EVENTS.requestFilesContent()); +if (bodyCallback == null && filenamesCallback == null && contentCallback == null) return; +``` + +## Blocking model per framework + +| Framework | Call `effectivelyBlocked()`? | Notes | +|---|---|---| +| Netty | NEVER | `BlockingResponseHandler` does it internally | +| Tomcat / Jersey / Jetty | YES, after `t = new BlockingException(...)` | Inside `if (brf != null)` | +| Undertow | YES, but `tryCommitBlockingResponse` is not idempotent | Guard `t == null` before each call | +| Vert.x | NO | Handler-based model | + +Full details: [docs/appsec/blocking-patterns.md](../../docs/appsec/blocking-patterns.md) diff --git a/dd-java-agent/instrumentation/akka/akka-http/AGENTS.md b/dd-java-agent/instrumentation/akka/akka-http/AGENTS.md new file mode 100644 index 00000000000..ae954cd8b62 --- /dev/null +++ b/dd-java-agent/instrumentation/akka/akka-http/AGENTS.md @@ -0,0 +1,27 @@ +# Akka HTTP AppSec Instrumentation + +Extended reference: [docs/appsec/ig-events.md](../../../../docs/appsec/ig-events.md), +[docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md) + +## Two multipart routes must both be instrumented + +Akka HTTP has two independent entry points for multipart form data. Any new WAF address that +captures multipart body data must instrument both: + +| Route | Entry point | Notes | +|---|---|---| +| Route 1 | `handleMultipartStrictFormData(Multipart$FormData$Strict)` | Has `reqCtx` as local variable; iterates via `getStrictParts()` | +| Route 2 | `handleStrictFormData(StrictForm)` | No `reqCtx` in scope; must obtain via `activeSpan()` | + +If only Route 1 is instrumented, multipart requests processed via `formFieldMultiMap` (Route 2) +silently miss the WAF event. + +## Do not extract filenames callback dispatch into `UnmarshallerHelpers` + +Extracting the `requestFilesFilenames` callback dispatch into a shared helper method in +`UnmarshallerHelpers` is known to cause problems. Keep dispatch inline in each advice class. + +## No `effectivelyBlocked()` in Akka advice + +Akka HTTP uses the Netty-style blocking model. Do not call `effectivelyBlocked()` in advice. +See [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md#netty-never-effectivelyblocked). diff --git a/dd-java-agent/instrumentation/jetty/jetty-appsec/AGENTS.md b/dd-java-agent/instrumentation/jetty/jetty-appsec/AGENTS.md new file mode 100644 index 00000000000..2bf77895d43 --- /dev/null +++ b/dd-java-agent/instrumentation/jetty/jetty-appsec/AGENTS.md @@ -0,0 +1,24 @@ +# Jetty AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md), +[docs/appsec/multipart-frameworks.md](../../../../docs/appsec/multipart-frameworks.md) + +## Jetty 8.x: no `getSubmittedFileName()`, manual parsing required + +`Part.getSubmittedFileName()` (Servlet 3.1) is not available in Jetty 8.x (Servlet 3.0). Filenames +must be extracted by manually parsing the `Content-Disposition` header from each `Part`. + +Exit advice on `getParts()` is the correct instrumentation point for Jetty 8.x. There is no +`parseParts()` equivalent to intercept. + +## Jetty 9.4/10: muzzle discriminator via `_dispatcherType` + +The field `_dispatcherType: Ljavax/servlet/DispatcherType;` distinguishes Jetty 9.4/10.x +(javax namespace) from Jetty 11+ (jakarta namespace). Use this field as the muzzle reference +rather than trying to match on API version strings alone. + +## Blocking: Jetty uses Servlet container model + +Jetty follows the Servlet blocking pattern. `effectivelyBlocked()` must be called explicitly +after creating `BlockingException`, inside `if (brf != null)`. See +[docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md#servlet-containers-tomcat-jersey-resteasy). diff --git a/dd-java-agent/instrumentation/netty/netty-4.1/AGENTS.md b/dd-java-agent/instrumentation/netty/netty-4.1/AGENTS.md new file mode 100644 index 00000000000..0aa0ac5a14a --- /dev/null +++ b/dd-java-agent/instrumentation/netty/netty-4.1/AGENTS.md @@ -0,0 +1,57 @@ +# Netty 4.1 AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md) + +## NEVER call `effectivelyBlocked()` in Netty advice + +`effectivelyBlocked()` must not appear in any Netty advice class or helper. `BlockingResponseHandler` calls it internally when the blocking response is committed. Calling it in advice causes a double-invocation and closes the span prematurely. + +This applies even to the urlencoded body-processed path in `HttpPostRequestDecoderInstrumentation` -- the span is already closed synchronously by `tryCommitBlockingResponse()` before advice can reach the mark. + +Correct pattern: +```java +// Netty advice — blocking path +if (brf != null) { + brf.tryCommitBlockingResponse(ctx.getTraceSegment(), rba); + thr = new BlockingException("..."); + // do NOT call ctx.getTraceSegment().effectivelyBlocked() +} +``` + +## `thr` creation must be inside `if (brf != null)` + +If `thr = new BlockingException(...)` is placed outside the `brf != null` guard, it is created even when no blocking action exists. The exception is then thrown unconditionally in `@Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class)`, producing spurious 500 responses. + +## FileUpload content: two-branch read + +`HttpData` stores uploads either in memory or on disk depending on size. Always branch on `isInMemory()`: + +```java +if (fileUpload.isInMemory()) { + ByteBuf buf = fileUpload.getByteBuf(); + // read from buf +} else { + File f = fileUpload.getFile(); + try (InputStream is = new FileInputStream(f)) { + // read from stream + } +} +``` + +Never call `getFile()` on an in-memory upload -- the file does not exist on disk and the call throws. + +## `@RequiresRequestContext` + `Config.get()` causes muzzle failure + +Do not declare `static final` fields initialized from `Config.get()` in any `@RequiresRequestContext`-annotated advice inner class. Muzzle validates those classes against the instrumented library's classpath (Netty), where `Config` is absent, causing `MuzzleValidationException` in CI. + +Move such constants to the helper class declared in `helperClassNames()`: + +```java +// Wrong -- in @RequiresRequestContext advice inner class +private static final int MAX_FILES = Config.get().getAppSecMaxFileContentCount(); + +// Correct -- in NettyMultipartHelper or similar helper +public class NettyMultipartHelper { + static final int MAX_FILES_TO_INSPECT = Config.get().getAppSecMaxFileContentCount(); +} +``` diff --git a/dd-java-agent/instrumentation/tomcat/tomcat-appsec/AGENTS.md b/dd-java-agent/instrumentation/tomcat/tomcat-appsec/AGENTS.md new file mode 100644 index 00000000000..390630acaa9 --- /dev/null +++ b/dd-java-agent/instrumentation/tomcat/tomcat-appsec/AGENTS.md @@ -0,0 +1,48 @@ +# Tomcat AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md), +[docs/appsec/file-content.md](../../../../docs/appsec/file-content.md), +[docs/appsec/multipart-frameworks.md](../../../../docs/appsec/multipart-frameworks.md) + +## MUST call `effectivelyBlocked()` in Tomcat blocking path + +Unlike Netty, Tomcat advice must call `effectivelyBlocked()` explicitly. The call must come AFTER +`t = new BlockingException(...)` and both must be inside `if (brf != null)`: + +```java +if (brf != null) { + brf.tryCommitBlockingResponse(ctx.getTraceSegment(), rba); + t = new BlockingException("..."); // 1. create first + ctx.getTraceSegment().effectivelyBlocked(); // 2. mark after +} +``` + +If `effectivelyBlocked()` throws (span already finished), the exception object must already exist +so the advice can still return a non-null value and let the container propagate it. + +## `Part` access via reflection: GlassFish restriction does NOT apply here + +Tomcat's `ParameterCollector` uses reflection to avoid bytecode references to `javax` vs `jakarta` +Part types. This is safe for Tomcat because the agent classloader is not subject to Java 9+ module +access restrictions for Tomcat's classloader. + +For GlassFish, direct cast via `javax.servlet.http.Part` interface is required instead. +See [docs/appsec/multipart-frameworks.md](../../../../docs/appsec/multipart-frameworks.md#glassfishpayara-no-reflection-via-parametercollector). + +## `inspectContent` flag must be evaluated before iterating parts + +Check whether the `requestFilesContent` callback is registered BEFORE iterating parts, not inside +the loop. This avoids calling `getInputStream()` on every file when no rule uses `files_content`: + +```java +boolean inspectContent = cbp.getCallback(EVENTS.requestFilesContent()) != null; +ParameterCollector collector = new ParameterCollector.ParameterCollectorImpl(inspectContent); +for (Object part : parts) { + collector.addPart(part); +} +``` + +## `helperClassNames()` must list all nested inner classes + +ByteBuddy does not auto-discover nested classes. Every `$Inner` class used at runtime must be +explicitly listed in `helperClassNames()` or a `NoClassDefFoundError` will occur silently. diff --git a/dd-java-agent/instrumentation/undertow/AGENTS.md b/dd-java-agent/instrumentation/undertow/AGENTS.md new file mode 100644 index 00000000000..907359cfd47 --- /dev/null +++ b/dd-java-agent/instrumentation/undertow/AGENTS.md @@ -0,0 +1,37 @@ +# Undertow AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md), +[docs/appsec/ig-events.md](../../../../docs/appsec/ig-events.md) + +## `tryCommitBlockingResponse` is not idempotent in Undertow + +Unlike Tomcat/Jersey, Undertow's `tryCommitBlockingResponse` commits the response immediately. +Calling it twice sends a duplicate response and can throw an `IllegalStateException`. + +When an advice has two blocking paths (body + filenames), the `t == null` guard must be checked +BEFORE each `tryCommitBlockingResponse` call, not after: + +```java +// Wrong -- commits response even if already blocked +if (bodyCallback != null && !map.isEmpty()) { + Flow flow = bodyCallback.apply(ctx, map); + t = tryBlock(flow, ctx); +} +if (filenamesCallback != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(ctx, filenames); + t = tryBlock(flow, ctx); // tryBlock calls tryCommitBlockingResponse again +} + +// Correct -- guard prevents second commit +if (bodyCallback != null && !map.isEmpty()) { + Flow flow = bodyCallback.apply(ctx, map); + t = tryBlock(flow, ctx); +} +if (t == null && filenamesCallback != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(ctx, filenames); + t = tryBlock(flow, ctx); +} +``` + +See [docs/appsec/ig-events.md](../../../docs/appsec/ig-events.md#undertow-trycommitblockingresponse-is-not-idempotent) +for the full explanation of why this differs from other Servlet containers. diff --git a/dd-java-agent/instrumentation/vertx/vertx-web/AGENTS.md b/dd-java-agent/instrumentation/vertx/vertx-web/AGENTS.md new file mode 100644 index 00000000000..bda2fc899c3 --- /dev/null +++ b/dd-java-agent/instrumentation/vertx/vertx-web/AGENTS.md @@ -0,0 +1,31 @@ +# Vert.x Web AppSec Instrumentation + +Extended reference: [docs/appsec/ig-events.md](../../../../docs/appsec/ig-events.md), +[docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md) + +## Advice classes are inlined, not injected + +`RoutingContextImplInstrumentation` has no `helperClassNames()` override -- it uses the empty +default. Advice classes (`RoutingContextJsonAdvice`, `RoutingContextFilenamesAdvice`, etc.) are +inlined by ByteBuddy, not injected as helpers. Only runtime-instantiated handlers go in +`helperClassNames()` (e.g. `WafPublishingBodyHandler`). + +New `@RequiresRequestContext` classes for `RoutingContextImplInstrumentation` must NOT be added +to `helperClassNames()`. + +## `fileUploads()` requires prior `BodyHandler` execution + +`RoutingContext.fileUploads()` returns an empty set unless a `BodyHandler` has previously parsed +the body. `setExpectMultipart(true)` + `endHandler` populates `formAttributes()` but NOT file +uploads. + +## Use distinct `CallDepthThreadLocalMap` keys per advice class + +Each advice class in `RoutingContextImplInstrumentation` must use a distinct key in +`CallDepthThreadLocalMap` to avoid re-entrancy interference. Using the same key across the +filenames advice and the JSON advice causes one to suppress the other. + +## No `effectivelyBlocked()` in Vert.x advice + +Do not call `effectivelyBlocked()` in Vert.x advice. See +[docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md#vertx). diff --git a/docs/appsec/blocking-patterns.md b/docs/appsec/blocking-patterns.md new file mode 100644 index 00000000000..916c1cf6ab9 --- /dev/null +++ b/docs/appsec/blocking-patterns.md @@ -0,0 +1,239 @@ +# AppSec Blocking Patterns + +How to correctly implement request blocking in advice classes. The pattern differs by server — getting it wrong causes silent failures where the block action is lost. + +## Netty 4.1 + +### `effectivelyBlocked()` — never call it in Netty advice + +`BlockingResponseHandler` calls `effectivelyBlocked()` internally as part of the blocking pipeline. When advice calls `tryCommitBlockingResponse()`, that triggers `BlockingResponseHandler`, which calls `effectivelyBlocked()` and finishes the span (`span.finish()`). A second call from advice on an already-finished span throws an exception. + +The advice has `@Advice.OnMethodExit(suppress = Throwable.class)`. With `suppress = Throwable.class`, the exception from `effectivelyBlocked()` is silently swallowed — the advice does not visibly fail. But if code follows `effectivelyBlocked()` (typically `thr = new BlockingException(...)`), that code **never executes** because the exception interrupted the flow. Result: the request is not aborted. + +**Correct pattern in Netty:** +```java +BlockResponseFunction brf = reqCtx.getBlockResponseFunction(); +if (brf != null) { + brf.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); + thr = new BlockingException("Blocked request"); + // DO NOT call effectivelyBlocked() here — BlockingResponseHandler already did it +} +``` + +### Exception: urlencoded body-processed path + +The body-processed (urlencoded) path in `HttpPostRequestDecoderInstrumentation` must **not** call `effectivelyBlocked()` even in the pattern above. In this path, `tryCommitBlockingResponse()` closes the span synchronously (in the Netty event loop). Calling `effectivelyBlocked()` afterwards raises `AssertionError: Interaction with TraceSegment after root span has already finished`, which is swallowed by `suppress = Throwable.class` and registered as an instrumentation error. + +The multipart paths (filenames, content) in the same advice **do** call `effectivelyBlocked()` within `if (brf != null)`. The asymmetry is intentional. + +| Path in `HttpPostRequestDecoderInstrumentation` | `effectivelyBlocked()` | +|---|---| +| body-processed (urlencoded) | no | +| filenames (multipart) | yes, inside `if (brf != null)` | +| content (multipart) | yes, inside `if (brf != null)` | + +### `thr` must be assigned inside `if (brf != null)` + +`thr = new BlockingException(...)` must be inside the `if (brf != null)` block. If assigned outside, it is set even when no blocking response function is available, causing the decoder to abort without having sent a blocking response. + +### Netty FileUpload content reading + +When reading bytes from `HttpData`/`FileUpload` for `files_content`, there are exactly two correct paths: + +```java +if (fileUpload.isInMemory()) { + ByteBuf buf = fileUpload.getByteBuf(); + int length = (int) Math.min((long) MAX_CONTENT_BYTES, (long) buf.readableBytes()); + byte[] bytes = new byte[length]; + buf.getBytes(buf.readerIndex(), bytes); // absolute read — does NOT advance readerIndex + contentStr = MultipartContentDecoder.decodeBytes(bytes, length, fileUpload.getContentType()); +} else { + byte[] bytes = new byte[MAX_CONTENT_BYTES]; + try (FileInputStream fis = new FileInputStream(fileUpload.getFile())) { + int read = fis.read(bytes); + contentStr = MultipartContentDecoder.decodeBytes(bytes, read < 0 ? 0 : read, fileUpload.getContentType()); + } +} +``` + +**Why not the alternatives:** +- `getInputStream()` — not declared in the `HttpData` interface; only available on `AbstractDiskHttpData` (internal class). Compilation fails on `FileUpload` type. +- `getByteBuf()` on disk-backed uploads — `AbstractDiskHttpData.getByteBuf()` loads the entire file into RAM. Catastrophic for large uploads. +- `get()` — same as above, loads everything. +- `getChunk(int)` — maintains internal state (`fileChannel`, `chunkPosition`). Side effects can interfere with Netty's upload lifecycle. + +--- + +## Tomcat / Servlet containers (Tomcat, GlassFish, Payara) + +### `effectivelyBlocked()` — always call it, but after assigning `t` + +Unlike Netty, the blocking response function in Tomcat does **not** call `effectivelyBlocked()` internally. It must be called explicitly in the advice. + +**Mandatory ordering:** assign `t = new BlockingException(...)` **before** calling `effectivelyBlocked()`. + +The advice has `@Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class)`. If `effectivelyBlocked()` throws and `suppress = Throwable.class` swallows the exception, and `effectivelyBlocked()` was called before assigning `t`, then `t` is never assigned and the request is not aborted. + +**Correct pattern — all three blocking paths:** +```java +BlockResponseFunction brf = reqCtx.getBlockResponseFunction(); +if (brf != null) { + brf.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); + t = new BlockingException("Blocked request (...)"); // 1. assign t + reqCtx.getTraceSegment().effectivelyBlocked(); // 2. then effectivelyBlocked +} +``` + +The three blocking paths in a typical Servlet advice (body-processed, filenames, content) must all follow this pattern consistently. + +### `t` must be assigned inside `if (brf != null)` + +Same rule as Netty: `t = new BlockingException(...)` must be inside `if (brf != null)`. Assigning it outside aborts the request even when no blocking response could be sent. + +### Filenames → content: sequential ordering + +When an advice handles both `requestFilesFilenames` and `requestFilesContent`: +- Fire filenames first. +- Fire content **only if** filenames did not block (`if (t == null)`). + +Filenames and content represent the same file upload. If the request is already blocked by filename, sending content to the WAF adds no value. + +```java +// filenames path +if (filenamesCallback != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(reqCtx, filenames); + if (action instanceof Flow.Action.RequestBlockingAction) { + if (brf != null) { + brf.tryCommitBlockingResponse(...); + t = new BlockingException("Blocked (filenames)"); + reqCtx.getTraceSegment().effectivelyBlocked(); + } + } +} + +// content path — only if filenames did not block +if (t == null && contentCallback != null) { + List contents = collector.getContents(); + if (!contents.isEmpty()) { + Flow flow = contentCallback.apply(reqCtx, contents); + // ... same blocking pattern + } +} +``` + +--- + +## Body + filenames: independent callbacks — both must always fire + +When an advice handles both `requestBodyProcessed` and `requestFilesFilenames` (or `requestFilesContent`) for the same request body, both callbacks must fire even if the first one triggers blocking. These are independent WAF addresses — a WAF config with only filenames rules will have `bodyCallback == null`. Silencing one because the other blocked deprives the WAF of information. + +**Wrong — throw on body blocks before filenames fires:** +```java +executeCallback(reqCtx, bodyCallback, conv); // throws BlockingException if blocked +filenamesCallback.apply(reqCtx, filenames); // never reached +``` + +**Correct — `pendingBlock` pattern:** +```java +BlockingException pendingBlock = null; + +if (bodyCallback != null) { + Flow flow = bodyCallback.apply(reqCtx, conv); + if (action instanceof Flow.Action.RequestBlockingAction) { + BlockResponseFunction brf = reqCtx.getBlockResponseFunction(); + if (brf != null) { + boolean success = brf.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); + if (success) { + pendingBlock = new BlockingException("Blocked (body)"); + } + } + } +} + +if (filenamesCallback != null && filenames != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(reqCtx, filenames); + if (pendingBlock == null && action instanceof Flow.Action.RequestBlockingAction) { + // apply blocking only if body did not block first + ... + pendingBlock = new BlockingException("Blocked (filenames)"); + } +} + +if (pendingBlock != null) throw pendingBlock; +``` + +The blocking logic (`tryCommitBlockingResponse`, `new BlockingException`) is specific to each server's `BlockResponseFunction`. It can be extracted to a helper **within the same module** but must **never be shared across different server modules**. + +--- + +## Undertow + +`UndertowBlockResponseFunction.tryCommitBlockingResponse` is **not idempotent**. Each call: +- Overwrites `exchange.putAttachment(TRACE_SEGMENT, segment)` +- Overwrites `exchange.putAttachment(REQUEST_BLOCKING_DATA, rab)` +- Dispatches to `UndertowBlockingHandler` + +If two blocking paths exist (body + filenames), the second call to `tryCommitBlockingResponse` generates a second dispatch on the IO thread. + +**The guard `t == null` must be placed before the call to `tryCommitBlockingResponse`**, not after: + +```java +// Correct — guard before +if (brf != null && t == null) { + boolean success = brf.tryCommitBlockingResponse(...); + if (success) { + t = new BlockingException("..."); + } +} + +// Wrong — guard after: tryCommitBlockingResponse already ran +if (brf != null) { + boolean success = brf.tryCommitBlockingResponse(...); + if (success && t == null) { // too late + t = new BlockingException("..."); + } +} +``` + +--- + +## GlassFish / Payara (module `tomcat-appsec-7.0`) + +`TomcatServerInstrumentation` is not active on Payara because `PECoyoteResponse` is not `org.apache.catalina.connector.Response`. As a result, `BlockResponseFunction` is never registered and `reqCtx.getBlockResponseFunction()` returns `null`. + +**Fallback:** obtain `HttpServletRequest` and `HttpServletResponse` from the `Multipart.request` field via `@Advice.FieldValue`, typed as `org.apache.catalina.Request` (the Catalina interface). See `GlassFishBlockingHelper` for the full implementation. + +**Why the interface, not the concrete class:** `connector.Request.getResponse()` has different JVM descriptors in Tomcat vs. GlassFish. `org.apache.catalina.Request.getResponse()` has the same descriptor in both — safe for muzzle. + +`GlassFishBlockingHelper.tryBlock()` uses a two-phase try/catch to handle the case where `effectivelyBlocked()` throws because the span was already finished after a successful commit: + +```java +try { + // commit the blocking response (brf or fallback) +} catch (Exception ignored) { + return false; // commit failed — cannot block this request +} +try { + reqCtx.getTraceSegment().effectivelyBlocked(); +} catch (Exception ignored) { + // span already finished — response was sent, blocking succeeded +} +return true; +``` + +**`Collections.emptyList()` instead of `BlockingException`** in the GlassFish advice return: propagating `BlockingException` through the Grizzly/Payara pipeline causes an uncontrolled shutdown. Returning an empty list is equivalent — the controller receives zero parts to process. + +--- + +## Vert.x + +No AppSec advice in Vert.x calls `effectivelyBlocked()`. The blocking pattern is: + +```java +blockResponseFunction.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); +if (throwable == null) { + throwable = new BlockingException("Blocked request (for ...)"); +} +``` + +Verify with grep before adding `effectivelyBlocked()` to any Vert.x advice — there are currently no calls to it in `dd-java-agent/instrumentation/vertx/`. diff --git a/docs/appsec/file-content.md b/docs/appsec/file-content.md new file mode 100644 index 00000000000..cf4706a3ba8 --- /dev/null +++ b/docs/appsec/file-content.md @@ -0,0 +1,131 @@ +# AppSec File Content — Reading and Encoding + +How to correctly read multipart file upload content for the `server.request.body.files_content` WAF address. + +## Canonical helper: `FileItemContentReader` + +The reference implementation for commons-fileupload. Key constants: + +```java +public static final int MAX_CONTENT_BYTES = 4096; // per-file limit +public static final int MAX_FILES_TO_INSPECT = 25; // per-request limit +``` + +Both are `public static final` so that tests can reference them rather than hardcoding the values. + +```java +public static List readContents(List fileItems) { + List result = new ArrayList<>(); + for (FileItem fileItem : fileItems) { + if (result.size() >= MAX_FILES_TO_INSPECT) break; + if (fileItem.isFormField()) continue; // form fields have no file content + result.add(readContent(fileItem)); + } + return result; +} + +public static String readContent(FileItem fileItem) { + try (InputStream is = fileItem.getInputStream()) { // try-with-resources required + byte[] buf = new byte[MAX_CONTENT_BYTES]; + int total = 0, n; + while (total < MAX_CONTENT_BYTES && (n = is.read(buf, total, MAX_CONTENT_BYTES - total)) != -1) + total += n; + return MultipartContentDecoder.decodeBytes(buf, total, fileItem.getContentType()); + } catch (IOException ignored) { + return ""; // IO failure — report empty rather than crash the advice + } +} +``` + +--- + +## Form field vs. file upload: `isFormField()`, not filename null + +**Use `isFormField()` to skip non-file parts**, not `fileItem.getName() != null`. Files without a filename (e.g., `curl` upload without explicit `-F "file=@..."`) still have content and must be inspected. Form fields are text input, never file uploads. + +The filename null check is correct for `files_filenames` (no name → nothing to report) but **incorrect** for `files_content`. + +--- + +## `getInputStream()` must be closed with try-with-resources + +`FileItem.getInputStream()` may return a stream backed by a temp file on disk (managed by `FileCleaningTracker`). Without try-with-resources, the file descriptor stays open until GC, which can exhaust the process's available file descriptors on requests with many uploads. + +--- + +## Charset decoding: `MultipartContentDecoder` + +All file content for `files_content` must be decoded using `MultipartContentDecoder` in `internal-api`, not with a hardcoded charset. The decoder applies the following fallback chain: + +1. Charset declared in the part's `Content-Type` header (e.g., `text/xml; charset=UTF-8`) +2. `Charset.defaultCharset()` if no charset is declared +3. ISO-8859-1 if the declared charset produces decoding errors + +```java +// Correct +return MultipartContentDecoder.decodeBytes(buf, total, fileItem.getContentType()); + +// Wrong — hardcoded charset loses non-ASCII characters +return new String(buf, 0, total, StandardCharsets.ISO_8859_1); +``` + +**Why `new String(bytes, charset)` does not work for the fallback chain:** `new String(bytes, charset)` uses `CodingErrorAction.REPLACE` internally — it never throws, silently substituting `U+FFFD` for invalid bytes. The catch block for the ISO-8859-1 fallback would be dead code. `MultipartContentDecoder` uses `CharsetDecoder` with `CodingErrorAction.REPLACE` as well (after considering the truncation-at-boundary case), so the decoder applies the correct charset while being robust to partial multi-byte sequences at the `MAX_CONTENT_BYTES` boundary. + +--- + +## Integration-specific notes + +### Tomcat / GlassFish (`ParameterCollector`) + +The `Part` interface is accessed via reflection to avoid bytecode references to types that differ between javax and jakarta. The `CachedMethods` inner class caches `getSubmittedFileName()`, `getInputStream()`, and `getContentType()` using a volatile immutable holder — see [multipart-frameworks.md](multipart-frameworks.md#reflection-cache-pattern) for the pattern. + +The `inspectContent` flag must be evaluated **before** iterating over parts: + +```java +boolean inspectContent = cbp.getCallback(EVENTS.requestFilesContent()) != null; +ParameterCollector collector = new ParameterCollector.ParameterCollectorImpl(inspectContent); +for (Object part : parts) { + collector.addPart(part); +} +``` + +Without the flag, `getInputStream()` is called for every file even when no rule uses `files_content`, adding unnecessary I/O overhead. + +### Netty 4.1 (`NettyMultipartHelper`) + +Netty stores uploads either in memory (`ByteBuf`) or on disk (`File`). See [blocking-patterns.md](blocking-patterns.md#netty-fileupload-content-reading) for the two-branch read pattern using `isInMemory()`. + +### RESTEasy + +`InputPart.getBody(InputStream.class, null)` may return a `FileInputStream` for uploads above the in-memory threshold. Use try-with-resources: + +```java +try (InputStream is = inputPart.getBody(InputStream.class, null)) { + if (is == null) return ""; + return MultipartContentDecoder.readInputStream(is, MAX_CONTENT_BYTES, contentType); +} +``` + +`MultipartContentDecoder.readInputStream()` does **not** close the caller's stream — the caller is responsible. + +--- + +## Ordering: filenames before content, content only if not blocked + +When an advice fires both `requestFilesFilenames` and `requestFilesContent`: +- Fire filenames first. +- Fire content only if filenames did not produce a blocking action. + +See [blocking-patterns.md](blocking-patterns.md#filenames--content-sequential-ordering) for the full pattern. + +--- + +## Required tests for new implementations + +- Truncation: content larger than `MAX_CONTENT_BYTES` → result is exactly `MAX_CONTENT_BYTES` bytes +- `IOException` from `getInputStream()` → returns `""` +- Form fields: skipped in `readContents()` +- Files with null or empty filename: included (not filtered by name) +- More than `MAX_FILES_TO_INSPECT` files: only the first `MAX_FILES_TO_INSPECT` inspected +- Empty file list → empty content list +- `contentCallback.apply()` not called when content list is empty diff --git a/docs/appsec/ig-events.md b/docs/appsec/ig-events.md new file mode 100644 index 00000000000..57c32e38d74 --- /dev/null +++ b/docs/appsec/ig-events.md @@ -0,0 +1,184 @@ +# AppSec Instrumentation Gateway Events + +How to add a new WAF address exposed through the Instrumentation Gateway (IG), and the invariants that apply to all IG event callbacks in advice classes. + +## Adding a new IG event — 4 required changes + +### 1. `Events.java` — declare the event type + +```java +static final int REQUEST_FILES_CONTENT_ID = 31; // unique, increment from last used + +@SuppressWarnings("rawtypes") +private static final EventType REQUEST_FILES_CONTENT = + new ET<>("request.body.files.content", REQUEST_FILES_CONTENT_ID); + +@SuppressWarnings("unchecked") +public EventType, Flow>> requestFilesContent() { + return (EventType, Flow>>) REQUEST_FILES_CONTENT; +} +``` + +The ID must be unique and sequential. Check the last used ID before assigning. + +### 2. `InstrumentationGateway.java` — register the callback type + +Add the new event ID to the appropriate case in `getCallback()` for the matching `BiFunction` signature. + +### 3. `KnownAddresses.java` — declare the WAF address + +```java +Address> REQUEST_FILES_CONTENT = + new Address<>("server.request.body.files_content"); +``` + +Also add it to the `fromString()` switch. + +### 4. `GatewayBridge.java` — wire the callback + +Four changes needed: +1. Subscriber field: `private volatile DataSubscriberInfo requestFilesContentSubInfo;` +2. Registration in `init()` (conditional on `additionalIGEvents`): + ```java + if (additionalIGEvents.contains(EVENTS.requestFilesContent())) { + subscriptionService.registerCallback(EVENTS.requestFilesContent(), this::onRequestFilesContent); + } + ``` +3. Reset in `reset()`: `requestFilesContentSubInfo = null;` +4. Handler method (see pattern below) +5. Entry in `DATA_DEPENDENCIES`: + ```java + DATA_DEPENDENCIES.put(KnownAddresses.REQUEST_FILES_CONTENT, l(EVENTS.requestFilesContent())); + ``` + +### Handler pattern in `GatewayBridge` + +All handlers follow the same retry-on-expiry pattern: + +```java +private Flow onRequestFilesContent(RequestContext ctx_, List filesContent) { + AppSecRequestContext ctx = ctx_.getData(RequestContextSlot.APPSEC); + if (ctx == null || filesContent == null || filesContent.isEmpty()) { + return NoopFlow.INSTANCE; + } + while (true) { + DataSubscriberInfo subInfo = requestFilesContentSubInfo; + if (subInfo == null) { + subInfo = producerService.getDataSubscribers(KnownAddresses.REQUEST_FILES_CONTENT); + requestFilesContentSubInfo = subInfo; + } + if (subInfo == null || subInfo.isEmpty()) { + return NoopFlow.INSTANCE; + } + DataBundle bundle = + new SingletonDataBundle<>(KnownAddresses.REQUEST_FILES_CONTENT, filesContent); + try { + GatewayContext gwCtx = new GatewayContext(false); // transient=false for request data; RASP uses true + return producerService.publishDataEvent(subInfo, ctx, bundle, gwCtx); + } catch (ExpiredSubscriberInfoException e) { + requestFilesContentSubInfo = null; + } + } +} +``` + +--- + +## `KnownAddressesSpecificationForkedTest` — update instance count + +Every new `Address` declared in `KnownAddresses.java` must be accompanied by incrementing the expected count in: + +```groovy +void 'number of known addresses is expected number'() { + expect: + Address.instanceCount() == 47 // increment by 1 for each new Address +} +``` + +The test will fail if the count is not updated. + +--- + +## Guard ordering in advice — obtain all callbacks before early return + +When an advice handles multiple IG callbacks, all callbacks must be fetched **before** any early-return guard. If a callback is fetched after a guard, it is silently lost when only that callback is registered. + +```java +// Wrong — contentCb is never fetched if bodyCallback is null +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +if (bodyCallback == null) return; // early return skips contentCb entirely +BiFunction<...> contentCb = cbp.getCallback(EVENTS.requestFilesContent()); + +// Correct — fetch all first, guard is AND of all +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +BiFunction<...> contentCb = cbp.getCallback(EVENTS.requestFilesContent()); +if (bodyCallback == null && contentCb == null) return; +``` + +**Subtle variant:** a callback fetched inside a conditional block after the loop is equally invisible when only that callback is registered: + +```java +// Also wrong — filenamesCb is never fetched when filenames list happens to be empty +// and bodyCallback already returned early +for (Part part : parts) { collector.addPart(part); } +if (!filenames.isEmpty()) { + BiFunction<...> filenamesCb = cbp.getCallback(EVENTS.requestFilesFilenames()); // too late + ... +} +``` + +The correct pattern: fetch all callbacks at the top of the method, then reference them by variable in conditional blocks. + +--- + +## `@RequiresRequestContext` + `Config.get()` — muzzle failure + +Do **not** place `static final` constants initialized from `Config.get()` in advice inner classes annotated with `@RequiresRequestContext`. + +Muzzle treats `@RequiresRequestContext`-annotated classes as user classes and tries to validate `Config` against the instrumented library's classpath (e.g., Netty, commons-fileupload) — where `Config` does not exist. This causes a `MuzzleValidationException` in CI even though the code compiles. + +```java +// Wrong — Config.get() in @RequiresRequestContext advice inner class +@RequiresRequestContext(RequestContextSlot.APPSEC) +static class ParseBodyAdvice { + private static final int MAX_FILES = Config.get().getAppSecMaxFileContentCount(); // breaks muzzle +} + +// Correct — constant in helper class declared in helperClassNames() +public class NettyMultipartHelper { + static final int MAX_FILES_TO_INSPECT = Config.get().getAppSecMaxFileContentCount(); +} +``` + +--- + +## Undertow: `tryCommitBlockingResponse` is not idempotent + +See [blocking-patterns.md](blocking-patterns.md#undertow) for the full explanation. Summary: when an advice has two blocking paths (body + filenames), the guard `t == null` must appear **before** the call to `tryCommitBlockingResponse`, not after. + +--- + +## Akka HTTP: two multipart routes + +Any new WAF address that captures multipart body data must instrument **both** routes in Akka HTTP: + +| Route | Entry point | Notes | +|---|---|---| +| Route 1 | `handleMultipartStrictFormData(Multipart$FormData$Strict)` | Has `reqCtx` as local variable; iterates via Java API `getStrictParts()` | +| Route 2 | `handleStrictFormData(StrictForm)` | No `reqCtx` in scope; must obtain via `activeSpan()` | + +If only one route is instrumented, multipart requests processed via `formFieldMultiMap` (Route 2) silently miss the WAF event. + +Do **not** extract the filenames callback dispatch into a separate helper method in `UnmarshallerHelpers`. This is known to cause problems. + +--- + +## Vert.x: advice classes vs. helper classes + +`RoutingContextImplInstrumentation` has no `helperClassNames()` override — it uses the empty default. Advice classes (`RoutingContextJsonAdvice`, `RoutingContextFilenamesAdvice`, etc.) are inlined by ByteBuddy, not injected. Only runtime-instantiated handlers go in `helperClassNames()` (e.g. `WafPublishingBodyHandler`). + +New `@RequiresRequestContext` classes for `RoutingContextImplInstrumentation` must **not** be added to `helperClassNames()`. + +`RoutingContext.fileUploads()` returns an empty set unless `BodyHandler` has previously parsed the body. `setExpectMultipart(true)` + `endHandler` populates `formAttributes()` but **not** file uploads. + +Use a distinct `CallDepthThreadLocalMap` key per advice class to avoid interference between the filenames advice and the existing JSON advice. diff --git a/docs/appsec/multipart-frameworks.md b/docs/appsec/multipart-frameworks.md new file mode 100644 index 00000000000..c9c7a811a33 --- /dev/null +++ b/docs/appsec/multipart-frameworks.md @@ -0,0 +1,168 @@ +# Multipart AppSec — Framework-Specific Patterns + +Framework-specific invariants for implementing `server.request.body.filenames` and `server.request.body.files_content` in Jersey, RESTEasy, Jetty, and GlassFish/Payara. + +## Jersey: dual javax / jakarta modules + +Jersey 2.x uses `javax.ws.rs.*`; Jersey 3.x uses `jakarta.ws.rs.*`. The types are bytecode-incompatible. Any helper that imports JAX-RS types must be duplicated: + +- `jersey-appsec-2.0` → package `jersey2`, imports `javax.*` +- `jersey-appsec-3.0` → package `jersey3`, imports `jakarta.*` + +There is no mechanism to enforce synchronization between the two modules. The convention is a `// keep in sync with jersey3` comment in the 2.0 module. When adding logic to either module's `MultiPartHelper`, replicate the change in the other. + +## Reflection cache pattern for version-variant return types + +When a method's return type differs between library versions (e.g., `getHeaders()` returns `javax.ws.rs.core.MultivaluedMap` in RESTEasy 3.x and `jakarta.ws.rs.core.MultivaluedMap` in RESTEasy 6.x), a direct bytecode reference fails muzzle for the other version. + +**Canonical solution:** cache the `Method` as a `static final` field, resolved once in a `static {}` block: + +```java +private static final Method GET_HEADERS; + +static { + Method m = null; + try { + m = InputPart.class.getMethod("getHeaders"); + } catch (NoSuchMethodException ignored) { + } + GET_HEADERS = m; +} +``` + +**Why `static final` and not `volatile`:** class initialization is atomic by JVM spec — the `static {}` block runs exactly once when the classloader loads the helper class. No race condition. `volatile` would be needed only for lazy initialization in an instance or a non-static field. + +**Runtime cost:** zero per request. The reflection lookup happens once at class load time in the app classloader, where the library is present. + +**Null guard required:** always check `if (GET_HEADERS == null) return ...` before invoking. The static block may set the field to `null` if the class is not present in the classpath (incompatible version). + +This pattern applies to any method whose return type or parameters differ between major library versions. + +## Per-part try/catch in for-loops + +`suppress = Throwable.class` on the advice protects the **advice boundary**, not the interior of loops inside helpers. If an exception is thrown while processing part N, it short-circuits the remaining parts — the WAF receives only partial data with no visible error. + +```java +for (FormDataBodyPart part : parts) { + try { + FormDataContentDisposition cd = part.getFormDataContentDisposition(); + collectBodyPart(part, cd, map, filenames); + } catch (Exception ignored) { + // malformed part — continue processing remaining parts + } +} +``` + +This applies to any helper that iterates over parts, items, or framework elements where each element can fail independently. + +## RESTEasy: case-insensitive `Content-Disposition` lookup + +`MultivaluedMapImpl` (used internally by RESTEasy) extends `HashMap` — `get(key)` is case-sensitive. RFC 2045 defines MIME headers as case-insensitive. If a client sends `content-disposition` in lowercase, `headers.get("Content-Disposition")` returns `null` and all filenames for that part are lost. + +```java +String cd = null; +for (Map.Entry> entry : headers.entrySet()) { + if ("content-disposition".equalsIgnoreCase(entry.getKey())) { + List values = entry.getValue(); + if (values != null && !values.isEmpty()) { + cd = values.get(0); + } + break; + } +} +``` + +This also applies to `Content-Type` lookups in any framework map that does not guarantee case-insensitive access. + +## RESTEasy: `Content-Disposition` parser edge cases + +`MultipartHelper.filenameFromContentDisposition()` handles these non-obvious cases: + +| Case | Behaviour | +|---|---| +| Semicolon inside quoted filename: `filename="shell;evil.php"` | Extracts `shell;evil.php` (does not split on `;`) | +| MIME linear whitespace after `;`: `; \tfilename=` | Skips SP and HT before the parameter name | +| Optional whitespace around `=`: `filename = "f.php"` | Skips SP/HT before and after `=` | +| `filename*` (RFC 5987) must not match: `filename*=UTF-8''f.php` | Lookahead index `j` distinguishes `filename` from `filename*` | +| Escape inside quoted string: `filename="file\"name.php"` | Unescaped: returns `file"name.php` | +| Empty value: `filename=""` or `filename=` | Returns `null` — do not add to the WAF list | + +Jersey does not need a custom parser — it uses `FormDataContentDisposition.getFileName()` from the Jersey public API. + +## Servlet containers: blocking ordering in `tryBlock()` + +When implementing a `tryBlock()` helper for Servlet-based frameworks (Jersey, RESTEasy, Tomcat), the ordering inside the helper must be: + +```java +brf.tryCommitBlockingResponse(ctx.getTraceSegment(), rba); +BlockingException be = new BlockingException(message); // 1. create exception +ctx.getTraceSegment().effectivelyBlocked(); // 2. mark blocked +return be; +``` + +`new BlockingException` before `effectivelyBlocked()`: if `effectivelyBlocked()` throws (possible if the span is already finished) and the advice has `suppress = Throwable.class`, the exception object must already exist. Creating it beforehand guarantees the caller receives a non-null value regardless of what happens next. + +See [blocking-patterns.md](blocking-patterns.md) for Netty's different behaviour. + +## RESTEasy: try-with-resources in `readContent()` + +`InputPart.getBody(InputStream.class, null)` may return a `FileInputStream` when the part exceeds RESTEasy's in-memory threshold (configurable via `resteasy.multipart.input.max.child.bytes`, default ~10 MB). Without try-with-resources, the file descriptor leaks until GC. + +```java +try (InputStream is = inputPart.getBody(InputStream.class, null)) { + if (is == null) return ""; + return MultipartContentDecoder.readInputStream(is, MAX_CONTENT_BYTES, contentType); +} catch (IOException ignored) { + return ""; +} +``` + +## Three-callback ordering: body + filenames + content + +When all three callbacks coexist in a single advice: + +1. `requestBodyProcessed` (form fields) — always fires when map is non-empty. +2. `requestFilesFilenames` — always fires even if body blocked (`pendingBlock` pattern). +3. `requestFilesContent` — fires **only if** `t == null` (sequential gate). + +```java +// Guard — obtain all three upfront +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +BiFunction<...> filenamesCallback = cbp.getCallback(EVENTS.requestFilesFilenames()); +BiFunction<...> contentCallback = cbp.getCallback(EVENTS.requestFilesContent()); +if (bodyCallback == null && filenamesCallback == null && contentCallback == null) return; + +// body — fires always +if (bodyCallback != null && !map.isEmpty()) { bodyCallback.apply(...); tryBlock(...); } + +// filenames — fires always (independent WAF address) +if (filenamesCallback != null && !filenames.isEmpty()) { filenamesCallback.apply(...); tryBlock(...); } + +// content — fires only if not already blocked +if (t == null && contentCallback != null) { + List contents = readContents(...); + if (!contents.isEmpty()) { contentCallback.apply(...); tryBlock(...); } +} +``` + +Body and filenames are independent WAF addresses — a WAF config with only filenames rules will have `bodyCallback == null`. Silencing filenames because body already blocked deprives the WAF of information. + +## Jetty 8.x: manual `Content-Disposition` parsing + +`getSubmittedFileName()` (Servlet 3.1) is not available in Jetty 8.x (Servlet 3.0). Filenames must be extracted by manually parsing the `Content-Disposition` header from each `Part`. + +Exit advice on `getParts()` is the correct instrumentation point for Jetty 8.x — there is no `parseParts()` equivalent to intercept. + +## Jetty 9.4 / 10: muzzle discriminator + +The field `_dispatcherType: Ljavax/servlet/DispatcherType;` distinguishes Jetty 9.4/10.x (javax namespace) from Jetty 11+ (jakarta namespace). Use it as the muzzle reference rather than trying to match on API version alone. + +## GlassFish / Payara: no reflection via `ParameterCollector` + +The Java 11+ module system blocks reflective access from unnamed modules (agent helpers) to named modules (GlassFish's `PartItem`). `Method.invoke()` and `setAccessible(true)` fail silently — `IllegalAccessException` is swallowed by `catch (Exception)` in the helper, and data never reaches the WAF. + +**Correct approach:** the advice inlines access to `PartItem` via a direct cast to `javax.servlet.http.Part` (the standard Servlet interface). Method dispatch goes through the interface — no reflection needed. + +Only helpers that reference exclusively bootstrap or Servlet API types can be injected via `helperClassNames()` without module access issues. `GlassFishBlockingHelper` is a correct example: it only uses `HttpServletRequest`, `HttpServletResponse` (Servlet API), and `BlockingActionHelper` (bootstrap). + +`compileOnly javax.servlet:javax.servlet-api:3.1.0` is required — `Part.getSubmittedFileName()` is Servlet 3.1 (GlassFish 4+); the default Tomcat 7.x dependency only includes Servlet 3.0. diff --git a/docs/appsec/rasp-file-io.md b/docs/appsec/rasp-file-io.md new file mode 100644 index 00000000000..b5d7ca82ffb --- /dev/null +++ b/docs/appsec/rasp-file-io.md @@ -0,0 +1,74 @@ +# RASP File I/O — CSI Pattern + +How to implement RASP call sites for `java.io` / `java.nio.file` operations. + +## Call site structure + +```java +@CallSite( + spi = {RaspCallSites.class}, // required — registers as RASP call site + helpers = FileIORaspHelper.class) // required — injects helper into target classloader +public class FilesCallSite { + + @CallSite.Before("fully.qualified.MethodSignature(java.nio.file.Path, ...)") + public static void beforeXxx(@CallSite.Argument(N) @Nullable final Path path) { + if (path != null) { + FileIORaspHelper.INSTANCE.beforeFileLoaded(path.toString()); + } + } +} +``` + +Both `spi` and `helpers` are mandatory: +- Without `spi = {RaspCallSites.class}`: the call site is not activated when RASP is enabled. +- Without `helpers = FileIORaspHelper.class`: `NoClassDefFoundError` at runtime when the helper is accessed. + +## `@CallSite.Before` descriptor rules + +- Always use fully-qualified type names: `java.nio.file.Path`, not `Path`. +- Varargs are declared as arrays: `java.nio.file.CopyOption[]`, not `CopyOption...`. ByteBuddy always sees the array form in bytecode. + +## Operation → event mapping + +| `Files.*` operation | `beforeFileLoaded` (LFI) | `beforeFileWritten` | +|---|---|---| +| `copy(InputStream, Path, ...)` | no | yes (target) | +| `copy(Path, Path, ...)` | yes (source) | yes (target) — dual event | +| `copy(Path, OutputStream)` | yes (source) | no (OutputStream ≠ filesystem path) | +| `move(Path, Path, ...)` | no | yes (target) | +| `read*`, `newInputStream`, `lines*` | yes | no | +| `write*`, `newOutputStream`, `newBufferedWriter` | no | yes | + +`Files.copy(Path, Path)` is the only dual-event case in this module. + +## `FileIORaspHelper` contract + +- `INSTANCE` is `public static` (not final) — replaceable in tests with a mock. +- `invokeRaspCallback` has three early returns: `!isAppSecRaspEnabled()`, `callback == null`, `span == null / ctx == null`. +- `BlockingException` is always re-thrown (separate catch from the general `Throwable` catch). +- Any other `Throwable` is suppressed with `LOGGER.debug` — RASP must never crash the application. + +## Testing + +```groovy +void 'test RASP Files.copy path to path'() { + setup: + final helper = Mock(FileIORaspHelper) + FileIORaspHelper.INSTANCE = helper + final source = newFile('src.txt').toPath() // newFile() creates file on disk — required for reads + final target = temporaryFolder.resolve('dst.txt') // resolve() → path without creating — required for writes + + when: + TestFilesSuite.copyPathToPath(source, target) + + then: + 1 * helper.beforeFileLoaded(source.toString()) + 1 * helper.beforeFileWritten(target.toString()) +} +``` + +Rules: +- Read operations → `newFile(name).toPath()` (file must exist on disk). +- Write operations → `temporaryFolder.resolve(name)` (empty path; the operation creates it). +- Always add a negative assertion `0 * helper.beforeFileWritten(_)` when the operation should not fire a write event. +- Spock automatically restores the `INSTANCE` mock after each test — no manual cleanup needed. diff --git a/docs/appsec/waf-api.md b/docs/appsec/waf-api.md new file mode 100644 index 00000000000..b3b04b06031 --- /dev/null +++ b/docs/appsec/waf-api.md @@ -0,0 +1,286 @@ +# WAF API Reference — libddwaf-java + +Reference for the `libddwaf-java` binding used by the AppSec module. Based on libddwaf-java v1.30.0 / libddwaf v1.30.0. + +## Lifecycle + +``` +Waf.initialize(false) // loads native .so (singleton, irreversible) + ↓ +WafBuilder builder = new WafBuilder(config) +builder.addOrUpdateConfig("path", rulesMap) // → WafDiagnostics + ↓ +WafHandle handle = builder.buildWafHandleInstance() +builder.close() + ↓ +WafContext ctx = new WafContext(handle) // one per HTTP request + ↓ +Waf.ResultWithData r = ctx.run(persistentData, limits, metrics) +ctx.runEphemeral(ephemeralData, limits, metrics) + ↓ +ctx.close() // at end of request +handle.close() // when ruleset is reloaded +``` + +**Lifecycle rules:** +- `WafContext` is `Closeable` — use try-with-resources or close in `finally` +- `WafHandle` is thread-safe for reads (ReadWriteLock); write (close) is exclusive +- `WafBuilder.close()` may be called after `buildWafHandleInstance()` without affecting the handle +- Existing contexts continue working after `handle.close()` (reference counting) +- `Waf.deinitialize()` releases JNI — **not thread-safe**; crashes if threads are using the WAF + +--- + +## Java types accepted as address values + +The `Map` passed to `ctx.run()` maps address names to values: + +| Java type | WAF type (`ddwaf_object`) | Notes | +|---|---|---| +| `null` | `DDWAF_OBJ_NULL` | Serialized as explicit null | +| `String` | `DDWAF_OBJ_STRING` | UTF-16 → UTF-8. Truncated to `maxStringSize` UTF-16 chars | +| `CharSequence` (any) | `DDWAF_OBJ_STRING` | Same as String | +| `CharBuffer` (heap or direct) | `DDWAF_OBJ_STRING` | Zero-copy for direct `ByteOrder.nativeOrder()` | +| `Integer`, `Long`, `Short`, `Byte`, `AtomicInteger`, `BigInteger` | `DDWAF_OBJ_SIGNED` (int64_t) | Any non-float Number | +| `Double`, `Float`, `BigDecimal` | `DDWAF_OBJ_FLOAT` (double IEEE 754) | | +| `Boolean` | `DDWAF_OBJ_BOOL` | | +| `Map` | `DDWAF_OBJ_MAP` | Keys converted to String via `toString()`. Recursive | +| `Collection` (List, Set, etc.) | `DDWAF_OBJ_ARRAY` | Recursive | +| Native arrays (`String[]`, `int[]`, etc.) | `DDWAF_OBJ_ARRAY` | Recursive | +| Any `Iterable` | `DDWAF_OBJ_ARRAY` | Recursive | +| Any other object | `DDWAF_OBJ_NULL` | **Silent null — no exception thrown** | + +**Edge cases:** +- Empty strings `""`: valid, serialized as `DDWAF_OBJ_STRING` of length 0 +- `null` keys in a Map: converted to `""` (empty string) +- Incomplete UTF-16 surrogates: replaced with `U+FFFD` (no exception) +- Unknown objects (`new Object()`): silently become `DDWAF_OBJ_NULL` +- `ConcurrentModificationException` during iteration: propagates as Java exception + +--- + +## Limits (`Waf.Limits`) + +```java +Waf.Limits limits = new Waf.Limits( + int maxDepth, // max nesting depth + int maxElements, // max total nodes (maps + arrays, cumulative) + int maxStringSize, // max UTF-16 chars per string (keys AND values) + long generalBudgetInUs, // total microseconds (serialization + execution) + long runBudgetInUs // microseconds for ddwaf_run() (0 = no separate limit) +); +``` + +**Values used in dd-trace-java** (see `AppSecSystem` / `WafInitialization`): +- `maxDepth = 20` +- `maxElements = 150` or `255` (version-dependent) +- `maxStringSize = 4096` +- `generalBudgetInUs`: configurable via `DD_APPSEC_WAF_TIMEOUT` (default: 10 000 µs) + +**Truncation behavior** (tracked in `WafMetrics`): + +| Limit exceeded | Effect | Metric | +|---|---|---| +| `maxDepth` | Object replaced with empty map | `truncatedObjectTooDeepCount` | +| `maxElements` | Map/array replaced with empty map | `truncatedListMapTooLargeCount` | +| `maxStringSize` | String truncated | `truncatedStringTooLongCount` | +| `generalBudgetInUs` | `TimeoutWafException` | — | + +`maxElements` is a **global counter per `run()` call**, decremented for each node (maps, arrays, and their elements). A map with 2 keys costs 3 (the map itself + 2 elements). + +`maxStringSize` is in **UTF-16 chars, not bytes**. An emoji (e.g. `👍`) is 2 `char` units; if `maxStringSize=1` the emoji is truncated to empty. + +--- + +## Persistent vs. ephemeral data + +`WafContext` accepts two data slots per `run()`: + +| | Persistent | Ephemeral | +|---|---|---| +| Storage | Cached in context across calls | Only for the current call | +| Exclusion caching | Yes | No (always re-evaluated) | +| Typical use | HTTP headers, IP, URI, body | gRPC messages, streaming payloads | +| Released | On `ctx.close()` | At end of `ctx.run()` | + +Persistent data **accumulates** — if run 1 passes `server.request.headers` and run 2 passes `server.request.body`, the context has both in run 2. Rules that depend on both addresses are only evaluated once all their data is available. + +gRPC messages must be passed as **ephemeral** data (`runEphemeral()`). Passing them as persistent means only the first message is evaluated; subsequent messages show no new data. + +--- + +## WAF addresses — full catalogue + +Declared in `KnownAddresses.java`. + +### HTTP request + +| Address | Expected type | Notes | +|---|---|---| +| `server.request.method` | `String` | "GET", "POST", etc. | +| `server.request.uri.raw` | `String` | Full URI | +| `server.request.uri` | `Map` | Generated by `uri_parse` preprocessor. Keys: scheme, userinfo, host, port, path, query, fragment | +| `server.request.headers.no_cookies` | `Map>` | Headers without cookies. Keys lowercase | +| `server.request.cookies` | `Map` | | +| `server.request.query` | `Map>` | | +| `server.request.path_params` | `Map` | | +| `server.request.body` | `Map` or `String` | Parsed or raw body | +| `server.request.body.raw` | `String` | Unparsed body (input for processors) | +| `server.request.transport` | `String` | "http", "https", "grpc" | +| `server.request.jwt` | `Map` | Decoded JWT (generated by `jwt_decode` processor) | + +### HTTP response + +| Address | Expected type | Notes | +|---|---|---| +| `server.response.status` | `Integer` (unsigned) | Numeric HTTP code — **not** a String | +| `server.response.body` | `String` | | +| `server.response.headers.no_cookies` | `Map>` | | + +### Client and user + +| Address | Expected type | +|---|---| +| `http.client_ip` | `String` | +| `usr.id` | `String` | +| `usr.session_id` | `String` | + +### gRPC + +| Address | Expected type | Persistence | +|---|---|---| +| `grpc.server.request.message` | `Map` | **Ephemeral** | +| `grpc.server.request.metadata` | `Map>` | Persistent | + +### RASP + +| Address | Expected type | When | +|---|---|---| +| `server.db.statement` | `String` | Before executing SQL | +| `server.db.system` | `String` | "mysql", "postgresql", etc. | +| `server.io.fs.file` | `String` | File path being accessed | +| `server.io.net.url` | `String` | Outbound request URL | +| `server.sys.shell.cmd` | `String` | Shell command | + +### Business logic + +| Address | Expected type | +|---|---| +| `server.business_logic.users.login.success` | `Map` | +| `server.business_logic.users.login.failure` | `Map` | + +### Processor activator + +| Address | Value | +|---|---| +| `waf.context.processor` | `Map{"fingerprint": true}` — activates fingerprint generation | + +### Processor outputs (returned in `ResultWithData.attributes`, not passed as input) + +- `_dd.appsec.s.req.body` — body schema (gzip+base64) +- `_dd.appsec.fp.http.endpoint` — endpoint fingerprint +- `_dd.appsec.fp.http.header` — header fingerprint +- `_dd.appsec.fp.http.network` — network fingerprint +- `_dd.appsec.fp.session` — session fingerprint + +**Never pass these as input addresses** — they are outputs from `ResultWithData.attributes`. + +`server.request.headers.no_cookies` is **without cookies**. Passing all headers (including cookies) to this address is a bug; cookies go in `server.request.cookies`. + +`server.response.status` must be an `Integer`, not a String `"200"`. Changed from String to Integer in v1.28.0. + +--- + +## Result: `Waf.ResultWithData` + +```java +result.result // Waf.Result.OK or Waf.Result.MATCH +result.data // String JSON with events (null if OK) +result.actions // Map>: actions to execute +result.attributes // Map: WAF-derived data (schemas, fingerprints) +result.keep // boolean: whether to keep the trace +result.duration // long: nanoseconds of native execution +result.events // boolean: whether events were generated +``` + +**Known actions:** +- `block_request`: `{status_code: int, type: "json"|"html"|"auto", grpc_status_code: int}` +- `redirect_request`: `{status_code: int, location: String}` +- `generate_stack`: generate stack trace +- `generate_schema`: activate schema extraction + +From v1.29.0, each blocking action includes a `block_id` correlating with the event in `data`. + +--- + +## `WafDiagnostics` — result of `addOrUpdateConfig()` + +```java +WafDiagnostics d = builder.addOrUpdateConfig("path", map); + +d.getNumConfigOK() // int +d.getNumConfigError() // int +d.getAllErrors() // Map> +d.getRulesetVersion() // String +d.isWellStructured() // boolean +``` + +Available sections: `rules`, `customRules`, `rulesData`, `rulesOverride`, `exclusions`, `exclusionData`, `actions`, `processors`, `scanners`. + +--- + +## `WafMetrics` — execution metrics + +```java +WafMetrics m = new WafMetrics(); +ctx.run(data, limits, m); + +m.getTotalRunTimeNs() // total: serialization + native WAF +m.getTotalDdwafRunTimeNs() // native WAF only +m.getTruncatedStringTooLongCount() +m.getTruncatedListMapTooLargeCount() +m.getTruncatedObjectTooDeepCount() +``` + +Serialization time = `totalRunTimeNs - totalDdwafRunTimeNs`. + +--- + +## Obfuscation — `WafConfig` + +```java +WafConfig config = new WafConfig(); +config.obfuscatorKeyRegex = "(?i)pass|pwd|secret|api.key|token|..."; +config.obfuscatorValueRegex = "..."; +// To disable obfuscation: +config.obfuscatorKeyRegex = ""; +``` + +Obfuscated data appears as `` in `result.data`. + +--- + +## Schema extraction + +Activated by the `extract_schema` processor in the ruleset. Results are returned in `ResultWithData.attributes` under keys like `_dd.appsec.s.req.body`, already gzip-compressed and base64-encoded. Write them to the span as-is — do not decompress. + +**Type codes in schema:** + +| Code | Type | +|---|---| +| 1 | Null | +| 2 | Boolean | +| 4 | Integer | +| 8 | String | +| 16 | Float/Double | + +--- + +## Version history + +| Version | Changes | +|---|---| +| v1.30.0 | No breaking Java API changes | +| v1.29.0 | `block_id` in block/redirect actions; `processor_overrides` are now incremental | +| v1.28.0 | New address `server.request.uri` (from `uri_parse`); `server.response.status` changed from String to Integer | +| v1.28.1 | Float support in status codes for ruleset compatibility |