diff --git a/AGENTS.md b/AGENTS.md index e09dff0d473..4cbdb6d953f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -39,6 +39,7 @@ docs/ Developer documentation (see below) | Working with Gradle | [docs/how_to_work_with_gradle.md](docs/how_to_work_with_gradle.md) | | Bootstrap/premain constraints | [docs/bootstrap_design_guidelines.md](docs/bootstrap_design_guidelines.md) | | CI/CD workflows | [.github/workflows/README.md](.github/workflows/README.md) | +| AppSec: blocking, WAF API, IG events | [docs/appsec/](docs/appsec/) | **When working on a topic above, read the linked file first** — they are the source of truth maintained by humans. diff --git a/dd-java-agent/appsec/AGENTS.md b/dd-java-agent/appsec/AGENTS.md new file mode 100644 index 00000000000..17cec86a68e --- /dev/null +++ b/dd-java-agent/appsec/AGENTS.md @@ -0,0 +1,65 @@ +# AppSec Module + +Extended reference: [docs/appsec/](../../docs/appsec/) + +## Adding a new WAF address: 4-file checklist + +Every new WAF address requires changes in exactly 4 files. Missing any one causes a silent no-op: + +1. `Events.java` -- declare the event type with a unique sequential ID +2. `InstrumentationGateway.java` -- register the callback type in `getCallback()` +3. `KnownAddresses.java` -- declare the `Address` constant and add to `fromString()` switch +4. `GatewayBridge.java` -- subscriber field, `init()` registration, `reset()` null, handler method, `DATA_DEPENDENCIES` entry + +`GatewayBridgeIGRegistrationSpecification` verifies that every `DATA_DEPENDENCIES` entry registers +its events in `init()`. If you add a `DATA_DEPENDENCIES` entry but forget the `registerCallback()` +call, that test fails with an explanation. + +`KnownAddressesSpecificationForkedTest` verifies that every address is in `fromString()` and that +the total count is up to date. + +See [docs/appsec/ig-events.md](../../docs/appsec/ig-events.md) for the full handler pattern including the retry-on-expiry loop. + +## `KnownAddressesSpecificationForkedTest` instance count + +Every new `Address` declared in `KnownAddresses.java` must be accompanied by incrementing the +expected count in `KnownAddressesSpecificationForkedTest`: + +```groovy +Address.instanceCount() == 47 // increment by 1 per new Address +``` + +The test fails if this is not updated. + +## WAF API quick reference + +- `server.response.status` must be `Integer`, not `String` (changed in libddwaf v1.28.0) +- gRPC messages must use `runEphemeral()`, not `run()` -- persistent data caches only the first message +- Processor outputs (`_dd.appsec.s.req.body`, fingerprints) come from `ResultWithData.attributes` -- never pass them as input addresses +- `server.request.headers.no_cookies` must not include cookies -- cookies go in `server.request.cookies` + +Full type mapping and limits: [docs/appsec/waf-api.md](../../docs/appsec/waf-api.md) + +## Callback guard ordering in advice + +Fetch ALL callbacks before any early-return guard. A callback fetched after a conditional return +is silently skipped when only that callback is registered: + +```java +// Correct +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +BiFunction<...> filenamesCallback = cbp.getCallback(EVENTS.requestFilesFilenames()); +BiFunction<...> contentCallback = cbp.getCallback(EVENTS.requestFilesContent()); +if (bodyCallback == null && filenamesCallback == null && contentCallback == null) return; +``` + +## Blocking model per framework + +| Framework | Call `effectivelyBlocked()`? | Notes | +|---|---|---| +| Netty | NEVER | `BlockingResponseHandler` does it internally | +| Tomcat / Jersey / Jetty | YES, after `t = new BlockingException(...)` | Inside `if (brf != null)` | +| Undertow | YES, but `tryCommitBlockingResponse` is not idempotent | Guard `t == null` before each call | +| Vert.x | NO | Handler-based model | + +Full details: [docs/appsec/blocking-patterns.md](../../docs/appsec/blocking-patterns.md) diff --git a/dd-java-agent/appsec/src/test/groovy/com/datadog/appsec/gateway/GatewayBridgeIGRegistrationSpecification.groovy b/dd-java-agent/appsec/src/test/groovy/com/datadog/appsec/gateway/GatewayBridgeIGRegistrationSpecification.groovy index 617240790cf..d05c16493e9 100644 --- a/dd-java-agent/appsec/src/test/groovy/com/datadog/appsec/gateway/GatewayBridgeIGRegistrationSpecification.groovy +++ b/dd-java-agent/appsec/src/test/groovy/com/datadog/appsec/gateway/GatewayBridgeIGRegistrationSpecification.groovy @@ -2,6 +2,7 @@ package com.datadog.appsec.gateway import com.datadog.appsec.event.EventDispatcher import com.datadog.appsec.event.data.KnownAddresses +import datadog.trace.api.gateway.EventType import datadog.trace.api.gateway.Events import datadog.trace.api.gateway.SubscriptionService import datadog.trace.test.util.DDSpecification @@ -45,4 +46,41 @@ class GatewayBridgeIGRegistrationSpecification extends DDSpecification { then: 1 * ig.registerCallback(Events.get().requestFilesContent(), _) } + + // Coherence test: every entry in IGAppSecEventDependencies.DATA_DEPENDENCIES must have its + // events registered in GatewayBridge.init() when the corresponding address is subscribed. + // If you add a new DATA_DEPENDENCIES entry without the matching registerCallback() call in + // init(), this test fails with a message showing which event was missing. + void 'DATA_DEPENDENCIES: events for address #addressKey are registered when address is subscribed'() { + given: + List> registeredEvents = [] + SubscriptionService mockSvc = Mock() + EventDispatcher mockDispatcher = Mock() + _ * mockDispatcher.allSubscribedDataAddresses() >> [address] + mockSvc.registerCallback(_, _) >> { EventType et, Object cb -> registeredEvents << et } + + GatewayBridge testBridge = new GatewayBridge(mockSvc, mockDispatcher, null, null, []) + testBridge.init() + + expect: + events.every { event -> + assert registeredEvents.contains(event), + "Event '${event.type}' not registered for address '${address.key}'. " + + "Add registerCallback(EVENTS.${event.type}, ...) to GatewayBridge.init() " + + "inside an additionalIGEvents.contains() guard." + true + } + + where: + [address, events] << dataDependenciesEntries() + addressKey = address.key + } + + private static List dataDependenciesEntries() { + def innerClass = GatewayBridge.declaredClasses.find { it.simpleName == 'IGAppSecEventDependencies' } + def field = innerClass.getDeclaredField('DATA_DEPENDENCIES') + field.accessible = true + Map map = (Map) field.get(null) + return map.collect { addr, evts -> [addr, evts as List] } + } } diff --git a/dd-java-agent/instrumentation/akka/akka-http/AGENTS.md b/dd-java-agent/instrumentation/akka/akka-http/AGENTS.md new file mode 100644 index 00000000000..1de9ff00341 --- /dev/null +++ b/dd-java-agent/instrumentation/akka/akka-http/AGENTS.md @@ -0,0 +1,30 @@ +# Akka HTTP AppSec Instrumentation + +Extended reference: [docs/appsec/ig-events.md](../../../../docs/appsec/ig-events.md), +[docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md) + +## Two multipart routes must both be instrumented + +Akka HTTP has two independent entry points for multipart form data. Any new WAF address that +captures multipart body data must instrument both: + +| Route | Entry point | Notes | +|---|---|---| +| Route 1 | `handleMultipartStrictFormData(Multipart$FormData$Strict)` | Has `reqCtx` as local variable; iterates via `getStrictParts()` | +| Route 2 | `handleStrictFormData(StrictForm)` | No `reqCtx` in scope; must obtain via `activeSpan()` | + +If only Route 1 is instrumented, multipart requests processed via `formFieldMultiMap` (Route 2) +silently miss the WAF event. + +## Do not extract filenames callback dispatch into `UnmarshallerHelpers` + +Keep the `requestFilesFilenames` callback dispatch inline in each advice class. The two routes +obtain `reqCtx` differently and handle the `pendingBlock` state separately. When the dispatch was +previously extracted into a shared helper, the filenames callback was silently skipped on Route 2 +when the body had already triggered blocking, because the helper did not propagate the pending +`BlockingException` correctly across the two call sites. + +## No `effectivelyBlocked()` in Akka advice + +Akka HTTP uses the Netty-style blocking model. Do not call `effectivelyBlocked()` in advice. +See [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md#netty-never-effectivelyblocked). diff --git a/dd-java-agent/instrumentation/jetty/jetty-appsec/AGENTS.md b/dd-java-agent/instrumentation/jetty/jetty-appsec/AGENTS.md new file mode 100644 index 00000000000..8c3264a4078 --- /dev/null +++ b/dd-java-agent/instrumentation/jetty/jetty-appsec/AGENTS.md @@ -0,0 +1,3 @@ +# Jetty AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md) diff --git a/dd-java-agent/instrumentation/jetty/jetty-appsec/jetty-appsec-9.3/src/main/java/datadog/trace/instrumentation/jetty93/RequestExtractContentParametersInstrumentation.java b/dd-java-agent/instrumentation/jetty/jetty-appsec/jetty-appsec-9.3/src/main/java/datadog/trace/instrumentation/jetty93/RequestExtractContentParametersInstrumentation.java index 3e1e2bf6d5c..f81a6f4bd6e 100644 --- a/dd-java-agent/instrumentation/jetty/jetty-appsec/jetty-appsec-9.3/src/main/java/datadog/trace/instrumentation/jetty93/RequestExtractContentParametersInstrumentation.java +++ b/dd-java-agent/instrumentation/jetty/jetty-appsec/jetty-appsec-9.3/src/main/java/datadog/trace/instrumentation/jetty93/RequestExtractContentParametersInstrumentation.java @@ -89,6 +89,8 @@ static void after( if (action instanceof Flow.Action.RequestBlockingAction) { Flow.Action.RequestBlockingAction rba = (Flow.Action.RequestBlockingAction) action; BlockResponseFunction blockResponseFunction = reqCtx.getBlockResponseFunction(); + // Unlike Netty (which calls effectivelyBlocked() internally in BlockingResponseHandler), + // Jetty advice must call it explicitly inside if (brf != null). if (blockResponseFunction != null) { blockResponseFunction.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); if (t == null) { diff --git a/dd-java-agent/instrumentation/netty/netty-4.1/AGENTS.md b/dd-java-agent/instrumentation/netty/netty-4.1/AGENTS.md new file mode 100644 index 00000000000..0aa0ac5a14a --- /dev/null +++ b/dd-java-agent/instrumentation/netty/netty-4.1/AGENTS.md @@ -0,0 +1,57 @@ +# Netty 4.1 AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md) + +## NEVER call `effectivelyBlocked()` in Netty advice + +`effectivelyBlocked()` must not appear in any Netty advice class or helper. `BlockingResponseHandler` calls it internally when the blocking response is committed. Calling it in advice causes a double-invocation and closes the span prematurely. + +This applies even to the urlencoded body-processed path in `HttpPostRequestDecoderInstrumentation` -- the span is already closed synchronously by `tryCommitBlockingResponse()` before advice can reach the mark. + +Correct pattern: +```java +// Netty advice — blocking path +if (brf != null) { + brf.tryCommitBlockingResponse(ctx.getTraceSegment(), rba); + thr = new BlockingException("..."); + // do NOT call ctx.getTraceSegment().effectivelyBlocked() +} +``` + +## `thr` creation must be inside `if (brf != null)` + +If `thr = new BlockingException(...)` is placed outside the `brf != null` guard, it is created even when no blocking action exists. The exception is then thrown unconditionally in `@Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class)`, producing spurious 500 responses. + +## FileUpload content: two-branch read + +`HttpData` stores uploads either in memory or on disk depending on size. Always branch on `isInMemory()`: + +```java +if (fileUpload.isInMemory()) { + ByteBuf buf = fileUpload.getByteBuf(); + // read from buf +} else { + File f = fileUpload.getFile(); + try (InputStream is = new FileInputStream(f)) { + // read from stream + } +} +``` + +Never call `getFile()` on an in-memory upload -- the file does not exist on disk and the call throws. + +## `@RequiresRequestContext` + `Config.get()` causes muzzle failure + +Do not declare `static final` fields initialized from `Config.get()` in any `@RequiresRequestContext`-annotated advice inner class. Muzzle validates those classes against the instrumented library's classpath (Netty), where `Config` is absent, causing `MuzzleValidationException` in CI. + +Move such constants to the helper class declared in `helperClassNames()`: + +```java +// Wrong -- in @RequiresRequestContext advice inner class +private static final int MAX_FILES = Config.get().getAppSecMaxFileContentCount(); + +// Correct -- in NettyMultipartHelper or similar helper +public class NettyMultipartHelper { + static final int MAX_FILES_TO_INSPECT = Config.get().getAppSecMaxFileContentCount(); +} +``` diff --git a/dd-java-agent/instrumentation/tomcat/tomcat-appsec/AGENTS.md b/dd-java-agent/instrumentation/tomcat/tomcat-appsec/AGENTS.md new file mode 100644 index 00000000000..390630acaa9 --- /dev/null +++ b/dd-java-agent/instrumentation/tomcat/tomcat-appsec/AGENTS.md @@ -0,0 +1,48 @@ +# Tomcat AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md), +[docs/appsec/file-content.md](../../../../docs/appsec/file-content.md), +[docs/appsec/multipart-frameworks.md](../../../../docs/appsec/multipart-frameworks.md) + +## MUST call `effectivelyBlocked()` in Tomcat blocking path + +Unlike Netty, Tomcat advice must call `effectivelyBlocked()` explicitly. The call must come AFTER +`t = new BlockingException(...)` and both must be inside `if (brf != null)`: + +```java +if (brf != null) { + brf.tryCommitBlockingResponse(ctx.getTraceSegment(), rba); + t = new BlockingException("..."); // 1. create first + ctx.getTraceSegment().effectivelyBlocked(); // 2. mark after +} +``` + +If `effectivelyBlocked()` throws (span already finished), the exception object must already exist +so the advice can still return a non-null value and let the container propagate it. + +## `Part` access via reflection: GlassFish restriction does NOT apply here + +Tomcat's `ParameterCollector` uses reflection to avoid bytecode references to `javax` vs `jakarta` +Part types. This is safe for Tomcat because the agent classloader is not subject to Java 9+ module +access restrictions for Tomcat's classloader. + +For GlassFish, direct cast via `javax.servlet.http.Part` interface is required instead. +See [docs/appsec/multipart-frameworks.md](../../../../docs/appsec/multipart-frameworks.md#glassfishpayara-no-reflection-via-parametercollector). + +## `inspectContent` flag must be evaluated before iterating parts + +Check whether the `requestFilesContent` callback is registered BEFORE iterating parts, not inside +the loop. This avoids calling `getInputStream()` on every file when no rule uses `files_content`: + +```java +boolean inspectContent = cbp.getCallback(EVENTS.requestFilesContent()) != null; +ParameterCollector collector = new ParameterCollector.ParameterCollectorImpl(inspectContent); +for (Object part : parts) { + collector.addPart(part); +} +``` + +## `helperClassNames()` must list all nested inner classes + +ByteBuddy does not auto-discover nested classes. Every `$Inner` class used at runtime must be +explicitly listed in `helperClassNames()` or a `NoClassDefFoundError` will occur silently. diff --git a/dd-java-agent/instrumentation/undertow/AGENTS.md b/dd-java-agent/instrumentation/undertow/AGENTS.md new file mode 100644 index 00000000000..907359cfd47 --- /dev/null +++ b/dd-java-agent/instrumentation/undertow/AGENTS.md @@ -0,0 +1,37 @@ +# Undertow AppSec Instrumentation + +Extended reference: [docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md), +[docs/appsec/ig-events.md](../../../../docs/appsec/ig-events.md) + +## `tryCommitBlockingResponse` is not idempotent in Undertow + +Unlike Tomcat/Jersey, Undertow's `tryCommitBlockingResponse` commits the response immediately. +Calling it twice sends a duplicate response and can throw an `IllegalStateException`. + +When an advice has two blocking paths (body + filenames), the `t == null` guard must be checked +BEFORE each `tryCommitBlockingResponse` call, not after: + +```java +// Wrong -- commits response even if already blocked +if (bodyCallback != null && !map.isEmpty()) { + Flow flow = bodyCallback.apply(ctx, map); + t = tryBlock(flow, ctx); +} +if (filenamesCallback != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(ctx, filenames); + t = tryBlock(flow, ctx); // tryBlock calls tryCommitBlockingResponse again +} + +// Correct -- guard prevents second commit +if (bodyCallback != null && !map.isEmpty()) { + Flow flow = bodyCallback.apply(ctx, map); + t = tryBlock(flow, ctx); +} +if (t == null && filenamesCallback != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(ctx, filenames); + t = tryBlock(flow, ctx); +} +``` + +See [docs/appsec/ig-events.md](../../../docs/appsec/ig-events.md#undertow-trycommitblockingresponse-is-not-idempotent) +for the full explanation of why this differs from other Servlet containers. diff --git a/dd-java-agent/instrumentation/undertow/undertow-2.0/src/main/java/datadog/trace/instrumentation/undertow/MultiPartUploadHandlerInstrumentation.java b/dd-java-agent/instrumentation/undertow/undertow-2.0/src/main/java/datadog/trace/instrumentation/undertow/MultiPartUploadHandlerInstrumentation.java index 52bd0b73897..019f7a7dc47 100644 --- a/dd-java-agent/instrumentation/undertow/undertow-2.0/src/main/java/datadog/trace/instrumentation/undertow/MultiPartUploadHandlerInstrumentation.java +++ b/dd-java-agent/instrumentation/undertow/undertow-2.0/src/main/java/datadog/trace/instrumentation/undertow/MultiPartUploadHandlerInstrumentation.java @@ -130,6 +130,8 @@ static void after( Flow.Action.RequestBlockingAction rba = (Flow.Action.RequestBlockingAction) filenamesAction; BlockResponseFunction brf = reqCtx.getBlockResponseFunction(); + // tryCommitBlockingResponse is not idempotent in Undertow: guard t == null before + // calling it to avoid committing a second block response when body already blocked. if (brf != null && t == null) { boolean success = brf.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); if (success) { diff --git a/dd-java-agent/instrumentation/vertx/vertx-web/AGENTS.md b/dd-java-agent/instrumentation/vertx/vertx-web/AGENTS.md new file mode 100644 index 00000000000..36a71f551e5 --- /dev/null +++ b/dd-java-agent/instrumentation/vertx/vertx-web/AGENTS.md @@ -0,0 +1,4 @@ +# Vert.x Web AppSec Instrumentation + +Extended reference: [docs/appsec/ig-events.md](../../../../docs/appsec/ig-events.md), +[docs/appsec/blocking-patterns.md](../../../../docs/appsec/blocking-patterns.md) diff --git a/docs/appsec/blocking-patterns.md b/docs/appsec/blocking-patterns.md new file mode 100644 index 00000000000..ff2a61507df --- /dev/null +++ b/docs/appsec/blocking-patterns.md @@ -0,0 +1,225 @@ +# AppSec Blocking Patterns + +How to correctly implement request blocking in advice classes. The pattern differs by server — getting it wrong causes silent failures where the block action is lost. + +## Netty 4.1 + +### `effectivelyBlocked()` — never call it in Netty advice + +`BlockingResponseHandler` calls `effectivelyBlocked()` internally as part of the blocking pipeline. When advice calls `tryCommitBlockingResponse()`, that triggers `BlockingResponseHandler`, which calls `effectivelyBlocked()` and finishes the span (`span.finish()`). A second call from advice on an already-finished span throws an exception. + +The advice has `@Advice.OnMethodExit(suppress = Throwable.class)`. With `suppress = Throwable.class`, the exception from `effectivelyBlocked()` is silently swallowed — the advice does not visibly fail. But if code follows `effectivelyBlocked()` (typically `thr = new BlockingException(...)`), that code **never executes** because the exception interrupted the flow. Result: the request is not aborted. + +**Correct pattern in Netty:** +```java +BlockResponseFunction brf = reqCtx.getBlockResponseFunction(); +if (brf != null) { + brf.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); + thr = new BlockingException("Blocked request"); + // DO NOT call effectivelyBlocked() here — BlockingResponseHandler already did it +} +``` + +### Exception: urlencoded body-processed path + +The body-processed (urlencoded) path in `HttpPostRequestDecoderInstrumentation` must **not** call `effectivelyBlocked()` even in the pattern above. In this path, `tryCommitBlockingResponse()` closes the span synchronously (in the Netty event loop). Calling `effectivelyBlocked()` afterwards raises `AssertionError: Interaction with TraceSegment after root span has already finished`, which is swallowed by `suppress = Throwable.class` and registered as an instrumentation error. + +The multipart paths (filenames, content) in the same advice **do** call `effectivelyBlocked()` within `if (brf != null)`. The asymmetry is intentional. + +| Path in `HttpPostRequestDecoderInstrumentation` | `effectivelyBlocked()` | +|---|---| +| body-processed (urlencoded) | no | +| filenames (multipart) | yes, inside `if (brf != null)` | +| content (multipart) | yes, inside `if (brf != null)` | + +### `thr` must be assigned inside `if (brf != null)` + +`thr = new BlockingException(...)` must be inside the `if (brf != null)` block. If assigned outside, it is set even when no blocking response function is available, causing the decoder to abort without having sent a blocking response. + +### Netty FileUpload content reading + +When reading bytes from `HttpData`/`FileUpload` for `files_content`, there are exactly two correct paths: + +```java +if (fileUpload.isInMemory()) { + ByteBuf buf = fileUpload.getByteBuf(); + int length = (int) Math.min((long) MAX_CONTENT_BYTES, (long) buf.readableBytes()); + byte[] bytes = new byte[length]; + buf.getBytes(buf.readerIndex(), bytes); // absolute read — does NOT advance readerIndex + contentStr = MultipartContentDecoder.decodeBytes(bytes, length, fileUpload.getContentType()); +} else { + byte[] bytes = new byte[MAX_CONTENT_BYTES]; + try (FileInputStream fis = new FileInputStream(fileUpload.getFile())) { + int read = fis.read(bytes); + contentStr = MultipartContentDecoder.decodeBytes(bytes, read < 0 ? 0 : read, fileUpload.getContentType()); + } +} +``` + +**Why not the alternatives:** +- `getInputStream()` — not declared in the `HttpData` interface; only available on `AbstractDiskHttpData` (internal class). Compilation fails on `FileUpload` type. +- `getByteBuf()` on disk-backed uploads — `AbstractDiskHttpData.getByteBuf()` loads the entire file into RAM. Catastrophic for large uploads. +- `get()` — same as above, loads everything. +- `getChunk(int)` — maintains internal state (`fileChannel`, `chunkPosition`). Side effects can interfere with Netty's upload lifecycle. + +--- + +## Tomcat / Servlet containers (Tomcat, GlassFish, Payara) + +### `effectivelyBlocked()` — always call it, but after assigning `t` + +Unlike Netty, the blocking response function in Tomcat does **not** call `effectivelyBlocked()` internally. It must be called explicitly in the advice. + +**Mandatory ordering:** assign `t = new BlockingException(...)` **before** calling `effectivelyBlocked()`. + +The advice has `@Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class)`. If `effectivelyBlocked()` throws and `suppress = Throwable.class` swallows the exception, and `effectivelyBlocked()` was called before assigning `t`, then `t` is never assigned and the request is not aborted. + +**Correct pattern — all three blocking paths:** +```java +BlockResponseFunction brf = reqCtx.getBlockResponseFunction(); +if (brf != null) { + brf.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); + t = new BlockingException("Blocked request (...)"); // 1. assign t + reqCtx.getTraceSegment().effectivelyBlocked(); // 2. then effectivelyBlocked +} +``` + +The three blocking paths in a typical Servlet advice (body-processed, filenames, content) must all follow this pattern consistently. + +### `t` must be assigned inside `if (brf != null)` + +Same rule as Netty: `t = new BlockingException(...)` must be inside `if (brf != null)`. Assigning it outside aborts the request even when no blocking response could be sent. + +### Filenames → content: sequential ordering + +When an advice handles both `requestFilesFilenames` and `requestFilesContent`: +- Fire filenames first. +- Fire content **only if** filenames did not block (`if (t == null)`). + +Filenames and content represent the same file upload. If the request is already blocked by filename, sending content to the WAF adds no value. + +```java +// filenames path +if (filenamesCallback != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(reqCtx, filenames); + if (action instanceof Flow.Action.RequestBlockingAction) { + if (brf != null) { + brf.tryCommitBlockingResponse(...); + t = new BlockingException("Blocked (filenames)"); + reqCtx.getTraceSegment().effectivelyBlocked(); + } + } +} + +// content path — only if filenames did not block +if (t == null && contentCallback != null) { + List contents = collector.getContents(); + if (!contents.isEmpty()) { + Flow flow = contentCallback.apply(reqCtx, contents); + // ... same blocking pattern + } +} +``` + +--- + +## Body + filenames: independent callbacks — both must always fire + +When an advice handles both `requestBodyProcessed` and `requestFilesFilenames` (or `requestFilesContent`) for the same request body, both callbacks must fire even if the first one triggers blocking. These are independent WAF addresses — a WAF config with only filenames rules will have `bodyCallback == null`. Silencing one because the other blocked deprives the WAF of information. + +**Wrong — throw on body blocks before filenames fires:** +```java +executeCallback(reqCtx, bodyCallback, conv); // throws BlockingException if blocked +filenamesCallback.apply(reqCtx, filenames); // never reached +``` + +**Correct — `pendingBlock` pattern:** +```java +BlockingException pendingBlock = null; + +if (bodyCallback != null) { + Flow flow = bodyCallback.apply(reqCtx, conv); + if (action instanceof Flow.Action.RequestBlockingAction) { + BlockResponseFunction brf = reqCtx.getBlockResponseFunction(); + if (brf != null) { + boolean success = brf.tryCommitBlockingResponse(reqCtx.getTraceSegment(), rba); + if (success) { + pendingBlock = new BlockingException("Blocked (body)"); + } + } + } +} + +if (filenamesCallback != null && filenames != null && !filenames.isEmpty()) { + Flow flow = filenamesCallback.apply(reqCtx, filenames); + if (pendingBlock == null && action instanceof Flow.Action.RequestBlockingAction) { + // apply blocking only if body did not block first + ... + pendingBlock = new BlockingException("Blocked (filenames)"); + } +} + +if (pendingBlock != null) throw pendingBlock; +``` + +The blocking logic (`tryCommitBlockingResponse`, `new BlockingException`) is specific to each server's `BlockResponseFunction`. It can be extracted to a helper **within the same module** but must **never be shared across different server modules**. + +--- + +## Undertow + +`UndertowBlockResponseFunction.tryCommitBlockingResponse` is **not idempotent**. Each call: +- Overwrites `exchange.putAttachment(TRACE_SEGMENT, segment)` +- Overwrites `exchange.putAttachment(REQUEST_BLOCKING_DATA, rab)` +- Dispatches to `UndertowBlockingHandler` + +If two blocking paths exist (body + filenames), the second call to `tryCommitBlockingResponse` generates a second dispatch on the IO thread. + +**The guard `t == null` must be placed before the call to `tryCommitBlockingResponse`**, not after: + +```java +// Correct — guard before +if (brf != null && t == null) { + boolean success = brf.tryCommitBlockingResponse(...); + if (success) { + t = new BlockingException("..."); + } +} + +// Wrong — guard after: tryCommitBlockingResponse already ran +if (brf != null) { + boolean success = brf.tryCommitBlockingResponse(...); + if (success && t == null) { // too late + t = new BlockingException("..."); + } +} +``` + +--- + +## GlassFish / Payara (module `tomcat-appsec-7.0`) + +`TomcatServerInstrumentation` is not active on Payara because `PECoyoteResponse` is not `org.apache.catalina.connector.Response`. As a result, `BlockResponseFunction` is never registered and `reqCtx.getBlockResponseFunction()` returns `null`. + +**Fallback:** obtain `HttpServletRequest` and `HttpServletResponse` from the `Multipart.request` field via `@Advice.FieldValue`, typed as `org.apache.catalina.Request` (the Catalina interface). See `GlassFishBlockingHelper` for the full implementation. + +**Why the interface, not the concrete class:** `connector.Request.getResponse()` has different JVM descriptors in Tomcat vs. GlassFish. `org.apache.catalina.Request.getResponse()` has the same descriptor in both — safe for muzzle. + +`GlassFishBlockingHelper.tryBlock()` uses a two-phase try/catch to handle the case where `effectivelyBlocked()` throws because the span was already finished after a successful commit: + +```java +try { + // commit the blocking response (brf or fallback) +} catch (Exception ignored) { + return false; // commit failed — cannot block this request +} +try { + reqCtx.getTraceSegment().effectivelyBlocked(); +} catch (Exception ignored) { + // span already finished — response was sent, blocking succeeded +} +return true; +``` + +**`Collections.emptyList()` instead of `BlockingException`** in the GlassFish advice return: propagating `BlockingException` through the Grizzly/Payara pipeline causes an uncontrolled shutdown. Returning an empty list is equivalent — the controller receives zero parts to process. + diff --git a/docs/appsec/file-content.md b/docs/appsec/file-content.md new file mode 100644 index 00000000000..cf4706a3ba8 --- /dev/null +++ b/docs/appsec/file-content.md @@ -0,0 +1,131 @@ +# AppSec File Content — Reading and Encoding + +How to correctly read multipart file upload content for the `server.request.body.files_content` WAF address. + +## Canonical helper: `FileItemContentReader` + +The reference implementation for commons-fileupload. Key constants: + +```java +public static final int MAX_CONTENT_BYTES = 4096; // per-file limit +public static final int MAX_FILES_TO_INSPECT = 25; // per-request limit +``` + +Both are `public static final` so that tests can reference them rather than hardcoding the values. + +```java +public static List readContents(List fileItems) { + List result = new ArrayList<>(); + for (FileItem fileItem : fileItems) { + if (result.size() >= MAX_FILES_TO_INSPECT) break; + if (fileItem.isFormField()) continue; // form fields have no file content + result.add(readContent(fileItem)); + } + return result; +} + +public static String readContent(FileItem fileItem) { + try (InputStream is = fileItem.getInputStream()) { // try-with-resources required + byte[] buf = new byte[MAX_CONTENT_BYTES]; + int total = 0, n; + while (total < MAX_CONTENT_BYTES && (n = is.read(buf, total, MAX_CONTENT_BYTES - total)) != -1) + total += n; + return MultipartContentDecoder.decodeBytes(buf, total, fileItem.getContentType()); + } catch (IOException ignored) { + return ""; // IO failure — report empty rather than crash the advice + } +} +``` + +--- + +## Form field vs. file upload: `isFormField()`, not filename null + +**Use `isFormField()` to skip non-file parts**, not `fileItem.getName() != null`. Files without a filename (e.g., `curl` upload without explicit `-F "file=@..."`) still have content and must be inspected. Form fields are text input, never file uploads. + +The filename null check is correct for `files_filenames` (no name → nothing to report) but **incorrect** for `files_content`. + +--- + +## `getInputStream()` must be closed with try-with-resources + +`FileItem.getInputStream()` may return a stream backed by a temp file on disk (managed by `FileCleaningTracker`). Without try-with-resources, the file descriptor stays open until GC, which can exhaust the process's available file descriptors on requests with many uploads. + +--- + +## Charset decoding: `MultipartContentDecoder` + +All file content for `files_content` must be decoded using `MultipartContentDecoder` in `internal-api`, not with a hardcoded charset. The decoder applies the following fallback chain: + +1. Charset declared in the part's `Content-Type` header (e.g., `text/xml; charset=UTF-8`) +2. `Charset.defaultCharset()` if no charset is declared +3. ISO-8859-1 if the declared charset produces decoding errors + +```java +// Correct +return MultipartContentDecoder.decodeBytes(buf, total, fileItem.getContentType()); + +// Wrong — hardcoded charset loses non-ASCII characters +return new String(buf, 0, total, StandardCharsets.ISO_8859_1); +``` + +**Why `new String(bytes, charset)` does not work for the fallback chain:** `new String(bytes, charset)` uses `CodingErrorAction.REPLACE` internally — it never throws, silently substituting `U+FFFD` for invalid bytes. The catch block for the ISO-8859-1 fallback would be dead code. `MultipartContentDecoder` uses `CharsetDecoder` with `CodingErrorAction.REPLACE` as well (after considering the truncation-at-boundary case), so the decoder applies the correct charset while being robust to partial multi-byte sequences at the `MAX_CONTENT_BYTES` boundary. + +--- + +## Integration-specific notes + +### Tomcat / GlassFish (`ParameterCollector`) + +The `Part` interface is accessed via reflection to avoid bytecode references to types that differ between javax and jakarta. The `CachedMethods` inner class caches `getSubmittedFileName()`, `getInputStream()`, and `getContentType()` using a volatile immutable holder — see [multipart-frameworks.md](multipart-frameworks.md#reflection-cache-pattern) for the pattern. + +The `inspectContent` flag must be evaluated **before** iterating over parts: + +```java +boolean inspectContent = cbp.getCallback(EVENTS.requestFilesContent()) != null; +ParameterCollector collector = new ParameterCollector.ParameterCollectorImpl(inspectContent); +for (Object part : parts) { + collector.addPart(part); +} +``` + +Without the flag, `getInputStream()` is called for every file even when no rule uses `files_content`, adding unnecessary I/O overhead. + +### Netty 4.1 (`NettyMultipartHelper`) + +Netty stores uploads either in memory (`ByteBuf`) or on disk (`File`). See [blocking-patterns.md](blocking-patterns.md#netty-fileupload-content-reading) for the two-branch read pattern using `isInMemory()`. + +### RESTEasy + +`InputPart.getBody(InputStream.class, null)` may return a `FileInputStream` for uploads above the in-memory threshold. Use try-with-resources: + +```java +try (InputStream is = inputPart.getBody(InputStream.class, null)) { + if (is == null) return ""; + return MultipartContentDecoder.readInputStream(is, MAX_CONTENT_BYTES, contentType); +} +``` + +`MultipartContentDecoder.readInputStream()` does **not** close the caller's stream — the caller is responsible. + +--- + +## Ordering: filenames before content, content only if not blocked + +When an advice fires both `requestFilesFilenames` and `requestFilesContent`: +- Fire filenames first. +- Fire content only if filenames did not produce a blocking action. + +See [blocking-patterns.md](blocking-patterns.md#filenames--content-sequential-ordering) for the full pattern. + +--- + +## Required tests for new implementations + +- Truncation: content larger than `MAX_CONTENT_BYTES` → result is exactly `MAX_CONTENT_BYTES` bytes +- `IOException` from `getInputStream()` → returns `""` +- Form fields: skipped in `readContents()` +- Files with null or empty filename: included (not filtered by name) +- More than `MAX_FILES_TO_INSPECT` files: only the first `MAX_FILES_TO_INSPECT` inspected +- Empty file list → empty content list +- `contentCallback.apply()` not called when content list is empty diff --git a/docs/appsec/ig-events.md b/docs/appsec/ig-events.md new file mode 100644 index 00000000000..3c77a2ae6d7 --- /dev/null +++ b/docs/appsec/ig-events.md @@ -0,0 +1,84 @@ +# AppSec Instrumentation Gateway Events + +How to add a new WAF address exposed through the Instrumentation Gateway (IG), and the invariants that apply to all IG event callbacks in advice classes. + +## Adding a new IG event — 4 required changes + +### 1. `Events.java` — declare the event type + +Add the event type, unique integer ID, and accessor method in +`dd-java-agent/src/main/java/datadog/trace/api/gateway/Events.java`. +The ID must be unique and sequential — check the last used ID before assigning. + +### 2. `InstrumentationGateway.java` — register the callback type + +Add the new event ID to the appropriate case in `getCallback()` for the matching `BiFunction` signature. + +### 3. `KnownAddresses.java` — declare the WAF address + +Declare the `Address` constant and add it to the `fromString()` switch in +`dd-java-agent/src/main/java/datadog/trace/api/gateway/KnownAddresses.java`. + +### 4. `GatewayBridge.java` — wire the callback + +Four changes needed: +1. Subscriber field: `private volatile DataSubscriberInfo requestFilesContentSubInfo;` +2. Registration in `init()` (conditional on `additionalIGEvents`): + ```java + if (additionalIGEvents.contains(EVENTS.requestFilesContent())) { + subscriptionService.registerCallback(EVENTS.requestFilesContent(), this::onRequestFilesContent); + } + ``` +3. Reset in `reset()`: `requestFilesContentSubInfo = null;` +4. Handler method (see pattern below) +5. Entry in `DATA_DEPENDENCIES`: + ```java + DATA_DEPENDENCIES.put(KnownAddresses.REQUEST_FILES_CONTENT, l(EVENTS.requestFilesContent())); + ``` + +### Handler pattern in `GatewayBridge` + +All handlers follow the same retry-on-expiry pattern. See existing handlers in `GatewayBridge.java` +for the full template (`onRequestBodyProcessed`, `onRequestFilesFilenames`, etc.). + +--- + +## Guard ordering in advice — obtain all callbacks before early return + +When an advice handles multiple IG callbacks, all callbacks must be fetched **before** any early-return guard. If a callback is fetched after a guard, it is silently lost when only that callback is registered. + +```java +// Wrong — contentCb is never fetched if bodyCallback is null +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +if (bodyCallback == null) return; // early return skips contentCb entirely +BiFunction<...> contentCb = cbp.getCallback(EVENTS.requestFilesContent()); + +// Correct — fetch all first, guard is AND of all +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +BiFunction<...> contentCb = cbp.getCallback(EVENTS.requestFilesContent()); +if (bodyCallback == null && contentCb == null) return; +``` + +**Subtle variant:** a callback fetched inside a conditional block after the loop is equally invisible when only that callback is registered: + +```java +// Also wrong — filenamesCb is never fetched when filenames list happens to be empty +// and bodyCallback already returned early +for (Part part : parts) { collector.addPart(part); } +if (!filenames.isEmpty()) { + BiFunction<...> filenamesCb = cbp.getCallback(EVENTS.requestFilesFilenames()); // too late + ... +} +``` + +The correct pattern: fetch all callbacks at the top of the method, then reference them by variable in conditional blocks. + +--- + +## `@RequiresRequestContext` + `Config.get()` — muzzle failure + +Do **not** place `static final` constants initialized from `Config.get()` in advice inner classes annotated with `@RequiresRequestContext`. + +Muzzle treats `@RequiresRequestContext`-annotated classes as user classes and validates all referenced types against the instrumented library's classpath (e.g., Netty, commons-fileupload), where `Config` does not exist. This causes a `MuzzleValidationException` in CI even though the code compiles. + +Put any `Config.get()` constants in a helper class declared in `helperClassNames()` instead. diff --git a/docs/appsec/multipart-frameworks.md b/docs/appsec/multipart-frameworks.md new file mode 100644 index 00000000000..108a2981cbf --- /dev/null +++ b/docs/appsec/multipart-frameworks.md @@ -0,0 +1,93 @@ +# Multipart AppSec — Cross-Cutting Patterns + +Invariants for implementing `server.request.body.filenames` and `server.request.body.files_content` across multipart frameworks. + +## Reflection cache pattern for version-variant return types + +When a method's return type differs between library versions (e.g., `getHeaders()` returns `javax.ws.rs.core.MultivaluedMap` in RESTEasy 3.x and `jakarta.ws.rs.core.MultivaluedMap` in RESTEasy 6.x), a direct bytecode reference fails muzzle for the other version. + +**Canonical solution:** cache the `Method` as a `static final` field, resolved once in a `static {}` block: + +```java +private static final Method GET_HEADERS; + +static { + Method m = null; + try { + m = InputPart.class.getMethod("getHeaders"); + } catch (NoSuchMethodException ignored) { + } + GET_HEADERS = m; +} +``` + +**Why `static final` and not `volatile`:** class initialization is atomic by JVM spec — the `static {}` block runs exactly once when the classloader loads the helper class. No race condition. `volatile` would be needed only for lazy initialization in an instance or a non-static field. + +**Runtime cost:** zero per request. The reflection lookup happens once at class load time in the app classloader, where the library is present. + +**Null guard required:** always check `if (GET_HEADERS == null) return ...` before invoking. The static block may set the field to `null` if the class is not present in the classpath (incompatible version). + +This pattern applies to any method whose return type or parameters differ between major library versions. + +## Per-part try/catch in for-loops + +`suppress = Throwable.class` on the advice protects the **advice boundary**, not the interior of loops inside helpers. If an exception is thrown while processing part N, it short-circuits the remaining parts — the WAF receives only partial data with no visible error. + +```java +for (FormDataBodyPart part : parts) { + try { + FormDataContentDisposition cd = part.getFormDataContentDisposition(); + collectBodyPart(part, cd, map, filenames); + } catch (Exception ignored) { + // malformed part — continue processing remaining parts + } +} +``` + +This applies to any helper that iterates over parts, items, or framework elements where each element can fail independently. + +## Servlet containers: blocking ordering in `tryBlock()` + +When implementing a `tryBlock()` helper for Servlet-based frameworks (Jersey, RESTEasy, Tomcat), the ordering inside the helper must be: + +```java +brf.tryCommitBlockingResponse(ctx.getTraceSegment(), rba); +BlockingException be = new BlockingException(message); // 1. create exception +ctx.getTraceSegment().effectivelyBlocked(); // 2. mark blocked +return be; +``` + +`new BlockingException` before `effectivelyBlocked()`: if `effectivelyBlocked()` throws (possible if the span is already finished) and the advice has `suppress = Throwable.class`, the exception object must already exist. Creating it beforehand guarantees the caller receives a non-null value regardless of what happens next. + +See [blocking-patterns.md](blocking-patterns.md) for Netty's different behaviour. + +## Three-callback ordering: body + filenames + content + +When all three callbacks coexist in a single advice: + +1. `requestBodyProcessed` (form fields) — always fires when map is non-empty. +2. `requestFilesFilenames` — always fires even if body blocked (`pendingBlock` pattern). +3. `requestFilesContent` — fires **only if** `t == null` (sequential gate). + +```java +// Guard — obtain all three upfront +BiFunction<...> bodyCallback = cbp.getCallback(EVENTS.requestBodyProcessed()); +BiFunction<...> filenamesCallback = cbp.getCallback(EVENTS.requestFilesFilenames()); +BiFunction<...> contentCallback = cbp.getCallback(EVENTS.requestFilesContent()); +if (bodyCallback == null && filenamesCallback == null && contentCallback == null) return; + +// body — fires always +if (bodyCallback != null && !map.isEmpty()) { bodyCallback.apply(...); tryBlock(...); } + +// filenames — fires always (independent WAF address) +if (filenamesCallback != null && !filenames.isEmpty()) { filenamesCallback.apply(...); tryBlock(...); } + +// content — fires only if not already blocked +if (t == null && contentCallback != null) { + List contents = readContents(...); + if (!contents.isEmpty()) { contentCallback.apply(...); tryBlock(...); } +} +``` + +Body and filenames are independent WAF addresses — a WAF config with only filenames rules will have `bodyCallback == null`. Silencing filenames because body already blocked deprives the WAF of information. + diff --git a/docs/appsec/rasp-file-io.md b/docs/appsec/rasp-file-io.md new file mode 100644 index 00000000000..b5d7ca82ffb --- /dev/null +++ b/docs/appsec/rasp-file-io.md @@ -0,0 +1,74 @@ +# RASP File I/O — CSI Pattern + +How to implement RASP call sites for `java.io` / `java.nio.file` operations. + +## Call site structure + +```java +@CallSite( + spi = {RaspCallSites.class}, // required — registers as RASP call site + helpers = FileIORaspHelper.class) // required — injects helper into target classloader +public class FilesCallSite { + + @CallSite.Before("fully.qualified.MethodSignature(java.nio.file.Path, ...)") + public static void beforeXxx(@CallSite.Argument(N) @Nullable final Path path) { + if (path != null) { + FileIORaspHelper.INSTANCE.beforeFileLoaded(path.toString()); + } + } +} +``` + +Both `spi` and `helpers` are mandatory: +- Without `spi = {RaspCallSites.class}`: the call site is not activated when RASP is enabled. +- Without `helpers = FileIORaspHelper.class`: `NoClassDefFoundError` at runtime when the helper is accessed. + +## `@CallSite.Before` descriptor rules + +- Always use fully-qualified type names: `java.nio.file.Path`, not `Path`. +- Varargs are declared as arrays: `java.nio.file.CopyOption[]`, not `CopyOption...`. ByteBuddy always sees the array form in bytecode. + +## Operation → event mapping + +| `Files.*` operation | `beforeFileLoaded` (LFI) | `beforeFileWritten` | +|---|---|---| +| `copy(InputStream, Path, ...)` | no | yes (target) | +| `copy(Path, Path, ...)` | yes (source) | yes (target) — dual event | +| `copy(Path, OutputStream)` | yes (source) | no (OutputStream ≠ filesystem path) | +| `move(Path, Path, ...)` | no | yes (target) | +| `read*`, `newInputStream`, `lines*` | yes | no | +| `write*`, `newOutputStream`, `newBufferedWriter` | no | yes | + +`Files.copy(Path, Path)` is the only dual-event case in this module. + +## `FileIORaspHelper` contract + +- `INSTANCE` is `public static` (not final) — replaceable in tests with a mock. +- `invokeRaspCallback` has three early returns: `!isAppSecRaspEnabled()`, `callback == null`, `span == null / ctx == null`. +- `BlockingException` is always re-thrown (separate catch from the general `Throwable` catch). +- Any other `Throwable` is suppressed with `LOGGER.debug` — RASP must never crash the application. + +## Testing + +```groovy +void 'test RASP Files.copy path to path'() { + setup: + final helper = Mock(FileIORaspHelper) + FileIORaspHelper.INSTANCE = helper + final source = newFile('src.txt').toPath() // newFile() creates file on disk — required for reads + final target = temporaryFolder.resolve('dst.txt') // resolve() → path without creating — required for writes + + when: + TestFilesSuite.copyPathToPath(source, target) + + then: + 1 * helper.beforeFileLoaded(source.toString()) + 1 * helper.beforeFileWritten(target.toString()) +} +``` + +Rules: +- Read operations → `newFile(name).toPath()` (file must exist on disk). +- Write operations → `temporaryFolder.resolve(name)` (empty path; the operation creates it). +- Always add a negative assertion `0 * helper.beforeFileWritten(_)` when the operation should not fire a write event. +- Spock automatically restores the `INSTANCE` mock after each test — no manual cleanup needed. diff --git a/docs/appsec/waf-api.md b/docs/appsec/waf-api.md new file mode 100644 index 00000000000..a908eeedb80 --- /dev/null +++ b/docs/appsec/waf-api.md @@ -0,0 +1,282 @@ +# WAF API Reference — libddwaf-java + +Reference for the [`libddwaf-java`](https://github.com/DataDog/libddwaf-java) binding used by the AppSec module. Based on libddwaf-java v1.30.0 / libddwaf v1.30.0. + +## Lifecycle + +``` +Waf.initialize(false) // loads native .so (singleton, irreversible) + ↓ +WafBuilder builder = new WafBuilder(config) +builder.addOrUpdateConfig("path", rulesMap) // → WafDiagnostics + ↓ +WafHandle handle = builder.buildWafHandleInstance() +builder.close() + ↓ +WafContext ctx = new WafContext(handle) // one per HTTP request + ↓ +Waf.ResultWithData r = ctx.run(persistentData, limits, metrics) +ctx.runEphemeral(ephemeralData, limits, metrics) + ↓ +ctx.close() // at end of request +handle.close() // when ruleset is reloaded +``` + +**Lifecycle rules:** +- `WafContext` is `Closeable` — use try-with-resources or close in `finally` +- `WafHandle` is thread-safe for reads (ReadWriteLock); write (close) is exclusive +- `WafBuilder.close()` may be called after `buildWafHandleInstance()` without affecting the handle +- Existing contexts continue working after `handle.close()` (reference counting) +- `Waf.deinitialize()` releases JNI — **not thread-safe**; crashes if threads are using the WAF + +--- + +## Java types accepted as address values + +The `Map` passed to `ctx.run()` maps address names to values: + +| Java type | WAF type (`ddwaf_object`) | Notes | +|---|---|---| +| `null` | `DDWAF_OBJ_NULL` | Serialized as explicit null | +| `String` | `DDWAF_OBJ_STRING` | UTF-16 → UTF-8. Truncated to `maxStringSize` UTF-16 chars | +| `CharSequence` (any) | `DDWAF_OBJ_STRING` | Same as String | +| `CharBuffer` (heap or direct) | `DDWAF_OBJ_STRING` | Zero-copy for direct `ByteOrder.nativeOrder()` | +| `Integer`, `Long`, `Short`, `Byte`, `AtomicInteger`, `BigInteger` | `DDWAF_OBJ_SIGNED` (int64_t) | Any non-float Number | +| `Double`, `Float`, `BigDecimal` | `DDWAF_OBJ_FLOAT` (double IEEE 754) | | +| `Boolean` | `DDWAF_OBJ_BOOL` | | +| `Map` | `DDWAF_OBJ_MAP` | Keys converted to String via `toString()`. Recursive | +| `Collection` (List, Set, etc.) | `DDWAF_OBJ_ARRAY` | Recursive | +| Native arrays (`String[]`, `int[]`, etc.) | `DDWAF_OBJ_ARRAY` | Recursive | +| Any `Iterable` | `DDWAF_OBJ_ARRAY` | Recursive | +| Any other object | `DDWAF_OBJ_NULL` | **Silent null — no exception thrown** | + +**Edge cases:** +- Empty strings `""`: valid, serialized as `DDWAF_OBJ_STRING` of length 0 +- `null` keys in a Map: converted to `""` (empty string) +- Incomplete UTF-16 surrogates: replaced with `U+FFFD` (no exception) +- Unknown objects (`new Object()`): silently become `DDWAF_OBJ_NULL` +- `ConcurrentModificationException` during iteration: propagates as Java exception + +--- + +## Limits (`Waf.Limits`) + +```java +Waf.Limits limits = new Waf.Limits( + int maxDepth, // max nesting depth + int maxElements, // max total nodes (maps + arrays, cumulative) + int maxStringSize, // max UTF-16 chars per string (keys AND values) + long generalBudgetInUs, // total microseconds (serialization + execution) + long runBudgetInUs // microseconds for ddwaf_run() (0 = no separate limit) +); +``` + +**Values used in dd-trace-java:** see `WafInitialization.java` and `AppSecSystem.java` for current values. + +**Truncation behavior** (tracked in `WafMetrics`): + +| Limit exceeded | Effect | Metric | +|---|---|---| +| `maxDepth` | Object replaced with empty map | `truncatedObjectTooDeepCount` | +| `maxElements` | Map/array replaced with empty map | `truncatedListMapTooLargeCount` | +| `maxStringSize` | String truncated | `truncatedStringTooLongCount` | +| `generalBudgetInUs` | `TimeoutWafException` | — | + +`maxElements` is a **global counter per `run()` call**, decremented for each node (maps, arrays, and their elements). A map with 2 keys costs 3 (the map itself + 2 elements). + +`maxStringSize` is in **UTF-16 chars, not bytes**. An emoji (e.g. `👍`) is 2 `char` units; if `maxStringSize=1` the emoji is truncated to empty. + +--- + +## Persistent vs. ephemeral data + +`WafContext` accepts two data slots per `run()`: + +| | Persistent | Ephemeral | +|---|---|---| +| Storage | Cached in context across calls | Only for the current call | +| Exclusion caching | Yes | No (always re-evaluated) | +| Typical use | HTTP headers, IP, URI, body | gRPC messages, streaming payloads | +| Released | On `ctx.close()` | At end of `ctx.run()` | + +Persistent data **accumulates** — if run 1 passes `server.request.headers` and run 2 passes `server.request.body`, the context has both in run 2. Rules that depend on both addresses are only evaluated once all their data is available. + +gRPC messages must be passed as **ephemeral** data (`runEphemeral()`). Passing them as persistent means only the first message is evaluated; subsequent messages show no new data. + +--- + +## WAF addresses — full catalogue + +Declared in `KnownAddresses.java`. + +### HTTP request + +| Address | Expected type | Notes | +|---|---|---| +| `server.request.method` | `String` | "GET", "POST", etc. | +| `server.request.uri.raw` | `String` | Full URI | +| `server.request.uri` | `Map` | Generated by `uri_parse` preprocessor. Keys: scheme, userinfo, host, port, path, query, fragment | +| `server.request.headers.no_cookies` | `Map>` | Headers without cookies. Keys lowercase | +| `server.request.cookies` | `Map` | | +| `server.request.query` | `Map>` | | +| `server.request.path_params` | `Map` | | +| `server.request.body` | `Map` or `String` | Parsed or raw body | +| `server.request.body.raw` | `String` | Unparsed body (input for processors) | +| `server.request.transport` | `String` | "http", "https", "grpc" | +| `server.request.jwt` | `Map` | Decoded JWT (generated by `jwt_decode` processor) | + +### HTTP response + +| Address | Expected type | Notes | +|---|---|---| +| `server.response.status` | `Integer` (unsigned) | Numeric HTTP code — **not** a String | +| `server.response.body` | `String` | | +| `server.response.headers.no_cookies` | `Map>` | | + +### Client and user + +| Address | Expected type | +|---|---| +| `http.client_ip` | `String` | +| `usr.id` | `String` | +| `usr.session_id` | `String` | + +### gRPC + +| Address | Expected type | Persistence | +|---|---|---| +| `grpc.server.request.message` | `Map` | **Ephemeral** | +| `grpc.server.request.metadata` | `Map>` | Persistent | + +### RASP + +| Address | Expected type | When | +|---|---|---| +| `server.db.statement` | `String` | Before executing SQL | +| `server.db.system` | `String` | "mysql", "postgresql", etc. | +| `server.io.fs.file` | `String` | File path being accessed | +| `server.io.net.url` | `String` | Outbound request URL | +| `server.sys.shell.cmd` | `String` | Shell command | + +### Business logic + +| Address | Expected type | +|---|---| +| `server.business_logic.users.login.success` | `Map` | +| `server.business_logic.users.login.failure` | `Map` | + +### Processor activator + +| Address | Value | +|---|---| +| `waf.context.processor` | `Map{"fingerprint": true}` — activates fingerprint generation | + +### Processor outputs (returned in `ResultWithData.attributes`, not passed as input) + +- `_dd.appsec.s.req.body` — body schema (gzip+base64) +- `_dd.appsec.fp.http.endpoint` — endpoint fingerprint +- `_dd.appsec.fp.http.header` — header fingerprint +- `_dd.appsec.fp.http.network` — network fingerprint +- `_dd.appsec.fp.session` — session fingerprint + +**Never pass these as input addresses** — they are outputs from `ResultWithData.attributes`. + +`server.request.headers.no_cookies` is **without cookies**. Passing all headers (including cookies) to this address is a bug; cookies go in `server.request.cookies`. + +`server.response.status` must be an `Integer`, not a String `"200"`. Changed from String to Integer in v1.28.0. + +--- + +## Result: `Waf.ResultWithData` + +```java +result.result // Waf.Result.OK or Waf.Result.MATCH +result.data // String JSON with events (null if OK) +result.actions // Map>: actions to execute +result.attributes // Map: WAF-derived data (schemas, fingerprints) +result.keep // boolean: whether to keep the trace +result.duration // long: nanoseconds of native execution +result.events // boolean: whether events were generated +``` + +**Known actions:** +- `block_request`: `{status_code: int, type: "json"|"html"|"auto", grpc_status_code: int}` +- `redirect_request`: `{status_code: int, location: String}` +- `generate_stack`: generate stack trace +- `generate_schema`: activate schema extraction + +From v1.29.0, each blocking action includes a `block_id` correlating with the event in `data`. + +--- + +## `WafDiagnostics` — result of `addOrUpdateConfig()` + +```java +WafDiagnostics d = builder.addOrUpdateConfig("path", map); + +d.getNumConfigOK() // int +d.getNumConfigError() // int +d.getAllErrors() // Map> +d.getRulesetVersion() // String +d.isWellStructured() // boolean +``` + +Available sections: `rules`, `customRules`, `rulesData`, `rulesOverride`, `exclusions`, `exclusionData`, `actions`, `processors`, `scanners`. + +--- + +## `WafMetrics` — execution metrics + +```java +WafMetrics m = new WafMetrics(); +ctx.run(data, limits, m); + +m.getTotalRunTimeNs() // total: serialization + native WAF +m.getTotalDdwafRunTimeNs() // native WAF only +m.getTruncatedStringTooLongCount() +m.getTruncatedListMapTooLargeCount() +m.getTruncatedObjectTooDeepCount() +``` + +Serialization time = `totalRunTimeNs - totalDdwafRunTimeNs`. + +--- + +## Obfuscation — `WafConfig` + +```java +WafConfig config = new WafConfig(); +config.obfuscatorKeyRegex = "(?i)pass|pwd|secret|api.key|token|..."; +config.obfuscatorValueRegex = "..."; +// To disable obfuscation: +config.obfuscatorKeyRegex = ""; +``` + +Obfuscated data appears as `` in `result.data`. + +--- + +## Schema extraction + +Activated by the `extract_schema` processor in the ruleset. Results are returned in `ResultWithData.attributes` under keys like `_dd.appsec.s.req.body`, already gzip-compressed and base64-encoded. Write them to the span as-is — do not decompress. + +**Type codes in schema:** + +| Code | Type | +|---|---| +| 1 | Null | +| 2 | Boolean | +| 4 | Integer | +| 8 | String | +| 16 | Float/Double | + +--- + +## Version history + +| Version | Changes | +|---|---| +| v1.30.0 | No breaking Java API changes | +| v1.29.0 | `block_id` in block/redirect actions; `processor_overrides` are now incremental | +| v1.28.0 | New address `server.request.uri` (from `uri_parse`); `server.response.status` changed from String to Integer | +| v1.28.1 | Float support in status codes for ruleset compatibility |