Skip to content

New serverless pattern - lambda-durable-execution-java-cdk#3074

Open
NithinChandranR-AWS wants to merge 5 commits into
aws-samples:mainfrom
NithinChandranR-AWS:NithinChandranR-AWS-feature-lambda-durable-execution-java-cdk
Open

New serverless pattern - lambda-durable-execution-java-cdk#3074
NithinChandranR-AWS wants to merge 5 commits into
aws-samples:mainfrom
NithinChandranR-AWS:NithinChandranR-AWS-feature-lambda-durable-execution-java-cdk

Conversation

@NithinChandranR-AWS
Copy link
Copy Markdown
Contributor

New Serverless Pattern: Lambda Durable Execution with Java SDK

Description

Deploys a Lambda durable function written in Java that orchestrates a multi-step order processing workflow with automatic checkpointing and failure recovery using the Durable Execution SDK for Java (v1.0.1, GA April 2026).

Architecture

Invoke → Lambda Durable Function (Java 17, Docker)
  Step 1: Validate Order ✓ checkpoint
  Step 2: Reserve Inventory ✓ checkpoint
  Step 3: Process Payment ✓ checkpoint
  Wait:   Warehouse Processing (zero compute) ✓ checkpoint
  Step 4: Confirm Shipment ✓ checkpoint

Key Features

  • First Java-based durable execution pattern in this repo
  • DurableHandler<Map, Map> base class with DurableContext
  • ctx.step() for checkpointed operations, ctx.wait() for zero-cost suspension
  • Docker-based Java 17 Lambda with Maven build
  • CDK TypeScript infrastructure with DurableConfig escape hatch
  • If interrupted, replays from beginning but skips completed steps

Framework / Language

  • AWS CDK (TypeScript)
  • Lambda: Java 17 (Docker image)

Deployment & Testing

  • Deployed and tested successfully on AWS
  • All 5 workflow steps complete with checkpointing
  • Durable execution status visible in Lambda console

Files

File Purpose
lib/lambda-durable-execution-java-stack.ts CDK stack
src/main/java/com/example/OrderProcessor.java Java handler
src/pom.xml Maven config with durable SDK dependency
src/Dockerfile Java 17 Lambda container
example-pattern.json Serverless Land metadata

… Java SDK pattern

Deploy a resilient multi-step order processing workflow using the AWS
Lambda Durable Execution SDK for Java (v1.0.1) with automatic
checkpointing and failure recovery.

Key features:
- DurableHandler<Map, Map> base class with DurableContext
- 5-step workflow: validate, reserve, pay, wait, ship
- ctx.step() for checkpointed operations
- ctx.wait() for zero-compute-cost suspension
- Docker-based Java 17 Lambda with Maven build
- CDK TypeScript infrastructure with DurableConfig escape hatch
Replace inline durable execution policy (wildcard resources) with the
AWS managed policy for least-privilege IAM, matching the approach
recommended in PR aws-samples#3053 review feedback.
@NithinChandranR-AWS
Copy link
Copy Markdown
Contributor Author

Hi @biswanathmukherjee 👋 Friendly nudge — this pattern is ready for review. Deployed and tested end-to-end on a live AWS account. Would appreciate a look when you have time. Thank you!

@NithinChandranR-AWS
Copy link
Copy Markdown
Contributor Author

Hi @biswanathmukherjee 👋 This is the first Java-based durable execution pattern — a completely different SDK surface (DurableHandler<I,O>, DurableContext) from the Node.js version. Enterprise Java customers need this dedicated reference. Deployed and tested.

@NithinChandranR-AWS
Copy link
Copy Markdown
Contributor Author

Hi @bfreiberg 👋 — friendly nudge on this pattern. It's been deployed and tested end-to-end on a live AWS account. Happy to address any feedback. Thank you!


@Override
public Map<String, Object> handleRequest(Map<String, Object> input, DurableContext ctx) {
String orderId = (String) input.getOrDefault("orderId", UUID.randomUUID().toString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-deterministic UUID outside any durable operation.

UUID.randomUUID() produces a different value on every replay. On a replay after a mid-workflow interruption, the rebuilt orderId won't match what completed steps were keyed against, and any new branch that reads orderId from this line will see a different value than the original invocation. AWS docs flag UUID generation specifically as code that must be wrapped in a step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — wrapped in ctx.step("generate-order-id", String.class, stepCtx -> UUID.randomUUID().toString()) so the value is checkpointed and deterministic on replay. If orderId is provided in input, it's used directly without a step.


// Step 1: Validate order
String validation = ctx.step("validate-order", String.class, stepCtx -> {
System.out.println("Validating order " + orderId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java SDK's StepContext exposes getLogger() which returns a DurableLogger enriched with execution-context metadata (SDK Reference → Step → StepContext; Logging). System.out.println works (stdout reaches CloudWatch) but loses retry-attempt counters, replay flags, and the correlation fields the SDK adds.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced all System.out.println with stepCtx.getLogger().info() to get the DurableLogger metadata (retry-attempt counters, replay flags, correlation fields).

@NithinChandranR-AWS
Copy link
Copy Markdown
Contributor Author

Thank you @parikhudit — both excellent catches! Fixed in commit 1c353ae0:

  1. Non-deterministic UUID — wrapped UUID.randomUUID() in ctx.step("generate-order-id", ...) so the value is checkpointed and stable across replays. If orderId is provided in input, it's used directly without a step.

  2. Structured logging — replaced all System.out.println with stepCtx.getLogger().info() to get DurableLogger metadata (retry-attempt counters, replay flags, correlation fields).

Redeployed and tested — durable execution completes successfully with both fixes. Pushing shortly.

```bash
npm install
```
5. Deploy the stack:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a one-line note for first-time CDK users:

e.g. If this is the first time you deploy a CDK stack in this account/region, run cdk bootstrap before cdk deploy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added: > **Note:** If this is the first time you deploy a CDK stack in this account/region, run \cdk bootstrap` before `cdk deploy`.`

- Wrap UUID.randomUUID() in ctx.step() to ensure stable orderId
  across replays after mid-workflow interruption
- Replace System.out.println with stepCtx.getLogger().info() for
  DurableLogger with retry-attempt counters, replay flags, and
  correlation metadata

Addresses review feedback from parikhudit.
@NithinChandranR-AWS
Copy link
Copy Markdown
Contributor Author

Also added the cdk bootstrap note for first-time CDK users in the same commit. Thank you @parikhudit!

@@ -0,0 +1,5 @@
FROM maven:3.9-amazoncorretto-17 AS build
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused Dockerfile, purpose unclear.

The CDK stack uses lambda.Code.fromAsset() and the README's Deployment step 3 runs mvn clean package -q directly on the host, so this Dockerfile is never invoked anywhere in the documented deploy flow.

  • If it's intended as a build-helper convenience (so contributors who don't have Maven installed locally can produce the JAR with docker build src/), please:

    • Make the JAR extractable by adding a final stage like FROM scratch and COPY --from=build /app/target/*.jar / so the artifact can be pulled out cleanly with docker create + docker cp (without that, the only way to retrieve the JAR is to dig into a stopped build container, which is awkward).
    • Mention it in the README so readers know the path exists, e.g.:
      Alternative build (no Maven on host required, verify before using):
docker build -t durable-builder src/
docker create --name x durable-builder
docker cp x:/app/target/. src/target/
docker rm x
  • If it's not actually used by anyone, please remove src/Dockerfile so it doesn't confuse readers leaving an unreferenced Dockerfile suggests Docker is part of the deploy story when it isn't.

Either way, dropping the unused file or documenting it as a build helper is fine what's important is that the README and the file agree on the intent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's a leftover from an earlier Docker-based approach. Removed it — the deploy flow is just mvn package on host + Code.fromAsset() pointing at the JAR. Cleaner this way.

The CDK stack uses Code.fromAsset() with the pre-built JAR and the
README instructs mvn clean package on the host. The Dockerfile was
never part of the deploy flow.
iam.ManagedPolicy.fromAwsManagedPolicyName(
"service-role/AWSLambdaBasicDurableExecutionRolePolicy"
)
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native durableConfig property + L2 Alias is cleaner than the L1 escape hatch.

The official CDK example (Deploy Lambda durable functions with IaC → AWS CDK) uses the native durableConfig property on lambda.Function and a regular lambda.Alias against fn.currentVersion:

const fn = new lambda.Function(this, 'DurableOrderProcessorFn', {
  ...,
  durableConfig: {
    executionTimeout: cdk.Duration.hours(1),
    retentionPeriod: cdk.Duration.days(7),
  },
});
const alias = new lambda.Alias(this, 'ProdAlias', {
  aliasName: 'prod',
  version: fn.currentVersion,
});

Two possible issues with the current escape-hatch approach:

  • Mixing L2 Function with L1 CfnVersion/CfnAlias means fn.currentVersion won't reflect the published version, which could be an issue for any future code that reads it.
  • The comment "to avoid CDK version property validation" suggests the targeted CDK version may have lacked the native property. If so, please add a comment with the minimum CDK version where the native property becomes available so a future contributor can clean this up; if the native property already exists in aws-cdk-lib@2.180.0 or latest, please switch to it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call. Upgraded to aws-cdk-lib@2.257.0 which has the native durableConfig property -- switched to that plus L2 Alias with fn.currentVersion. Much cleaner, no more escape hatches. The original approach was because 2.180.0 didn't have it yet.

Switched from L1 escape hatch (CfnFunction.addOverride + CfnVersion +
CfnAlias) to native durableConfig property on lambda.Function with
L2 lambda.Alias against fn.currentVersion. Requires aws-cdk-lib@2.257.0.

Cleaner, type-safe, and consistent with official CDK docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants