Skip to content

feat: per-alarm action and severity overrides#1880

Open
sai-ray wants to merge 15 commits into
mainfrom
sai/granular-alarm-targeting
Open

feat: per-alarm action and severity overrides#1880
sai-ray wants to merge 15 commits into
mainfrom
sai/granular-alarm-targeting

Conversation

@sai-ray
Copy link
Copy Markdown

@sai-ray sai-ray commented May 20, 2026

Fixes #1599

This PR adds alarmOverrides on ConstructHub so customers can change what fires for a specific alarm at deploy time.

Today every alarm is locked into one of three severity buckets and each bucket has one action. Customers can't reclassify a single alarm or attach a custom action to one. The existing AlarmSeverities interface only covers two named alarms.

Example usage

This is the actual stack used to verify the feature against a real AWS account; every override flavor is exercised below.

import * as cdk from 'aws-cdk-lib/core';
import { SnsAction } from 'aws-cdk-lib/aws-cloudwatch-actions';
import * as sns from 'aws-cdk-lib/aws-sns';
import { Construct } from 'constructs';
import { AlarmSeverity, ConstructHub, Isolation } from 'construct-hub';

export class TempStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const highTopic = new sns.Topic(this, 'HighTopic');
    const lowTopic = new sns.Topic(this, 'LowTopic');
    const customTopic = new sns.Topic(this, 'CustomTopic');

    new ConstructHub(this, 'ConstructHub', {
      sensitiveTaskIsolation: Isolation.UNLIMITED_INTERNET_ACCESS,
      alarmActions: {
        highSeverityAction: new SnsAction(highTopic),
        normalSeverityAction: new SnsAction(lowTopic),
      },
      alarmOverrides: {
        // (1) HIGH → LOW — downgrade
        'Sources/NpmJs/Canary/NotRunningOrFailing': { severity: AlarmSeverity.LOW },
        // (2) LOW → HIGH — upgrade (should appear on the high-sev dashboard)
        'PackageStats/Failures': { severity: AlarmSeverity.HIGH },
        // (3) actions-only — bypasses the bucket action
        'Ingestion/DLQNotEmpty': { actions: [new SnsAction(customTopic)] },
        // (4) severity + actions
        'Sources/NpmJs/Canary/StaleCanaryPackage': {
          severity: AlarmSeverity.HIGH,
          actions: [new SnsAction(customTopic)],
        },
        // (5) actions: [] — falls back to bucket action (does NOT mute)
        'VersionTracker/NotRunning': { actions: [] },
      },
    });
  }
}

The actual aws cloudwatch describe-alarms output after deploy:

Alarm Default sev Override Wired to
Sources/NpmJs/Canary/NotRunningOrFailing HIGH severity: LOW LowTopic
PackageStats/Failures LOW severity: HIGH HighTopic
Ingestion/DLQNotEmpty HIGH actions: [custom] CustomTopic
Sources/NpmJs/Canary/StaleCanaryPackage MEDIUM severity: HIGH, actions: [custom] CustomTopic
VersionTracker/NotRunning LOW actions: [] LowTopic (fallback)
Ingestion/Failure (control) HIGH HighTopic
VersionTracker/Failures (control) LOW LowTopic

The two control rows (no override) prove the override is what's changing the wiring, not a fixed rule.

Raw aws cloudwatch describe-alarms output
--- Sources/NpmJs/Canary/NotRunningOrFailing ---
  AlarmName: AlarmOverridesSanity/ConstructHub/Sources/NpmJs/Canary/NotRunningOrFailing
  StateValue: OK
  AlarmAction: arn:aws:sns:us-east-1:XXXXXXXXXXXX:AlarmOverridesSanity-LowTopic4E7342F5-2LhqEbe3Nq27  (LowTopic)

--- PackageStats/Failures ---
  AlarmName: AlarmOverridesSanity/ConstructHub/PackageStats/Failures
  StateValue: INSUFFICIENT_DATA
  AlarmAction: arn:aws:sns:us-east-1:XXXXXXXXXXXX:AlarmOverridesSanity-HighTopic335D797C-GajKrapOI2K8  (HighTopic)

--- Ingestion/DLQNotEmpty ---
  AlarmName: AlarmOverridesSanity/ConstructHub/Ingestion/DLQNotEmpty
  StateValue: OK
  AlarmAction: arn:aws:sns:us-east-1:XXXXXXXXXXXX:AlarmOverridesSanity-CustomTopicE1837878-3SNvSZVTUsuD  (CustomTopic)

--- Sources/NpmJs/Canary/StaleCanaryPackage ---
  AlarmName: AlarmOverridesSanity/ConstructHub/Sources/NpmJs/Canary/StaleCanaryPackage
  StateValue: OK
  AlarmAction: arn:aws:sns:us-east-1:XXXXXXXXXXXX:AlarmOverridesSanity-CustomTopicE1837878-3SNvSZVTUsuD  (CustomTopic)

--- VersionTracker/NotRunning ---
  AlarmName: AlarmOverridesSanity/ConstructHub/VersionTracker/NotRunning
  StateValue: OK
  AlarmAction: arn:aws:sns:us-east-1:XXXXXXXXXXXX:AlarmOverridesSanity-LowTopic4E7342F5-2LhqEbe3Nq27  (LowTopic)

--- Ingestion/Failure (control) ---
  AlarmName: AlarmOverridesSanity/ConstructHub/Ingestion/Failure
  StateValue: OK
  AlarmAction: arn:aws:sns:us-east-1:XXXXXXXXXXXX:AlarmOverridesSanity-HighTopic335D797C-GajKrapOI2K8  (HighTopic)

--- VersionTracker/Failures (control) ---
  AlarmName: AlarmOverridesSanity/ConstructHub/VersionTracker/Failures
  StateValue: OK
  AlarmAction: arn:aws:sns:us-east-1:XXXXXXXXXXXX:AlarmOverridesSanity-LowTopic4E7342F5-2LhqEbe3Nq27  (LowTopic)
  • severity — wire this alarm to a different bucket's action (AlarmSeverity.HIGH / MEDIUM / LOW).
  • actions — supply custom actions that bypass the buckets entirely.
  • Either, both, or neither.

Keys are the alarm's CloudWatch display name relative to the ConstructHub construct i.e. the same string a customer sees in tickets and the CloudWatch console.

Note

The existing AlarmActions interface calls the lowest-tier slot normalSeverity (legacy). Everywhere else in
the codebase (the AlarmSeverity enum, add*SeverityAlarm methods, and our new severity field) uses
LOW. Renaming AlarmActions.normalSeveritylowSeverity can be a separate PR.

Warning

Three previously-anonymous alarms now have explicit alarmNames

The lookup mechanism keys on the alarm's CloudWatch display name. Three alarms in the codebase predate the alarmName: ${scope.node.path}/... convention and had no explicit name set:

  • MonitoredCertificate/ACMAlarm (45-day cert expiry)
  • MonitoredCertificate/EndpointAlarm (45-day cert expiry)
  • Monitoring/WebCanary/Errors (web canary error rate)

This PR adds explicit names to all three, which means CloudFormation will replace the existing alarms (delete + create) on next deploy. The replacement is acceptable because (a) CloudWatch alarms hold no state, (b) the gap is bounded by the deploy duration, and (c) none of the three are time-sensitive (cert expiry is a 45-day window; the web canary fires only on sustained errors).

Implementation

  • New AlarmOverride interface in src/api.ts.
  • Monitoring reads (alarm.node.defaultChild as CfnAlarm | CfnCompositeAlarm).alarmName, strips the ConstructHub prefix, looks up the override. If found, fully wires the alarm itself; otherwise falls through to existing per-bucket logic. Each add*SeverityAlarm gains one early-return line; existing logic is untouched.
  • alarmOverrides lives entirely on ConstructHub props, no per-source plumbing.
  • Unknown override keys, and any registered alarm without an explicit alarmName, are surfaced as synth-time validation errors via node.addValidation. Validation only runs when alarmOverrides is non-empty, so existing customers and downstream forks are unaffected.
  • An actions: [] override falls back to the bucket action (rather than silently muting the alarm).

Coverage

Every alarm registered through IMonitoring is overridable. Three alarms that previously had no explicit alarmName (the two cert-expiry alarms and the web canary alarm) now have one, so they're covered too.

Tests

7 new tests in monitoring.test.ts: severity-only override, actions-only override, both, empty actions: [] falls back to bucket, unknown override key (synth-time error), default path with explicit alarmName, alarm registered without alarmName when overrides are set (synth-time error).


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

sai-ray and others added 10 commits May 20, 2026 16:10
Allow customers to override action wiring for specific alarms via the
new `alarmOverrides` prop on `ConstructHub`. Each entry, keyed by the
alarm's construct path, can set `severity` (route the alarm to a
different bucket's action) and/or `actions` (supply custom actions that
bypass the buckets entirely).
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Customer-facing change: override keys are now the alarm's CloudWatch display
name (e.g. 'Sources/NpmJs/Canary/NotRunningOrFailing') — the same string
visible in tickets and the CloudWatch console. Severity is now a plain string
('HIGH' | 'MEDIUM' | 'LOW') instead of an enum import.

Drops the AlarmPath enum; lookup reads `(alarm.node.defaultChild as CfnAlarm |
CfnCompositeAlarm).alarmName` and strips the ConstructHub prefix. Unknown
override keys are surfaced as synth-time validation errors.

Also wires up the dev-app to exercise all three override flavors so changes
can be verified end-to-end with `yarn dev:synth`.
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
The dev-app is the golden snapshot, so test scaffolding shouldn't ship in it.
Manual verification was done locally, no need to commit the topics and
alarmOverrides example into the canonical dev deployment.
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
These three alarms were the only ones in the construct tree without an
explicit `alarmName`, which meant CloudFormation generated opaque hash names
(e.g. WebAppExpirationMonitorACMAlarm12ABC34D-X7K9P2QRSTUV) and they couldn't
be targeted via `alarmOverrides`.

Setting an explicit name gives them readable ticket titles and makes them
override-able like the rest.
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
- AlarmOverride.severity uses the AlarmSeverity enum (consistent with the
  rest of the API; drops the string-literal union and severityFromString
  helper)
- align MonitoringProps and ConstructHubProps jsdoc on "CloudWatch display
  name" terminology
- throw at synth time when a registered alarm has no explicit alarmName,
  so a missing name doesn't silently make it un-overridable
- drop the `this.node.scope!` non-null assertion in favor of a safe fallback
- reword AlarmOverride.actions doc to handle the multi-action case
- add tests for default-path (alarm with name, no override) and missing-name
  (synth fails)
@sai-ray sai-ray force-pushed the sai/granular-alarm-targeting branch from 1e21b78 to 514a44e Compare May 21, 2026 23:54
github-actions Bot and others added 4 commits May 21, 2026 23:59
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
- guard `actions: []` so an empty array falls back to the bucket action
  instead of silently muting the alarm
- collapse the redundant cast on `alarm.node.defaultChild` and document the
  literal-alarmName-only requirement
- only run the missing-alarmName validation when `alarmOverrides` is
  non-empty, so subclasses with anonymous alarms aren't rejected
- point the unmatched-key error at `ConstructHubProps.alarmOverrides`
- drop the extra trailing newline
- add a test for the `actions: []` fallback
- ConstructHubProps.alarmOverrides: severity is `AlarmSeverity.HIGH/MEDIUM/LOW`,
  not the string-literal union from an earlier draft
- AlarmOverride.actions: document that an empty array falls back to the
  bucket action rather than silently muting the alarm
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@sai-ray sai-ray marked this pull request as ready for review May 22, 2026 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow granular targeting of Alarms

1 participant