From c212defe419ec499c0fdc81250b21b4fcee69198 Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Mon, 25 May 2026 14:09:03 +0200 Subject: [PATCH 1/6] Add design spec for software delivery agent skills documentation Co-Authored-By: Claude Sonnet 4.6 (1M context) --- ...5-software-delivery-agent-skills-design.md | 80 +++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md diff --git a/docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md b/docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md new file mode 100644 index 00000000000..b45944ce525 --- /dev/null +++ b/docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md @@ -0,0 +1,80 @@ +# Design: Document triage-flaky-test and unblock-pr Agent Skills + +**Date:** 2026-05-25 +**Jira:** SDCT-405 +**Epic:** SDCT-240 + +## Summary + +Add an "Agent skills" section to `content/en/getting_started/software_delivery_mcp_tools/_index.md` documenting the `triage-flaky-test` and `unblock-pr` skills. Structure mirrors `llm_observability/mcp_server.md#agent-skills`. + +## Skills Being Documented + +### triage-flaky-test + +Source: `dd-source/domains/mcp_services/libs/go/mcp/tools/skills/datadog/triage-flaky-test/SKILL.md` + +- Investigates a specific flaky test +- Pulls 30-day failure history, top error messages, blast radius across pipelines +- Surfaces AI-generated fix (CodeGen) if `attempt_to_fix_id` is present; otherwise proposes agent-native fix from flaky category + stack trace +- Produces a structured triage brief: test name, category, failure rate, duration lost, codeowners, blast radius, recommendation (fix / quarantine / escalate) +- Quarantine actions require explicit user approval before calling `update_datadog_flaky_test_states` (reversible) +- Required toolsets: `core`, `software-delivery` + +### unblock-pr + +Source: `dd-source/domains/mcp_services/libs/go/mcp/tools/skills/datadog/unblock-pr/SKILL.md` + +- Investigates a failing PR CI pipeline +- Runs blame guard per failing job: checks if job was already failing on default branch or other branches +- Classifies each job failure as **flaky**, **infra**, or **regression** +- Produces a triage brief with per-job classification, evidence, confidence, and recommended action +- For flaky: chains into `triage-flaky-test`; for infra: offers CI retry via `gh run rerun`; for regression: prompts user to investigate their changes +- Required toolsets: `core`, `software-delivery` + +## Page Change + +**File:** `content/en/getting_started/software_delivery_mcp_tools/_index.md` + +**Placement:** New `## Agent skills` section added after `## Setup`, before `## Further reading`. + +### Section structure + +``` +## Agent skills + +[intro paragraph — skills available automatically via MCP; npx for slash commands] + +### Install + +[zero-install path first, then npx optional] + +### Available skills + +[summary table: Skill | Invoke with | What it does] + +### Triage flaky test + +[description, approval gate callout, usage examples] + +### Unblock PR + +[description, chains into triage-flaky-test, usage examples] +``` + +### Skill set name + +Placeholder `dd-software-delivery` — to be confirmed when skills are added to `datadog-labs/agent-skills`. + +### Key content decisions + +- **Lead with MCP-native (zero-install)**: Skills are loaded automatically by the MCP server when the prompt matches. The `npx` install is optional and enables explicit slash command invocation. +- **Approval gate callout for quarantine**: `triage-flaky-test` can call `update_datadog_flaky_test_states`; must note that quarantine requires explicit user approval and is reversible. +- **unblock-pr chains into triage-flaky-test**: Document this relationship so users understand flaky failures trigger a deeper investigation automatically. + +## What Is Not Changing + +- No new pages (Option A — single page addition) +- No changes to the Setup section or existing tools reference +- No guide page (can be added later if content grows) +- Translated content is managed externally — English only From ea0f90c010fe3115cdb8fc1a6f924cdc757e993e Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Mon, 25 May 2026 14:12:56 +0200 Subject: [PATCH 2/6] Revert "Add design spec for software delivery agent skills documentation" This reverts commit c212defe419ec499c0fdc81250b21b4fcee69198. --- ...5-software-delivery-agent-skills-design.md | 80 ------------------- 1 file changed, 80 deletions(-) delete mode 100644 docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md diff --git a/docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md b/docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md deleted file mode 100644 index b45944ce525..00000000000 --- a/docs/superpowers/specs/2026-05-25-software-delivery-agent-skills-design.md +++ /dev/null @@ -1,80 +0,0 @@ -# Design: Document triage-flaky-test and unblock-pr Agent Skills - -**Date:** 2026-05-25 -**Jira:** SDCT-405 -**Epic:** SDCT-240 - -## Summary - -Add an "Agent skills" section to `content/en/getting_started/software_delivery_mcp_tools/_index.md` documenting the `triage-flaky-test` and `unblock-pr` skills. Structure mirrors `llm_observability/mcp_server.md#agent-skills`. - -## Skills Being Documented - -### triage-flaky-test - -Source: `dd-source/domains/mcp_services/libs/go/mcp/tools/skills/datadog/triage-flaky-test/SKILL.md` - -- Investigates a specific flaky test -- Pulls 30-day failure history, top error messages, blast radius across pipelines -- Surfaces AI-generated fix (CodeGen) if `attempt_to_fix_id` is present; otherwise proposes agent-native fix from flaky category + stack trace -- Produces a structured triage brief: test name, category, failure rate, duration lost, codeowners, blast radius, recommendation (fix / quarantine / escalate) -- Quarantine actions require explicit user approval before calling `update_datadog_flaky_test_states` (reversible) -- Required toolsets: `core`, `software-delivery` - -### unblock-pr - -Source: `dd-source/domains/mcp_services/libs/go/mcp/tools/skills/datadog/unblock-pr/SKILL.md` - -- Investigates a failing PR CI pipeline -- Runs blame guard per failing job: checks if job was already failing on default branch or other branches -- Classifies each job failure as **flaky**, **infra**, or **regression** -- Produces a triage brief with per-job classification, evidence, confidence, and recommended action -- For flaky: chains into `triage-flaky-test`; for infra: offers CI retry via `gh run rerun`; for regression: prompts user to investigate their changes -- Required toolsets: `core`, `software-delivery` - -## Page Change - -**File:** `content/en/getting_started/software_delivery_mcp_tools/_index.md` - -**Placement:** New `## Agent skills` section added after `## Setup`, before `## Further reading`. - -### Section structure - -``` -## Agent skills - -[intro paragraph — skills available automatically via MCP; npx for slash commands] - -### Install - -[zero-install path first, then npx optional] - -### Available skills - -[summary table: Skill | Invoke with | What it does] - -### Triage flaky test - -[description, approval gate callout, usage examples] - -### Unblock PR - -[description, chains into triage-flaky-test, usage examples] -``` - -### Skill set name - -Placeholder `dd-software-delivery` — to be confirmed when skills are added to `datadog-labs/agent-skills`. - -### Key content decisions - -- **Lead with MCP-native (zero-install)**: Skills are loaded automatically by the MCP server when the prompt matches. The `npx` install is optional and enables explicit slash command invocation. -- **Approval gate callout for quarantine**: `triage-flaky-test` can call `update_datadog_flaky_test_states`; must note that quarantine requires explicit user approval and is reversible. -- **unblock-pr chains into triage-flaky-test**: Document this relationship so users understand flaky failures trigger a deeper investigation automatically. - -## What Is Not Changing - -- No new pages (Option A — single page addition) -- No changes to the Setup section or existing tools reference -- No guide page (can be added later if content grows) -- Translated content is managed externally — English only From 50b2eae968711215f61cda98843dc39193560377 Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Mon, 25 May 2026 14:26:25 +0200 Subject: [PATCH 3/6] [SDCT-405] Document triage-flaky-test and unblock-pr agent skills Co-Authored-By: Claude Sonnet 4.6 (1M context) --- .../software_delivery_mcp_tools/_index.md | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/content/en/getting_started/software_delivery_mcp_tools/_index.md b/content/en/getting_started/software_delivery_mcp_tools/_index.md index 2207cd2147b..1a5f262ecab 100644 --- a/content/en/getting_started/software_delivery_mcp_tools/_index.md +++ b/content/en/getting_started/software_delivery_mcp_tools/_index.md @@ -97,6 +97,51 @@ https://mcp.{{< region-param key="dd_site" >}}/api/unstable/mcp-server/mcp?tools For full setup instructions including client configuration for Cursor, Claude Code, VS Code, and other AI clients, see [Set Up the Datadog MCP Server][1]. +## Agent skills + +Agent skills are prebuilt instruction sets for AI coding agents that automate common Software Delivery workflows. The `dd-software-delivery` skill set is available in the [Datadog agent-skills][6] repository. It provides two skills for triaging flaky tests and unblocking failing PR pipelines against your live CI and Test Optimization data. + +Skills are loaded automatically by the MCP server when your prompt matches their purpose — for example, "Why is TestMyFunc flaky?" loads `/triage-flaky-test` automatically. You can also invoke them explicitly with a slash command after installing them locally. + +### Install + +The skills are available automatically when the `software-delivery` MCP toolset is connected — no installation required. To also invoke them explicitly with a slash command, install them locally: + +```shell +npx skills add datadog-labs/agent-skills --skill dd-software-delivery --full-depth -y +``` + +Restart Claude Code after installing for the slash commands to appear. + +### Available skills + +| Skill | Invoke with | What it does | +|-------|-------------|-------------| +| Triage flaky test | `/triage-flaky-test` | Get history, failure pattern, and AI category for a specific flaky test, then recommend fix, quarantine, or escalate | +| Unblock PR | `/unblock-pr` | Attribute each CI failure on a PR as flaky, infra, or regression and propose a targeted action | + +### Triage flaky test + +`/triage-flaky-test` investigates a specific flaky test. It pulls 30-day failure history, extracts the top error messages and stack traces, and checks how many pipelines the test has impacted. If a CodeGen AI fix exists for the test, the skill surfaces it directly. Otherwise, it proposes a targeted fix based on the flaky category and stack trace. It produces a structured triage brief with a recommendation to fix, quarantine, or escalate to the owning team. + +If the skill recommends quarantine, it presents the proposed action and requires your explicit approval before calling `update_datadog_flaky_test_states`. All state changes are reversible. + +``` +/triage-flaky-test TestMyFunc +/triage-flaky-test com.example.MyTest github.com/org/repo +``` + +### Unblock PR + +`/unblock-pr` investigates a failing PR CI pipeline. For each failing job, it checks whether the failure was already present on the default branch or on other branches — a blame guard that classifies the failure as **flaky**, **infra**, or **regression**. It produces a triage brief with per-job classification, evidence, and a recommended action. + +For flaky failures, the skill chains into `triage-flaky-test` for a deeper investigation. For infra failures, it offers to retry the failed jobs. For regressions, it prompts you to investigate your code changes. + +``` +/unblock-pr +/unblock-pr my-feature-branch +``` + ## Further reading {{< partial name="whats-next/whats-next.html" >}} @@ -106,3 +151,4 @@ For full setup instructions including client configuration for Cursor, Claude Co [3]: /continuous_integration/ [4]: /tests/ [5]: /getting_started/site/ +[6]: https://github.com/datadog-labs/agent-skills From ba4b1458491c7e2c476baec91c8c7b51636882fd Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Mon, 25 May 2026 16:30:02 +0200 Subject: [PATCH 4/6] Update agent skills docs: retry tool, unblock-pr PR health, prompt fix Co-Authored-By: Claude Sonnet 4.6 (1M context) --- .../software_delivery_mcp_tools/_index.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/content/en/getting_started/software_delivery_mcp_tools/_index.md b/content/en/getting_started/software_delivery_mcp_tools/_index.md index 1a5f262ecab..47d121a4410 100644 --- a/content/en/getting_started/software_delivery_mcp_tools/_index.md +++ b/content/en/getting_started/software_delivery_mcp_tools/_index.md @@ -72,6 +72,9 @@ The `software-delivery` toolset includes the following tools: `aggregate_dora_deployments` : Aggregate DORA metrics—deployment frequency, change lead time, change failure rate, and recovery time—as scalar values or timeseries. For a complete DORA summary, call this tool four times in parallel, once per metric. +`retry_datadog_ci_job` +: Queue a retry for a failed GitHub Actions CI job. Requires `CiVisibilityWrite` permission and explicit user approval. Server-side safety rails cap retries at two per job over seven days. GitHub Actions only — for other CI providers, use the provider's UI to rerun. + ## Example prompts After you are connected, try prompts like: @@ -101,7 +104,7 @@ For full setup instructions including client configuration for Cursor, Claude Co Agent skills are prebuilt instruction sets for AI coding agents that automate common Software Delivery workflows. The `dd-software-delivery` skill set is available in the [Datadog agent-skills][6] repository. It provides two skills for triaging flaky tests and unblocking failing PR pipelines against your live CI and Test Optimization data. -Skills are loaded automatically by the MCP server when your prompt matches their purpose — for example, "Why is TestMyFunc flaky?" loads `/triage-flaky-test` automatically. You can also invoke them explicitly with a slash command after installing them locally. +Skills are loaded automatically by the MCP server when your prompt matches their purpose — for example, "TestMyFunc keeps failing in CI — investigate it" loads `/triage-flaky-test` automatically. You can also invoke them explicitly with a slash command after installing them locally. ### Install @@ -118,7 +121,7 @@ Restart Claude Code after installing for the slash commands to appear. | Skill | Invoke with | What it does | |-------|-------------|-------------| | Triage flaky test | `/triage-flaky-test` | Get history, failure pattern, and AI category for a specific flaky test, then recommend fix, quarantine, or escalate | -| Unblock PR | `/unblock-pr` | Attribute each CI failure on a PR as flaky, infra, or regression and propose a targeted action | +| Unblock PR | `/unblock-pr` | Attribute each CI failure on a PR as flaky, infra, or regression, surface code coverage and quality violations, and propose a targeted action | ### Triage flaky test @@ -133,9 +136,9 @@ If the skill recommends quarantine, it presents the proposed action and requires ### Unblock PR -`/unblock-pr` investigates a failing PR CI pipeline. For each failing job, it checks whether the failure was already present on the default branch or on other branches — a blame guard that classifies the failure as **flaky**, **infra**, or **regression**. It produces a triage brief with per-job classification, evidence, and a recommended action. +`/unblock-pr` investigates a failing PR CI pipeline. For each failing job, it checks whether the failure was already present on the default branch or on other branches — a blame guard that classifies the failure as **flaky**, **infra**, or **regression**. In parallel, it fetches the branch's code coverage and any code quality or security violations from PR insights. It produces a triage brief with per-job classification, evidence, a recommended action, and a PR Health section summarizing coverage and violations. -For flaky failures, the skill chains into `triage-flaky-test` for a deeper investigation. For infra failures, it offers to retry the failed jobs. For regressions, it prompts you to investigate your code changes. +For flaky failures, the skill chains into `triage-flaky-test` for a deeper investigation. For infra failures on GitHub Actions, it retries the failed jobs using `retry_datadog_ci_job`; for other CI providers, it provides a link to the provider's UI. For regressions, it prompts you to investigate your code changes. ``` /unblock-pr From 1be704c88b08d37da76e50c0d209bfd87c994c3e Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Mon, 25 May 2026 17:58:29 +0200 Subject: [PATCH 5/6] Address review feedback: wording, use cases, retry approval accuracy Co-Authored-By: Claude Sonnet 4.6 (1M context) --- .../getting_started/software_delivery_mcp_tools/_index.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/content/en/getting_started/software_delivery_mcp_tools/_index.md b/content/en/getting_started/software_delivery_mcp_tools/_index.md index 47d121a4410..398bd452549 100644 --- a/content/en/getting_started/software_delivery_mcp_tools/_index.md +++ b/content/en/getting_started/software_delivery_mcp_tools/_index.md @@ -31,6 +31,8 @@ The Software Delivery MCP tools unlock AI-assisted workflows for: - **Reviewing code coverage**: Get coverage summaries for branches or commits, including patch coverage and breakdowns by service or code owner. - **Measuring DORA metrics**: Query deployment frequency, change lead time, change failure rate, and recovery time by service or team. - **Checking test optimization settings**: See which Test Optimization features are active for a service, including Test Impact Analysis, Early Flake Detection, and Auto Test Retries. +- **Retrying failed CI jobs**: Queue a retry for a failed GitHub Actions job without leaving the agent session. +- **Checking PR health**: Get a combined view of CI failures, code coverage, and quality or security violations for a pull request. ## Available tools @@ -73,7 +75,7 @@ The `software-delivery` toolset includes the following tools: : Aggregate DORA metrics—deployment frequency, change lead time, change failure rate, and recovery time—as scalar values or timeseries. For a complete DORA summary, call this tool four times in parallel, once per metric. `retry_datadog_ci_job` -: Queue a retry for a failed GitHub Actions CI job. Requires `CiVisibilityWrite` permission and explicit user approval. Server-side safety rails cap retries at two per job over seven days. GitHub Actions only — for other CI providers, use the provider's UI to rerun. +: Queue a retry for a failed GitHub Actions CI job. A write operation that modifies CI state, requiring `CiVisibilityWrite` permission. Server-side limits cap retries at two per job over seven days. GitHub Actions only — for other CI providers, use the provider's UI to rerun. ## Example prompts @@ -102,7 +104,7 @@ For full setup instructions including client configuration for Cursor, Claude Co ## Agent skills -Agent skills are prebuilt instruction sets for AI coding agents that automate common Software Delivery workflows. The `dd-software-delivery` skill set is available in the [Datadog agent-skills][6] repository. It provides two skills for triaging flaky tests and unblocking failing PR pipelines against your live CI and Test Optimization data. +Agent skills are prebuilt instruction sets for AI coding agents that automate common Software Delivery workflows. The `dd-software-delivery` skill set is available in the [Datadog agent-skills][6] repository. It provides two skills for triaging flaky tests and unblocking failing PR pipelines using your live CI and Test Optimization data. Skills are loaded automatically by the MCP server when your prompt matches their purpose — for example, "TestMyFunc keeps failing in CI — investigate it" loads `/triage-flaky-test` automatically. You can also invoke them explicitly with a slash command after installing them locally. From 3e00c60fee3062efa0cb8df9842ec0c1cbac977a Mon Sep 17 00:00:00 2001 From: Victor Kuznetsov Date: Tue, 26 May 2026 10:47:49 +0200 Subject: [PATCH 6/6] Fix unblock-pr table entry to include security violations Co-Authored-By: Claude Sonnet 4.6 (1M context) --- .../en/getting_started/software_delivery_mcp_tools/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/getting_started/software_delivery_mcp_tools/_index.md b/content/en/getting_started/software_delivery_mcp_tools/_index.md index 398bd452549..51a3d58940b 100644 --- a/content/en/getting_started/software_delivery_mcp_tools/_index.md +++ b/content/en/getting_started/software_delivery_mcp_tools/_index.md @@ -123,7 +123,7 @@ Restart Claude Code after installing for the slash commands to appear. | Skill | Invoke with | What it does | |-------|-------------|-------------| | Triage flaky test | `/triage-flaky-test` | Get history, failure pattern, and AI category for a specific flaky test, then recommend fix, quarantine, or escalate | -| Unblock PR | `/unblock-pr` | Attribute each CI failure on a PR as flaky, infra, or regression, surface code coverage and quality violations, and propose a targeted action | +| Unblock PR | `/unblock-pr` | Attribute each CI failure on a PR as flaky, infra, or regression, surface code coverage and quality or security violations, and propose a targeted action | ### Triage flaky test