Buildkite Self-Healing Pipeline Example
This example demonstrates a self-healing pipeline built with Buildkite dynamic pipelines and Claude Code. When a PR build fails, adding a label triggers an AI agent that automatically diagnoses the failure and submits a fix.
How it works
- You add a
buildkite-fixlabel to a GitHub PR that has a failing build - GitHub sends a webhook to Buildkite, which starts a build
- The first step evaluates the webhook payload — if it’s not a label event, the build exits early
- A TypeScript handler reads the payload, finds the failed build via the Buildkite REST API, and checks that the failure matches the PR’s head commit
- If there’s a matching failure, the handler uses the Buildkite SDK to dynamically generate a new pipeline step and uploads it with
buildkite-agent pipeline upload - That step launches Claude Code in a Docker container with access to the repo, the failed build logs (via the Buildkite MCP server), and GitHub (via
ghCLI) - Claude reads the logs, diagnoses the issue, creates a fix on a new branch, opens a PR, and verifies it passes CI
The handler pattern
The core of the handler is short — read the webhook payload from build metadata, evaluate whether to act, and generate a step with the Buildkite SDK:
// 1. Read the webhook payload that Buildkite stored as build metadata
const payload = JSON.parse(
execSync("buildkite-agent meta-data get buildkite:webhook").toString(),
);
// 2. Evaluate the condition — right event, right label?
if (payload.action !== "labeled" || payload.label.name !== process.env.TRIGGER_ON_LABEL) {
process.exit(0);
}
// 3. Generate a step with the Buildkite SDK and pipe it into `pipeline upload`
const pipeline = new Pipeline();
pipeline.addStep({ label: ":robot_face: Fix the build", command: "scripts/claude.sh" });
execSync("buildkite-agent pipeline upload", { input: pipeline.toYAML() });
The real handler also calls the Buildkite API between steps 2 and 3 to confirm there’s an actual failing build on the PR’s head commit — see scripts/handler.ts.
The key Buildkite features at play:
buildkite-agent pipeline upload— adding steps to a running build based on runtime conditionsbuildkite-agent meta-data— reading webhook payloads stored as build metadata@buildkite/buildkite-sdk— programmatically generating pipeline YAML in TypeScript- Buildkite webhooks — triggering builds from external events
- Buildkite Hosted Models — proxying LLM requests through Buildkite’s model provider endpoint
What’s interesting about this?
This pipeline doesn’t have a fixed set of steps. Whether anything happens at all depends on the webhook payload and the state of the builds at that moment. That’s the core idea behind dynamic pipelines — your pipeline logic runs at build time and decides what to do based on real conditions, not static YAML.
The self-healing use case takes this further: the pipeline not only decides whether to act, it decides what to do by handing the problem to an AI agent. This is one pattern for building agentic CI/CD workflows on Buildkite.
Setup
To run this yourself, you’ll need:
- A Buildkite account
- A GitHub repository with a Buildkite pipeline configured to receive webhooks
- A Buildkite API token with read access to builds
- An Anthropic API key (or use Buildkite Hosted Models to proxy requests)
- Docker installed on your Buildkite agent (the Claude Code toolchain runs in a container)
- Fork this repo
- Create a Buildkite pipeline pointing to your fork with webhook support enabled
- Configure a GitHub webhook to send
pull_requestevents with thelabeledaction to Buildkite - Set up the required secrets in your Buildkite pipeline:
GITHUB_TOKENandBUILDKITE_API_TOKEN - Add the
buildkite-fixlabel to a PR with a failing build and watch it go
Known limitations
- The handler assumes the Buildkite org slug and pipeline slug match the GitHub org and repo name. This won’t always be the case — you may need to configure these separately.
Credits
Originally built by Grant Colegate and Christian Nunciato as a demo for AWS re:Invent.
License
See LICENSE (MIT)



