---
name: "Self-Healing Pipeline"
description: "Uses dynamic pipelines and Claude Code to automatically diagnose and fix broken PR builds."
author: "buildkite"
repo: "self-healing-pipeline-example"
stars: 1
demo: "https://buildkite.com/buildkite/self-healing-pipeline-example/builds/latest?branch=main"
---

# Buildkite Self-Healing Pipeline Example


<!-- docs:start -->

This example demonstrates a **self-healing pipeline** built with Buildkite [dynamic pipelines](https://buildkite.com/docs/pipelines/configure/dynamic-pipelines) and [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview). When a PR build fails, adding a label triggers an AI agent that automatically diagnoses the failure and submits a fix.



## How it works

1. You add a `buildkite-fix` label to a GitHub PR that has a failing build
2. GitHub sends a webhook to Buildkite, which starts a build
3. The first step evaluates the webhook payload — if it's not a label event, the build exits early
4. A TypeScript handler reads the payload, finds the failed build via the Buildkite REST API, and checks that the failure matches the PR's head commit
5. If there's a matching failure, the handler uses the [Buildkite SDK](https://github.com/buildkite/buildkite-sdk) to **dynamically generate a new pipeline step** and uploads it with `buildkite-agent pipeline upload`
6. That step launches Claude Code in a Docker container with access to the repo, the failed build logs (via the [Buildkite MCP server](https://github.com/buildkite/buildkite-mcp-server)), and GitHub (via `gh` CLI)
7. Claude reads the logs, diagnoses the issue, creates a fix on a new branch, opens a PR, and verifies it passes CI

### The handler pattern

The core of the handler is short — read the webhook payload from build metadata, evaluate whether to act, and generate a step with the Buildkite SDK:

```typescript
// 1. Read the webhook payload that Buildkite stored as build metadata
const payload = JSON.parse(
  execSync("buildkite-agent meta-data get buildkite:webhook").toString(),
);

// 2. Evaluate the condition — right event, right label?
if (payload.action !== "labeled" || payload.label.name !== process.env.TRIGGER_ON_LABEL) {
  process.exit(0);
}

// 3. Generate a step with the Buildkite SDK and pipe it into `pipeline upload`
const pipeline = new Pipeline();
pipeline.addStep({ label: ":robot_face: Fix the build", command: "scripts/claude.sh" });
execSync("buildkite-agent pipeline upload", { input: pipeline.toYAML() });
```

The real handler also calls the Buildkite API between steps 2 and 3 to confirm there's an actual failing build on the PR's head commit — see [`scripts/handler.ts`](https://github.com/buildkite/self-healing-pipeline-example/blob/HEAD/scripts/handler.ts).

The key Buildkite features at play:

- **`buildkite-agent pipeline upload`** — adding steps to a running build based on runtime conditions
- **`buildkite-agent meta-data`** — reading webhook payloads stored as build metadata
- **`@buildkite/buildkite-sdk`** — programmatically generating pipeline YAML in TypeScript
- **Buildkite webhooks** — triggering builds from external events
- **Buildkite Hosted Models** — proxying LLM requests through Buildkite's model provider endpoint

## What's interesting about this?

This pipeline doesn't have a fixed set of steps. Whether anything happens at all depends on the webhook payload and the state of the builds at that moment. That's the core idea behind dynamic pipelines — your pipeline logic runs at build time and decides what to do based on real conditions, not static YAML.

The self-healing use case takes this further: the pipeline not only decides *whether* to act, it decides *what* to do by handing the problem to an AI agent. This is one pattern for building agentic CI/CD workflows on Buildkite.

## Setup

To run this yourself, you'll need:

- A [Buildkite account](https://buildkite.com/signup)
- A GitHub repository with a Buildkite pipeline configured to receive webhooks
- A [Buildkite API token](https://buildkite.com/docs/apis/rest-api#authentication) with read access to builds
- An [Anthropic API key](https://console.anthropic.com/settings/keys) (or use [Buildkite Hosted Models](https://buildkite.com/docs/pipelines/hosted-models) to proxy requests)
- Docker installed on your Buildkite agent (the Claude Code toolchain runs in a container)

1. Fork this repo
2. Create a Buildkite pipeline pointing to your fork with webhook support enabled
3. Configure a [GitHub webhook](https://buildkite.com/docs/integrations/github#setting-up-github-webhooks) to send `pull_request` events with the `labeled` action to Buildkite
4. Set up the required secrets in your Buildkite pipeline: `GITHUB_TOKEN` and `BUILDKITE_API_TOKEN`
5. Add the `buildkite-fix` label to a PR with a failing build and watch it go

<!-- docs:end -->

## Known limitations

- The handler assumes the Buildkite org slug and pipeline slug match the GitHub org and repo name. This won't always be the case — you may need to configure these separately.

## Credits

Originally built by [Grant Colegate](https://github.com/grantc) and [Christian Nunciato](https://github.com/cnunciato) as a demo for AWS re:Invent.

## License

See [LICENSE](https://github.com/buildkite/self-healing-pipeline-example/blob/HEAD/LICENSE) (MIT)