Buildkite Self-Healing Pipeline Example

This example demonstrates a self-healing pipeline built with Buildkite dynamic pipelines and Claude Code. When a PR build fails, adding a label triggers an AI agent that automatically diagnoses the failure and submits a fix.

How it works

You add a buildkite-fix label to a GitHub PR that has a failing build
GitHub sends a webhook to Buildkite, which starts a build
The first step evaluates the webhook payload — if it’s not a label event, the build exits early
A TypeScript handler reads the payload, finds the failed build via the Buildkite REST API, and checks that the failure matches the PR’s head commit
If there’s a matching failure, the handler uses the Buildkite SDK to dynamically generate a new pipeline step and uploads it with buildkite-agent pipeline upload
That step launches Claude Code in a Docker container with access to the repo, the failed build logs (via the Buildkite MCP server), and GitHub (via gh CLI)
Claude reads the logs, diagnoses the issue, creates a fix on a new branch, opens a PR, and verifies it passes CI

The handler pattern

The core of the handler is short — read the webhook payload from build metadata, evaluate whether to act, and generate a step with the Buildkite SDK:

// 1. Read the webhook payload that Buildkite stored as build metadata
const payload = JSON.parse(
  execSync("buildkite-agent meta-data get buildkite:webhook").toString(),
);

// 2. Evaluate the condition — right event, right label?
if (payload.action !== "labeled" || payload.label.name !== process.env.TRIGGER_ON_LABEL) {
  process.exit(0);
}

// 3. Generate a step with the Buildkite SDK and pipe it into `pipeline upload`
const pipeline = new Pipeline();
pipeline.addStep({ label: ":robot_face: Fix the build", command: "scripts/claude.sh" });
execSync("buildkite-agent pipeline upload", { input: pipeline.toYAML() });

The real handler also calls the Buildkite API between steps 2 and 3 to confirm there’s an actual failing build on the PR’s head commit — see scripts/handler.ts.

The key Buildkite features at play:

buildkite-agent pipeline upload — adding steps to a running build based on runtime conditions
buildkite-agent meta-data — reading webhook payloads stored as build metadata
@buildkite/buildkite-sdk — programmatically generating pipeline YAML in TypeScript
Buildkite webhooks — triggering builds from external events
Buildkite Hosted Models — proxying LLM requests through Buildkite’s model provider endpoint

What’s interesting about this?

This pipeline doesn’t have a fixed set of steps. Whether anything happens at all depends on the webhook payload and the state of the builds at that moment. That’s the core idea behind dynamic pipelines — your pipeline logic runs at build time and decides what to do based on real conditions, not static YAML.

The self-healing use case takes this further: the pipeline not only decides whether to act, it decides what to do by handing the problem to an AI agent. This is one pattern for building agentic CI/CD workflows on Buildkite.

Setup

To run this yourself, you’ll need:

A Buildkite account
A GitHub repository with a Buildkite pipeline configured to receive webhooks
A Buildkite API token with read access to builds
An Anthropic API key (or use Buildkite Hosted Models to proxy requests)
Docker installed on your Buildkite agent (the Claude Code toolchain runs in a container)

Fork this repo
Create a Buildkite pipeline pointing to your fork with webhook support enabled
Configure a GitHub webhook to send pull_request events with the labeled action to Buildkite
Set up the required secrets in your Buildkite pipeline: GITHUB_TOKEN and BUILDKITE_API_TOKEN
Add the buildkite-fix label to a PR with a failing build and watch it go

Known limitations

The handler assumes the Buildkite org slug and pipeline slug match the GitHub org and repo name. This won’t always be the case — you may need to configure these separately.

Credits

Originally built by Grant Colegate and Christian Nunciato as a demo for AWS re:Invent.

License

See LICENSE (MIT)

Capabilities

Pipelines→

Test Engine→

Package Registries→

Mobile Delivery Cloud→

Flexible compute

Agentic workflows→

Replace Jenkins

Workflows for MLOps

Testing at scale

Monorepo mojo

Bazel orchestration

Example pipelines

Webinars

Blog

Public pipelines

Case studies

Events

Follow Buildkite

About

Careers

Follow Buildkite

Buildkite Self-Healing Pipeline Example

How it works

The handler pattern

What’s interesting about this?

Setup

Known limitations

Credits

License

More examples

Node.js Docker

Python Docker

Bash

Rails Parallel Docker

Dynamic Pipeline

Python (pipenv)

Start turning complexity into an advantage

Platform

Hosting options

Resources

Company

Solutions

Legal

Support