Diagnose infrastructure-related job failures through the REST API

Buildkite's REST API now exposes more of the job and runner context that agents, CLI tools, and MCP clients need to tell infrastructure failures apart from code failures.

The REST job object now includes signal and signal_reason. When signal_reason is present, it explains why the Buildkite Agent terminated the job, such as agent_stop or process_run_error. That gives automated debugging loops a clearer signal that the runner or process failed, rather than the code under test.

REST agent objects now include runner environment and lifecycle fields: os_id, arch, queue, connected_at, disconnected_at, lost_at, and stopped_at. These fields are also available through the agent embedded in a job response, so tools investigating a failed job can stay on the same API path and still answer questions like "did this runner disappear just before the job failed?"

Together, these additions make it easier for the Buildkite CLI, MCP clients, and other agentic CI workflows to diagnose failed jobs without falling back to the Buildkite UI. An agent can see whether a job was terminated by runner infrastructure, whether the runner was lost or stopped around the time of failure, and avoid blindly retrying builds that are unlikely to pass without infrastructure intervention.

These are additive read-only fields, with no migrations or GraphQL schema changes.

Capabilities

Pipelines→

Test Engine→

Package Registries→

Mobile Delivery Cloud→

Flexible compute

Agentic workflows→

Replace Jenkins

Workflows for MLOps

Testing at scale

Monorepo mojo

Bazel orchestration

Example pipelines

Webinars

Blog

Public pipelines

Case studies

Events

Follow Buildkite

About

Careers

Follow Buildkite

Diagnose infrastructure-related job failures through the REST API

Sarah

Start turning complexity into an advantage

Platform

Hosting options

Resources

Company

Solutions

Legal

Support