Diagnose infrastructure-related job failures through the REST API
Buildkite's REST API now exposes more of the job and runner context that agents, CLI tools, and MCP clients need to tell infrastructure failures apart from code failures.
The REST job object now includes signal and signal_reason. When signal_reason is present, it explains why the Buildkite Agent terminated the job, such as agent_stop or process_run_error. That gives automated debugging loops a clearer signal that the runner or process failed, rather than the code under test.
REST agent objects now include runner environment and lifecycle fields: os_id, arch, queue, connected_at, disconnected_at, lost_at, and stopped_at. These fields are also available through the agent embedded in a job response, so tools investigating a failed job can stay on the same API path and still answer questions like "did this runner disappear just before the job failed?"
Together, these additions make it easier for the Buildkite CLI, MCP clients, and other agentic CI workflows to diagnose failed jobs without falling back to the Buildkite UI. An agent can see whether a job was terminated by runner infrastructure, whether the runner was lost or stopped around the time of failure, and avoid blindly retrying builds that are unlikely to pass without infrastructure intervention.
These are additive read-only fields, with no migrations or GraphQL schema changes.
Sarah
Start turning complexity into an advantage
Create an account to get started for free.