The Buildkite Agent

The Buildkite agent is a small, reliable and cross-platform build runner that makes it easy to run automated builds on your own infrastructure. Its main responsibilities are polling buildkite.com for work, running build jobs, reporting back the status code and output log of the job, and uploading the job's artifacts.

This page contains reference information for Buildkite organization administrators. It covers agent installation and configuration details and how agents communicate with Buildkite. If you're working with a team that already uses Buildkite and you want to write code that agents will run, read Pipelines. If you're setting up a Buildkite organization and you don't already have agents running, read Getting started.

You (or your organization) need one or more running agents to run builds, but once you've installed the agent and got it running on your own infrastructure, you don't need to interact with it directly. Whether you're starting builds automatically with every commit, or running them manually by clicking a button, Buildkite handles everything from telling the agent what version control references to use, where to get the changes from, and what code to run; as well as reporting the outcome back to Buildkite.com.

How it works

The agent works by polling Buildkite's agent API over HTTPS. There is no need to forward ports or provide incoming firewall access, and the agents can be run across any number of machines and networks.

Shows the hybrid architecture combining a SaaS platform with your infrastructure

The agent starts by registering itself with Buildkite, and once registered it's placed into your organization's agents pool. The agent periodically polls Buildkite looking for new work, waiting to accept an available job.

After accepting a build job the agent will execute the command, streaming back the build script's output and then posting the final exit status.

Whilst the job is running you can use the buildkite-agent meta-data command to set and get build-wide meta-data, and buildkite-agent artifact for fetching and retrieving binary build-wide artifacts. These two commands allow you to have completely isolated build jobs (similar to a 12 factor web application) but have access to shared state and data storage across any number of machines and networks.

Installation

You can install the agent on a wide variety of platforms, see the installation instructions for a full list and for information on how to get started.

Usage

$ buildkite-agent --help
Usage:

  buildkite-agent [command] [arguments...]

Available commands are:

  start      Starts a Buildkite agent
  annotate   Annotate the build page within the Buildkite UI with text from within a Buildkite job
  artifact   Upload/download artifacts from Buildkite jobs
  env        Process environment subcommands
  lock       Process lock subcommands
  meta-data  Get/set data from Buildkite jobs
  pipeline   Make changes to the pipeline of the currently running build
  bootstrap  Run a Buildkite job locally
  step       Retrieve and update the attributes of steps
  help       Shows a list of commands or help for one command

Use "buildkite-agent [command] --help" for more information about a command.

To start an agent you'll need your organization's agent token from the Agents page of your Buildkite dashboard. You pass the token to the agent using an environment variable or command line flag, and it will register itself with Buildkite and wait to accept jobs.

Configuration

The agent has a standard configuration file format on all systems to set meta-data, priority, etc. See the configuration documentation for more details.

Experimental features

We frequently introduce new experimental features to the agent. Use the --experiment flag to opt-in to them and test them out:

buildkite-agent start --experiment experiment1 --experiment experiment2

Or you can set them in your agent configuration file:

experiment="experiment1,experiment2"

If an experiment doesn't exist, no error will be raised.

Please note that there is every chance we will remove or change these experiments, so using them should be at your own risk and without the expectation that they will work in future!

Normalised upload paths

Artifacts uploaded by buildkite-agent artifact upload will be uploaded using URI/Unix-style paths, even on Windows. This makes sure that artifacts uploaded from Windows agents are stored in a URI-compatible URL.

Experimental feature

To use it, set experiment="normalised-upload-paths" in your agent configuration.

Artifact names displayed in Buildkite's web UI, as well as in the API, are changed by this.

For example, when using this experimental feature buildkite-agent artifact upload coverage\report.xml uploads to s3://example/coverage/report.xml instead of to s3://example/coverage\report.xml.

Resolve commit after checkout

After repository checkout, resolve BUILDKITE_COMMIT to a commit hash. This makes BUILDKITE_COMMIT useful for builds triggered against non-commit-hash refs such as HEAD.

Experimental feature

To use it, set experiment="resolve-commit-after-checkout" in your agent configuration.

Kubernetes exec

Modifies start and bootstrap to execute in separate Kubernetes containers in the same pod.

The Buildkite Agent Stack for Kubernetes uses this experiment to run Buildkite steps as Kubernetes jobs. It is not expected to work outside of that context. See the README for more details.

Experimental feature

To use Kubernetes exec, set experiment="kubernetes-exec" in your agent configuration.

Job API

Exposes a local API to introspect and mutate the state of a running job through environment variables. This lets you write scripts, hooks, and plugins in languages other than Bash, using them to interact with the agent.

The API uses a Unix Domain Socket, whose path is exposed to running jobs with the BUILDKITE_AGENT_JOB_API_SOCKET environment variable. Calls are authenticated using the Bearer HTTP Authorization scheme made available through a token in the BUILDKITE_AGENT_JOB_API_TOKEN environment variable.

The API provides the following endpoints:

  • GET /api/current-job/v0/env - Returns a JSON object of all environment variables for the current job.
  • PATCH /api/current-job/v0/env - Accepts a JSON object of environment variables to set for the current job.
  • DELETE /api/current-job/v0/env - Accepts a JSON array of environment variable names to unset for the current job.

See the agent repo for the full API request and response definitions.

The job API is unavailable on agents running versions of Windows before build 17063, as this was when Windows added Unix Domain Socket support. If you enable this experiment on an unsupported Windows agent, the agent outputs a warning and the API is unavailable.

Experimental feature

To use the job API, set experiment="job-api" in your agent configuration.

Agent API

Like the Job API experiment, this exposes a (separate) local API for interacting with the agent process. The Agent API offers these endpoints:

  • GET /api/leader/v0/ping - Returns a JSON object with the current time (useful for testing the agent is alive).
  • GET /api/leader/v0/lock?key=<key> - Returns a JSON object containing the current state of a lock.
  • PATCH /api/leader/v0/lock?key=<key> - Accepts a JSON object with old and new states for a lock. The lock is then updated atomically, and a JSON object describing whether the operation proceeded is returned.

The API is exposed using a Unix Domain Socket. Unlike the job-api, the path to the socket is not available through a environment variable—rather, there is a single (configurable) path on the system.

Experimental feature

To use the agent API, set experiment="agent-api" in your agent configuration.

Customizing with hooks

The agent's behavior can be customized using hooks, which are shell scripts that exist on your build machines or in each pipeline's code repository. Hooks can be used to set up secrets as well as overriding default behavior. See the hooks documentation for full details.

Signal handling

When a build job is canceled the agent will send the build job process a SIGTERM signal to allow it to gracefully exit.

If the process does not exit within the 10s grace period it will be forcefully terminated with a SIGKILL signal. If you require a longer grace period, it can be customized using the cancel-grace-period agent configuration option.

The agent also accepts the following two signals directly:

  • SIGTERM - Instructs the agent to gracefully disconnect, after completing any job that it may be running.
  • SIGQUIT - Instructs the agent to forcefully disconnect, cancelling any job that it may be running.

Exit codes

The agent reports its activity to Buildkite using exit codes. The most common exit codes and their descriptions can be found in the table below.

0 The job exited with a status of 0 (success)
1 The job exited with a status of 1 (most common error status)
128 + signal number The job was terminated by a signal (see note below)
255 The agent was gracefully terminated
-1 Buildkite lost contact with the agent or it stopped reporting to us

Jobs terminated by signals

When a job is terminated by a signal, the exit code will be set to 128 + the signal number. For more information about how shells manage commands terminated by signals, see the Wiki page on Exit Signals.

Exit codes for common signals:

Exit code Signal Name Description
130 2 SIGINT Terminal interrupt signal
137 9 SIGKILL Kill (cannot be caught or ignored)
139 11 SIGSEGV Segmentation fault; Invalid memory reference
141 13 SIGPIPE Write on a pipe with no one to read it
143 15 SIGTERM Termination signal (graceful)

Troubleshooting

One issue you sometimes need to troubleshoot is when Buildkite loses contact with an agent, resulting in a -1 exit code. After registering with the Buildkite API, an agent regularly sends heartbeat updates to indicate that it is operational. If the Buildkite API does not receive any heartbeat requests from an agent for 10 consecutive minutes, that agent is marked as lost and will not be assigned any further jobs.

Various factors can cause an agent to fail to send heartbeat updates. Common reasons include networking issues and resource constraints, such as CPU, memory, or I/O limitations on the infrastructure hosting the agent.

In such cases, it's essential to check the agent logs and examine metrics related to networking, CPU, memory, and I/O to help identify the cause of the failed heartbeat updates.

If the agents run on the Elastic CI Stack for AWS with spot instances, the abrupt termination of spot instances can also result in marking agents as lost. To investigate this issue, you can use the log collector script script to gather all relevant logs and metrics from the Elastic CI Stack for AWS.