The Buildkite Agent

The Buildkite agent is a small, reliable and cross-platform build runner that makes it easy to run automated builds on your own infrastructure. Its main responsibilities are polling buildkite.com for work, running build jobs, reporting back the status code and output log of the job, and uploading the job's artifacts.

This page contains reference information for Buildkite organization administrators. It covers agent installation and configuration details and how agents communicate with Buildkite. If you're working with a team that already uses Buildkite and you want to write code that agents will run, read Pipelines. If you're setting up a Buildkite organization and you don't already have agents running, read Getting started.

You (or your organization) need one or more running agents to run builds, but once you've installed the agent and got it running on your own infrastructure, you don't need to interact with it directly. Whether you're starting builds automatically with every commit, or running them manually by clicking a button, Buildkite handles everything from telling the agent what version control references to use, where to get the changes from, and what code to run; as well as reporting the outcome back to Buildkite.com.

How it works

The agent works by polling Buildkite's agent API over HTTPS. There is no need to forward ports or provide incoming firewall access, and the agents can be run across any number of machines and networks.

Shows the hybrid architecture combining a SaaS platform with your infrastructure

The agent starts by registering itself with Buildkite, and once registered it's placed into your organization's agents pool. The agent periodically polls Buildkite looking for new work, waiting to accept an available job.

After accepting a build job the agent will execute the command, streaming back the build script's output and then posting the final exit status.

Whilst the job is running you can use the buildkite-agent meta-data command to set and get build-wide meta-data, and buildkite-agent artifact for fetching and retrieving binary build-wide artifacts. These two commands allow you to have completely isolated build jobs (similar to a 12 factor web application) but have access to shared state and data storage across any number of machines and networks.

Job routing

By default, Buildkite runs jobs on the first available agent that matches the agent tags of these jobs, ordered by how recently that agent completed a job. This takes advantage of warm caches to guarantee the fastest run time possible. You can alter this behavior by changing priority for some or all of your agents.

Installation

You can install the agent on a wide variety of platforms, see the installation instructions for a full list and for information on how to get started.

Usage

$ buildkite-agent --help
Usage:

  buildkite-agent [command] [arguments...]

Available commands are:

  start      Starts a Buildkite agent
  annotate   Annotate the build page within the Buildkite UI with text from within a Buildkite job
  artifact   Upload/download artifacts from Buildkite jobs
  env        Process environment subcommands
  lock       Process lock subcommands
  meta-data  Get/set data from Buildkite jobs
  pipeline   Make changes to the pipeline of the currently running build
  bootstrap  Run a Buildkite job locally
  step       Retrieve and update the attributes of steps
  stop       Stop the agent
  redactor   Redact sensitive information from logs
  tool       Utility commands, intended for users and operators of the agent to run directly on their machines, and not as part of a Buildkite job
  secret     Interact with Pipelines Secrets
  help       Shows a list of commands or help for one command

Use "buildkite-agent [command] --help" for more information about a command.

To start an agent you'll need your organization's agent token from the Agents page of your Buildkite dashboard. You pass the token to the agent using an environment variable or command line flag, and it will register itself with Buildkite and wait to accept jobs.

Configuration

The agent has a standard configuration file format on all systems to set meta-data, priority, etc. See the configuration documentation for more details.

Experimental features

We frequently introduce new experimental features to the agent. Use the --experiment flag to opt-in to them and test them out:

buildkite-agent start --experiment experiment1 --experiment experiment2

Or you can set them in your agent configuration file:

experiment="experiment1,experiment2"

If an experiment doesn't exist, no error will be raised.

Please note that there is every chance we will remove or change these experiments, so using them should be at your own risk and without the expectation that they will work in future!

Normalized upload paths

Artifacts uploaded by buildkite-agent artifact upload will be uploaded using URI/Unix-style paths, even on Windows. This makes sure that artifacts uploaded from Windows agents are stored in a URI-compatible URL.

Experimental feature

To use it, set experiment="normalised-upload-paths" in your agent configuration.

Artifact names displayed in Buildkite's web UI, as well as in the API, are changed by this.

For example, when using this experimental feature buildkite-agent artifact upload coverage\report.xml uploads to s3://example/coverage/report.xml instead of to s3://example/coverage\report.xml.

Resolve commit after checkout

After repository checkout, resolve BUILDKITE_COMMIT to a commit hash. This makes BUILDKITE_COMMIT useful for builds triggered against non-commit-hash refs such as HEAD.

Experimental feature

To use it, set experiment="resolve-commit-after-checkout" in your agent configuration.

Agent API

Like the Job API experiment, this exposes a (separate) local API for interacting with the agent process. The Agent API offers these endpoints:

GET /api/leader/v0/ping - Returns a JSON object with the current time (useful for testing the agent is alive).
GET /api/leader/v0/lock?key=<key> - Returns a JSON object containing the current state of a lock.
PATCH /api/leader/v0/lock?key=<key> - Accepts a JSON object with old and new states for a lock. The lock is then updated atomically, and a JSON object describing whether the operation proceeded is returned.

The API is exposed using a Unix Domain Socket. Unlike the job-api, the path to the socket is not available through a environment variable—rather, there is a single (configurable) path on the system.

Experimental feature

To use the agent API, set experiment="agent-api" in your agent configuration.

Promoted experiments

The following features started as experiments before being promoted to fully supported features.

Flock file locks

Changes the file lock implementation from github.com/nightlyone/lockfile to github.com/gofrs/flock to address an issue where file locks are never released by agents that don't shut down cleanly. The new file locks are implemented at the kernel level, and are aware of when their parent process dies.

Promoted in v3.48.0. It's the default behavior, so there's no configuration required to use it. Because the old and new lock systems do not interact, we strongly recommend not running different versions of the agent on the same host.

ANSI timestamps

Outputs inline ANSI timestamps for each line of log output, enabling timestamps you can toggle in the Buildkite dashboard.

Promoted in v3.48.0. It's the default behavior, so there's no configuration required to use it. If you want to turn it off, pass the --no-ansi-timestamps flag.

Git mirrors

Git mirrors is no longer an experimental feature. Promoted in v3.47.0. You can use Git mirrors by setting the --git-mirrors-path flag. See the Git mirrors to learn more about how Git mirrors work with agents running in your own self-hosted infrastructure.

Redacted variables

The Buildkite agent can redact strings that match the value of environment variables whose names match common patterns for passwords and other secure information before the build log is uploaded to Buildkite.

Promoted in v3.31.0.

See redacted-vars for more information.

Job API

Exposes a local API to introspect and mutate the state of a running job through environment variables. This lets you write scripts, hooks, and plugins in languages other than Bash, using them to interact with the agent.

Promoted in v3.64.0.

The API uses a Unix Domain Socket, whose path is exposed to running jobs with the BUILDKITE_AGENT_JOB_API_SOCKET environment variable. Calls are authenticated using the Bearer HTTP Authorization scheme made available through a token in the BUILDKITE_AGENT_JOB_API_TOKEN environment variable.

The API provides the following endpoints:

GET /api/current-job/v0/env - Returns a JSON object of all environment variables for the current job.
PATCH /api/current-job/v0/env - Accepts a JSON object of environment variables to set for the current job.
DELETE /api/current-job/v0/env - Accepts a JSON array of environment variable names to unset for the current job.

See the agent repo for the full API request and response definitions.

The job API is unavailable on agents running versions of Windows before build 17063, as this was when Windows added Unix Domain Socket support. If you enable this experiment on an unsupported Windows agent, the agent outputs a warning and the API is unavailable.

Customizing with hooks

The agent's behavior can be customized using hooks, which are shell scripts that exist on your build machines or in each pipeline's code repository. Hooks can be used to set up secrets as well as overriding default behavior. See the hooks documentation for full details.

Signal handling

When a build job is canceled the agent will send the build job process a SIGTERM signal to allow it to gracefully exit.

If the process does not exit within the 10s grace period it will be forcefully terminated with a SIGKILL signal. If you require a longer grace period, it can be customized using the cancel-grace-period agent configuration option.

The agent also accepts the following two signals directly:

SIGTERM - Instructs the agent to gracefully disconnect, after completing any job that it may be running.
SIGQUIT - Instructs the agent to forcefully disconnect, canceling any job that it may be running.

Exit codes

The agent reports its activity to Buildkite using exit codes. The most common exit codes and their descriptions can be found in the table below.

Exit code	Description
0	The job exited with a status of 0 (success)
1	The job exited with a status of 1 (most common error status)
94	The checkout timed out waiting for a Git mirrors lock
128 + signal number	The job was terminated by a signal (see note below)
255	The agent was gracefully terminated
-1	Buildkite lost contact with the agent or it stopped reporting to us

Jobs terminated by signals

When a job is terminated by a signal, the exit code will be set to 128 + the signal number. For more information about how shells manage commands terminated by signals, see the Wiki page on Exit Signals.

Exit codes for common signals:

Exit code	Signal	Name	Description
130	2	SIGINT	Terminal interrupt signal
137	9	SIGKILL	Kill (cannot be caught or ignored)
139	11	SIGSEGV	Segmentation fault; Invalid memory reference
141	13	SIGPIPE	Write on a pipe with no one to read it
143	15	SIGTERM	Termination signal (graceful)

Troubleshooting

One issue you sometimes need to troubleshoot is when Buildkite loses contact with an agent, resulting in a -1 exit code. After registering with the Buildkite API, an agent regularly sends heartbeat updates to indicate that it is operational. If the Buildkite API does not receive any heartbeat requests from an agent for 3 consecutive minutes, that agent is marked as lost within the next 60 seconds, and will not be assigned any further jobs.

Various factors can cause an agent to fail to send heartbeat updates. Common reasons include networking issues and resource constraints, such as CPU, memory, or I/O limitations on the infrastructure hosting the agent.

In such cases, it's essential to check the agent logs and examine metrics related to networking, CPU, memory, and I/O to help identify the cause of the failed heartbeat updates.

If the agents run on the Elastic CI Stack for AWS with spot instances, the abrupt termination of spot instances can also result in marking agents as lost. To investigate this issue, you can use the log collector script script to gather all relevant logs and metrics from the Elastic CI Stack for AWS.

Timeouts

Occasionally, a job may time out if it exceeds the maximum allowed command step timeout. Depending on the cancel-grace-period set on the agent, the job may not complete gracefully, resulting in an unexpected exit code (-1).