The Buildkite Agent
The Buildkite agent is a small, reliable and cross-platform build runner that makes it easy to run automated builds on your own infrastructure. Its main responsibilities are polling buildkite.com for work, running build jobs, reporting back the status code and output log of the job, and uploading the job's artifacts.
This page contains reference information for Buildkite organization administrators. It covers agent installation and configuration details and how agents communicate with Buildkite. If you're working with a team that already uses Buildkite and you want to write code that agents will run, read Pipelines. If you're setting up a Buildkite organization and you don't already have agents running, read Getting started.
You (or your organization) need one or more running agents to run builds, but once you've installed the agent and got it running on your own infrastructure, you don't need to interact with it directly. Whether you're starting builds automatically with every commit, or running them manually by clicking a button, Buildkite handles everything from telling the agent what version control references to use, where to get the changes from, and what code to run; as well as reporting the outcome back to Buildkite.com.
How it works
The agent works by polling Buildkite's agent API over HTTPS. There is no need to forward ports or provide incoming firewall access, and the agents can be run across any number of machines and networks.
The agent starts by registering itself with Buildkite, and once registered it's placed into your organization's agents pool. The agent periodically polls Buildkite looking for new work, waiting to accept an available job.
After accepting a build job the agent will execute the command, streaming back the build script's output and then posting the final exit status.
Whilst the job is running you can use the buildkite-agent meta-data
command to set and get build-wide meta-data, and buildkite-agent artifact
for fetching and retrieving binary build-wide artifacts. These two commands allow you to have completely isolated build jobs (similar to a 12 factor web application) but have access to shared state and data storage across any number of machines and networks.
Installation
You can install the agent on a wide variety of platforms, see the installation instructions for a full list and for information on how to get started.
Usage
$ buildkite-agent --help
Usage:
buildkite-agent [command] [arguments...]
Available commands are:
start Starts a Buildkite agent
annotate Annotate the build page within the Buildkite UI with text from within a Buildkite job
artifact Upload/download artifacts from Buildkite jobs
env Process environment subcommands
lock Process lock subcommands
meta-data Get/set data from Buildkite jobs
pipeline Make changes to the pipeline of the currently running build
bootstrap Run a Buildkite job locally
step Retrieve and update the attributes of steps
redactor Redact sensitive information from logs
tool Utility commands, intended for users and operators of the agent to run directly on their machines, and not as part of a Buildkite job
secret Interact with Pipelines Secrets
help Shows a list of commands or help for one command
Use "buildkite-agent [command] --help" for more information about a command.
To start an agent you'll need your organization's agent token from the Agents page of your Buildkite dashboard. You pass the token to the agent using an environment variable or command line flag, and it will register itself with Buildkite and wait to accept jobs.
Configuration
The agent has a standard configuration file format on all systems to set meta-data, priority, etc. See the configuration documentation for more details.
Experimental features
We frequently introduce new experimental features to the agent. Use the --experiment
flag to opt-in to them and test them out:
buildkite-agent start --experiment experiment1 --experiment experiment2
Or you can set them in your agent configuration file:
experiment="experiment1,experiment2"
If an experiment doesn't exist, no error will be raised.
Please note that there is every chance we will remove or change these experiments, so using them should be at your own risk and without the expectation that they will work in future!
Normalised upload paths
Artifacts uploaded by buildkite-agent artifact upload
will be uploaded using URI/Unix-style paths, even on Windows. This makes sure that artifacts uploaded from Windows agents are stored in a URI-compatible URL.
To use it, set experiment="normalised-upload-paths"
in your agent configuration.
Artifact names displayed in Buildkite's web UI, as well as in the API, are changed by this.
For example, when using this experimental feature buildkite-agent artifact upload coverage\report.xml
uploads to s3://example/coverage/report.xml
instead of to s3://example/coverage\report.xml
.
Resolve commit after checkout
After repository checkout, resolve BUILDKITE_COMMIT
to a commit hash. This makes BUILDKITE_COMMIT
useful for builds triggered against non-commit-hash refs such as HEAD
.
To use it, set experiment="resolve-commit-after-checkout"
in your agent configuration.
Agent API
Like the Job API experiment, this exposes a (separate) local API for interacting with the agent process. The Agent API offers these endpoints:
-
GET /api/leader/v0/ping
- Returns a JSON object with the current time (useful for testing the agent is alive). -
GET /api/leader/v0/lock?key=<key>
- Returns a JSON object containing the current state of a lock. -
PATCH /api/leader/v0/lock?key=<key>
- Accepts a JSON object with old and new states for a lock. The lock is then updated atomically, and a JSON object describing whether the operation proceeded is returned.
The API is exposed using a Unix Domain Socket. Unlike the job-api
, the path to the socket is not available through a environment variable—rather, there is a single (configurable) path on the system.
To use the agent API, set experiment="agent-api"
in your agent configuration.
Promoted experiments
The following features started as experiments before being promoted to fully supported features.
Flock file locks
Changes the file lock implementation from github.com/nightlyone/lockfile
to github.com/gofrs/flock
to address an issue where file locks are never released by agents that don't shut down cleanly. The new file locks are implemented at the kernel level, and are aware of when their parent process dies.
Promoted in v3.48.0. It's the default behavior, so there's no configuration required to use it. Because the old and new lock systems do not interact, we strongly recommend not running different versions of the agent on the same host.
ANSI timestamps
Outputs inline ANSI timestamps for each line of log output, enabling timestamps you can toggle in the Buildkite dashboard.
Promoted in v3.48.0. It's the default behavior, so there's no configuration required to use it. If you want to turn it off, pass the --no-ansi-timestamps
flag.
Git mirrors
Maintain a single bare git mirror for each repository on a host that is shared amongst multiple agents and pipelines. Checkouts reference the git mirror using git clone --reference
, as do submodules.
Promoted in v3.47.0. You can use it by setting the --git-mirrors-path
flag.
See the following agent configuration options for more information:
Redacted variables
The Buildkite agent can redact strings that match the value of environment variables whose names match common patterns for passwords and other secure information before the build log is uploaded to Buildkite.
Promoted in v3.31.0.
See redacted-vars for more information.
Job API
Exposes a local API to introspect and mutate the state of a running job through environment variables. This lets you write scripts, hooks, and plugins in languages other than Bash, using them to interact with the agent.
Promoted in v3.64.0.
The API uses a Unix Domain Socket, whose path is exposed to running jobs with the BUILDKITE_AGENT_JOB_API_SOCKET
environment variable. Calls are authenticated using the Bearer HTTP Authorization scheme made available through a token in the BUILDKITE_AGENT_JOB_API_TOKEN
environment variable.
The API provides the following endpoints:
-
GET /api/current-job/v0/env
- Returns a JSON object of all environment variables for the current job. -
PATCH /api/current-job/v0/env
- Accepts a JSON object of environment variables to set for the current job. -
DELETE /api/current-job/v0/env
- Accepts a JSON array of environment variable names to unset for the current job.
See the agent repo for the full API request and response definitions.
The job API is unavailable on agents running versions of Windows before build 17063, as this was when Windows added Unix Domain Socket support. If you enable this experiment on an unsupported Windows agent, the agent outputs a warning and the API is unavailable.
Customizing with hooks
The agent's behavior can be customized using hooks, which are shell scripts that exist on your build machines or in each pipeline's code repository. Hooks can be used to set up secrets as well as overriding default behavior. See the hooks documentation for full details.
Signal handling
When a build job is canceled the agent will send the build job process a SIGTERM
signal to allow it to gracefully exit.
If the process does not exit within the 10s grace period it will be forcefully terminated with a SIGKILL
signal. If you require a longer grace period, it can be customized using the cancel-grace-period agent configuration option.
The agent also accepts the following two signals directly:
-
SIGTERM
- Instructs the agent to gracefully disconnect, after completing any job that it may be running. -
SIGQUIT
- Instructs the agent to forcefully disconnect, canceling any job that it may be running.
Exit codes
The agent reports its activity to Buildkite using exit codes. The most common exit codes and their descriptions can be found in the table below.
Exit code | Description |
---|---|
0 | The job exited with a status of 0 (success) |
1 | The job exited with a status of 1 (most common error status) |
94 | The checkout timed out waiting for a Git mirrors lock |
128 + signal number | The job was terminated by a signal (see note below) |
255 | The agent was gracefully terminated |
-1 | Buildkite lost contact with the agent or it stopped reporting to us |
When a job is terminated by a signal, the exit code will be set to 128 + the signal number. For more information about how shells manage commands terminated by signals, see the Wiki page on Exit Signals.
Exit codes for common signals:
Exit code | Signal | Name | Description |
---|---|---|---|
130 | 2 | SIGINT | Terminal interrupt signal |
137 | 9 | SIGKILL | Kill (cannot be caught or ignored) |
139 | 11 | SIGSEGV | Segmentation fault; Invalid memory reference |
141 | 13 | SIGPIPE | Write on a pipe with no one to read it |
143 | 15 | SIGTERM | Termination signal (graceful) |
Troubleshooting
One issue you sometimes need to troubleshoot is when Buildkite loses contact with an agent, resulting in a -1
exit code. After registering with the Buildkite API, an agent regularly sends heartbeat updates to indicate that it is operational. If the Buildkite API does not receive any heartbeat requests from an agent for 3 consecutive minutes, that agent is marked as lost within the next 60 seconds, and will not be assigned any further jobs.
Various factors can cause an agent to fail to send heartbeat updates. Common reasons include networking issues and resource constraints, such as CPU, memory, or I/O limitations on the infrastructure hosting the agent.
In such cases, it's essential to check the agent logs and examine metrics related to networking, CPU, memory, and I/O to help identify the cause of the failed heartbeat updates.
If the agents run on the Elastic CI Stack for AWS with spot instances, the abrupt termination of spot instances can also result in marking agents as lost. To investigate this issue, you can use the log collector script script to gather all relevant logs and metrics from the Elastic CI Stack for AWS.