Agent v3 to v4 upgrade guide
Version 4 of the Buildkite Agent is in beta, and this upgrade guide should not be considered final.
How to test v4 in beta
You can test Buildkite Agent v4 a number of ways, depending on how you installed or use Buildkite Agent.
Hosted agents
Buildkite does not presently offer a way to use Buildkite Agent v4 in Hosted Agents in advance of the stable release.
Self-hosted installations
With agent-stack-k8s
In the values.yaml used to deploy the agent-stack-k8s Helm chart, set the config.image option to a beta-tagged agent image, for example:
config:
image: ghcr.io/buildkite/agent:beta
If you are using a custom image derived from the agent image, you will need to build a new custom image based on the beta.
With Elastic CI Stack for AWS
When configuring Elastic CI Stack for AWS, set the BuildkiteAgentRelease parameter to beta.
Note that because the exact beta release of the agent is baked into each release of the stack, you must update to the latest release of Elastic CI Stack to access a more recent beta.
Using install.sh
Set the environment variable BETA=true when executing the install script, for example:
$ curl https://raw.githubusercontent.com/buildkite/agent/refs/heads/main/install.sh | BETA=true bash
Ubuntu and Debian
In the APT source list file for Buildkite Agent (usually /etc/apt/sources.list.d/buildkite-agent.list), change stable to unstable, for example:
-deb [signed-by=/usr/share/keyrings/buildkite-agent-archive-keyring.gpg] https://apt.buildkite.com/buildkite-agent stable main"
+deb [signed-by=/usr/share/keyrings/buildkite-agent-archive-keyring.gpg] https://apt.buildkite.com/buildkite-agent unstable main"
Then proceed to sudo apt update and sudo apt install buildkite-agent.
Red Hat / CentOS / Amazon Linux
When following the self-hosted install guide, replace /stable/ with /unstable/ in the command for adding the Yum repository. For example, to install the x86_64 variant:
$ sudo sh -c 'echo -e "[buildkite-agent]\nname = Buildkite Pty Ltd\nbaseurl = https://yum.buildkite.com/buildkite-agent/unstable/x86_64/\nenabled=1\ngpgcheck=0\nrepo_gpgcheck=0\npriority=1" > /etc/yum.repos.d/buildkite-agent.repo'
FreeBSD
v4 is not yet available in the pkg system. Manually download and install a FreeBSD binary from the latest v4 release.
macOS
Homebrew can be used to install or upgrade to a version 4 release with the @4 suffix, for example:
$ brew install buildkite/buildkite/buildkite-agent@4
(The latest release of version 3 is available using @3.)
Windows
Within an Administrator PowerShell session, add the environment variable buildkiteAgentBeta=true before running the installation script. For example:
PS> $env:buildkiteAgentToken = "<your_token>"
PS> $env:buildkiteAgentBeta = true
PS> Set-ExecutionPolicy Bypass -Scope Process -Force
iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/buildkite/agent/main/install.ps1'))
From source
Using the Go compiler, you can build and install Buildkite Agent v4 from source. For example:
$ go install github.com/buildkite/agent/v4@latest
This typically installs the v4 agent at ~/go/bin/agent.
Validating the install
You can check which version of Buildkite Agent is installed with the --version flag:
$ buildkite-agent --version
buildkite-agent version 4.0.0-beta.3+12492.5228a73b6906effe729cfe48cfd900f3291b3c3a
Breaking changes in v4
Read the following breaking changes carefully to determine if your agent setup, pipelines, or plugins need updating for version 4.
Changes to job handling
- The deprecated Docker integration has been removed. (Note that the
dockeranddocker-composeplugins are very much not deprecated nor removed!) The deprecated Docker integration has been considered deprecated since 2017, and used the environment variablesBUILDKITE_DOCKERandBUILDKITE_DOCKER_COMPOSE_CONTAINER. All current Docker usage for jobs known to us uses thedockeranddocker-composeplugins. If you were using the deprecated Docker integration, switch to using one of the plugins. - On Windows, the exit status of a cancelled job is now 1, where it used to be 0. A consequence of this is that cancelled jobs will appear “failed”, making them consistent with jobs run on other platforms.
- The flags and config options
--cancel-grace-periodand--signal-grace-period-seconds(environment variablesBUILDKITE_CANCEL_GRACE_PERIODandBUILDKITE_SIGNAL_GRACE_PERIOD_SECONDS) have been replaced with--cancel-signal-timeoutand--cancel-cleanup-timeout(BUILDKITE_CANCEL_SIGNAL_TIMEOUTandBUILDKITE_CANCEL_CLEANUP_TIMEOUT), and the timeouts have been increased slightly (10s signal timeout and 5s cleanup timeout). Both flags are now non-negative durations. The “total” maximum grace period following job cancellation is now the sum of the two timeouts. The “negative signal grace period” handling no longer exists.
Changes to job logs
- Timestamp options for logs have been simplified. ANSI timestamp codes are now always emitted into the job log stream. Flags to disable ANSI timestamps (
--no-ansi-timestamps,BUILDKITE_NO_ANSI_TIMESTAMPS) and enable plaintext timestamps (--timestamp-lines,BUILDKITE_TIMESTAMP_LINES) have been removed, as has the separate uploading of "header times".
Changes to checkout
- After repository checkout,
BUILDKITE_COMMITis resolved to a commit hash, which is beneficial when the initial value is a refspec such asHEAD. - The inbuilt SSH key-scan and
known-hostsfile updater in the default checkout process has been replaced with the OpenSSH optionStrictHostKeyChecking=accept-new. As such, the default checkout process now requires OpenSSH version 7.6 or later, unless--no-ssh-keyscan/BUILDKITE_NO_SSH_KEYSCANis enabled. (Note that OpenSSH 7.6 was released in 2017.)
Changes to agent parallelism
- The
--spawn-with-priorityflag (environment variableBUILDKITE_AGENT_SPAWN_WITH_PRIORITY) is no longer a boolean (true/false). It now takes a string value (one ofstatic,ascending, ordescending):-
--spawn-with-priority=staticis equivalent to the v3--spawn-with-priority=false, and will spawn all agent workers with the same priority. -
--spawn-with-priority=ascendingis equivalent to the v3--spawn-with-priority=true, and will spawn agent workers with an increasing sequence of priority values (1, 2, 3, …) -
--spawn-with-priority=descendingis equivalent to the v3--spawn-with-priority=true --experiment=descending-spawn-priority, and will spawn agent workers with a decreasing sequence of priority values (-1, -2, -3, …). Combined with varying--spawn, this option can be useful for spreading jobs across machines with different hardware capabilities. When the number of jobs is low, jobs will tend to be evenly distributed across all machines, and when the number of jobs is high, more jobs will be assigned to agents running on the more powerful machines (those with higher--spawn).
-
Changes to observability
- Both OpenTracing and direct connection to Dogstatsd are no longer supported. Various Datadog-related tracing workarounds were also cleaned up. Use OpenTelemetry instead.
- Enable OpenTelemetry tracing with the
--opentelemetry-tracingflag orBUILDKITE_OPENTELEMETRY_TRACINGenvironment variable. -
--tracing-service-name(BUILDKITE_TRACING_SERVICE_NAME) has been renamed to--telemetry-service-name(BUILDKITE_TELEMETRY_SERVICE_NAME). -
--tracing-backend(BUILDKITE_TRACING_BACKEND) has been removed. (Only OpenTelemetry is supported.) -
--tracing-propagate-traceparent(BUILDKITE_TRACING_PROPAGATE_TRACEPARENT) has been removed, and its function (accept a trace parent from the Buildkite backend) is now always enabled. - OpenTelemetry OTLP endpoint and protocol can be configured with the standard
OTEL_EXPORTER_OTLP_*environment variables.
- Enable OpenTelemetry tracing with the
- Some OpenTelemetry metrics have changed.
jobs.successandjobs.failedcounters have been replaced with a singlejobs.finishedmetric. Job failure or success can be inferred from theexit_statustag applied to the metric. - Some Prometheus metrics have changed.
buildkite_agent_jobs_started_totalandbuildkite_agent_jobs_ended_totalnow havepriorityandqueuelabels, replacingbuildkite_agent_jobs_started_with_labels_totalandbuildkite_agent_jobs_ended_with_labels_total.
Changes to pipeline uploads
- By default, secrets detected in a pipeline upload will now cause the pipeline upload to fail immediately. Secrets in pipeline uploads can be allowed again by passing the
--allow-secretsflag (environment variableBUILDKITE_AGENT_PIPELINE_UPLOAD_ALLOW_SECRETS). The--reject-secretsflag (environment variableBUILDKITE_AGENT_PIPELINE_UPLOAD_REJECT_SECRETS) has been removed.
Changes to artifacts
- Windows path separators (
\) are now translated into forward slashes/for storage (for example in S3). We do not anticipate this breaking any agent-side behaviour. (The inverse translation back to Windows path separators (\) is already applied on artifact download.) But this change may break workflows that assume particular storage paths. - The
artifact uploadcommand--follow-symlinksflag (BUILDKITE_AGENT_ARTIFACT_SYMLINKSenvironment variable) has been removed. Use--glob-resolve-follow-symlinks(BUILDKITE_AGENT_ARTIFACT_GLOB_RESOLVE_FOLLOW_SYMLINKS) instead, which is equivalent.
Changes to plugins and hooks
- The names (but not values) of various agent environment variables are now written to
BUILDKITE_ENV_FILE, so that they can be automatically propagated to child environments, for example with thedockeranddocker-composeplugins. This may break hooks (for example a pre-bootstrap hook) that assume all lines in the file will have an equal sign=and a variable value. - Deprecated env vars generated for plugin configuration have been removed. When running a plugin, environment variables are generated reflecting the plugin configuration. The deprecated form of these variables eliminated any consecutive underscores (for example
VAR__NAMEwas mangled intoVAR_NAME), making it harder to predict the env var corresponding to a particular plugin config key.
The new form of these generated variables, where consecutive underscores are preserved, has existed for some time now. Plugins should not depend on the deprecated variables, but this change may break plugins we are not currently aware of if they still use the deprecated variables. - Post-checkout, post-command, pre-exit hooks now run in "reverse" order (relative to pre-checkout, pre-command hooks). See below for an example.
This change is aimed at making “setup/cleanup” pairing of hooks easier, particularly when using multiple instances of the same plugin. It may break some combinations of hooks or plugins that we are not currently aware of. The new ordering can be opted-out using the new legacy-post-hook-order experiment.
New hook ordering example
Suppose a step specifies two plugins A and B (in that order), and there are also agent and repository hooks present. Under version 3, hooks for each of these hook types (pre-checkout, post-checkout, pre-command, post-command, pre-exit) execute in the same order as one another: agent, repository, plugin A, plugin B. In full:
- agent pre-checkout
- (pre-checkout is not possible for repository hooks)
- plugin A pre-checkout
- plugin B pre-checkout
- (checkout)
- agent post-checkout
- repository post-checkout
- plugin A post-checkout
- plugin B post-checkout
- agent pre-command
- repository pre-command
- plugin A pre-command
- plugin B pre-command
- (command)
- agent post-command
- repository post-command
- plugin A post-command
- plugin B post-command
- agent pre-exit
- repository pre-exit
- plugin A pre-exit
- plugin B pre-exit
In version 4, the default execution order is reversed for post-checkout, post-command, and pre-exit hooks (key differences in bold):
- agent pre-checkout
- (pre-checkout is not possible for repository hooks)
- plugin A pre-checkout
- plugin B pre-checkout
- (checkout)
- plugin B post-checkout
- plugin A post-checkout
- repository post-checkout
- agent post-checkout
- agent pre-command
- repository pre-command
- plugin A pre-command
- plugin B pre-command
- (command)
- plugin B post-command
- plugin A post-command
- repository post-command
- agent post-command
- plugin B pre-exit
- plugin A pre-exit
- repository pre-exit
- agent pre-exit
Changes to job metadata
- A trailing newline has been added to the output from
buildkite-agent meta-data get. This may break code that assumes there is no trailing whitespace after fetching a metadata value with this command.
Changes to experiments
- A new experiment,
legacy-post-hook-order, can be used to revert to the v3 hook ordering (see “Changes to plugins and hooks” above). -
allow-artifact-path-traversalhas been removed. The insecure behaviour it enabled is no longer supported. -
normalised-upload-pathsis now default behaviour and has been removed (see "Changes to artifacts" above). -
override-zero-exit-on-cancelis now default behaviour and has been removed (see "Changes to job handling" above). -
resolve-commit-after-checkoutis now default behaviour and has been removed(see “Changes to checkout” above). -
propagate-agent-config-varsis now default behaviour and has been removed (see “Changes to plugins and hooks” above). -
descending-spawn-priorityhas been removed, with equivalent functionality now available using the--spawn-with-priorityflag (see “Changes to agent parallelism” above).
Changes to flags, environment variables, and agent configuration
The underlying CLI flag processing package has been upgraded to a new version. No problems are expected as a result.
These CLI flags, environment variables, and agent configuration options have been removed:
-
cancel-grace-period(BUILDKITE_CANCEL_GRACE_PERIOD) has been removed (see “Changes to job handling” above). -
signal-grace-period-seconds(BUILDKITE_SIGNAL_GRACE_PERIOD_SECONDS) has been removed (see “Changes to job handling” above). -
no-ansi-timestamps(BUILDKITE_NO_ANSI_TIMESTAMPS) has been removed (see “Changes to job logs” above). -
timestamp-lines(BUILDKITE_TIMESTAMP_LINES) has been removed (see “Changes to job logs” above). -
trace-context-encoding(BUILDKITE_TRACE_CONTEXT_ENCODING) - it applied to OpenTracing support, which was also removed in this version. There is no replacement flag, because there is no longer a trace context encoding to configure. -
tracing-service-name(BUILDKITE_TRACING_SERVICE_NAME) has been renamed totelemetry-service-name(BUILDKITE_TELEMETRY_SERVICE_NAME) (see "Changes to observability" above). -
tracing-backend(BUILDKITE_TRACING_BACKEND) has been removed (see "Changes to observability" above). -
tracing-propagate-traceparent(BUILDKITE_TRACING_PROPAGATE_TRACEPARENT) has been removed as it is now always enabled (see "Changes to observability" above). -
kubernetes-log-collection-grace-period(BUILDKITE_KUBERNETES_LOG_COLLECTION_GRACE_PERIOD) has been removed. It was only briefly used with agent-stack-k8s before the functionality was removed. There is no replacement flag, it should not be used. -
no-automatic-ssh-fingerprint-verification(BUILDKITE_NO_AUTOMATIC_SSH_FINGERPRINT_VERIFICATION) - useno-ssh-keyscan(BUILDKITE_NO_SSH_KEYSCAN) instead, which is equivalent. -
meta-data(BUILDKITE_AGENT_META_DATA) - usetags(BUILDKITE_AGENT_TAGS) instead, which is equivalent. -
meta-data-ec2(BUILDKITE_AGENT_META_DATA_EC2) - usetags-from-ec2-meta-data(BUILDKITE_AGENT_TAGS_FROM_EC2_META_DATA) instead, which is equivalent. -
meta-data-ec2-tags(BUILDKITE_AGENT_META_DATA_EC2_TAGS) - usetags-from-ec2-tags(BUILDKITE_AGENT_TAGS_FROM_EC2_TAGS) instead, which is equivalent. -
meta-data-gcp(BUILDKITE_AGENT_META_DATA_GCP) - usetags-from-gcp-meta-data(BUILDKITE_AGENT_TAGS_FROM_GCP_META_DATA) instead, which is equivalent. -
tags-from-ec2(BUILDKITE_AGENT_TAGS_FROM_EC2) - usetags-from-ec2-meta-data(BUILDKITE_AGENT_TAGS_FROM_EC2_META_DATA) instead, which is equivalent. -
tags-from-gcp(BUILDKITE_AGENT_TAGS_FROM_GCP) - usetags-from-gcp-meta-data(BUILDKITE_AGENT_TAGS_FROM_GCP_META_DATA) instead, which is equivalent. -
disconnect-after-job-timeout(BUILDKITE_AGENT_DISCONNECT_AFTER_JOB_TIMEOUT) - usedisconnect-after-idle-timeoutinstead, which is equivalent.