Agent v3 to v4 upgrade guide

Pre-release software

Version 4 of the Buildkite Agent is in beta, and this upgrade guide should not be considered final.

How to test v4 in beta

You can test Buildkite Agent v4 a number of ways, depending on how you installed or use Buildkite Agent.

Hosted agents

Buildkite does not presently offer a way to use Buildkite Agent v4 in Hosted Agents in advance of the stable release.

Self-hosted installations

With agent-stack-k8s

In the values.yaml used to deploy the agent-stack-k8s Helm chart, set the config.image option to a beta-tagged agent image, for example:

config:
  image: ghcr.io/buildkite/agent:beta

If you are using a custom image derived from the agent image, you will need to build a new custom image based on the beta.

With Elastic CI Stack for AWS

When configuring Elastic CI Stack for AWS, set the BuildkiteAgentRelease parameter to beta.

Note that because the exact beta release of the agent is baked into each release of the stack, you must update to the latest release of Elastic CI Stack to access a more recent beta.

Using install.sh

Set the environment variable BETA=true when executing the install script, for example:

$ curl https://raw.githubusercontent.com/buildkite/agent/refs/heads/main/install.sh | BETA=true bash

Ubuntu and Debian

In the APT source list file for Buildkite Agent (usually /etc/apt/sources.list.d/buildkite-agent.list), change stable to unstable, for example:

-deb [signed-by=/usr/share/keyrings/buildkite-agent-archive-keyring.gpg] https://apt.buildkite.com/buildkite-agent stable main"
+deb [signed-by=/usr/share/keyrings/buildkite-agent-archive-keyring.gpg] https://apt.buildkite.com/buildkite-agent unstable main"

Then proceed to sudo apt update and sudo apt install buildkite-agent.

Red Hat / CentOS / Amazon Linux

When following the self-hosted install guide, replace /stable/ with /unstable/ in the command for adding the Yum repository. For example, to install the x86_64 variant:

$ sudo sh -c 'echo -e "[buildkite-agent]\nname = Buildkite Pty Ltd\nbaseurl = https://yum.buildkite.com/buildkite-agent/unstable/x86_64/\nenabled=1\ngpgcheck=0\nrepo_gpgcheck=0\npriority=1" > /etc/yum.repos.d/buildkite-agent.repo'

FreeBSD

v4 is not yet available in the pkg system. Manually download and install a FreeBSD binary from the latest v4 release.

macOS

Homebrew can be used to install or upgrade to a version 4 release with the @4 suffix, for example:

$ brew install buildkite/buildkite/buildkite-agent@4

(The latest release of version 3 is available using @3.)

Windows

Within an Administrator PowerShell session, add the environment variable buildkiteAgentBeta=true before running the installation script. For example:

PS> $env:buildkiteAgentToken = "<your_token>"
PS> $env:buildkiteAgentBeta = true
PS> Set-ExecutionPolicy Bypass -Scope Process -Force
iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/buildkite/agent/main/install.ps1'))

From source

Using the Go compiler, you can build and install Buildkite Agent v4 from source. For example:

$ go install github.com/buildkite/agent/v4@latest

This typically installs the v4 agent at ~/go/bin/agent.

Validating the install

You can check which version of Buildkite Agent is installed with the --version flag:

$ buildkite-agent --version
buildkite-agent version 4.0.0-beta.3+12492.5228a73b6906effe729cfe48cfd900f3291b3c3a

Breaking changes in v4

Read the following breaking changes carefully to determine if your agent setup, pipelines, or plugins need updating for version 4.

Changes to job handling

  • The deprecated Docker integration has been removed. (Note that the docker and docker-compose plugins are very much not deprecated nor removed!) The deprecated Docker integration has been considered deprecated since 2017, and used the environment variables BUILDKITE_DOCKER and BUILDKITE_DOCKER_COMPOSE_CONTAINER. All current Docker usage for jobs known to us uses the docker and docker-compose plugins. If you were using the deprecated Docker integration, switch to using one of the plugins.
  • On Windows, the exit status of a cancelled job is now 1, where it used to be 0. A consequence of this is that cancelled jobs will appear “failed”, making them consistent with jobs run on other platforms.
  • The flags and config options --cancel-grace-period and --signal-grace-period-seconds (environment variables BUILDKITE_CANCEL_GRACE_PERIOD and BUILDKITE_SIGNAL_GRACE_PERIOD_SECONDS) have been replaced with --cancel-signal-timeout and --cancel-cleanup-timeout (BUILDKITE_CANCEL_SIGNAL_TIMEOUT and BUILDKITE_CANCEL_CLEANUP_TIMEOUT), and the timeouts have been increased slightly (10s signal timeout and 5s cleanup timeout). Both flags are now non-negative durations. The “total” maximum grace period following job cancellation is now the sum of the two timeouts. The “negative signal grace period” handling no longer exists.

Changes to job logs

  • Timestamp options for logs have been simplified. ANSI timestamp codes are now always emitted into the job log stream. Flags to disable ANSI timestamps (--no-ansi-timestamps, BUILDKITE_NO_ANSI_TIMESTAMPS) and enable plaintext timestamps (--timestamp-lines, BUILDKITE_TIMESTAMP_LINES) have been removed, as has the separate uploading of "header times".

Changes to checkout

  • After repository checkout, BUILDKITE_COMMIT is resolved to a commit hash, which is beneficial when the initial value is a refspec such as HEAD.
  • The inbuilt SSH key-scan and known-hosts file updater in the default checkout process has been replaced with the OpenSSH option StrictHostKeyChecking=accept-new. As such, the default checkout process now requires OpenSSH version 7.6 or later, unless --no-ssh-keyscan / BUILDKITE_NO_SSH_KEYSCAN is enabled. (Note that OpenSSH 7.6 was released in 2017.)

Changes to agent parallelism

  • The --spawn-with-priority flag (environment variable BUILDKITE_AGENT_SPAWN_WITH_PRIORITY) is no longer a boolean (true/false). It now takes a string value (one of static, ascending, or descending):
    • --spawn-with-priority=static is equivalent to the v3 --spawn-with-priority=false, and will spawn all agent workers with the same priority.
    • --spawn-with-priority=ascending is equivalent to the v3 --spawn-with-priority=true, and will spawn agent workers with an increasing sequence of priority values (1, 2, 3, …)
    • --spawn-with-priority=descending is equivalent to the v3 --spawn-with-priority=true --experiment=descending-spawn-priority, and will spawn agent workers with a decreasing sequence of priority values (-1, -2, -3, …). Combined with varying --spawn, this option can be useful for spreading jobs across machines with different hardware capabilities. When the number of jobs is low, jobs will tend to be evenly distributed across all machines, and when the number of jobs is high, more jobs will be assigned to agents running on the more powerful machines (those with higher --spawn).

Changes to observability

  • Both OpenTracing and direct connection to Dogstatsd are no longer supported. Various Datadog-related tracing workarounds were also cleaned up. Use OpenTelemetry instead.
    • Enable OpenTelemetry tracing with the --opentelemetry-tracing flag or BUILDKITE_OPENTELEMETRY_TRACING environment variable.
    • --tracing-service-name (BUILDKITE_TRACING_SERVICE_NAME) has been renamed to --telemetry-service-name (BUILDKITE_TELEMETRY_SERVICE_NAME).
    • --tracing-backend (BUILDKITE_TRACING_BACKEND) has been removed. (Only OpenTelemetry is supported.)
    • --tracing-propagate-traceparent (BUILDKITE_TRACING_PROPAGATE_TRACEPARENT) has been removed, and its function (accept a trace parent from the Buildkite backend) is now always enabled.
    • OpenTelemetry OTLP endpoint and protocol can be configured with the standard OTEL_EXPORTER_OTLP_* environment variables.
  • Some OpenTelemetry metrics have changed. jobs.success and jobs.failed counters have been replaced with a single jobs.finished metric. Job failure or success can be inferred from the exit_status tag applied to the metric.
  • Some Prometheus metrics have changed. buildkite_agent_jobs_started_total and buildkite_agent_jobs_ended_total now have priority and queue labels, replacing buildkite_agent_jobs_started_with_labels_total and buildkite_agent_jobs_ended_with_labels_total.

Changes to pipeline uploads

  • By default, secrets detected in a pipeline upload will now cause the pipeline upload to fail immediately. Secrets in pipeline uploads can be allowed again by passing the --allow-secrets flag (environment variable BUILDKITE_AGENT_PIPELINE_UPLOAD_ALLOW_SECRETS). The --reject-secrets flag (environment variable BUILDKITE_AGENT_PIPELINE_UPLOAD_REJECT_SECRETS) has been removed.

Changes to artifacts

  • Windows path separators (\) are now translated into forward slashes / for storage (for example in S3). We do not anticipate this breaking any agent-side behaviour. (The inverse translation back to Windows path separators (\) is already applied on artifact download.) But this change may break workflows that assume particular storage paths.
  • The artifact upload command --follow-symlinks flag (BUILDKITE_AGENT_ARTIFACT_SYMLINKS environment variable) has been removed. Use --glob-resolve-follow-symlinks (BUILDKITE_AGENT_ARTIFACT_GLOB_RESOLVE_FOLLOW_SYMLINKS) instead, which is equivalent.

Changes to plugins and hooks

  • The names (but not values) of various agent environment variables are now written to BUILDKITE_ENV_FILE, so that they can be automatically propagated to child environments, for example with the docker and docker-compose plugins. This may break hooks (for example a pre-bootstrap hook) that assume all lines in the file will have an equal sign = and a variable value.
  • Deprecated env vars generated for plugin configuration have been removed. When running a plugin, environment variables are generated reflecting the plugin configuration. The deprecated form of these variables eliminated any consecutive underscores (for example VAR__NAME was mangled into VAR_NAME), making it harder to predict the env var corresponding to a particular plugin config key.

The new form of these generated variables, where consecutive underscores are preserved, has existed for some time now. Plugins should not depend on the deprecated variables, but this change may break plugins we are not currently aware of if they still use the deprecated variables. - Post-checkout, post-command, pre-exit hooks now run in "reverse" order (relative to pre-checkout, pre-command hooks). See below for an example.

This change is aimed at making “setup/cleanup” pairing of hooks easier, particularly when using multiple instances of the same plugin. It may break some combinations of hooks or plugins that we are not currently aware of. The new ordering can be opted-out using the new legacy-post-hook-order experiment.

New hook ordering example

Suppose a step specifies two plugins A and B (in that order), and there are also agent and repository hooks present. Under version 3, hooks for each of these hook types (pre-checkout, post-checkout, pre-command, post-command, pre-exit) execute in the same order as one another: agent, repository, plugin A, plugin B. In full:

  1. agent pre-checkout
  2. (pre-checkout is not possible for repository hooks)
  3. plugin A pre-checkout
  4. plugin B pre-checkout
  5. (checkout)
  6. agent post-checkout
  7. repository post-checkout
  8. plugin A post-checkout
  9. plugin B post-checkout
  10. agent pre-command
  11. repository pre-command
  12. plugin A pre-command
  13. plugin B pre-command
  14. (command)
  15. agent post-command
  16. repository post-command
  17. plugin A post-command
  18. plugin B post-command
  19. agent pre-exit
  20. repository pre-exit
  21. plugin A pre-exit
  22. plugin B pre-exit

In version 4, the default execution order is reversed for post-checkout, post-command, and pre-exit hooks (key differences in bold):

  1. agent pre-checkout
  2. (pre-checkout is not possible for repository hooks)
  3. plugin A pre-checkout
  4. plugin B pre-checkout
  5. (checkout)
  6. plugin B post-checkout
  7. plugin A post-checkout
  8. repository post-checkout
  9. agent post-checkout
  10. agent pre-command
  11. repository pre-command
  12. plugin A pre-command
  13. plugin B pre-command
  14. (command)
  15. plugin B post-command
  16. plugin A post-command
  17. repository post-command
  18. agent post-command
  19. plugin B pre-exit
  20. plugin A pre-exit
  21. repository pre-exit
  22. agent pre-exit

Changes to job metadata

  • A trailing newline has been added to the output from buildkite-agent meta-data get. This may break code that assumes there is no trailing whitespace after fetching a metadata value with this command.

Changes to experiments

  • A new experiment, legacy-post-hook-order, can be used to revert to the v3 hook ordering (see “Changes to plugins and hooks” above).
  • allow-artifact-path-traversal has been removed. The insecure behaviour it enabled is no longer supported.
  • normalised-upload-paths is now default behaviour and has been removed (see "Changes to artifacts" above).
  • override-zero-exit-on-cancel is now default behaviour and has been removed (see "Changes to job handling" above).
  • resolve-commit-after-checkout is now default behaviour and has been removed(see “Changes to checkout” above).
  • propagate-agent-config-vars is now default behaviour and has been removed (see “Changes to plugins and hooks” above).
  • descending-spawn-priority has been removed, with equivalent functionality now available using the --spawn-with-priority flag (see “Changes to agent parallelism” above).

Changes to flags, environment variables, and agent configuration

The underlying CLI flag processing package has been upgraded to a new version. No problems are expected as a result.

These CLI flags, environment variables, and agent configuration options have been removed:

  • cancel-grace-period (BUILDKITE_CANCEL_GRACE_PERIOD ) has been removed (see “Changes to job handling” above).
  • signal-grace-period-seconds (BUILDKITE_SIGNAL_GRACE_PERIOD_SECONDS ) has been removed (see “Changes to job handling” above).
  • no-ansi-timestamps (BUILDKITE_NO_ANSI_TIMESTAMPS ) has been removed (see “Changes to job logs” above).
  • timestamp-lines (BUILDKITE_TIMESTAMP_LINES) has been removed (see “Changes to job logs” above).
  • trace-context-encoding (BUILDKITE_TRACE_CONTEXT_ENCODING) - it applied to OpenTracing support, which was also removed in this version. There is no replacement flag, because there is no longer a trace context encoding to configure.
  • tracing-service-name (BUILDKITE_TRACING_SERVICE_NAME) has been renamed to telemetry-service-name (BUILDKITE_TELEMETRY_SERVICE_NAME) (see "Changes to observability" above).
  • tracing-backend (BUILDKITE_TRACING_BACKEND) has been removed (see "Changes to observability" above).
  • tracing-propagate-traceparent (BUILDKITE_TRACING_PROPAGATE_TRACEPARENT) has been removed as it is now always enabled (see "Changes to observability" above).
  • kubernetes-log-collection-grace-period (BUILDKITE_KUBERNETES_LOG_COLLECTION_GRACE_PERIOD) has been removed. It was only briefly used with agent-stack-k8s before the functionality was removed. There is no replacement flag, it should not be used.
  • no-automatic-ssh-fingerprint-verification (BUILDKITE_NO_AUTOMATIC_SSH_FINGERPRINT_VERIFICATION) - use no-ssh-keyscan (BUILDKITE_NO_SSH_KEYSCAN) instead, which is equivalent.
  • meta-data (BUILDKITE_AGENT_META_DATA) - use tags (BUILDKITE_AGENT_TAGS) instead, which is equivalent.
  • meta-data-ec2 (BUILDKITE_AGENT_META_DATA_EC2) - use tags-from-ec2-meta-data (BUILDKITE_AGENT_TAGS_FROM_EC2_META_DATA) instead, which is equivalent.
  • meta-data-ec2-tags (BUILDKITE_AGENT_META_DATA_EC2_TAGS) - use tags-from-ec2-tags (BUILDKITE_AGENT_TAGS_FROM_EC2_TAGS) instead, which is equivalent.
  • meta-data-gcp (BUILDKITE_AGENT_META_DATA_GCP) - use tags-from-gcp-meta-data (BUILDKITE_AGENT_TAGS_FROM_GCP_META_DATA) instead, which is equivalent.
  • tags-from-ec2 (BUILDKITE_AGENT_TAGS_FROM_EC2) - use tags-from-ec2-meta-data (BUILDKITE_AGENT_TAGS_FROM_EC2_META_DATA) instead, which is equivalent.
  • tags-from-gcp (BUILDKITE_AGENT_TAGS_FROM_GCP) - use tags-from-gcp-meta-data (BUILDKITE_AGENT_TAGS_FROM_GCP_META_DATA) instead, which is equivalent.
  • disconnect-after-job-timeout (BUILDKITE_AGENT_DISCONNECT_AFTER_JOB_TIMEOUT) - use disconnect-after-idle-timeout instead, which is equivalent.