Cache volumes

Cache volumes (also known as volumes) are external volumes attached to Buildkite hosted agent instances, and are scoped to specific Buildkite clusters. These volumes are attached on a best-effort basis depending on their locality, expiration and current usage, and therefore, should not be relied upon as durable data storage.

Volumes are useful if your pipeline builds on Buildkite hosted agents have jobs that make use of build dependencies, use Docker images, which can be stored in container cache volumes, or Git mirrors, which can be stored in Git mirror volumes. Managing build dependencies, Docker images, and Git mirrors in volumes can greatly speed up the duration of your overall pipeline builds.

By default, volumes:

  • Are disabled, although you can enable them by providing a list of paths containing files and data to temporarily store in these volumes at the pipeline- or step-level.
  • Are scoped to a pipeline and are shared between all steps in the pipeline.

Volumes act as regular disks, and have the following properties on Linux:

  • They use NVMe storage, delivering high performance.
  • They are formatted as a regular Linux filesystem (for example, ext4)—therefore, these volumes support any Linux use-cases.

Volumes on macOS are a little different, with sparse bundle disk images being utilized, as opposed to the bind mount volumes used by Linux. However, macOS volumes are managed in the same way as they are for Linux volumes.

Volume retention

Volumes are retained for up to 14 days maximum from their last use. Note that 14 days is not a guaranteed retention duration and that the volumes may be removed before this period ends.

Design your workflows to handle volume misses, as volumes are designed for temporary data storage.

Volume configuration

Volume paths can be defined in your pipeline.yml file using the cache key at either the root level of your pipeline YAML, or as an attribute on a step. Defining paths for the cache key in your pipeline YAML or attribute on a step will implicitly create a volume for the pipeline.

When volume paths are defined, the volume is mounted under /cache/bkcache in the agent instance. The agent links sub-directories of the volume into the paths specified in the configuration. For example, defining cache: "node_modules" in your pipeline.yml file will link ./node_modules to /cache/bkcache/node_modules in your agent instance.

Volumes can be created by specifying a name for the volume, which allows you to use multiple volumes in a single pipeline, or have multiple pipelines share a single volume. Note that it is not possible to share a volume across multiple pipelines.

When requesting a volume, you can specify a size. The volume provided will have a minimum available storage equal to the specified size. In the case of a volume hit (most of the time), the actual volume size is: last used volume size + the specified size.

Defining a top-level volume configuration (using the cache key at the root level of your pipeline YAML) sets the default volume for all steps in the pipeline. Any volume defined within a step will be merged with the top-level volume configuration, with step-level volume size taking precedence when the same volume name is specified at both levels. Paths from both levels will be available when using the same volume name.

Example

pipeline.yml
cache:
  paths:
    - "node_modules"
  size: "100g"

steps:
  - command: "yarn run build"
    cache: ".build"

  - command: "yarn run test"
    cache:
      - ".build"

  - command: "rspec"
    cache:
      paths:
        - "vendor/bundle"
      size: 20g
      name: "bundle-volume"

Required attributes

paths A list of paths to volume. Paths are relative to the working directory of the step.
Absolute references can be provided in the cache paths configuration relative to the root of the instance.
Example:
- ".volume"
- "/tmp/volume"
Be aware that if you do not need to include other optional attributes and you only need to define a single path for your volume, you can omit this paths attribute, and simply add your path to the end of the cache attribute or key.
Example:
cache: ".volume"

On macOS hosted agents, the instance is a full macOS snapshot, including the standard file system structure. Volume paths cannot be specified on reserved paths, such as /tmp and /private. However, sub-paths such as /tmp/volume are acceptable.

Optional attributes

name A name for the volume. This allows you to use multiple volumes in a single pipeline. If no name is specified, the value of this attribute defaults to the pipeline slug.
Example: "node-modules-volume"
size The size of the volume. The default size is 20 gigabytes, which is also the minimum volume size that can be requested.
Units are in gigabytes, specified as Ng, where N is the size in gigabytes, and g indicates gigabytes.
Example: "20g"

Lifecycle

At any point in time, multiple versions of a volume may be used by different jobs.

The first request creates the first version of the volume, which is used as the parent of subsequent forks until a new parent version is committed. A fork in this context is a "moment", or a readable/writable "snapshot", version of the volume in time.

When requesting a volume, a fork of the previous volume version is attached to the agent instance. This is the case for all volumes, except for the first request, which starts empty, with no volumes attached.

Each job gets its own private copy of the volume, as it existed at the time of the last committed volume version.

Version commits follow a "last write" model—whenever a job terminates successfully (that is, exits with exit code 0), volumes attached to that job have a new parent committed—the final flushed volume of the exiting agent instance.

Whenever a job fails, the volume versions attached to the agent instance are abandoned.

Non-deterministic nature

Volumes, by their very nature, only provide non-deterministic access to their data. This means that when you issue a command in a Buildkite pipeline to retrieve data or an image from a volume (for example, a previously built Docker image in the container cache volume with a docker pull command), then the command may instead retrieve the data or image from a different source, such as the remote Docker builder's local storage/file system, which could be very fast, or Docker Hub, which could be very slow by comparison due to bandwidth limitations.

This behavior results from a volume's data availability, which depends on the following factors:

  • How often the volume is used.
  • How often the data on the volume is changed.

If a volume is used more frequently by pipelines, and the volume's data (for example, Docker images) remains relatively static, then the availability of the volume and its data (that is, its volume hit rate) to commands in your Buildkite pipeline, such as docker pull, is likely to be higher, resulting in a greater chance that the required data is sourced from the volume.

If, however, the volume is used less frequently and its data is relatively dynamic, then the volume hit rate is likely to be lower, meaning that the data will be sourced from other sources and external repositories.

Container cache volumes

Container cache volumes are types of volumes used to cache Docker images between builds.

This feature is only available to Linux hosted agents.

The container caching volumes feature can be enabled on the cluster's Cached Storage > Settings page. Once enabled, container cache volumes will be used for all Buildkite hosted agent jobs in that cluster. A separate volume is created for each pipeline, and is done so upon the pipeline being built for the first time.

Hosted agents container cache setting displayed in the Buildkite UI

A container cache volume's name is based on your pipeline's slug followed by a slash, then "container-cache". For example, pipeline-slug/container-cache.

You can view all of your current cluster's volumes through its Cached Storage > Volumes page.

Git mirror volumes

Git mirror volumes are specialized types of volumes designed to accelerate Git operations by caching the Git repository between builds. This is useful for large repositories that are slow to clone.

The Git mirror volumes feature can be enabled on the cluster's Cached Storage > Settings page. Once enabled, Git mirror volumes will be used for all Buildkite hosted agent jobs in that cluster. A separate volume is created for each repository, and is done so upon the first pipeline (whose source is the repository) being built for the first time.

Hosted agents git mirror setting displayed in the Buildkite UI

A Git mirror volume's name is based on your cloud-based Git service's account and repository name, and begins with "buildkite-git-mirror-". For example, buildkite-git-mirror-my-account-my-repository.

You can view all of your current cluster's volumes through its Cached Storage > Volumes page.