---
title: "Setting up a self-hosted Bazel remote cache on AWS"
date: "2025-04-21"
author: "Praneet Loke"
description: "A guide to deploying a self-hosted Bazel remote cache on AWS, with examples in Terraform and Pulumi."
tags: "Bazel, Best practices"
readingTime: "9 minute read"
---

# Setting up a self-hosted Bazel remote cache on AWS

A guide to deploying a self-hosted Bazel remote cache on AWS, with examples in Terraform and Pulumi.

<p>Bazel builds can be sped up significantly by caching build outputs. A remote cache helps reduce the running time of your builds on CI runners by effectively reusing Bazel build outputs. This post introduces you to the options available for deploying a self-hosted Bazel remote cache.</p><h2>How Bazel remote caching works</h2><p>Bazel’s remote caching is surprisingly simple. Based on the HTTP/1.1 protocol, it&apos;s described in the official <a href="https://bazel.build/remote/caching#http-caching"><u>docs</u></a> as follows:</p><blockquote>Binary data (BLOB) is uploaded via PUT requests and downloaded via GET requests. Action result metadata is stored under the path `/ac/` and output files are stored under the path `/cas/`.</blockquote><p>The remote cache stores two types of data:</p><ul><li><p>The <em>action cache</em>, which is a map of action hashes to action result metadata</p></li><li><p>A <em>content-addressable store</em> (CAS) of output files</p></li></ul><p>Successful remote cache hits will appear in the status line, which might look like this:</p><pre><code>INFO: Elapsed time: 1.727s, Critical Path: 1.19s
INFO: 25 processes: 2 remote cache hit, 23 internal.</code></pre><p>Here, out of 25 processes, 2 were remote cache hits, so 23 were executed locally. </p><p>We highly recommend reading the <a href="https://bazel.build/remote/caching"><u>overview</u></a> section in the official docs to learn more about remote caching in general. For guidance on how best to identify cache hits and misses, see the <a href="https://bazel.build/remote/cache-local#cache-hits">remote-cache debugging docs</a>.</p><h2>What is remote caching?</h2><p>Remote caching is a feature of <a href="https://github.com/bazelbuild/remote-apis?tab=readme-ov-file#remote-apis"><u>Remote APIs</u></a>. Remote APIs is a collection of APIs that enable large scale distributed execution and caching on source code and other inputs. The APIs are categorized as Remote Execution, Remote Asset and Remote Logstream.</p><p>There are several <a href="https://github.com/bazelbuild/remote-apis?tab=readme-ov-file#clients"><u>clients</u></a>, including Bazel, that use servers that support the remote execution APIs from REAPI. These clients are free to talk to any server that can support remote execution for distributed execution and/or caching.</p><h2>Self-hosting a remote Bazel cache with <code>bazel-remote</code></h2><p>In this post, we’ll focus on <a href="https://github.com/buchgr/bazel-remote"><code>bazel-remote</code></a>, since it’s a cache-only service. In other words, <code>bazel-remote</code> only implements the caching APIs from the remote execution collection of APIs. Also note <code>bazel-remote</code> is not the only open-source service that can support remote caching for Bazel. Have a look at the other <a href="https://github.com/bazelbuild/remote-apis?tab=readme-ov-file#servers"><u>open-source remote execution servers</u></a> in the official remote APIs GitHub repository.</p><p><code>bazel-remote</code> is a simple Go-based service. It can be run as a container or in its raw binary form on a VM with a local disk as its primary method of persistence. In addition to writing to a local disk, it also supports writing (asynchronously) to Amazon S3, Google Cloud Storage, Azure Storage, or even another <code>bazel-remote</code> instance as a proxy backend.</p><p>Here’s what a simple example of deploying <code>bazel-remote</code> would look like:</p><div>Image of How the Bazel CLI and bazel-remote service write to and read from different storage backends.</div><h3>Deploy the <code>bazel-remote</code> service with Terraform or Pulumi</h3><p>To make it easy to deploy <code>bazel-remote</code> on AWS, we&apos;ve created two infrastructure-as-code examples, one using <a href="https://developer.hashicorp.com/terraform">Terraform</a> and the other <a href="https://www.pulumi.com/docs/iac/">Pulumi</a>. The repository containing both examples is available on GitHub at <a href="https://github.com/buildkite/bazel-remote-cache-aws-iac-example">https://github.com/buildkite/bazel-remote-cache-aws-iac-example</a>.</p><p>Both versions set up the same infrastructure components:</p><ul><li><p>An ECS cluster using the AWS Fargate capacity provider</p></li><li><p>A <code>bazel-remote</code> ECS service that utilizes:</p><ul><li><p>AWS Firelens with a FluentBit sidecar container definition</p></li><li><p>A load-balancer target group attachment</p></li><li><p>A security group for access to <code>bazel-remote</code></p></li><li><p>An EBS volume <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ebs-volumes.html"><u>managed by ECS</u></a></p></li></ul></li></ul><p>We chose ECS on Fargate as the compute platform on AWS because of its simplicity and its capability to scale services easily. Moreover, it’s possible to use an <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/efs-volumes.html"><u>EFS volume</u></a> instead of an EBS volume just as easily—although we recommend starting with the EBS volume. Services are always updated using a blue/green strategy, which is useful when updating <code>bazel-remote</code> to a newer version without causing interruptions.</p><p>Before you can run either of the IaC code samples to set up <code>bazel-remote</code> on ECS, make sure you have the right AWS credentials with permissions to the following resources:</p><ul><li><p>IAM roles, policies and the ability to pass a role to ECS</p></li><li><p>CloudWatch log groups and log streams</p></li><li><p>ECS clusters, services, and task definitions</p></li><li><p>EC2, including VPCs, target groups, load balancers, and security groups</p></li></ul><p>To deploy with Terraform, make sure you&apos;ve applied your AWS credentials in the usual way (e.g., with environment variables), then change to the <code>terraform</code> folder in the repository and run:</p><pre><code>terraform init
terraform plan
terraform apply</code></pre><p>By default, this will write to a local Terraform state file. If you&apos;d prefer to use HCP Terraform Cloud, follow the <a href="https://developer.hashicorp.com/terraform/tutorials/cloud-get-started"><u>tutorials</u></a> to set up the cloud workspace so you can save your state on TF Cloud instead of managing it in Git version control.</p><p>Alternatively, to deploy with Pulumi, once again make sure you&apos;ve applied your AWS credentials properly, change to the <code>pulumi</code> folder in the repository, and run:</p><pre><code>pulumi stack init dev
pulumi preview
pulumi up</code></pre><p>Unlike Terraform, Pulumi writes its state to Pulumi Cloud by default. If you&apos;d prefer to write to a local file, use <a href="https://www.pulumi.com/docs/iac/concepts/state-and-backends/#local-filesystem"><code>pulumi login --local</code></a> first, then follow the same steps.</p><p>When the deployment completes, both the Terraform and Pulumi versions will expose the DNS name of the load balancer as <em>outputs:</em></p><pre data-language="bash"><code>terraform output dns_name
bazel-remote-1818996606.us-west-2.elb.amazonaws.com%

pulumi stack output dnsName
web-lb-07ad29a-136960774.us-west-2.elb.amazonaws.com</code></pre><p>To use the newly-created service as remote cache, format the DNS name as <code>http://${dns_name}</code> and use it in your <code>bazel</code> command-line client with the <code>--remote_cache</code> flag:</p><pre><code>bazel build //... --remote_cache &quot;http://$(terraform output -raw dns_name)&quot;

bazel build //... --remote_cache &quot;http://$(pulumi stack output dnsName)&quot;</code></pre><p>And thats it! You&apos;re now up and running with a scalable Bazel remote cache service on AWS.</p><p>The next section will walk you through further <a href="https://github.com/buchgr/bazel-remote?tab=readme-ov-file#usage">configuration options</a> you may want to consider as next steps.</p><h3>Configuring <code>bazel-remote</code></h3><p>Many of these <a href="https://github.com/buchgr/bazel-remote?tab=readme-ov-file#command-line-flags">configuration</a> recommendations were compiled from questions and experiences reported from users of <code>bazel-remote</code> and their experience of running it at scale. See <a href="https://github.com/buchgr/bazel-remote/issues/786#issuecomment-2418009617"><u>bazel-remote#786</u></a>, for example.</p><h4>Authentication</h4><p><code>bazel-remote</code> supports Basic authentication using an <code>.htpasswd</code> file. You should ensure you&apos;re using HTTPS in this case since the username/password will otherwise be transmitted in clear text.</p><p>If you plan to deploy the service internally in your corporate network that is not reachable over the internet, you may be able to skip this.</p><h4>Compression</h4><p>While there is no extra configuration necessary to achieve this, we have included this as a separate section to add more context on compression behavior. </p><p><code>bazel-remote</code> uses <a href="https://en.wikipedia.org/wiki/Zstd">Zstd compression</a> by default, but it can be disabled with the flag <code>--zstd_implementation</code>. Running <code>bazel-remote</code> with compression enabled will increase the effective cache size of the remote cache, but you can also tell your Bazel client to upload already-compressed blobs to achieve even better network performance.</p><p>To tell the Bazel client to use compression with the remote cache service, pass the flag <code>--remote_cache_compression</code> (or <code>--experimental_remote_cache_compression</code> if you&apos;re using Bazel v6). Additionally, it&apos;s also recommended that you set a threshold for when compression should be triggered, so that some small artifacts aren’t needlessly compressed, which can increase memory usage on your Bazel build runners. To set this threshold, pass the flag <code>--experimental_remote_cache_compression_threshold</code> with a value of <code>100</code> (starting in Bazel v7, this is the default value), which appears to be a sort of <a href="https://github.com/bazelbuild/bazel/issues/18997#issuecomment-2329371387"><u>sweet spot for compression effectiveness</u></a>, to your Bazel client. You may tune this value later based on your needs and observations.</p><h4>Max disk cache size</h4><p>The <code>--max_size</code> flag allows you to set the ceiling for the cache regardless of the size of the disk. This ceiling lets <code>bazel-remote</code> evict some items using an LRU eviction policy. Be sure to set this to a reasonable value based on your disk size.</p><h4>Networking</h4><p>Consider the location of the remote cache and provision the instance as close as possible to, or even in the same VPC as, your Bazel build runners.</p><h3>Other performance considerations</h3><p>It is recommended that you start with a simple cache setup for a build and observe its efficacy before trying to optimize the setup, as the optimization depends on the problems you may run into.</p><p>While there isn’t any official recommendation for how to run <code>bazel-remote</code> effectively for different types of scenarios, we’ve collected some information based on user feedback from the repository. Here’s a collection of information that might be useful for your deployment.</p><h4>Load balancing</h4><p>It might be tempting to load-balance multiple <code>bazel-remote</code> instances each with their own disk cache. Do not do this unless you can ensure that cache requests from the Bazel client can be routed to the same <code>bazel-remote</code> instance <em>for every request in the course of a single build</em>. You might be able to use a network file system instead of a local disk, for example, JuiceFS, Amazon EFS, or similar services.</p><p>Many load balancers offer session affinity or client stickiness which allows you to route a request to a specific backend server. However, that feature requires clients to persist cookies (e.g., web browsers) that are sent back to the server with every subsequent request. So the feature is ineffective with Bazel.</p><h4>Partitioned caching</h4><p>Instead of using a single <code>bazel-remote</code> cache for <em>every</em> type of build your team may be running, consider running an instance, say, for each platform that you are building for, or identify the builds with large artifacts, and perhaps use a dedicated instance for those and isolate the others to different dedicated cache instances.</p><h4>Tiered caching</h4><p>Whereas partitioned caching ensures that you have dedicated instances for certain types of builds, tiered caching is where you might have smaller <code>bazel-remote</code>s that use a central, larger <code>bazel-remote</code> instance as an HTTP proxy backend. Remember that <code>bazel-remote</code> can use object storage or even another <code>bazel-remote</code> instance as a proxy backend to fallback to. (The words “larger” and “smaller” here refer to the disk size and not the compute size.)</p><h3>More self-hosting options: cloud object storage</h3><p>While there may be other ways to self-host a remote cache that satisfies Bazel’s HTTP caching protocol, using an object storage service such as Amazon S3, Google Cloud Storage, or Azure Blob Storage might be even simpler. </p><p>Specifically, <a href="https://bazel.build/remote/caching#cloud-storage"><u>using a Google Cloud Storage bucket</u></a> might be the simplest compared to others, since the Bazel client supports specifying a Google credentials file for authentication. This  means you don&apos;t need another service in front of object storage to handle authentication (i.e., some sort of a reverse proxy), as would be the case for Amazon S3 and Azure Blob Storage.</p><p>If you are interested in using Amazon S3 as a Bazel remote cache, <a href="https://github.com/cnunciato/bazel-remote-cache-pulumi-aws">check out this example</a>, which uses Amazon S3 and CloudFront, and also supports HTTP Basic authentication.</p><h2>Next steps</h2><p>Using a remote cache with Bazel can dramatically reduce your build times—especially with large and complex projects that produce a lot of infrequently changing build artifacts.</p><p>If you&apos;re using Bazel in production, chances are you&apos;ll need a delivery platform that can help you manage the scale and complexity that Bazel projects so often require. As a flexible and massively scalable platform, Buildkite is an especially good fit for Bazel—in fact, <a href="https://buildkite.com/resources/webinars/how-bazel-built-its-ci-system-on-buildkite/">the Bazel team itself even uses Buildkite to ship Bazel</a>! A few suggestions to keep the learning going:</p><ul><li><p>Learn <a href="https://buildkite.com/resources/webinars/how-bazel-built-its-ci-system-on-buildkite/">how the Bazel team built its CI system on top of Buildkite</a></p></li><li><p>See how to <a href="https://buildkite.com/resources/blog/fully-dynamic-pipelines-with-bazel-and-buildkite/">use Bazel with Buildkite dynamic pipelines</a> to fine-tune your workflows using Bazel&apos;s knowledge of the dependency graph</p></li><li><p>Brush up on your <a href="https://buildkite.com/resources/blog/a-guide-to-bazel-query/"><code>bazel query</code> skills</a></p></li><li><p>Learn more about <a href="https://buildkite.com/docs/pipelines/tutorials/bazel">Bazel and Buildkite</a> in the docs</p></li></ul><p><span></span></p>