Git checkout optimization

This page covers best practices for optimizing Git workflows in Buildkite Pipelines through the use of sparse checkout and Git mirrors.

Sparse checkout

Sparse checkout is a Git feature that allows you to check out only a subset of paths from a repository into your working directory, while the local repository still retains the full commit history. When using sparse checkout, after you have specified the required paths, Git will populate only those files locally, which speeds up operations and reduces disk usage on very large, monorepo-style projects, without changing the repository itself or requiring server-side setup.

To natively implement sparse checkout in Buildkite Pipelines, you can use the Sparse Checkout Buildkite plugin. It allows you to speed up pipeline upload by checking out only .buildkite or other specific paths and supports cone and non-cone patterns, optional aggressive cleanup, skipping ssh-keyscan, and verbose mode for debugging.

Git mirrors

Git mirrors are one of the most effective ways to speed up Git checkouts in Buildkite Pipelines. Instead of fetching the entire repository from your remote Git server every time, agents maintain a single local bare mirror of each repository on the host machine.

When a build runs, the Buildkite agent performs a fast local clone from the mirror by using git clone --reference flag, significantly reducing checkout times, especially for large repositories or those with extensive histories. Submodules also benefit from this optimization by referencing the mirror during their checkout process.

Comparing sparse checkout and Git mirrors

While both approaches help optimize your Git workflow, they solve different problems and work in fundamentally different ways. Understanding when to use each can make a real difference in your build performance.

Sparse checkout:

  • Is client-side only, so no extra infrastructure or separate repository is required for its implementation.
  • Downloads the full repository history but only checks out the selected paths in the working tree - the files and folders that you actually need in your working directory.
  • Useful for monorepo teams where different teams touch different directories - for example, when the frontend developers don't need backend code cluttering their workspace (and vice versa).

Git mirrors:

  • A separate copy of your repository (typically created with --mirror or --bare) that mirrors another repository and acts as a local cache.
  • Useful in CI/CD environments with frequent builds to avoid repeatedly hitting your Git server.
  • Can mirror everything (all refs and history) or be combined with filtering if you build specialized mirrors.
  • Require some upfront setup and ongoing maintenance, but result in faster checkout times.

When to use which

  • Use sparse checkout when you’re optimizing developer workstation performance - for example, developers need to work in a large repository but only on a few directories, optimizing local checkouts and IDE performance without changing server infrastructure.
  • Use a Git mirror when you’re optimizing distribution, reliability, or centralization for automation and scaling - for example, when you need a replicated source of truth for CI, faster clones for many agents, network isolation, or migration between hosts.

In addition to sparse checkout and Git mirrors, for checkout optimization you can also use the Git Shallow Clone Buildkite Plugin that sets --depth flag for git-clone and git-fetch commands.

Understanding checkout defaults across platforms

The default checkout behavior in Buildkite Pipelines prioritizes completeness and flexibility. As a result, if you're migrating to Buildkite Pipelines from another CI/CD platform, especially if you're using LFS, you might notice differences in checkout speed or behavior.

To understand how Buildkite's checkout defaults differ from other platforms in a GitHub Actions-based example (including LFS handling, shallow clones, and customization options), see Understanding the difference in default checkout behaviors.

How to monitor Git operations

Understanding where time is spent during Git checkout helps you identify bottlenecks and measure the impact of optimizations. The following approaches can help you gain visibility into Git performance across your builds.

OpenTelemetry tracing

The Buildkite agent emits OpenTelemetry trace spans for checkout behavior when tracing is enabled. Two spans are relevant to Git operations:

  • checkout: Covers the entire checkout phase, including pre-checkout and post-checkout hooks.
  • repo-checkout: A child span of checkout that isolates the Git checkout itself, excluding hook execution time.

By comparing these two spans, you can determine whether slowdowns originate from Git operations or from custom hook logic. If you are also using the OpenTelemetry Tracing Notification Service, you can propagate traces from the Buildkite control plane through to the agent spans for an end-to-end view of build performance.

Checkout hooks

You can use a checkout hook on your agents to add custom timing or instrumentation around the Git checkout phase. For example, a pre-checkout hook could record a start timestamp and a post-checkout hook could calculate the elapsed time and send it to your monitoring system. This approach works with any observability platform and does not require OpenTelemetry.

Git caching proxies

A local or network-level Git caching proxy sits between your agents and the upstream Git server, caching repository data and serving repeated clones or fetches from a local cache. Because all Git traffic flows through the proxy, it provides a natural instrumentation point for collecting metrics such as cache hit rates, clone durations, and bandwidth usage.

Two open-source options that support Git caching with built-in observability are:

  • Cachew: A protocol-aware caching proxy that maintains compressed snapshots of repositories for faster restores. It supports OpenTelemetry metrics and Prometheus integration.
  • content-cache: A content-addressable caching proxy that supports Git smart HTTP protocol with pack-level caching. It exports OpenTelemetry metrics and provides Prometheus endpoints for monitoring cache effectiveness.