NewBuildkite hosted agents. Check out the Q1 Release for the latest features, including managed CI/CD offerings for Mac and Linux.

Monorepo CI Best Practices


Here at Buildkite, we serve some of the most highly-scaled and sophisticated dev teams on the planet. Our customers use every conceivable combination of tooling and methodology to improve productivity and developer experience, but one thing that’s ubiquitous is the use of monorepos.

Using a monorepo, or “monolithic repository”, means storing all of your code for multiple projects in a single repository, as opposed to a ‘multi-repo’ approach, where each project or service has its own separate repository. Both strategies have their merits, and there’s no ‘right or wrong’ strategy to use; however, it’s also important to consider the areas in which monorepos can introduce friction, especially when designing, implementing, and maintaining CI/CD.

The benefits of monorepos

As projects and teams grow in size, maintaining a ‘multi repo’ approach can be the source of unwelcome complexity that slows your team down. Projects can become siloed, with teams lacking visibility outside of the services they’re working on, and maintaining consistency around coding standards, security policies, tooling and dev practice can be challenging. A monorepo unifies many (or all) of your repos into a single project and, whilst this won’t automatically solve the problems mentioned above, this alternative workflow brings a number of benefits:

  • Simplified dependency management - In a properly designed monorepo, there will be only one version of every module. This makes it easier to manage dependencies and ensure that there aren't conflicts between versions.
  • Easier code sharing - In a monorepo, sharing code between projects is straightforward. You don't need to create separate libraries or packages unless you want to. This can encourage code reuse and reduce duplication.
  • Improved visibility and collaboration - A monorepo represents a single source of truth for your organization's codebase, and developers have visibility into all parts of the codebase, beyond the services that they’re working on. This can foster a better understanding of the broader system and enable more effective collaboration across teams.
  • Easier to release - CI/CD setups can be simplified when all code is in one repository, leading to potentially faster build and deployment processes. Rather than each repo/service having its own separate CI/CD pipelines, unified pipelines can be used for all builds in the monorepo (we’ll touch on this in more detail in a moment).
  • Standardization - With a monorepo, you can have consistent build tools, linters, testing frameworks, and configurations across all projects. This makes it easier to onboard new developers and ensures (well, should ensure) uniformity in development practices.

The challenges with monorepos

Obviously, the act of shoving all your smaller repos into one large repository is, in itself, not going to suddenly solve all your problems. The benefits outlined above are going to come from leadership and ownership of the repository, and processes being put in place to ensure that best practices are being followed. Challenges can arise with even the best managed monorepos, so it’s important to maintain focus on areas that could become problematic if left unattended to.

  • Managing the repo as it grows - As the monorepo grows, traditional CI/CD tools may struggle to efficiently handle the sheer amount of code and the number of projects. Those using legacy CI/CD tools (...Jenkins 😟) will likely experience scaling issues, as the tooling struggles to keep up with the size and complexity of your project. Specialized tools and configurations will likely be needed to manage repos of this size and complexity. Similarly, you may find that your SCM experiences scaling issues as the monorepo grows, resulting in performance issues with cloning, pulling, and pushing changes.
  • Keeping the main branch clean - Monorepos often rely heavily on CI/CD processes to automate testing, building, and deploying of individual projects. If the main branch isn't clean, it can disrupt these automated workflows, halting deployments or breaking builds.
  • Merge conflicts are harder to manage - Given the volume of changes and the number of contributors, there's a higher likelihood of merge conflicts. Resolving these conflicts might require coordination between teams that wouldn't typically interact in a multi-repo setup. Using a merge queue can really help here (spoiler alert: we talk about merge queues in a little more detail below).
  • Bad coding practices can lead to tangled, fragile code - In a monorepo, because everything is unified into a single place, there's a potential for higher interdependencies between projects or modules. Bad coding practices can lead to tightly coupled code, where changes in one module inadvertently affect another. This can cause unintended side effects and make the codebase fragile.
  • Noisy build notifications - With multiple teams working on different projects within the same repository, the volume of commits can be substantial. If every commit or merge triggers a build (as is common with CI), the number of build notifications can become overwhelming.

Best practices in managing monorepos over time

So we’ve established that monorepos can be awesome and solve a lot of scaling issues but, at the same time, also introduce their own fair share of growing pains. Fortunately, this is a well-trodden path at this point, and there are a number of tried-and-tested approaches for you to consider, to help make your monorepo experience a positive one!

Use selective builds

"Selective builds" refer to the process of building only the parts of the codebase that have been changed or are affected by a change, rather than building the entire monorepo. Given that a monorepo contains multiple projects, components, or services within a single repository, selective builds ensure efficiency and speed in the development and integration process. Along with building selectively, tests can also be run selectively. For instance, if a library used by several projects is updated, only the tests relevant to that library and its dependents would be executed.

Your CI/CD tool might have some form of selective build capability support, allowing you to define which actions apply to changes observed at certain paths.

  • Some tools, such as Github Actions, support path filtering as part of the workflow specification. Workflow authors can define a number of predefined workflows, specific to services within the monorepo, and configure them to execute based on which files/paths in the commit have changed.
  • Buildkite allows users to dynamically generate pipeline steps based on changes, by calculating the diff of the current commit to see which paths and files have changed, and then programmatically determining which build/test steps need to execute. Buildkite provides a plugin that can help users access this capability without having to write their own dynamic pipeline gear.
    The ability to generate builds dynamically is a powerful feature that unlocks a ton of capabilities beyond what is possible with predefined workflows. With a dynamic pipeline, you can create your own tooling that, when executed at build runtime, generates all the necessary jobs specific to that particular build, rather than just running through a series of preconfigured jobs that may not be applicable. If you’d like to learn more about dynamic pipelines, check out this blog post—it provides more info on why, where, and how you can use them in your own projects.

Modern build tools like Bazel, Gradle, and Buck are designed to understand the dependency graph of the code, and determine what needs to be rebuilt based on changes. Such tools cache previous build results and utilize them to skip rebuilding parts of the codebase that haven’t changed.

In their UnblockConf talk, Uber explains how they’re leveraging Buildkite with Bazel to provide significant build performance gains at scale, whilst providing excellent observability.

Have a strategy for dependency management

Dependency management in a monorepo is crucial due to the intertwined nature of multiple projects, libraries, or services that reside within the same repository. Ensuring the dependencies are managed properly will ensure that builds perform optimally, function correctly, and reduce friction for developers working on the project.

When merging a bunch of repositories into a monorepo, determining how to manage dependencies across projects is going to be an important part of the process, and a source of future technical debt if done incorrectly. Consider organizing the monorepo in a way that reflects logical divisions, such as services, libraries, shared components, etc. Also, rather than allowing individual projects to specify versions of shared libraries, centralize version definitions—this ensures consistency across the monorepo.

There are a number of tools that you can integrate with your CI/CD pipelines to check for dependency issues, such as outdated libraries or version conflicts. The best tool for the job will vary depending on your application stack, but consider looking into Lerna (JS/Typescript), Dependabot (Github, several languages), and Renovate (several languages).

Managing access and merge-approvals

Managing access control with a monorepo is essential, due to the unique challenges and requirements presented by housing multiple projects or components in a single repository.

By moving your various services into a single repository, you’re encouraging collaboration, visibility, and cohesion between teams and engineers. This is in no small part due to the fact that everyone can now see all the code in the monorepo as opposed to, previously, when it was scattered across multiple smaller projects, and access was granted at the repository level.

If there are parts of your project that you aren’t comfortable with the entire team getting access to, then consider breaking these out into a separate repository and limiting access that way. If this isn’t viable, then it’s possible that a monorepo might not be the best fit for your organization.

If the concern is less around people viewing what’s in the repo, and more that someone might push a change to something they shouldn’t, then both GitHub and GitLab offer a ‘codeowners’ capability. By setting up a CODEOWNERS file, a user (or users) can be assigned ownership to a path in the repository and have the ability to approve or disapprove any changes before a merge occurs.

Use Git features to enhance repo performance

As the size of your repository grows, the amount of bandwidth and system resources required to effectively utilize the project will increase, and the time it takes to clone can become substantial, especially when employing modern CI/CD strategies such as parallelism. This is, to a large degree, an unavoidable side-effect of using a monorepo, and a common concern of those that have opted to employ one. That said, there are strategies that you can employ to reduce friction.

Git itself offers a couple of features that can help users reduce the amount of data they have to wrangle with each clone:

  • Shallow clones are used to clone a repository without getting the entire history and, instead, only retrieving the last ‘n’ commits. Like a sparse checkout, this will result in faster clones, with reduced disk and bandwidth consumption; but, also like sparse checkouts, it's not all smooth sailing. By design, you’ll have a limited commit history, which can be limiting when performing certain kinds of git operations, such as blames, bisects, or deep dives into history. You may also run into issues with dependencies between commits, where the missing history can potentially lead to build or test failures.
  • Sparse checkouts allow you to check out only a subset of the files in a repository, making it possible to work with just the parts of a monorepo that you need. Less data will be copied, the operation will be faster because there are fewer files to be processed, and the workspace will be simplified for the user. However, a sparse checkout can complicate your git workflow, particularly for those less familiar with the feature or project. Also, by only checking out a subset of the repo, you risk the necessary context or history being missing, leading to CI failures.

This blog goes into detail about how to implement both strategies, and is recommended reading!

Both sparse checkouts and shallow clones are powerful features that can make working with monorepos more manageable. However, they come with their own complexities and might not be suitable for all workflows or all team members. It's essential to educate and train the development team when using these features and understand the specific needs of your projects and workflows. When implemented judiciously, they can significantly improve the developer experience with large monorepos.

Use trunk-based development

‘Trunk-based development’ (TBD) is a strategy where all developers work in a single branch, often called the 'trunk' or 'main' branch, and use short-lived feature branches (or no branches at all). This approach can be particularly advantageous when working within a monorepo, as it discourages the use of longer-lived branches that can otherwise cause potential CI problems down the line. With frequent builds and short-lived branches, the codebase doesn't diverge significantly, leading to fewer merge conflicts. Check out the section below, on the subject of merge queues, for more info on this subject.

Trunk-based development encourages developers to work against a near-latest version of the codebase, ensuring consistency. In a monorepo, where multiple projects or components may have dependencies on each other, this helps ensure that all parts of the system are compatible. Furthermore, TBD naturally aligns with CI/CD practices. Every merge into the main branch can trigger automated builds and tests, ensuring that the code in the monorepo is always in a deployable state.

Whilst beneficial when used correctly, trunk-based development offers several advantages, but it’s not without challenges. It will require team members to embrace this development practice, which could be a significant culture shift for some organizations. A robust CI/CD pipeline will also be required, so that frequent merges to the main branch don’t disrupt the stability of the codebase.

Finally, TBD often uses feature flags to hide work-in-progress features. This allows for the merging of code into the main branch, even if it’s not fully ready for wider consumption yet. However, heavy reliance on feature flags can introduce its own set of complexities, such as managing old flags and ensuring the codebase doesn’t become cluttered with conditional statements. Well-defined processes around feature flag lifecycle management can help keep this under control.

Queue and verify pull requests

If you’re using a monorepo, there’s a pretty good chance that it’s a fairly large and complex repository. If your repository is fairly large and complex, then there’s a similarly good chance that there are quite a few people working out of it. Maybe hundreds of people, or perhaps even thousands! The team might not have started that large, but it is now, and thousands of people are potentially cutting thousands of PRs every single day. How can this many PRs possibly be merged each day, without all sorts of problematic conflicts occurring? Answer: by utilizing a merge queue!

A merge queue is a system or tool that automates and serializes the process of merging pull requests into the main branch. In a busy monorepo, where many teams or developers might be trying to merge changes simultaneously, a merge queue can offer several benefits.

  • Ensuring a Green Build - When multiple developers are trying to merge their changes, the state of the main branch can be volatile. By the time a CI/CD pipeline finishes testing a given change, the main branch might have moved forward with other merges, rendering the previous tests outdated. A merge queue tests potential merges against the future state of the main branch, assuming all preceding merges in the queue are successful. This ensures that if a change passes tests in the queue, it will remain green when it's its turn to be merged.
  • Reducing Merge Conflicts - When many developers try to merge simultaneously, they can end up with a higher frequency of merge conflicts. By serializing the process, a merge queue reduces the potential for these conflicts.
  • Efficient Use of CI/CD Resources - Without a merge queue, multiple developers might push their changes around the same time, causing the CI/CD system to test each one against the main branch independently. If the main branch advances during this time (due to other merges), some of those test runs might become invalid. With a merge queue, each change is tested with the assumption of a particular main branch state, leading to more valid test runs and better utilization of CI/CD resources.

Depending on your SCM, you might have a merge queue-like feature available to you (GitHub, GitLab). If not, then never fear, because there are several 3rd party solutions available, such as Aviator.

Their Unblock talk examines some monorepo merge strategies offered by their MergeQueue product. They talk about when they should be used, and how they can help you remove productivity blockers from your complex monorepo (highly recommended viewing)!

Conclusion

That was a lot of words. The goal here was to provide some clarity on what a monorepo is, the positives and negatives of employing such a strategy, and some pointers on how you can help maximize the benefits of using a monorepo whilst, hopefully, minimizing the drawbacks. This information above should be largely applicable, regardless of what SCM and CI/CD tooling you’re using, but I’d be doing myself a disservice if I didn’t wrap this article up without briefly touching on why Buildkite is the CI/CD tool to help wrangle your monorepo.

  • Dynamic pipelines - Builds can be dynamically generated at runtime, or even as your build is running, rather than being defined prior to the build. This is an excellent approach to building a monorepo, as it allows you to dynamically generate the necessary build jobs based on changes in the current commit (remember the ‘selective builds’ section above?). Hasura wrote an excellent blog post on the benefits of using dynamic pipeline capabilities with their monorepo, that I highly recommend you check out.
  • Unlimited build agents - Buildkite doesn’t put any constraints on how you scale your build infrastructure, regardless of what plan you’re on. Some of our customers scale up to ~100,000 build agents at peak hour to handle load, and that’s absolutely fine!
  • No limits on build or job concurrency - Buildkite customers are able to achieve huge performance increases by scaling the number of builds and concurrent jobs that they’re able to execute. This means you can eliminate queued build wait times, by simply spinning up the necessary infrastructure to process jobs immediately. Or slash your massive monorepo build times from hours to minutes by parallelizing that huge test suite across 500 agents concurrently.
  • Dev friendly - Buildkite is designed to be extended and customized to your requirements, and is highly opinionated about how you approach problems. Regardless of the shape and size of your monorepo, or what dev practices your organization employs, Buildkite can be used effectively.

To try Buildkite out for yourself, and join the likes of Shopify, Uber, Slack, and Elastic (to name but a few), sign up for a free trial account today!