Exploring Continuous Integration for Monorepos

Architectural trends in software development are shaped by advances in technology, evolving industry practices, and influential success stories. Virtualization, containers, and cloud computing heralded a technological revolution, offering unprecedented scalability, flexibility, and efficiency. These innovations laid the foundation for a new era in software architecture, notably the shift towards microservices and Service-Oriented Architecture (SOA).

This technological shift was further propelled by the success stories of pioneering industry leaders who were early adopters of this technology. They leveraged these new technologies to achieve rapid growth, increased agility, and never-before-seen scalability, serving as powerful testaments to the effectiveness of microservices and SOA. As a result, these architectural styles gained immense popularity, emerging as the new benchmarks for modern software systems.

This trend, while revolutionary, brought about a re-evaluation of existing practices, including the use of monorepos. With the spotlight on microservices and the power of cloud computing, containerization, and virtualization, the perception that these newer, decentralized approaches would supersede traditional monorepo strategies began to emerge. However, as the industry, developer tools, and continuous integration (CI) capabilities progressed, it became evident that microservices and SOA hadn’t replaced monorepos. It wasn’t a case of obsolescence, but rather an expansion of choices.

Today, there seems to be a more reasonable approach—an understanding that the decision to adopt a monorepo is not in opposition to any advancements in technology, but is a strategic choice that aligns with specific project needs. Monorepos offer:

Unified version control.
Cohesive management of cross-project dependencies.
Streamlined workflows.

And they continue to be a viable and effective approach, particularly in complex, large-scale development environments.

In this blog post, we'll explore CI in the context of monorepos, in comparison with CI for multi-repo setups inspired by the microservices trend. We’ll cover some real-world examples from leading tech companies, examine the benefits and challenges of monorepos, and provide strategic insights for planning CI for monorepos. Finally, we’ll highlight how Buildkite’s features and capabilities are ideally suited to meet the demands of modern software development, bridging the latest technological advancements with the strategic advantages of monorepos.

Monorepo vs. multi-repos for continuous integration

As the software development landscape evolved with these technological advancements, so did the tools and strategies for managing codebases, particularly in Continuous Integration (CI). The debate between monorepos and multi-repos has become central to this discussion, each presenting distinct advantages and challenges.

Monorepos: A unified approach

A monorepo contains all the code for multiple projects, libraries and components, and perhaps even a company, in a single repository. This approach centralizes the management of a diverse codebase.

CI advantages: With a monorepo, CI processes can be streamlined. You have a single source of truth for your entire codebase, making it easier to track changes, manage dependencies, and implement consistent testing across all components.
Unified tooling and processes: Standardizing tools and processes becomes more straightforward, reducing the complexity of maintaining multiple CI configurations.
Cross-project visibility: Changes in one part of the code can be immediately tested against the rest of the codebase, enhancing the quality and coherence of the software.

Multi-repos: Decentralized and isolated

In a multi-repo setup, projects or components are stored in their own separate repositories. This allows for more focused and isolated management of each code segment, and the code segment usually has a single responsibility, for example, handling payments or sending notifications

CI flexibility: Each repository will have a dedicated, isolated, and tailored CI pipeline. This means faster builds—with simpler configuration and customization—that are tailored to the specific needs of each project.
Reduced complexity per repo: Smaller, more focused repositories can be easier to manage, especially for teams responsible for a single project or service.
Clearer responsibility boundaries and greater autonomy for engineering teams: In some scenarios, multi-repos can help the engineering organization scale more easily. Teams can form around specific repositories with clearer responsibilities and fewer interdependencies.

Choosing the right approach

Choosing between a monorepo and multi-repos depends on various factors, such as:

Organization size
Interdependency of projects
Preferred workflows

Some organizations opt for a hybrid approach, maintaining a monorepo for closely related projects while using separate repositories for more independent components.

There is no one-size-fits-all solution. The monorepo or multi-repos decision is not about which is universally better, but which is more suitable for a given project's requirements and team structure. Both approaches have their merits and can coexist within the broader landscape of modern software development.

Examples of continuous integration for monorepos

As we continue exploring the world of Continuous Integration (CI) for monorepos, it's evident that monorepos are not a remnant of the past. They remain a considered–and often superior–choice for many organizations. The fact that leading tech companies and industry giants successfully run and maintain monorepos, validates the approach's benefits. In this section, we'll delve into how some of these companies implement CI for their monorepos, offering real-world insights into the practicalities, strategies, and outcomes of their choice.

Google's monolithic codebase

Google is often cited as the poster child for monorepos. They have billions of lines of code stored in a single codebase and have remarkably managed to continue scaling their development processes.

CI practices: Google's homegrown CI system known internally as Blaze—better known externally as Bazel—facilitates efficient builds by only recompiling the changed code and its dependencies.
Benefits: A unified codebase helps Google maintain code health, foster code sharing, and ensure consistent code quality across projects.
Challenges: Given the sheer size of its codebase, Google has had to make significant investments in tooling and infrastructure to keep build and test times reasonable.
Learn more: Why Google Stores Billions of Lines of Code in a Single Repository.

Facebook's single repository

Facebook is known for having a vast array of products and services and a monorepo for its primary codebase.

CI practices: Facebook employs a build tool called Buck. It's optimized for incremental builds and ensures that only modified code and its dependencies are rebuilt, making CI faster and more efficient.
Benefits: Their consolidated approach reduces repetitive code and facilitates the reuse of code components, especially for mobile apps. It also supports atomic changes across different parts of the ecosystem.
Challenges: Facebook dedicates significant time and effort to the continuous iteration of its developer tooling and ecosystem to handle the challenges of scale and ensure speedy feedback loops for its engineers.
Learn more: Scaling Mercurial at Facebook.

Uber's shift to a monorepo

In stark contrast to the other examples, Uber started with multi-repos but transitioned to a monorepo as they scaled.

CI practices: The primary reasons for Uber's move were to simplify dependency management, and streamline CI processes. Their CI pipeline was modified to be more efficient, building only what changed, and running tests in parallel to significantly reduce feedback time.
Benefits: This migration resulted in easier code sharing, centralized dependency management, and more consistent code quality checks.
Challenges: The transition wasn't without hiccups. Uber invested substantial time in developer education, tooling enhancements, and guaranteeing the smooth operation of their new operation.
Learn more: Building Fast, Reliable, and Scalable CI at Uber with Buildkite.

A common refrain in the tech community has been the skepticism around the scalability of monorepos. Critics have argued that monorepos are not fit for large-scale operations. However, the argument loses weight when we observe some of the world's largest software creators thriving with monorepo structures. While challenges abound, especially in implementing CI for such expansive codebases, the strategic advantages offered by cohesive versioning, unified testing, and streamlined workflows are hard to ignore. The real-world examples of these tech giants not only attest to the viability of monorepos at scale, but also serve as a testament to their potential benefits when adequately managed.

3 Benefits of continuous integration for monorepos

The debate around monorepos versus multi-repos isn't just a matter of scale, or company size, but about the benefits of each approach. When it comes to continuous integration (CI) in particular, monorepos have definite benefits that streamline and enhance the development lifecycle.

1. Standardization:

Unified standards: With a monorepo, CI processes like security checks, linting for code hygiene, and other code quality checks can be defined and enforced uniformly across the entire codebase.
Centralized management: Tool configurations, CI/CD pipelines, and conventions can be centralized, making them more accessible. This reduces inconsistencies and encourages engineers to follow coding and deployment standards.

2. Greater visibility:

No more "code silos:" One of the pitfalls of multi-repos is the potential for repositories to become isolated "code silos," where code remains sequestered and unknown to other teams. This can lead to a lack of understanding of the broader system.
Holistic view: A monorepo setup, when combined with an integrated CI system, offers everyone a clear view of how the entire system comes together. It promotes a culture of shared knowledge and collective responsibility.
Enhanced code reuse: By having better visibility into other teams' code, developers can readily identify existing components or libraries, reducing redundancy. This not only prevents "reinventing the wheel" but also promotes a culture of code sharing and reuse.

3. Easier refactoring:

Fluid code movement: One of the standout advantages of a monorepo is the ability to refactor and move code between projects effortlessly. Testing refactored code is also easier. With the codebase in one location, changes that span multiple projects can be made in a single commit.
Simplified dependency management: Refactoring often means updating dependencies. In a monorepo, the unified versioning and clear dependency trees make updates fairly straightforward, reducing version conflicts. In the context of CI, monorepos can bring huge benefits that can transform development workflows. Balancing standardization, visibility, and flexibility–a potent combination for modern software development.

3 Challenges for continuous integration with monorepos

While monorepos, combined with Continuous Integration (CI), offer many benefits, they are not without their challenges. It's vital for organizations to be aware of potential pitfalls and considerations.

1. Security considerations:

Overexposure concerns: With the entire codebase in one repository, there's a risk of overexposing sensitive parts if proper access controls aren't in place. It's crucial to have granular permissions with the principle of least privilege to prevent unauthorized access.
Supply Chain Attacks: Centralized code can be attractive for malicious actors, potentially elevating the risk of supply chain attacks. If an attacker compromises a single component, the effects could be more widespread.

2. Dependency management:

Central dependency linking: Monorepos typically maintain centrally linked dependencies. While this can streamline versioning, it also means that breaking changes can have cascading effects throughout the codebase.
Breaking changes: Introducing changes to shared libraries can inadvertently impact multiple teams and projects. Understanding and managing these impacts becomes a delicate task. This challenge is explored in the blog The Problem with Shared Libraries and Monorepos.

3. Managing project size:

Repo size: As the codebase grows, the sheer size of the monorepo can become a challenge—which is also why they’re often referred to as monoliths. This can lead to longer clone times, increased storage needs, and slower searches.
Build queues: With many teams pushing to a single repository, CI pipelines can become congested if not properly designed and managed. Long build queues can delay feedback for developers, slowing down the development velocity and potentially becoming blockers. Monorepos, like any architectural decision, come with trade-offs. Understanding these nuances and implementing strategies and tools to address them is key to leveraging the benefits while minimizing the challenges.

Planning continuous integration for a monorepo

If you're adopting, transitioning to, or maintaining a monorepo, thinking beyond traditional CI strategies is critical. The early planning stage is essential to ensuring the CI process aligns with the dynamics of a monorepo. Let's look at how to break down and approach this planning, starting with designing a build strategy.

Design a build strategy

A robust build strategy is critical in a monorepo environment. Unlike multi-repo setups, which see each repository with its own build process, a monorepo requires a more considered approach.

Selective building and testing: A core component of CI in a monorepo is the ability to selectively build and test only the parts of the codebase affected by the latest changes. This approach, known as selective or differential building and testing, minimizes unnecessary resource usage and speeds up the CI process.
Code segmentation: Organize the codebase into modules or components that can be built independently. This modularization can further streamline the build process by enabling teams to focus on their specific areas of responsibility without the overhead of building the entire codebase, or running into daily merge conflicts.
Build caching: Effective caching can significantly speed up the CI process. Minimize redundant cloning and building efforts, as well as reduce unnecessary network bandwidth, by caching a recent up-to-date copy of the repository on the build machine and/or caching previously built artifacts and reusing them when the source code hasn’t changed.
Parallelization: To maximize efficiency, break the CI pipeline down into parallelizable tasks wherever possible. Running build and test processes in parallel can considerably shorten the feedback loop for developers.
Efficient resource management: Monorepo CI processes can demand a lot of compute resources. Utilizing dynamic scaling and efficient resource allocation ensures that CI infrastructure is responsive to demand, without being wastefully over-provisioned.
Toolchain and infrastructure: Choose CI toolchains and infrastructure that scale with your monorepo. To support the complexity and size of your codebase, the tooling should be extremely flexible. Evaluate whether your current tools adequately handle the load, or whether you should consider alternatives that are better suited for monorepo management. A build strategy for a monorepo should focus on efficiency and scalability. The goal is to minimize build times and resource usage, while ensuring every commit is integrated as swiftly and safely as possible, without compromising the codebase's integrity.

Create a dependency management plan

Effective dependency management is critical to planning continuous integration (CI) in a monorepo. Given the interconnected nature of a monorepo, introducing changes—particularly breaking changes—to shared libraries or components requires careful planning. Here's how to approach this:

Understand the impacts of breaking changes: Recognize that in a monorepo, a change in one part of the code can have unforeseen ripple effects, affecting multiple teams and projects. There is no one-size-fits-all strategy for introducing these changes. Each decision should be weighed up for its potential impact across the codebase.
Avoid symbolic linking: While sym-linking (symbolic linking) might seem a straightforward solution to managing dependencies, it has significant downsides. It can lead to complications in version control, build tooling, and consistency across development environments.
Implement a deprecation process: When introducing breaking changes, a structured deprecation process can mitigate disruptions. This might involve maintaining old and new versions of the library in parallel for a transitional period, and providing clear migration paths, and sufficient notice to consumers within the monorepo.
Design libraries thoughtfully: When developing shared libraries or components, design them to minimize the likelihood of future breaking changes. This can include introducing new classes or methods rather than altering existing ones and designing APIs that support multiple interface definitions.
Version control of shared libraries: Consider versioning shared libraries separately within the monorepo. This allows for more controlled updates and clearer tracking of changes, even though all code resides in the same repository.
Communication and collaboration: Effective dependency management in a monorepo environment requires constant communication and collaboration among teams and clear documentation. Ensure a clear process for announcing and discussing changes to shared components.

Create a build scalability plan

As a monorepo grows in size and complexity, creating a plan to ensure the build system remains scalable is essential to maintaining developer productivity. Here are some strategies to consider:

Distributed builds: Distributed build systems mean jobs can run across multiple agents, runners, or machines. This approach reduces build times, and handles increased load as the monorepo and list of contributors grow.
Load balancing: Incorporate load balancing mechanisms to distribute build tasks evenly. This prevents any single server or node from becoming a bottleneck, ensuring a more efficient build process.
Resource optimization: Actively monitor and optimize resource usage. Analyze build patterns to identify peak times, then predict and even pre-allocate resources accordingly. Consider implementing auto-scaling solutions that dynamically adjust resources based on current demand.
Cache effectively: Cache commonly reused build artifacts to avoid redundant builds. Ensure the cache invalidation strategy is optimized to balance build correctness and speed.
Incremental builds: Enhance your build system to support incremental builds, where only the parts of the codebase that have changed are rebuilt. This requires a robust dependency tracking system within the monorepo to identify which components are affected by a given change.
Prioritize critical builds: Implement a system that prioritizes critical builds or tests. This is particularly important when dealing with a high volume of commits and concurrent build processes.
Fault tolerance and recovery: Design a build system that is fault-tolerant. In case of failures, the system should recover quickly, resuming builds from the last known good state rather than starting over.
Continuous monitoring and feedback: Establish continuous monitoring of the build process. Gather metrics and feedback to identify bottlenecks or inefficiencies, and use this data to continually refine and improve the build process. Creating a build scalability plan for a monorepo involves anticipating growth and scaling challenges and addressing them proactively, with effective strategies and tools. This plan is critical to ensuring that CI processes remain efficient and effective.

Templatize your build

In a monorepo environment, standardizing and streamlining the build process across various teams and projects is crucial. One way to achieve this is by templating CI pipelines. Pipeline templates are reusable, standardized structures for pipelines. They promote consistency across the entire development process.

Using pipeline templates: Templates allow teams to quickly set up new projects with pre-defined sets of build steps and configurations. This reduces the time and effort required to configure builds from scratch, and ensures that all projects adhere to organizational standards and best practices.
Integrating cross-build checks: Templates can specify standard dependencies and build tools to ensure that every project within the monorepo uses the same versions and configurations. This consistency simplifies dependency management and reduces the likelihood of conflicts or compatibility issues.
Including standard dependencies: Templates can specify standard dependencies, and build tools, to ensure that every project within the monorepo is using the same versions and configurations. This consistency simplifies dependency management and reduces the likelihood of conflicts or compatibility issues.
Creating and managing templates: When creating pipeline templates, involve key stakeholders from different teams to ensure that the templates meet the diverse needs of the organization. Regularly review and update templates to align with evolving best practices, changing needs, and tool updates.
Documentation and training: Provide clear documentation and training on how to use templates, along with other internal processes or standards. This ensures teams can effectively adopt them into their processes, contributing to a smoother onboarding process for new developers.
Support in CI Systems: Many modern CI systems support the use of templates. Buildkite’s pipeline templates facilitate the creation and editing of flexible, reusable pipeline definitions that can be tailored to suit different needs, while maintaining a core set of standards. Buildkite’s dynamic pipelines and plugins can also be used in pipeline templates to abstract unnecessary complexity and provide uniformity across projects.

Templatizing builds in a monorepo setup fosters consistency, efficiency, and quality. It allows teams to focus on the unique aspects of their projects without worrying about the foundational build setup. Monitor performance of continuous integration jobs across monorepo

Monitor your builds

Monitoring is crucial to the effective management of CI across a monorepo. Given the scale and complexity of monorepos, keeping a close eye on key metrics can assist in maintaining efficiency, identifying bottlenecks, and continuously improving CI performance. Here are some key metrics to monitor:

Build times: Monitoring build times is essential for identifying performance issues. Long or increasing build times can be a symptom of several issues, such as inefficient build scripts, suboptimal resource allocation, or overly complex code dependencies. Tracking build times helps pinpoint areas for optimization and will prevent CI performance from creeping out of control.
Build failures: Keep track of build failures to understand their root causes. Analyzing failure patterns can reveal underlying problems in the codebase, such as flaky tests, unstable dependencies, or issues with the build environment. Quick identification and resolution of these failures are vital to maintaining a high velocity in software development.
Number of pending jobs: Monitoring the queue of pending build jobs is important, especially in large teams. A growing number of pending jobs can indicate a CI system that’s a bottleneck needing scaling or optimization. It could also signal that commit and merge practices may need to evolve to smooth out the workload.
Resource utilization: Keep an eye on resources used by CI jobs, such as CPU and memory usage, to ensure CI infrastructure is adequately provisioned and can handle the load. It can also identify areas for improvement, like jobs that consume more resources than necessary.
Feedback loop duration: The time between code commit and build feedback is critical. Prolonged feedback loops slow down development and reduce developer productivity. Monitoring this metric helps ensure developers receive timely feedback on their changes.
Automated alerts and dashboards: Implement automated alerts that notify the team about critical issues like prolonged build times or increased failure rates. Use dashboards to provide a real-time overview of the CI process, making tracking and analyzing performance metrics easier.

Regularly monitoring and analyzing metrics is essential for maintaining efficient and effective CI processes. They enable teams to make data-driven decisions to prioritize efforts toward CI optimization.

Creating CI for monorepos with Buildkite

Buildkite has emerged as a preferred choice for continuous integration (CI) among many of the world's leading software companies. Buildkite's architecture design is flexible to adapt and scale as your setup evolves.

The ability to have unlimited concurrency and parallelization is ideal for handling the extensive demands of a monorepo. To run build agents in your own environment at scale, you can use our auto-scaling AWS Elastic CI Stack, our Agent Stack for Kubernetes, or bring your own solution!

Buildkite gives you complete control over your build environment. Decide what machine or container images are used, what dependencies are pre-installed on those images, what tools or scripts are made available to jobs, and where and how they run. Pre-seed your monorepo Git repository, dependencies, and tools onto images to improve performance and reduce bandwidth utilization.

Dynamic pipelines allow you to generate builds and steps dynamically at runtime, when a build is running—rather than having to be defined before a build starts. It allows you to dynamically generate the necessary build jobs based on file changes in the current commit—our monorepo plugin makes this even easier! Create your own plugins for full customization and control over builds and steps, abstracting any complexity from your users and giving them only the minimum necessary configuration to build on your platform.

Buildkite’s branch filtering & limiting controls which branches will trigger a pipeline, and agent hooks offer more granular control over which events and metadata will trigger–or which steps will generate–based on logic you decide, evaluated at runtime!

Combining these and many other features makes Buildkite the best choice for working with monorepos and other complex development workflows. Experience firsthand how Buildkite can transform your monorepo's CI process. Get started today and join the ranks of the world's best software companies in optimizing your CI practices.

Scale-Out Delivery Platform→

Capabilities

Pipelines→

Test Engine→

Package Registries→

Mobile Delivery Cloud→

Bring your own compute

Hosted compute

Replace Jenkins

Workflows for AI/ML

Testing at scale

Monorepo mojo

Bazel orchestration

Webinars

Blog

Case studies

Events

About

Careers

Follow Buildkite

Monorepo vs. multi-repos for continuous integration

Monorepos: A unified approach

Multi-repos: Decentralized and isolated

Choosing the right approach

Examples of continuous integration for monorepos

Google's monolithic codebase

Facebook's single repository

Uber's shift to a monorepo

3 Benefits of continuous integration for monorepos

1. Standardization:

2. Greater visibility:

3. Easier refactoring:

3 Challenges for continuous integration with monorepos

1. Security considerations:

2. Dependency management:

3. Managing project size:

Planning continuous integration for a monorepo

Design a build strategy

Create a dependency management plan

Create a build scalability plan

Templatize your build

Monitor your builds

Creating CI for monorepos with Buildkite

Related posts

Fully dynamic pipelines with Bazel and Buildkite

A guide to Bazel query

Understanding the SLSA framework

Start turning complexity into an advantage

Platform

Hosting options

Resources

Company

Solutions

Legal

Support