NewBuildkite hosted agents. Check out the Q1 Release for the latest features, including managed CI/CD offerings for Mac and Linux.

Monorepo vs. polyrepo: How to choose

Just like the old Vim vs. Emacs, and tabs vs. spaces debates, with monorepo vs. polyrepo, disagreements abound and battle lines are drawn. I’m slightly fatigued by seeing people climb the various hills they’re prepared to die on. With strong opinions often loosely held, choosing the right tools for the right job seems a better approach. It’s true that monorepos are regaining ground and growing in popularity, but I’ll still fly a little flag for polyrepos.

If you’re asking “monorepo or polyrepos?” the answer—as with most questions relating to software—is “it all depends.” Your organizational and team structure, the nature of the products you’re building, and the technical challenges you’re facing should all guide your decision. If you’ve googled this topic, the lists of pros and cons are overwhelming. It’s a huge investment to re-architect a system so it’s important to carefully weigh up your options before you finally decide.

In this blog, we’ll compare the choices and help you decide what’s right for you.

Monorepos vs. polyrepos

A monorepo is a source code repository that contains multiple projects, along with all the libraries and dependencies the projects use, in one place. This means the majority of developers in an organization are likely working in this codebase, no matter which team or slice of the product they’re working on. There’s a centralized build environment, and often logic that will decide which part of the repository to build based on code that has changed.

Polyrepos (often referred to as microservices) are smaller repositories or components that often have a singular (or siloed) responsibility, which will work with other repositories across a larger polyrepo ecosystem. Each repository has its own build environment, CI pipeline, and deployment process.

In most cases, new products originate as a single repository that evolves over time, along with your understanding and growing customer base—and I’m sorry if that’s not the case for you. As you add more engineering capacity to your organization, the complexity of your team structure and how they communicate, collaborate, and ship code grows. The codebase grows, you have more tests, and your build pipelines get slower. You might now be considering splitting up your monorepo into polyrepos––for a moment, set aside which hill you’re about to stake a flag in and meditate on these things:

  • 📈 Scale: Have your engineering team, the codebase, and the customers depending on it really reached the kind of scale a monorepo can no longer handle? Realistically, not too many companies ever struggle with scale the likes of Uber, Google, or Facebook.
  • ⚒️ Tooling: Boring tech is good tech. Reaching for complex systems and tooling to solve organizational-shaped problems is an anti-pattern, and RDD (resumé-driven development) is not a good reason to re-architect a system that can be optimized in less intensive and expensive ways.
  • 😣 Pain: You need to be feeling a huge amount of pain, in a way that swallowing some additional compute painkillers can’t solve. There are reasons to lean in one direction or the other, but you need to understand the current limitations of your system well enough so you’re not prematurely optimizing for a thing that might never happen.

Premature scaling results in architectural decisions that have long-lived consequences, and that are not simple to reverse. So explore all options before biting off more than the business has an appetite for.

Monorepos: pros and cons

Many software companies historically began as a monorepo, for good reason. Their product domain was being uncovered, direction changing rapidly, and new features and functionality needed to get into market swiftly. The engineering team was often small and nimble. A monorepo makes sense in these conditions.

There are significant downsides to face, let’s take a look at some of the pros, and the cons with monorepos.

Pros of monorepos

With a single codebase containing multiple projects with well-defined boundaries and relationships, there’s very little overhead to create new projects. There’s no dependency or library duplication, no dependency conflicts, and CI/CD is already set up so everything should “just work.”

Developers can confidently contribute to other teams’ projects because there’s a consistent approach to building and testing features. Because monorepos offer full visibility into any changes made in the codebase, git blame can be used for some serious Git archeology, providing context into the evolution of the code and insights into why certain decisions were made. Since all the code lives in a single location, it's simpler to understand and navigate, making debugging less difficult.

A single repository with a centralized build environment makes it easier to enforce and monitor security and compliance through automated credential scanning, binary signing, and supply chain management. The shared stewardship can encourage collaboration from across the engineering organization in different ways, breaking down silos.

Cons of monorepos

Monorepos can most certainly scale to handle almost anything—though they require ongoing effort to manage. The good news is that if you’re successful enough to feel the pain, you’ve likely got the engineering capacity to take on the challenges!

CI is where the struggle is real. Cloning a Git monorepo can be glacially slow, and build times can be frustratingly upwards of an hour (or even multiple hours and sometimes overnight 😱). There are strategies to reduce monorepo build times, but this will likely require a dedicated team to keep the platform humming, ensuring developers continue to be productive.

The engineering team should share a vision for the codebase, with:

  • Automation that enforces well-formatted consistent code, and measures cyclomatic complexity.
  • A commitment to documentation, code reviews, and mentoring to encourage code conventions.
  • Strict branching protocols (such as trunk-based development), and a focus on hygiene with appropriate alerts and notifications to encourage speedy resolutions for issues on main.
  • A considered approach to dependency management—which can be difficult to maintain across multiple teams.
  • Easy to manage security and governance processes and policies that minimize insider threats to the source code.

Good vibes, and a collaborative and psychologically-safe engineering culture make success a lot easier to attain. However, human problems can make working in a monorepo challenging. How you form teams and manage work in a monorepo across an entire engineering organization can be far more difficult than any of the technical challenges you’re likely to face.

Polyrepos: pros and cons

If you’re facing scaling challenges and technical limitations or pre-empting organizational growth, you may consider adopting a polyrepo or microservice architecture. While some of these things are certainly made easier with smaller, focused repositories, premature-optimization and increased complexity pose some large challenges.

Pros of polyrepos

Polyrepos (or microservices) are small code repositories that usually have a single responsibility or contain one component of the application or user flow. For example, a customer payments service, a notification service, or an application’s search functionality. Their functionality generally interfaces with other services within a larger polyrepo ecosystem.

A polyrepo is commonly owned by a single team and can represent an application or project boundary. Because codebases are smaller and focused, they can mean simpler dependency management, faster builds, and greater autonomy for the team.

Isolated repositories make granular identity management and access possible, restricting access to a smaller subset of users, unlike a monorepo’s contributors—who are generally the entire engineering team.

Teams collaborate across projects and polyrepos using shared libraries and well-defined APIs, leading to looser coupling and greater extensibility.

For applications that rely on sometimes-flaky third-party integrations and systems, an event-driven microservice architecture can be useful for fault tolerance and reproducibility.

Cons of polyrepos

Without considerable investment in creating a platform focused on developer experience, deeply understanding the ecosystem can be difficult. Debugging and tracing issues through code that is spread across numerous repositories can be incredibly time-consuming and confusing.

For developers, writing tests in polyrepos can also be difficult and require lots of mocking and complex contract testing. User acceptance testing (UAT) features in staging environments can be done by spinning up the polyrepos in Docker, but this adds complexity to the testing process and maintenance overhead for those responsible for developer tooling.

Significant code duplication is common in polyrepos, and this increases toil for everyone—more code means more maintenance, increased security risks, and cognitive overhead. Spare a thought for the developer who needs to update a library with a critical security vulnerability in 12 different services, with a dozen separate build pipelines.

A downside of the increased autonomy for teams is that it can result in:

  • 😳 Services written in various versions of Scala, Elixir, Clojure, Go, and more obscure languages—each with its own tooling, testing methods, and syntax!
  • 😫 Inconsistent tooling creates considerable mental overhead, and engineers spend time familiarizing themselves with different toolsets every time they need to do a small thing.
  • 😞 Less crosswise movement for developers between teams and projects.
  • 😑 Services that stagnate due to lack of skills or the willingness to maintain those services in the future.

For teams managing CI/CD and infrastructure across an organization, polyrepos add considerable overhead. Along with the ever-increasing security compliance and reporting requirements, strict governance is essential, something that can be more difficult to manage across a number of disparate repositories. Certainly not impossible with the use of third-party services for package and dependency management, vulnerability scanning, and infrastructure as code—but that means adding and managing additional vendors.

Choosing between a monorepo vs. polyrepo

The good news is there are plenty of pros on both sides, and all of the cons can be solved. For monorepos: selective builds, smart caching, and sparse checkouts can speed up CI/CD. Polyrepos can be great if you’re dedicated to building a developer platform that provides observability across the entire ecosystem. You can ensure consistency by creating templates and using infrastructure as code to let developers lean on automation to create new repositories and build pipelines.

Everything is possible, and there are tradeoffs on both sides. The key is to be honest about what problems and pains you need to solve and what’s truly best for your team and the business. Remember, it’s more difficult to solve people problems than technical problems!

When to use a monorepo

In 99% of cases, a well-architected monorepo can meet your scaling needs as long as you have the appetite—and people—to maintain a centralized CI/CD system with smart caching and dynamic pipeline logic. This enables fast, stable builds, so your developers can focus on delivering new features without the overhead of also maintaining their build and deployments.

Monorepos also allow engineers to influence design decisions and collaborate across projects. A single code repository lends itself to stronger conventions, and is simpler to maintain and keep up to date.

If you’re worried about whether it can scale to a level you need, Google stores billions of lines of code in a single repository used by 95% of its developers. They use trunk-based development to manage builds, with release branches to ship changes. And if you’re considering whether to abandon your monorepo in favor of microservices, Uber migrated from polyrepos to a monorepo to simplify their dependency management and streamline their CI process––so the move may not be necessary.

When to use a polyrepo

Polyrepos are smaller and easier to build and deploy. Be prepared to invest in developer tooling to assist engineers working across the ecosystem. Perhaps you need modular, fault-tolerant services with circuit breakers for unreliably third-party integrations that can be toggled on and off during outages—so other parts of the system can be used. Or maybe you have increased security requirements that require you to limit source code access to fewer people—in these cases polyrepos will likely suit your needs.

Typically, people reach for microservices as they’re preparing for hyper-growth in headcount. They can provide teams the freedom and flexibility to choose their own adventures, or your DevEx team can create a developer platform with tooling and processes to enforce consistency across the organization. If you understand your domain and product well enough, and decide that maintaining a monorepo will incur too much overhead, you may choose to break things up into logical services to solve your specific problems.

Monorepo or polyrepo: Scale your builds no matter your approach

Both architectural approaches can be successful, as long as you're honest about why you’re considering each and are realistic about the tradeoffs.

Microservices get a bad rap. They’re not the answer in many cases, but sometimes they are, and developers are drawn to their potential and complexity like a moth to a flame. Monorepos are conceptually simpler and if architected well with optimized delivery pipelines, free up teams to do what they love to do—write code and deliver features with value to their users.

Whatever you choose, you’ll need a CI system that is flexible enough to help you manage your builds at scale and adapt to your needs. With Buildkite Pipelines, you can dynamically generate your CI/CD pipelines at runtime to build only what you decide, infinitely scale your build agents, and lean on tools to deliver software effectively and efficiently.

If you’d like to learn more about scaling CI/CD for monorepos, check out the following resources:

Buildkite Pipelines is a CI/CD tool designed for developer happiness. Easily follow and decipher logs, get observability into key build metrics, and tune for enterprise-grade speed, scale, and security. Every new signup gets a free 30-day trial to test out the key features. See Buildkite Pipelines  to learn more.