There are two main strategies for hosting and managing code through Git: monorepo vs multi-repo. Both approaches have their pros and cons.
We can use either approach for any codebase in any language. You can use any of these strategies for projects containing a handful of libraries to thousands of them. Even if it involves a few team members or hundreds, or you want to host private or open-source code, you can still go with monorepo or multi-repo based on various factors.
What are the benefits and drawbacks of each approach? When should we use one or the other? Let’s find out!
What Are Repos?
A repo (short for repository) is a storage for all the changes and files from a project, enabling developers to “version control” the project’s assets throughout its development stage.
What Is a Monorepo?
The monorepo approach uses a single repository to host all the code for the multiple libraries or services composing a company’s projects. At its most extreme, the whole codebase from a company — spanning various projects and coded in different languages — is hosted in a single repository.
Benefits of Monorepo
Hosting the whole codebase on a single repository provides the following benefits.
Lowers Barriers of Entry
When new staff members start working for a company, they need to download the code and install the required tools to begin working on their tasks. Suppose the project is scattered across many repositories, each having its installation instructions and tooling required. In that case, the initial setup will be complex, and more often than not, the documentation will not be complete, requiring these new team members to reach out to colleagues for help.
A monorepo simplifies matters. Since there is a single location containing all code and documentation, you can streamline the initial setup.
Centrally Located Code Management
Having a single repository gives visibility of all the code to all developers. It simplifies code management since we can use a single issue tracker to watch all issues throughout the application’s life cycle.
For instance, these characteristics are valuable when an issue spans two (or more) child libraries with the bug existing on the dependent library. With multiple repositories, it may be challenging to find the piece of code where the problem happens.
On top of this, we would need to figure out which repository to use to create the issue and then invite and cross-tag members of other teams to help resolve the problem.
With a monorepo, though, both locating code problems and collaborating to troubleshoot become simpler to achieve.
Painless Application-Wide Refactorings
When creating an application-wide refactoring of the code, multiple libraries will be affected. If you’re hosting them via multiple repositories, managing all the different pull requests to keep them synchronized with each other can prove to be a challenge.
A monorepo makes it easy to perform all modifications to all code for all libraries and submit it under a single pull request.
More Difficult To Break Adjacent Functionality
With the monorepo, we can set up all tests for all libraries to run whenever any single library is modified. As a result, the likelihood of doing a change in some libraries has minimized adverse effects on other libraries.
Teams Share Development Culture
Even though not impossible, with a monorepo approach, it becomes challenging to inspire unique subcultures among different teams. Since they’ll share the same repository, they will most likely share the same programming and management methodologies and use the same development tools.
Issues With the Monorepo Approach
Using a single repository for all our code has several drawbacks.
Slower Development Cycles
When the code for a library contains breaking changes, which make the tests for dependent libraries fail, the code must also be fixed before merging the changes.
If these libraries depend on other teams, who are busy working on some other task and are not able (or willing) to adapt their code to avoid the breaking changes and have the tests pass, the development of the new feature may stall.
What’s more, the project may well start advancing only at the speed of the slowest team in the company. This outcome could frustrate the members of the fastest teams, creating conditions for them to want to leave the company.
In addition, a library will need to run the tests for all other libraries too. The more tests to run, the more time it takes to run them, slowing down how fast we can iterate on our code.
Requires Download of Entire Codebase
When the monorepo contains all the code for a company, it can be huge, containing gigabytes of data. To contribute to any library hosted within, anybody would require a download of the whole repository.
Dealing with a vast codebase implies a poor use of space on our hard drives and slower interactions with it. For instance, everyday actions such as executing
git status or searching in the codebase with a regex may take many seconds or even minutes longer than they would with multiple repos.
Unmodified Libraries May Be Newly Versioned
When we tag the monorepo, all code within is assigned the new tag. If this action triggers a new release, then all libraries hosted in the repository will be newly released with the version number from the tag, even though many of those libraries may not have had any change.
Forking Is More Difficult
Open source projects must make it as easy as possible for contributors to become involved. With multiple repositories, contributors can head directly to the specific repository for the project they want to contribute to. With a monorepo hosting various projects, though, contributors must first navigate their way into the right project and will need to understand how their contribution may affect all other projects.
What Is Multi-Repo?
The multi-repo approach uses several repositories to host the multiple libraries or services of a project developed by a company. At its most extreme, it’ll host every minimum set of reusable code or standalone functionality (such as a microservice) under its repository.
Benefits of Multi-Repo
Hosting every library independently of all others provides a plethora of benefits.
Independent Library Versioning
When tagging a repository, its whole codebase is assigned the “new” tag. Since only the code for a specific library is on the repository, the library can be tagged and versioned independently of all other libraries hosted elsewhere.
Having an independent version for every library helps define the dependency tree for the application, allowing us to configure what version of each library to use.
Independent Service Releases
Since the repository only contains the code for some service and nothing else, it can have its own deployment cycle, independently of any progress made on the applications accessing it.
The service can use a fast release cycle such as continuous delivery (where new code is deployed after it passes all the tests). Some libraries accessing the service may use a slower release cycle, such as those that only produce a new release once a week.
Helps Define Access Control Across the Organization
Only the team members involved with developing a library need to be added to the corresponding repository and download its code. As a result, there’s an implicit access control strategy for each layer in the application. Those involved with the library will be granted editing rights, and everyone else may get no access to the repository. Or they may be given reading but not editing rights.
Allows Teams To Work Autonomously
Team members can design the library’s architecture and implement its code working in isolation from all other teams. They can make decisions based on what the library does in the general context without being affected by the specific requirements from some external team or application.
Issues With the Multi-Repo Approach
Using multiple repositories can give rise to several issues.
Libraries Must Constantly Be Resynced
When a new version of a library containing breaking changes is released, libraries depending on this library will need to be adapted to start using the latest version. If the release cycle of the library is faster than that of its dependent libraries, they could quickly become out of sync with each other.
Teams will need to constantly catch up to use the latest releases from other teams. Given that different teams have different priorities, this may sometimes prove arduous to achieve.
Consequently, a team not able to catch up may end up sticking to the outdated version of the depended-upon library. This outcome will have implications on the application (in terms of security, speed, and other considerations), and the gap in development across libraries may only get wider.
May Fragment Teams
When different teams don’t need to interact, they may work in their own silos. In the long term, this could result in teams producing their subcultures within the company, such as employing different methodologies of programming or management or utilizing different sets of development tools.
If some team member eventually needs to work in a different team, they may suffer a bit of culture shock and learn a new way of doing their job.
Monorepo vs Multi-Repo: Primary Differences
Both approaches ultimately deal with the same objective: managing the codebase. Hence, they must both solve the same challenges, including release management, fostering collaboration among team members, handling issues, running tests, and others.
Their main difference concerns their timing on team members to make decisions: either upfront for monorepo or down the line for multi-repo.
Let’s analyze this idea in more detail.
Because all libraries are versioned independently in the multi-repo, a team releasing a library with breaking changes can do it safely by assigning a new major version number to the latest release. Other groups can have their dependent libraries stick to the old version and switch to the new one once their code has been adapted.
This approach leaves the decision of when to adapt all other libraries to each responsible team, who can do it at any time. If they do it too late and new library versions are released, closing the gap across libraries will become increasingly difficult.
Consequently, while one team can iterate fast and often on their code, other teams may prove unable to catch up, ultimately producing libraries that diverge.
On the other hand, in a monorepo environment, we cannot release a new version of one library that breaks some other library since their tests will fail. In this case, the first team must communicate with the second team to incorporate the changes.
This approach forces teams to adapt all libraries altogether whenever a change for a single library must happen. All teams are forced to talk to each other and reach a solution together.
As a result, the first team will not be able to iterate as fast as they wish to, but the code across different libraries will at no point start diverging.
In summary, the multi-repo approach can help create a culture of “move fast and break things” among teams, where nimble independent teams can produce their output at their speed. Instead, the monorepo approach favors a culture of awareness and care, where teams should not be left behind to deal with a problem all by themselves.
Hybrid Poly-As-Mono Approach
If we can’t decide if to use either the multi-repo or monorepo approaches, there is also the in-between approach: to use multiple repositories and employ some tool to keep them synchronized, making it resemble a monorepo but with more flexibility.
A meta-repository contains the information on which repositories make up a project. Cloning this repository via meta will then recursively clone all the required repositories, making it easier for new team members to start working on their projects immediately.
To clone a meta-repository and all its defined multiple repos, we must execute the following:
meta git clone [meta repo url]
Meta will execute a
git clone for each repository and place it in a subfolder:
From then on, executing the
meta exec command will execute the command on each subfolder. For instance, executing
git checkout master on each repository is done like this:
meta exec "git checkout master"
Hybrid Mono-As-Poly Approach
Another approach is managing the code via a monorepo for development, but copying each library’s code into its independent repository for deployment.
This strategy is prevalent within the PHP ecosystem because Packagist (the main Composer repository) requires a public repository URL to publish a package, and it’s not possible to indicate that the package is located within a subdirectory of the repository.
Given the Packagist limitation, PHP projects can still use a monorepo for development, but they must use the multi-repo approach for deployment.
To achieve this conversion, we can execute a script with
git subtree split Or use one of the available tools which perform the same logic:
Who’s Using Monorepo vs Multi-Repo
Several big tech companies favor the monorepo approach, while others have decided to use the multi-repo method.
On the hybrid poly-as-mono side, Android updates multiple repositories, which are managed like a monorepo.
On the hybrid mono-as-poly side, Symfony keeps the code for all of its components in a monorepo. They split it into independent repositories for deployment (such as
Examples of Monorepo and Multi-Repo
The WordPress account on GitHub hosts examples of both the monorepo and multi-repo approaches.
WordPress/gutenberg monorepo and managed through Lerna to help publish them in the npm repository.
Monorepo vs Multi-Repo: How to Choose?
As with many development problems, there is no predefined answer on which approach you should use. Different companies and projects will benefit from one strategy or the other based on their unique conditions, such as:
- How big is the codebase? Does it contain gigabytes of data?
- How many people will work on the codebase? Is it around 10, 100, or 1,000?
- How many packages will there be? Is it around 10, 100, or 1,000?
- How many packages does the team need to work on at a given time?
- How tightly coupled are the packages?
- Are different programming languages involved? Do they require a particular software installed or special hardware to run?
- How many deployment tools are required, and how complex are they to set up?
- What is the culture in the company? Are teams encouraged to collaborate?
- What tools and technologies do the teams know how to use?
There are two main strategies for hosting and managing code: monorepo vs multi-repo. The monorepo approach entails storing the code for different libraries or projects — and even all code from a company — in a single repository. And the multi-repo system divides the code into units, such as libraries or services, and keeps their code hosted in independent repositories.
Which approach to use depends on a multitude of conditions. Both strategies have several advantages and disadvantages, and we’ve just covered all of them in detail in this article.
Do you have any questions left about monorepos or multi-repos? Let us know in the comments section!