Generic Monorepo Challenges
Migrating all of your code into a monorepo leads to one obvious and glaring problem: the majority of the tooling released over the past decade has been designed with the assumption that one “git repository” == “one suite of tests” == “a discrete set of built artifacts”. Based on that assumption, in a polyrepo world, any time a change is pushed to a git repository, all builds and tests for that repository are run in continuous integration (CI).
However, in a monorepo, all of an organization’s services and libraries live within the same repository. Based on the above assumption, every commit to the monorepo would require testing every single test and building every single build.
Therein lies a core challenge of operating a monorepo: it no longer makes sense to fully test and build the entire repository on each commit.
Instead, we must rely on tooling like Bazel to organize all services and libraries into an explicit “build dependency graph” that we can leverage to build and test exactly what is required on each commit.
Even with tooling like Bazel to build a dependency graph of Rules (“building blocks”) within a repository, there are inevitably a significant number of Rules/libraries/services that all other libraries and services depend on within the monorepo. So simply cloning down the repository and running the tests or building the outputs of a single project could still require an hour to build.
Getting around this problem in order to bring build and test times to a reasonable level requires that builds be hermetic so that build outputs can be aggressively cached. In the context of a monorepo, “hermetic” means that a build can produce deterministic outputs regardless of which system or time of the day the build is run.
The goal of a deterministic, hermetic build output is that the output of each build can be cached and leveraged for all builds across the company they are produced in CI or a developer’s machine.
The reality is that multiple challenges stand in the way of a truly reproducible, hermetic build. For one, a hermetic build cannot use any part of the host system’s configuration because even just a small change to a system library or OS version across build hosts could result in a complete drop off in cache hit rate. Additionally, every required dependency must either be SHA pinned or have their source code included within the monorepo.
There’s no end to the number of small external influences that can play into the purity of a monorepo’s hermeticity. However, reaching “hermetic enough” to where the majority of builds produce the same outputs can dramatically improve the cache hit rate across build machines and significantly cut build times.
After building a story around caching, we adapted our existing tooling to fetch and update caches between builds to keep build times lower.
Even with a cache, the tooling required to leverage a Bazel monorepo on tooling that was designed with the “git repository” == “one collection of builds/tests” mindset can be frustrating especially as we attempt to present test results for many projects or libraries all within the space designed for single projects.