By: Brie Bunge and Sharmila Jesupaul
At Airbnb, we’ve just lately adopted Bazel — Google’s open supply construct software–as our common construct system throughout backend, internet, and iOS platforms. This submit will cowl our expertise adopting Bazel for Airbnb’s large-scale (over 11 million strains of code) internet monorepo. We’ll share how we ready the code base, the ideas that guided the migration, and the method of migrating chosen CI jobs. Our purpose is to share info that may have been beneficial to us after we launched into this journey and to contribute to the rising dialogue round Bazel for internet improvement.
Traditionally, we wrote bespoke construct scripts and caching logic for varied steady integration (CI) jobs that proved difficult to take care of and persistently reached scaling limits because the repo grew. For instance, our linter, ESLint, and TypeScript’s sort checking didn’t help multi-threaded concurrency out-of-the-box. We prolonged our unit testing software, Jest, to be the runner for these instruments as a result of it had an API to leverage a number of employees.
It was not sustainable to repeatedly create workarounds to beat the inefficiencies of our tooling which didn’t help concurrency and we had been incurring a long-run upkeep price. To sort out these challenges and to finest help our rising codebase, we discovered that Bazel’s sophistication, parallelism, caching, and efficiency fulfilled our wants.
Moreover, Bazel is language agnostic. This facilitated consolidation onto a single, common construct system throughout Airbnb and allowed us to share frequent infrastructure and experience. Now, an engineer who works on our backend monorepo can change to the net monorepo and know find out how to construct and take a look at issues.
After we started the migration in 2021, there was no publicized business precedent for integrating Bazel with internet at scale outdoors of Google. Open supply tooling didn’t work out-of-the-box, and leveraging distant construct execution (RBE) launched further challenges. Our internet codebase is giant and comprises many free information, which led to efficiency points when transmitting them to the distant atmosphere. Moreover, we established migration ideas that included bettering or sustaining general efficiency and decreasing the impression on builders contributing to the monorepo in the course of the transition. We successfully achieved each of those objectives. Learn on for extra particulars.
We did some work up entrance to make the repository Bazel-ready–specifically, cycle breaking and automatic BUILD.bazel file technology.
Cycle Breaking
Our monorepo is laid out with tasks beneath a top-level frontend/ listing. To start out, we needed so as to add BUILD.bazel information to every of the ~1000 top-level frontend directories. Nonetheless, doing so created cycles within the dependency graph. This isn’t allowed in Bazel as a result of there must be a DAG of construct targets. Breaking these usually felt like battling a hydra, as eradicating one cycle spawns extra as a substitute. To speed up the method, we modeled the issue as discovering the minimal suggestions arc set (MFAS)¹ to determine the minimal set of edges to take away leaving a DAG. This set introduced the least disruption, stage of effort, and surfaced pathological edges.
Automated BUILD.bazel Technology
We routinely generate BUILD.bazel information for the next causes:
- Most contents are knowable from statically analyzable import / require statements.
- Automation allowed us to shortly iterate on BUILD.bazel modifications as we refined our rule definitions.
- It will take time for the migration to finish and we didn’t wish to ask customers to maintain these information up-to-date after they weren’t but gaining worth from them.
- Manually retaining these information up-to-date would represent a further Bazel tax, regressing the developer expertise.
We’ve a CLI software referred to as sync-configs that generates dependency-based configurations within the monorepo (e.g., tsconfig.json, challenge configuration, now BUILD.bazel). It makes use of jest-haste-map and watchman with a customized model of the dependencyExtractor to find out the file-level dependency graph and a part of Gazelle to emit BUILD.bazel information. This CLI software is just like Gazelle but additionally generates further internet particular configuration information akin to tsconfig.json information utilized in TypeScript compilation.
With preparation work full, we proceeded emigrate CI jobs to Bazel. This was a large endeavor, so we divided the work into incremental milestones. We audited our CI jobs and selected emigrate those that may profit essentially the most: sort checking, linting, and unit testing². To cut back the burden on our builders, we assigned the central Internet Platform group the accountability for porting CI jobs to Bazel. We proceeded one job at a time to ship incremental worth to builders sooner, acquire confidence in our strategy, focus our efforts, and construct momentum. With every job, we ensured that the developer expertise was high-quality, that efficiency improved, CI failures had been reproducible domestically, and that the tooling Bazel changed was totally deprecated and eliminated.
We began with the TypeScript (TS) CI job. We first tried the open supply ts_project rule³. Nonetheless, it didn’t work effectively with RBE as a result of sheer variety of inputs, so we wrote a customized rule to cut back the quantity and dimension of the inputs.
The most important supply of inputs got here from node_modules. Previous to this, the information for every npm package deal had been being uploaded individually. Since Bazel works effectively with Java, we packaged up a full tar and a TS-specific tar (solely containing the *.ts and package deal.json) for every npm package deal alongside the strains of Java JAR information (primarily zips).
One other supply of inputs got here via transitive dependencies. Transitive node_modules and d.ts information within the sandbox had been being included as a result of technically they are often wanted for subsequent challenge compilations. For instance, suppose challenge foo will depend on bar, and kinds from bar are uncovered in foo’s emit. Because of this, challenge baz which will depend on foo would additionally want bar’s outputs within the sandbox. For lengthy chains of dependencies, this will bloat the inputs considerably with information that aren’t really wanted. TypeScript has a — listFiles flag that tells us which information are a part of the compilation. We will package deal up this restricted set of information together with the emitted d.ts information into an output tsc.tar.gz file⁴. With this, targets want solely embody direct dependencies, somewhat than all transitive dependencies⁵.
This practice rule unblocked switching to Bazel for TypeScript, because the job was now effectively beneath our CI runtime price range.
We migrated the ESLint job subsequent. Bazel works finest with actions which are unbiased and have a slim set of inputs. A few of our lint guidelines (e.g., particular inner guidelines, import/export, import/extensions) inspected information outdoors of the linted file. We restricted our lint guidelines to people who may function in isolation as a method of decreasing enter dimension and having solely to lint straight affected information. This meant transferring or deleting lint guidelines (e.g., those who had been made redundant with TypeScript). Because of this, we decreased CI instances by over 70%.
Our subsequent problem was enabling Jest. This introduced distinctive challenges, as we wanted to deliver alongside a a lot bigger set of first and third-party dependencies, and there have been extra Bazel-specific failures to repair.
Employee and Docker Cache
We tarred up dependencies to cut back enter dimension, however extraction was nonetheless sluggish. To handle this, we launched caching. One layer of cache is on the distant employee and one other is on the employee’s Docker container, baked into the picture at construct time. The Docker layer exists to keep away from dropping our cache when distant employees are auto-scaled. We run a cron job as soon as per week to replace the Docker picture with the most recent set of cached dependencies, placing a stability of retaining them recent whereas avoiding picture thrashing. For extra particulars, take a look at this Bazel Group Day discuss.
This added caching offered us with a ~25% velocity up of our Jest unit testing CI job general and decreased the time to extract our dependencies from 1–3 minutes to three–7 seconds per goal. This implementation required us to allow the NodeJS preserve-symlinks possibility and patch a few of our instruments that adopted symlinks to their actual paths. We prolonged this caching technique to our Babel transformation cache, one other supply of poor efficiency.
Implicit Dependencies
Subsequent, we wanted to repair Bazel-specific take a look at failures. Most of those had been because of lacking information. For any inputs not statically analyzable (e.g., referenced as a string with out an import, babel plugin string referenced in .babelrc), we added help for a Bazel maintain remark (e.g., // bazelKeep: path/to/file) which acts as if the file had been imported. The benefits of this strategy are:
1. It’s colocated with the code that makes use of the dependency,
2. BUILD.bazel information don’t should be manually edited so as to add/transfer # maintain feedback,
3. There is no such thing as a impact on runtime.
A small variety of exams had been unsuitable for Bazel as a result of they required a big view of the repository or a dynamic and implicit set of dependencies. We moved these exams out of our unit testing job to separate CI checks.
Stopping Backsliding
With over 20,000 take a look at information and tons of of individuals actively working in the identical repository, we wanted to pursue take a look at fixes such that they’d not be undone as product improvement progressed.
Our CI has three kinds of construct queues:
1. “Required”, which blocks modifications,
2. “Non-compulsory”, which is non-blocking,
3. “Hidden”, which is non-blocking and never proven on PRs.
As we mounted exams, we moved them from “hidden” to “required” by way of a rule attribute. To make sure a single supply of reality, exams run in “required” beneath Bazel weren’t run beneath the Jest setup being changed.
# frontend/app/script/__tests__/BUILD.bazel
jest_test(
title = "jest_test",
is_required = True, # makes this goal a required test on pull requests
deps = [
":source_library",
],
)
Instance jest_test rule. This signifies that this goal will run on the “required” construct queue.
We wrote a script evaluating earlier than and after Bazel to find out migration-readiness, utilizing the metrics of take a look at runtime, code protection stats, and failure price. Luckily, the majority of exams might be enabled with out further modifications, so we enabled these in batches. We divided and conquered the remaining burndown checklist of failures with the central group, Internet Platform, fixing and updating exams in Bazel to keep away from placing this burden on our builders. After a grace interval, we totally disabled and deleted the non-Bazel Jest infrastructure and eliminated the is_required param.
Support authors and subscribe to content
This is premium stuff. Subscribe to read the entire article.