Using a Cache to Improve Bazel Build Times
Bazel is a tool that helps you automate the process of building and testing. For instance, with Bazel, you can automate the process of creating executables for monorepo build systems.
One notable feature of Bazel is the ability to use a cache. A cache speeds up the build process and reduces build times, especially for large projects with many dependencies. Moreover, a Bazel cache stores the build artifacts from previous builds, which means you don’t have to rebuild files that have already been built since they’re available in the cache.
A Bazel cache can either be local or remote (shared). A local cache is stored on the same computer where Bazel is running. It improves the performance of Bazel by avoiding the need to rebuild artifacts that haven’t changed since the last build on that machine.
A remote Bazel cache is stored on a separate machine, typically accessed over a network. This approach improves Bazel’s performance by allowing multiple computers to share the same build artifacts, which means you don’t have to rebuild them on each machine. This is important if multiple machines are used to build the same application but the cache is too large to fit on a single computer.
In this article, you’ll learn how to set up a local and a remote cache to improve Bazel build performance. You’ll also understand the pros and cons of caching information.
Why You Need Bazel Caching
As mentioned, a cache speeds up the build process and reduces build times. By storing build artifacts in a cache, Bazel ensures that the same artifacts are used each time a build is executed, even if the environment or dependencies change. This helps prevent build failures or inconsistencies due to changes in the build environment.
For example, if you change a small part of your project, Bazel uses the cached results from previous builds and quickly rebuilds only the sections of your project affected by the change.
Bazel caching is essential to using Bazel efficiently because it significantly improves the performance and reliability of Bazel builds. However, it’s always best to find a balance between cache builds since cache granularity lets you set the units of data stored in a cache.
In the next section, you’ll learn about the optimal cache granularity you need, which depends on the workload you’re running.
Bazel Cache Granularity
Cache granularity refers to the level of detail at which the cache stores build outputs. It determines how Bazel breaks down the build process into individual steps and how it stores the results of each step in the cache.
Bazel supports three levels of granularity for the cache:
- File granularity (fine-grained): At this level, the cache stores the outputs of individual source files. If a source file is modified, only the outputs of that file will be invalidated in the cache, and Bazel will rebuild it.
- Directory granularity (medium-grained): At the directory level, the cache stores the outputs of all files in a directory. If any file in a directory changes, the cached outputs of all files in that directory are invalidated, and all files will be rebuilt.
- Package granularity (coarser-grained): At this level, the cache stores the outputs of all files in a package. If any file in a package is modified, the outputs of all files in that package are invalidated in the cache, and consequently, all the files in the package will be rebuilt.
Bazel uses a fine-grained cache by default. This means it stores individual action results (such as compiling a single file) in the cache, avoiding repeating actions. You can choose to use the coarser-grained cache, which stores the results of larger groups of actions together. This can be useful when dealing with large builds that have many actions because it reduces the amount of data stored in the cache and improves cache hit rates.
To configure the granularity of the cache, add the --cache_granularity
flag in your .bazelrc
file. Valid values for this flag include fine
, medium
, and binary
, which correspond to the fine-grained
, medium-grained
, and coarser-grained
cache levels, respectively.
For example, to use a medium-grained
cache, add the following to your .bazelrc
file:
build --cache_granularity=medium
Comparing Local vs. Shared (Remote) Caching with Bazel
As discussed earlier, a Bazel build cache is stored either locally or remotely. Each of these approaches has its advantages and disadvantages, which you’ll review in more detail later.
Local Caching
The major advantages of using local caching include the following:
- Faster access to build artifacts: Your artifacts are stored on your local computer. This means that they can be accessed more quickly and do not need to be transferred over the network.
- Increased reliability: There’s no need for an internet connection to access build artifacts.
- Local caching is more secure: Because build artifacts are only stored on the local machine, you can’t access them over the network.
However, local caching has the following limitations:
- Limited storage space: Local caching takes up more disk space on the local computer. This is problematic if the computer has limited storage capacity, especially if you’re building large projects.
- Sharing limitations: With local caching, build artifacts are only stored locally. You can’t share them with other team members or across different computers. This is a huge setback when multiple team members work on the same Bazel built project.
Shared (Remote) Caching
The major advantages of using shared caching include the following:
- Unlimited storage space: Build artifacts are stored on a remote server, meaning storage space is flexible and scaled based on the team’s demand.
- Increased efficiency: Build artifacts are shareable across different team members, improving collaboration and reducing build times.
However, there are some potential drawbacks to using remote caching, including the following:
- Slower access to build artifacts: With shared caching, build artifacts are stored on a remote server and accessed more slowly than when stored locally.
- Requires additional security levels: Because build artifacts are accessible over the network, they’re vulnerable to security threats.
- Requires a stable internet connection.
Overall, the decision to use local or shared caching with Bazel will depend on your project’s specific needs and requirements.
How to Set Up a Shared Build Cache Using Bazel
Now that you know why you need Bazel caches and what the difference between shared and local caching is, let’s discuss different approaches to setting up a Bazel shared cache and storing build artifacts and outputs in a centralized location.
Remote Cache With a Google Cloud Storage Bucket
One option to set up a remote cache is to use a Google Cloud Storage bucket, a storage location in the cloud. By using a GCP bucket as a remote cache, Bazel will help improve the performance of your builds and make it easier to share artifacts among team members.
To use a GCP bucket as a remote cache for Bazel, you’ll need to follow these steps:
- Create a GCP bucket to use as the remote cache. This is done using the Google Cloud console, the gsutil command line tool, or gcloud storage buckets.
- Set up the credentials for accessing the GCP bucket. You should set up a service account and grant permission access to the bucket.
- Configure Bazel to use the GCP bucket as the remote cache. You can achieve this by adding the following lines to your
.bazelrc
file:
build --remote_cache=gs://[BUCKET_NAME]
build --google_credentials=[PATH_TO_CREDENTIALS_FILE]
Alternatively, you can use the --remote_cache
flag in your Bazel commands to specify the URL of your GCP bucket as the remote cache location.
For example, to build a Bazel target using a GCP bucket as the remote cache, use the following command:
bazel build --remote_cache=gs://my-gcp-bucket/path/to/cache my_target
This command will use the GCP bucket my-gcp-bucket
at the specified path as the remote cache for the build. Then Bazel will store the build artifacts in the cache and retrieve them from the cache as needed.
Remote Cache Using Amazon S3
Like a GCP bucket, an Amazon Simple Storage Service (S3) bucket works as a remote cache for Bazel. To use an S3 bucket, follow these steps:
- Create an S3 bucket to use as the remote cache. To do so, you can use the Amazon S3 console, the Amazon Web Services (AWS) CLI, or the Amazon S3 API.
- Set up the credentials for accessing the Amazon S3 bucket. Create an AWS Identity and Access Management (IAM) user and grant it the necessary permissions to access the bucket.
- Configure Bazel to use the S3 bucket as the remote cache by adding the following lines to your
.bazelrc
file:
build --remote_cache=https://s3.amazonaws.com/[BUCKET_NAME]
build --remote_instance_name=[INSTANCE_NAME]
Use the Bazel build and test commands as usual, and Bazel will automatically cache the build outputs or retrieve build artifacts from the S3 bucket.
Remote Cache Using Remote Build Execution
Bazel supports using remote build execution (RBE) to build and test applications remotely. This is useful when building and testing on multiple machines or when using a remote cache to speed up build times. To use RBE with a remote cache, you need to set up a remote execution instance and configure Bazel to use it. You can do this by adding the following lines to your .bazelrc
file:
build --remote_executor=<host>:<port>
build --remote_cache=<host>:<port>
Replace <host>
and <port>
with the hostname and port of your remote execution instance, respectively. Then use the build and test commands as usual, and Bazel will automatically use the remote cache and remote execution instance for your builds.
Using RBE and a remote cache extends build times because build output needs to be transmitted over the network to the remote cache. However, it significantly speeds up incremental builds since the remote cache saves the outcomes of earlier builds rather than recreating them.
Clearing the Cache
To clear the Bazel cache, you can use the bazel clean
command with the --expunge
flag. This will remove all files from the Bazel cache for both the build and test outputs. Here is an example of how to clear the Bazel cache:
bazel clean --expunge
Using --expunge
permanently deletes the files in the Bazel cache. You cannot recover them. Use this command only if you want to delete all the files in the Bazel cache.
Alternatively, you can use the bazel clean
command without the --expunge
flag to remove only the build and test outputs from the Bazel cache. This command won’t delete the build and test logs. You can then recover the files in the cache by running the Bazel build or test commands again. Here’s an example of how to use the clean
command without the --expunge
flag:
bazel clean
Disabling the Cache
To disable the Bazel cache, use the --noinmemory_cache
or --noremote_cache
flags along the Bazel commands. This will prevent Bazel from using the in-memory or remote cache, respectively.
Following is an example of how to use the --noinmemory_cache
flag when running the bazel build
command:
bazel build --noinmemory_cache [TARGETS]
And here is an example of how to use the --noremote_cache
flag when running the bazel test
command:
bazel test --noremote_cache [TARGETS]
Disabling the Bazel cache can significantly increase your build and test times. However, all files must be rebuilt again by Bazel, even if they’ve already been built and are stored in the cache.
Testing Without the Cache
It’s possible to use Bazel without a cache; however, this may result in slower build times. To use Bazel without a cache, you need to specify the --nocache
flag when running the Bazel commands. Here are examples:
bazel build --nocache //path/to/package:target
bazel test --nocache //path/to/package:target
Remember that using the --nocache
causes Bazel to rebuild the necessary dependencies from scratch. You can use the cache unless there is a specific reason not to, such as the following:
- A cache contains outdated or invalid data, and you want to force Bazel to rebuild all dependencies from scratch.
- A rebuild of all necessary components is required based on changes to build configurations or dependencies that must be fully reflected in the build.
- For debugging reasons, you need the full output of the build process.
Conclusion
Using remote cache for Bazel significantly improves your builds’ performance and reliability, making storing and accessing build artifacts from multiple machines and locations easy.
In this guide, you learned about the concept of Bazel cache builds. After reading, you should be able to set up the local and remote cache to improve Bazel build performance. For more information, check out this guide for any command-line reference you may need.
Earthly makes CI/CD super simple
Fast, repeatable CI/CD with an instantly familiar syntax – like Dockerfile and Makefile had a baby.