Ucacher: Speeding up GitHub Actions via syscall instrumentation

34 minute read     Updated:

Ignacio del Valle Alles %
Ignacio del Valle Alles

TL;DR: Ucacher automates caching and skipping in GitHub Actions using syscall instrumentation, eliminating manual configurations and errors. It tracks exact file dependencies, skips redundant steps, and restores outputs for faster, more efficient workflows. Tested on the React repo, Ucacher delivered a 2x speedup.

Traditional CI/CD workflows, particularly in GitHub Actions, rely on manually configured caching and skipping techniques to reduce redundancy and improve efficiency. While effective in basic scenarios, these methods often fall short in handling complex workflows, requiring developers to manage cache keys, paths, and conditional logic manually prone to errors and inefficiencies.

In this post, we’ll explore the current standard for implementing caching and skipping in GitHub Actions, to later on introduce Ucacher, a tool we’ve recently developed to automate these tasks at the command level, removing the need for manual setup and delivering greater precision.

How it’s done today: the manual approach

This section describes how skipping and caching are typically implemented in GitHub Actions. Feel free to skip it ;) if you are familiarized with these techniques.

Skipping and caching are two techniques used in CI/CD pipelines to avoid redundant operations and reduce execution, which translates into faster iteration loops for devs and reduced cloud infrastructure costs.

Caching

Caching involves storing files or directories in a persistent storage location so that they can be retrieved and reused in future executions of the workflows.

actions/cache is the official GitHub action for caching files. It works by caching a set of directories or files specified by the path parameter, and identifying their state by a unique key.

When a workflow runs, it attempts to restore the cache using this key or some other fallback keys, and if a match is found, the files are downloaded and restored. If no cache is found, then a new cache is created at the end of the workflow. In other words, it allows you to store immutable sets of files and retrieve them later on by its key.

This helps reduce time spent on repetitive tasks like downloading dependencies or recompiling code. It’s a useful pattern when you know how to generate the key from your build context.

For example, in step 2 of the workflow below, the cache is used to restore the contents of ./node_modules. The cache key includes the runner’s OS and the hash of package-lock.json. If package-lock.json changes, the key updates, and the cache is invalidated as expected, ensuring ./node_modules matches the new dependencies.

Notice also that if an exact match is not found we can still fall back to a restore key that could provide a partial restoration of the folder and still avoid some dependency downloads.

steps:
  - name: Step 1 - Checkout code
    uses: actions/checkout@v3
    
  - name: Step 2 - Restore dependencies
    uses: actions/cache@v3
    with:
      path: ./node_modules # Directory to cache
      key: ${{ runner.os ' }}-node-${{ hashFiles('**/package-lock.json') ' }}
      restore-keys: ${{ runner.os ' }}-node-
  
  - name: Step 3 - Install dependencies
    run: npm install
  
  - name: Step 4 - Build project
    run: npm run build

Restoring ./node_modules from the cache will significantly reduce the runtime of npm install, since all the dependencies will be found locally.

Skipping

Skipping ensures that unnecessary steps are not executed, based on certain conditions. This reduces runtime and resource consumption. In GitHub Actions, conditional job runs allow you to control whether a job or step is executed based on specific conditions using the if keyword. This enables dynamic workflows, optimizing execution by skipping unnecessary jobs or steps based on context information.

Combining caching and skipping in GitHub Actions is typically done by:

  1. First caching
  2. Then skipping if a cache hit was found Here is how the previous example can be improved to completely skip step 3a, when there is cache hit on step 2a:
steps:
  - name: Step 1a -Checkout code
    uses: actions/checkout@v3

  - name: Step 2a - Restore dependencies
    id: cache-dependencies
    uses: actions/cache@v3
    with:
      path: ./node_modules # Directory to cache
      key: ${{ runner.os ' }}-node-${{ hashFiles('**/package-lock.json') ' }}
  
  - name: Step 3a - Conditionally install dependencies
    if: steps.cache-dependencies.outputs.cache-hit != 'true'
    run: npm install
  
  - name: Step 4a - Build project
    run: npm run build

Now, we can completely skip the execution of Step 3a, reducing even more the runtime of the build. Notice that in this case the restore key is not specified, since an exact match is required in order to Step 4a finding the proper folder contents.

Note

Skipping can also be configured at the workflow level using the paths and paths-ignore clauses to filter triggering conditions. However, we will focus on conditional skipping, as it can be combined with caching to achieve the same outcomes and offer additional flexibility.

Beyond caching dependencies

The previous section demonstrated the standard approach of combining caching and skipping, specifically for caching dependencies.

While dependency caching is widely useful, dependency resolution may only account for a small portion of the total runtime in a CI job. This raises an important question: can we extend this pattern to skip other, more time-consuming steps?

Let’s revisit the earlier example and focus on the npm run build step, which takes a significant amount of time to complete. If possible, we’d like to skip this step when the conditions allow.

To achieve this, the first step is to identify the outputs generated by the build step and the inputs it relies on. In this case, the command produces a ./build folder as output, based on the dependencies (package-lock.json) and the source files located in the ./src directory.

This insight leads to a new workflow definition that:

  • First, it tries to fetch the ./build output folder based on a key that considers all the input files
  • Then, if a cache is found, and the output folder is returned it skips all successive steps
  • Otherwise, it behaves like the previous example iteration
steps:
  - name: Step 1b - Checkout code
    uses: actions/checkout@v3

  - name: Step 2b - Restore build
    id: cache-build
    uses: actions/cache@v3
    with:
      path: ./build
      key: ${{ runner.os ' }}-node-build-${{ hashFiles('**/package-lock.json') ' }}-${{ hashFiles('src/**') ' }}
  
  - name: Step 3b - Conditionally restore dependencies
    id: cache-dependencies
    if: steps.cache-build.outputs.cache-hit != 'true'
    uses: actions/cache@v3
    with:
      path: ./node_modules # Directory to cache
      key: ${{ runner.os ' }}-node-${{ hashFiles('**/package-lock.json') ' }}
  
  - name: Step 4b - Conditionally install dependencies
    if: steps.cache-build.outputs.cache-hit != 'true' && 
        steps.cache-dependencies.outputs.cache-hit != 'true'
    run: npm install
  
  - name: Step 5b - Build project
    if: steps.cache-build.outputs.cache-hit != 'true'
    run: npm run build

As you can see here, we’ve changed the path parameter to ./build, since that is the folder that the command generates, and also augmented the key to also include the hashes of all files under src, since all of them might potentially be used by the build command.

This example now presents the advantage that the build step is also skipped when the workflow runs for a second time over the same package-lock.json and ./src folder contents, for example when you change your README.md file or when you rollback a commit. However, notice how the workflow readability has degraded.

The problem: Broad cache keys

Now, let’s explore a more advanced scenario where the job uses a matrix strategy to divide test execution across multiple parallel runners:

build:
  name: yarn build and lint
  runs-on: ubuntu-latest
  strategy:
    matrix:
      shard:
      - 1/5
      - 2/5
      - 3/5
      - 4/5
      - 5/5
  steps:
  - name: Step 1c - Checkout code
    uses: actions/checkout@v3

  - name: Step 2c - Restore dependencies
    id: cache-dependencies
    uses: actions/cache@v3
    with:
      path: ./node_modules # Directory to cache
      key: ${{ runner.os ' }}-node-${{ hashFiles('**/package-lock.json') ' }}
  
  - name: Step 3c - Conditionally install dependencies
    if: steps.cache-dependencies.outputs.cache-hit != 'true'
    run: npm install
  
  - name: Step 4c - Run tests
    run: npm run test --shard=${{ matrix.shard ' }} --coverage

This is how the previous job definition works: at runtime, it creates 5 different runners (one per matrix element), and each runner runs a job with a different matrix.shard value.

On the other hand, suppose npm run test --shard=i/n works the following way:

  • It takes all the files in test/ and deterministically splits them into n groups
  • Then, it runs the tests in the i-th group

Now, suppose these commands take a significant amount of time to run, and we are interested in skipping them.

As before, we’d need to identify the input and output files the npm run test --shard=${{ matrix.shard ' }} --coverage commands operate on.

While we know each runner will use a different subset of files in src, we don’t know exactly which ones they are, so we have to consider the whole folder for keying purposes. This is how the new caching step would look like:

- name: Step 2d - Restore tests
  id: cache-build
  uses: actions/cache@v3
  with:
    path: ./coverage
    key: ${{ runner.os ' }}-node-build-${{ hashFiles('**/package-lock.json') ' }}-${{ hashFiles('src/**') ' }}-${{ matrix.shard ' }}

This setup is effective for caching reruns but has a significant limitation: any change in the src/ folder invalidates the cache for all shards. Even a minor modification affecting only one shard will cause all shards to re-run, reducing the overall efficiency of caching.

This lack of precision results in wasted resources and longer execution times. Ideally, only the steps impacted by changes should run, while unaffected shards skip execution entirely.

To solve this problem we’ve built Ucacher.

Ucacher

ucacher is a CLI tool that is able to understand the inputs and outputs of a command in your build to infer if it is impacted by the source file changes and skip its execution accordingly.

It improves the user experience offered by actions/cache in the following ways:

  • No parametrization is required
    • No human errors involved
    • No prior knowledge of the command to cache is required
  • Finer grain caching: at the command level, vs to the step level.
  • Greater precision: If the modified files triggering the build don’t affect the command it will be skipped
  • Automatic output caching: It restores the files written by the command in case of a cache hit (skip)

This is how the previous example could be cached with Ucacher:

build:
  name: yarn build and lint
  runs-on: ubuntu-latest
  strategy:
    matrix:
      shard:
      - 1/5
      - 2/5
      - 3/5
      - 4/5
      - 5/5
  steps:
  - name: Step 1e - Checkout code
    uses: actions/checkout@v3
  
  - name: Step 2e - Install ucacher
  - uses: earthly/setup-ucacher@main
  
  - name: Step 3e - Conditionally install dependencies
    run: ucacher npm install
  
  - name: Step 4e - Conditionally run tests
    run: ucacher npm run test --shard=${{ matrix.shard ' }} --coverage

For example, suppose now that you push a change in a test file belonging to the 3rd shard, then Ucacher automatically detects this file was not used in any previous runs for the shards 1,2,4,5 and skips them all, and only runs the tests for the 3rd shard.

How it works

Linux syscalls

In Linux, system calls (syscalls) are the primary interface between user-space applications and the Linux kernel. They allow applications to request services or resources from the kernel, which manages access to hardware and enforces security, process isolation, and memory management. Syscalls provide applications with controlled access to the system’s low-level functions without compromising security or stability.

Commands executed in your build are examples of user-space applications. To interact with the filesystem —such as reading from or writing to a file— these commands rely on syscalls. The kernel processes these requests, performing the necessary operations on behalf of the application.

ptrace

ptrace(2) is a special syscall that allows a process (typically a debugger) to observe and control the execution of another process. It provides powerful functionality for debugging and instrumentation, such as inspecting and modifying memory, registers, and system calls of a target process.

The idea

Ucacher launches the command and monitors the file-related syscalls triggered from its process tree by using ptrace.

When a file is open it halts the process execution and registers the file access. Depending on the file mode, Ucacher tags the file as an input or output file. For input files it computes their hash, to capture the initial state.

After the execution completes successfully, it uploads output files to persistent storage (e.g., GitHub Actions cache) along with some build metadata, in particular the hashes of all input files at the time they were read.

On subsequent runs, Ucacher checks for matching initial conditions (input files content, arguments, environment variables, system architecture, etc.). If they match, it skips execution and restores the cached output files instead.

The insight

While it’s impossible to know in advance which files an arbitrary command will access during execution, Ucacher is based on the observation that given a past execution, we can check if its initial state is still maintained in the actual running environment, to conclude that a new execution would produce the same results.

As a consequence, the approach to identifying matching initial conditions differs from the traditional one-hop cache lookup. Ucacher employs a two-phase lookup process instead:

  1. First, retrieve all cached entries associated with the specific combination of: command, environment, operating system, and branch.
  2. Then, for each potential candidate, verify whether the hashes of the input files from that execution match the current filesystem state. Once we find one where they match, then we have a cache hit, and we can stop looking.

Preliminary results

To evaluate the real-world effectiveness of Ucacher, we tested it on the well-known facebook/react repository. With its complexity and already highly optimized workflows, it provided an excellent benchmark for assessing Ucacher’s capabilities.

In particular, we identified the yarn build step of the “yarn build and lint” job as a good candidate since it:

  • takes a significant amount of time to complete.
  • uses a matrix strategy to shard the load across 38 jobs/runners.
  • doesn’t open files from bundles belonging to another shard.

This is how a single job of those 38 performs without Ucacher involved. As you can see, yarn build takes most of the time:

job details

So we introduced Ucacher as:

ucacher diff

to completely skip that step when the same conditions are met, as shown here:

ucacher job

Performance

We’ve dedicated significant effort to optimizing Ucacher’s performance. While this could be a great topic for a future blog post, one key takeaway is this: Yes, ptrace can be highly efficient, especially when combined with seccomp filters 1.

We measured Ucacher’s impact on the “yarn build and lint” job by comparing:

  1. Without Ucacher: The default workflow without caching or skipping enhancements.
  2. With Ucacher: The optimized workflow using Ucacher.

Ucacher reduced the total CI time for all shards from 57m to 31m. So overall, almost 2x faster. Let’s see how:

Cache Miss (First Run)

On a cache miss, Ucacher adds a small overhead due to dependency tracking and caching:

  • Baseline runtime: ~1m30s per shard.
  • Ucacher runtime: ~1m40s minutes per shard (~10% overhead).

This overhead is caused by hashing and uploading files, and we’re actively optimizing it.

Cache Hit (Subsequent Runs)

On a cache hit, Ucacher delivers substantial time savings:

  • Baseline runtime: ~1m30s per shard, regardless of file changes.
  • Ucacher runtime:
    • Unaffected shards: ~ 40s (skipped builds).
    • Affected shard: ~1m40s (rebuild).

Precision

To test Ucacher’s precision, we made isolated changes to specific files:

  • Change in source file: A line in packages/react-noop-renderer/src/ReactNoopServer.js was modified.
    • Without Ucacher: All 38 jobs rerun the build unnecessarily, resulting in 57m of billable runtime.
    • With Ucacher: Only the 4 of the jobs rerun the build. The rest skipped it, resulting in 31m of billable runtime.
  • Change outside build scope: A README.md file was modified.
    • Without Ucacher: All 38 jobs rerun the build unnecessarily.
    • With Ucacher: None of the rerun the build, resulting in 28m of billable runtime.

Screenshots

Details on the runtime taken for both cases when ReactNoopServer.js is changed:

Without Ucacher

base jobs

With Ucacher

cacher jobs

Caching introduces some overhead, and Ucacher is no exception. On a cache miss, performance may be slightly slower than the baseline due to the additional steps of identifying, uploading, and indexing artifacts. As a result, it’s important to apply caching thoughtfully, targeting scenarios where the benefits of cache hits outweigh the occasional penalties of misses. Ucacher has already proven to be highly effective at reducing overall computation costs, especially in:

  • Workflows with task sharding, such as compilation and testing, where workloads are distributed across multiple shards.
  • Monorepos, where changes often have a limited scope, allowing for precise caching and skipping.

Summary

In summary, Ucacher simplifies and optimizes CI/CD workflows by making caching and skipping smarter and more precise:

  • Higher precision: Tracks exact file dependencies, skipping commands only when truly unaffected.
  • Automation: Eliminates manual cache keys and reduces configuration errors.
  • Efficiency: Minimizes redundant steps, saving time and reducing compute costs.
  • Handles complexity: Excels in large repositories, matrix builds, and multi-stage pipelines.
  • Seamless integration: Works out of the box with tools like GitHub Actions.

We’re eager to share Ucacher with the developer community and would love to hear your feedback. Whether it’s suggestions, feature ideas, or insights from your own use cases, your input is essential to helping us improve and evolve it.

Give it a try and let us know how it works for you!

Visit setup-ucacher on GitHub to get started!


  1. Filter and Modify System Calls with seccomp and ptrace↩︎

Ignacio del Valle Alles %
Ignacio del Valle Alles
Nacho is an experienced software engineer with passion for bringing value to fellow developers. Besides an Earthly advocate and key member of our core team, he is also founder and CTO of BatchX, a platform that helps research teams to run their bioinformatics analyses at scale, and algorithm creators to monetize their tools. You can find him at [Github](https://github.com/idelvall) and as @Nacho in our [Slack community](https://earthly.dev/slack)

Published:

Get notified about new articles!
We won't send you spam. Unsubscribe at any time.