The Most Important Developer Productivity Metric

By: Liz Fong-Jones | January 14th, 2025

Software Engineering Technical Deep Dives

8 Min. Read

We love to talk about the value of observability in accelerating feedback loops by enabling teams to understand what changes they need to make to software. But a barrier that often holds teams back from completing the feedback loop is how long it takes to actually get feedback on code under development, or push code into production. If it feels like you’re operating the Mars Rover manually with a minutes-long delay between putting in a command and seeing results, it doesn’t matter how quickly you can analyze and get results within the Honeycomb product.

The importance of short build times

At Honeycomb, we believe in ensuring an awesome developer experience for our own engineering teams; it’s the only way that we can compete with companies that have way more engineers than we do. We’ve historically tried to keep our build times under 15 minutes, taking percussive action when build times got over 15 minutes. In fact, we have an internal Service Level Objective (SLO) on the data which alerts us if more than 5% of builds take longer than 15 minutes!

Randall Monroe, xkcd "Compiling" — Randall Monroe, xkcd “Compiling“

But it turns out that 15 minutes is still not a great experience; we’d find devs would task-switch rather than wait for builds to complete, losing valuable state and context in their heads. Ideally, build times under six to seven minutes mean developers stick around to watch their builds and are ready to rapidly correct CI failures. The state never gets paged out of someone’s head as they turn to Slack or pull the next ticket while waiting for the build.

Switching CI providers for speed and reliability

As an example of us taking major action to preserve build time goals, between June 2018 and May 2019, the median amount of time it took to build Honeycomb doubled from seven minutes to nearly 14 minutes. This prompted us to switch from TravisCI, which offered limited concurrency, to CircleCI, which offered better concurrency.

P50 duration between TravisCI and CircleCI

Since then, we’ve leveraged CircleCI’s concurrency and caching features to keep our tests running as quickly as possible while our codebase and developer count has grown. We went from 10-20 pull requests landing on the main branch per week to 100+ consistently every week. The total number of lines of code in our codebase approximately doubled over the past five years, increasing the amount of work incurred both for compilation and for unit testing.

The total number of lines of code in our codebase approximately doubled over the past five years, increasing the amount of work incurred both for compilation and for unit testing.

Build times climbed again

By 2023, things started to get considerably creaky. Over 25% of commits were taking longer than 12 minutes to pass CI, even worse than the threshold that caused us to abandon TravisCI. We tried increasing the number of parallel test runners, but to very little marginal benefit (thank you, Amdahl’s Law) as each test runner still has to do the work of setting up. Radical action appeared necessary to bring our test times back to what we previously had as a norm.

I hit my own personal breaking point with build time after an outage, trying to investigate a strange stack corruption issue that we thought was in the Go runtime, but turned out to be a bug in the x86_64 assembly of a library we depend upon. It was maddening having every build take 15 minutes to complete, deploying the build, testing whether the bug reproduced, and then repeating again to bisect the problem.

As Field CTO, I try not to get in the path of time-sensitive feature delivery, but this smelled like a problem that was both important and had no specific urgency to solve—precisely my jam. Attacking this problem became a key focus of my Q4.

Let’s fix these build times!

The first step was to study traces from our CircleCI buildevents orb to understand where all of the time was going. The tests actually were some of the faster parts of the build, and it was the binary build and artifact creation/upload steps that were the slowest. Why?

Build steps

To our surprise, the main issue holding back our performance was transferring files between different build steps. Nearly 3.5 GB of data was being passed between intermediate steps and the final jobs that uploaded the built artifacts. We discovered that the data was being gzipped rather than compressed with zstd, so meaningful size reductions purely through compression were not possible.

Instead of trying to compress the data, I started thinking about whether it was possible to entirely avoid moving all that data around. The primary artifacts we need to ship to our Kubernetes cluster are Go binaries and bundled-and-minified Javascript built with Webpack. Those get compiled into images pushed to ECR, as well as archival unstripped copies of the binaries uploaded to S3. But with our move of all of our services to containers and Kubernetes, there no longer was a need to upload stripped binaries both to tarballs stored in S3, as well as making those binaries part of the ECR images. What if we could just build everything in one single step and avoid shipping state around?

Docker Bake

The first thing that we considered was doing an omnibus build step to do the Go build, then doing the Javascript build, then bringing it all together into build artifacts. While this minimized data transfer, it was horrifically inefficient to run the steps in serial, waiting for each step to finish before starting the next step. My research then took me toward Docker Bake, a relatively new (introduced in 2019 and popularized around 2022) method of chaining together multi-step Docker image builds.

Parallel builds

By allowing steps to progress independently and block only when they needed files from the output of a step from a parallel build step that had not yet finished, we were able to effectively interleave multiple parallel builds on a single virtual machine. You’d think that CircleCI’s built-in support for Docker Layer Caching (DLC) would potentially be helpful here as well, but our experience was that for the same reasons that passing state in the CircleCI workspace was slower than we would have liked, that compressing/uncompressing and passing in indiscriminate amounts of cached data via DLC was going to take longer than creating the state from scratch, or passing in base layers generated from periodically built and explicitly cached tarballs.

Removing bottlenecks

Migrating the steps from separate CircleCI jobs run on their Docker executor into a single Docker Bake on a CircleCI machine executor allowed us to establish what the floor of our build time ought to be; we then carefully adjusted parallelism to make the tests (which still run in normal CircleCI parallel runners, using CircleCI’s parallel test sharding helper code) take approximately the same amount of time. We spent a lot of time staring at graphs generated using buildevents data to understand the normal distribution of latency of each step to ensure that no single job would be the bottleneck. For instance, we discovered that steps like initializing our test database for Go integration tests that previously had little effect on total build time now needed themselves to be optimized.

Caching network fetches

We also were able to dive in to look at individual traces to see if there were specific steps that contributed to tail latency and squish those down through more aggressive caching. For instance, we found that the necessary steps to update the system dependencies for puppeteer usually ran fast but could have enormous variance so it was better to pay a flat tax of a few seconds to cache the apt lists and deb packages from the day’s first run instead of re-downloading every time.

Results

One of the really cool things that happened during our investigation was stitching together buildevents traces at the CircleCI job and step level together with the actual Docker buildkit command tracing data. It turns out that buildkit has integrated OpenTelemetry support since its inception, so by setting environment variables for TRACEPARENT to make it match the buildevents trace and span IDs, we were able to directly see each individual command execution, ECR upload step, and blocking while waiting for other steps to complete.

Even though there were a few small bumps in the process of rolling out these changes (notably, an occasion in which we failed to set -euxo pipefail on a build step ignoring a failure to generate a list of which tests to run, resulting in tests being skipped until we noticed), the payoff of having build times consistently below seven minutes was well worth the effort.

What’s next for us? We’re curious about using tools like Depot or Namespace to do DLC without having to pay the cost of hydrating/dehydrating the cache.

Ever faster!

New to Honeycomb? Get your free account today!

TRY NOW

Don’t forget to share!

Liz Fong-Jones

Field CTO

Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with over two decades of experience. She is currently the Field CTO at Honeycomb, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.

Irving Popovetsky | Apr 21, 2025

Data Strategy for SREs and Observability Teams

The idea that telemetry data needs to be managed, or needs a strategy, draws a lot of inspiration from the data world (as in, BI and Data Engineering). Your company most likely has a data team that manages the data warehouse(s), data pipelines, data sources, and reporting tools. These teams are also constantly balancing costs with their user and stakeholder needs, usability, data retention, granularity, etc. Sound familiar? That’s because if you’re working on observability data, these teams are at least several years ahead of you in addressing these tradeoffs and considerations—and can teach us quite a lot.

Observability Sampling Software Engineering

Austin Parker | Apr 07, 2025

How Does ‘Vibe Coding’ Work With Observability?

You can’t throw a rock without hitting an online discussion about ‘vibe coding,’ so I figured I’d add some signal to the noise and discuss how I’ve been using AI-driven coding tools with observability platforms like Honeycomb over the past six months. This isn’t an exhaustive guide, and not everything I say is going to be useful to everyone—but hopefully it will clear up some common misconceptions and help folks out.

LLMs Software Engineering

Ken Rimple | Mar 24, 2025

AI for Software Engineering is Just Another “Paradigm Shift”

How many of you started with technologies you still use exactly the same way today? Even if you’re a master Lisp/Clojure developer, you’ve still evolved, right? I should really get my head around those parentheses some day.

LLMs Software Engineering

All-in-one Observability

Why Honeycomb

Looking for something?

Our mission