New to Honeycomb? Get your free account today.
Part 1: Is your observability a cost or an investment?
In 2018, I dashed off a punchy little blog post in which I observed that teams with good observability seemed to spend around ~20-30% of their infra bill to get it. I also noted this was based on absolutely no data, only my own experiences and a bunch of anecdotes, heavily weighted towards startups and the mid-market tech sector.
This post should have ridden off into the sunset years ago. To my horror, I have seen it referenced more in the past year than in all preceding years combined. I’ve stumbled over it in funding announcements, analyst briefings, startup pitches, podcasts, editorials… and even serious news articles written by actual journalists.
I’ll be reading some piece of tech news or whatnot, and it’s talking about “the soaring cost of observability, which experts say can be as high as 20-30% of your infra bill” and oh shit, that’s a link to my post?
But I pulled that number out of my ass! And I said I was pulling it out of my ass!
Also: that was almost eight years ago. The world was very different then! ZIRP was in full bloom, infrastructures were comparatively simpler (and thus cheaper), and a lot of people were pursuing a “best of breed” tooling strategy where they tried to pick the best tracing tool, best metrics tool, best APM, best RUM, etc., even if they were all from different vendors. All of which drove up costs.
So how would I update my rule of thumb today, in 2025? I would lower my estimate by a little, but complicate my answer by a lot.
In 2025, how much are people paying in observability costs?
After seeing what lots and lots of people pay for observability, an estimate of around 15-25% of your infra bill is straight down the middle. This will buy you quality observability, but you typically have to make some tough choices and sacrifices. People won’t get everything they want.
My rule of thumb does not scale up linearly to higher cost ranges. If your infra bill is $100m/year, you shouldn’t be paying $20m/year in observability costs—but if your infra bill is $100k/year (and your product is live and you have real, paying customers) you’re probably paying at least $20k/year. Sounds about right.
People who claim that they pay less than 15%, in my experience, usually fail to include the cost of engineers and engineering time in their estimates. Or they have a very simple architecture, a system that does not change very often, and/or customer experience is not a priority or a differentiator for them. (If you claim to pay <10% of your infra bill on observability and think you have high fidelity observability, I have many questions and would love to speak with you. Seriously, would you drop me an email?)
Worth noting: I am still pegged unduly to midmarket companies, because enterprises are very tight-lipped about how much they spend on infra (and/or they genuinely don’t know). It’s actually really hard to find someone at e.g. an international bank who can tell you these things, and even harder to find someone both knowledgeable and willing.
Analysts tell me that a 10% number should be achievable for a large enterprise with discipline, agency, and executive buy-in. But I’m still searching for the evidence. (As Corey Quinn points out, you typically spend at least 7% of your infra bill on internal network transfers. It beggars belief to me that good, high quality observability could cost anything in the neighborhood of network transfer costs.)
Thank you, Gartner, for publishing some real numbers
Gartner put out a webinar on observability costs earlier this year that allows us to attach some numbers to what has been a fairly vibes-based conversation. They don’t tell us how much people are paying as a percentage of their infra bill, but they do give us some high-level data points:
- 36% of Gartner clients spend over $1M per year on observability, with 4% spending over $10M
- Over 50% of observability spend goes to logs alone
- Many enterprises are using 10-20+ observability tools simultaneously
They also give the example of one particular Gartner customer, who spent $50k/year on observability in 2009 and is now spending over $14m/year (as of 2024). If you’re wondering whether those exponential cost increases have leveled off over the past few years, the answer is no; this includes 40% year-over-year growth for the past five years.
No wonder VPs and CTOs everywhere are pushing to contain or cap spend at all costs. But before we start blindly slashing costs, let’s take a look at what it is we’re trying to buy.
Investments pay off, costs do not
The #1 pro tip I can give anyone who wants to dramatically lower their observability spend is this: care less about the customer experience. Precision tooling for complex systems is not cheap. Is it worth paying for? I don’t know. Is observability a cost for your business, or is it an investment?
The difference is that investments pay off; costs do not. A good investment will not only pay for itself, it will deliver compounding returns over years to come. It doesn’t make sense to penny-pinch an investment that you expect to return 3x or 5x over the next two years. Instead, you should invest up until you near the point of diminishing returns. Costs that you don’t expect a return on, however, should be strictly controlled and minimized.
How can software instrumentation turn into an investment capable of generating returns? Tons of ways, which we can bucket in two groups: the external, customer-facing set of use cases, and the internal ones, linked to developer experience and productivity.
The external component is typically the one that is easier to calculate, estimate, and quantify, so let’s start there.
From a product perspective, when should you invest in observability?
It always starts with your business model. How do you make money? What are your core differentiators? What are customers sensitive to?
Does your business rely on ensuring every package gets delivered, every payment transaction succeeds, and that you can swiftly and accurately react and respond to customer complaints? Does high latency and laggy UX translate directly into lost revenue? We’ve all seen things like Amazon’s findings that every 100ms in latency cost them 1% in sales, or Google saying that a half-second delay costs them 20% of results, or that one out of five abandoned shopping carts were abandoned due to slow load times.
If you can draw a straight line from performance improvements to revenue, then good observability should be an investment for you; every penny you invest should pay for itself, many times over.
If you’re approaching this from the perspective of a team or group of teams, start by examining the portfolio of services that you own. Are they forward-facing, under active development, subject to frequent change, latency-sensitive, or directly tied to revenue-generating activities?
If you’re working at a startup or midmarket company, you should make this decision based on your business model. If you’re working at a large, profitable enterprise, you’re going to want to define tiers of service with different levels of observability.
The one exception here is when it comes to internal tooling—CI/CD pipelines, deploy tooling, that kind of thing. This is so deeply, inextricably linked to developer productivity that lack of visibility will cause you to slow down and suffer in compounding ways. If you care about speed and agility, you need precise, granular observability around these core feedback loops. Which brings us to the next point.
From an engineering perspective, when should you invest in observability?
The internal component is related to how swiftly engineering teams can ship, iterate, and understand the changes that they’ve made in production. This one is harder to quantify, identify, or even see.
I sometimes think of this as the dark matter of software engineering, the time we can’t see ourselves wasting because we can’t see it, all we can see is the endless toil and obstacles in our way.
Observability isn’t just about tracking errors and latency. It’s the grease, the lubrication, the sense-making of software delivery. When you frontload good observability, it makes everything else faster and easier. When sense-making itself is difficult, fragile, unintuitive, or requires a high bar of expertise or additional skill sets, it becomes a barrier to entry and a drag on development.
When should you invest in observability from an engineering perspective? I might be biased here, but I think the answer is “always, in theory.” What engineering org doesn’t want to ship faster and understand their software? In reality, it may not make sense to invest in better tooling if there’s no organizational alignment or commitment from management to follow through.
Why exactly are costs soaring? Why now?
The three big reasons Gartner cites under “Reasons for Increasing Costs” are:
- Organic growth
- Reality: costs increase linearly with business growth
- Desire: disassociate rising application/infra costs from observability costs
- Telemetry complexity
- Orders of magnitude increase in quantity of data
- New telemetry types—logs, metrics, trace, frontend, RUM, etc.
- Overwhelming noise, which requires an analytics platform to make sense
- Increased expectations
- Growing dependency
- Rapid adoption across organizations
I have a bone to pick with the first one. They claim that costs increase linearly with business growth, but the problem is that observability costs are increasing at an exponential rate, detached from business growth. I think cost increases pegged to the rate of business growth would make everyone very happy.
The second one seems like a bit of a truism. Costs are rising because we’re collecting more data and more types of data? Ok, but why?
The meta reason that all of this is happening is that our systems are soaring in complexity and dynamism. I gave a keynote way back in 2017 where I talked about the Cambrian explosion in complexity that our systems were undergoing, and things have only accelerated since then.
Back in 2009, when our friend from the Gartner example was spending $50k/year on observability, they probably had a monolith application, a web tier, and a primary database, and the tools they paid for told them if it was up or down, what the latency was, and a bunch of low level systems metrics like CPU, memory, and disk. Which brings us to reason number three.
The not-so-benevolent reasons why costs may be exploding
Those are my best-faith arguments for why observability bills have reached astronomical heights in the past few years, but it’s easy to come up with some not-so-benevolent reasons why vendors may be ratcheting up the costs. Here are a few of them:
- They may be passing on the high costs of their own technical cost drivers. We haven’t talked about technical cost drivers at all in this article (that will be coming next week, in part two!), but any vendor that was built using the multiple pillars (“observability 1.0”) model is paying to store your data many times, in many different formats.
If you’re using 10 different tools from an observability platform, this means they’re storing your data (at least) 10x for every request that enters your service. They pay for that 10x, and so do you. They also have to staff up 10 different development, product, design teams, market 10 different products, etc., and then pass all those cost multipliers along to you.
- Because they can. If companies will pay it, vendors will charge it. If a company was built on the assumption that it can charge particular margins, its entire business model comes to depend on it. Your investors expect it, your operating costs expect it. I am not a business cat, but my understanding is that this can be an extremely tough thing to adapt (I think this is what they call the “Innovator’s Dilemma”).
- Data has gravity, even in an OTel-ified world. Changing vendors is a pain in the ass. It’s a lot of labor that could otherwise go to value-generating activities, and you have to update your links, dashboards and workflows, train everyone in the new system, field complaints… It’s just no one’s idea of a good time. Vendors know this, alas.
I also recently heard an intriguing hypothesis from Sam Dwyer, of Nine.com.au. He said,
“I think it’s impossible for vendors to provide the meaningful value people are looking for in their observability tools, because you can’t get that depth of introspection into your systems without manual instrumentation—all you can get is breadth.
But people don’t want to hear that, so from the very first sales conversation, vendors promise their customers that they can just “drop in our magic library/agent/whatever, and everything magically works!” Then they get stuck in a product development cycle that is entirely dependent on auto-instrumentation output, because trying to build any other features breaks the initial contract. Even if it would result in a better outcome for the customer and a better product in the long term!
So customers keep saying ‘we need more value’ and vendors keep frantically trying to provide more value without having to go back to the customer and saying, “To do that, you need to add some manual instrumentation” so they keep building more and more features on top of their platform which provide more and more ‘surface’ observability without the depth that customers actually need and want.”
I don’t know exactly how prevalent this situation is, or how to quantify it, but it certainly plays right into my existing beliefs and biases, so I had to include it. 😉
It’s good that people are learning to rely on observability
Let’s talk about those “increased expectations” for a moment. Gartner reports that companies are seeing rapid adoption of observability tooling across the organization and are growing increasingly dependent on their observability tooling. Two client quotes:
“Without our observability tool, we realized that we were blind.”
“Once implemented, it spread like wildfire.”
This is a good thing. It’s an overdue reckoning with what has long been the case: that our systems are far too complex and dynamic to understand by reasoning about the system model we hold in our heads.
Without realtime, interactive, exploratory observability at every level of the system, we are flying blind. Observability is not just an operational concern, it’s a fundamental part of the way we build software, serve users, and make sense of the world around us.
To some extent, the rapid spread of tooling and instrumentation is us playing catchup with reality. It reminds me of the early days of moving to the cloud, when it was the Wild West and we didn’t yet have best practices and rules of thumb and accounting tools. We’ll get there.
Who owns your observability bill?
I mentioned earlier the difference between costs and investments. If your observability tools budget is owned by IT or rolls up to the CIO, it’s going to get managed like a cost center. Not for any nefarious reasons, just because that is their skill set. This is what they do.
If you want your observability to be a differentiator for customer experience or engineering excellence, it needs to be managed like an investment. This means bringing it in under the development umbrella (or, interestingly, the product or digital umbrellas).
There are very few blanket recommendations I will make, due to the sprawling complexity of the topic, but this is one of them. Move your observability budget under engineering, or some other org that knows how to manage tools as investments, not cost centers.
I’ve seen so many engineering orgs kick off well-intentioned, seemingly well-resourced transformation agendas, only to see them founder due to poor sense-making abilities and lossy feedback loops.
It really is that important.
The observability cost crisis is a rare window of opportunity
I believe that the work of controlling and managing costs can go hand in hand with making your observability better.
This doesn’t have to be a choice between spending more money and getting better outcomes vs spending less money and getting worse outcomes. In fact, it shouldn’t be.
The more useless data you collect, the harder and slower it gets to analyze that data. The more widely you scatter your critical data across multiple disconnected tools, pillars, and data formats, the worse your developer experience gets. The more fragmented your developer experience, the more work it takes to reconcile a unified view of the world and make good decisions.
These top-down mandates for cost control can actually be an enormous opportunity in disguise. Under normal operating conditions, it can be hard to summon enough organizational will to support the amount of work it takes to transform internal tools. Every team is already so busy, with features to ship and deadlines to hit, that allocating developer cycles to internal tooling takes last priority, and is the first to get bumped.
The cost crisis changes this. This is a rare opportunity to rethink the way we instrument and the tools we use, and make decisions that lay the groundwork for the next generation of software delivery. A window like this comes along only once in a while and does not stay open for long. We should not waste it.
Easy to say, harder to do. ☺️
Next week, we will publish the second half of this piece, which will be a practical guide to the cost drivers for both observability data models—the multiple pillars model and consolidated storage model (also called “observability 2.0”)—and levers for controlling those costs.