Observability: Are You Measuring What Actually Matters?

Old observability metrics like uptime and MTTR aren't enough anymore. Teams must connect technical signals to business outcomes, especially as AI raises the stakes.

By: Colin Burke

| June 15, 2026

Observability

AI & LLMs

Metrics

The Director’s Guide to the Future of Observability: AI, OpenTelemetry, and Complex Systems

Whitepapers

March 4, 2026

The Director’s Guide to the Future of Observability: AI, OpenTelemetry, and Complex Systems

Read Now

Observability has always been important, and much like any core capability in your business, the value needs to be understood.

For years, the value of observability was predictable. It was uptime, error rates, MTTR, and likely tool consolidation. That was enough to be able to show progress. These are foundational, tablestakes metrics—and they still matter, but they aren’t enough.

The gap is becoming harder to ignore

The systems being operated now are more complex, more distributed, and increasingly more shaped by AI. In this world, the question is not just whether a service is up, or whether an incident was resolved quickly. It is whether the system is behaving as intended, whether the customer experience delivers value, and whether the business can point to a return worth defending.

We hosted a roundtable at LDX in London, and one of the themes that struck me was that slow is the new down. In many examples, outages were less common than degraded experiences. Participants in healthcare and EV charging described situations where systems were technically available but slow enough to create material user harm, frustration, or business risk.

For that reason, the scorecard has changed and tablestakes won’t cut it. It’s easy to fall into the trap and assume that the familiar metrics are complete ones. Uptime, MTTR, and engineering productivity are useful because they are quantifiable, understood, and can be benchmarked. They prove competence. But what they don’t automatically show is consequence. They don’t explain what the business got out of that competence. They infer, but don’t show it concretely—nor do they show why the next increment of investment should matter.

Why is that distinction important now? The old answers are often enough for practitioners, and maybe for a CTO, but they are much less persuasive for the wider set of stakeholders who now have a say in observability. Product wants to know whether the customer experience improved, security wants to know whether the new system behavior is visible and governable, finance wants to know not just what the platform cost, but what it earns or protects. The same telemetry data may support all of those conversations, but only if the value story and measurement is built in a deliberate way.

Leverage AI-powered observability with Honeycomb Intelligence

Learn more about Honeycomb MCP, Canvas, and Anomaly Detection.

Learn More

Where organizations get stuck

Organizations have always struggled to measure, not because they didn’t know the metrics, but because they struggled to quantify and baseline. Now, you add a gap in knowing what measures matter most and there is a problem worth solving.

In practice, that usually shows up in a few predictable ways. Teams measure what is easy to count or anecdotes, rather than what is meaningful to the business. They report operational improvements without connecting them to commercial outcomes. They can describe what happened during an incident, but not what improvement was worth 12 months later. When they get asked what observability bought them, they often end up with a small amount of numbers, vibes, and anecdotes.

This is also a more specific issue hiding inside a lot of those conversations. MTTR is often treated as a hero metric, but in reality that is a lagging measure of a deeper problem. Fast resolution matters, but in so many organizations, the real bottleneck is understanding what’s wrong in the first place. Rapid identification of the right issue is what changes the shape of the work.

Now, add in the world of AI. Agentic systems are not only complex in the usual distributed systems sense. They are also nondeterministic, contextual, and sometimes working on behalf of users. That means the old monitoring assumptions start to fray. A service can be available and still be behaving badly, or providing a poor customer experience. A model that is technically healthy and commercially harmful.

A broader expression of value

When we talk about the value story, we need to expand the framing. At Honeycomb, we think about it across five connected areas:

savings
stability
speed
satisfaction
product success

The first four are where many teams begin, because they live close to the operational side. The fifth is usually the most difficult, because it forces teams to connect observability to product, customer, and business outcomes directly.

Savings is generally seen through the lens of tool consolidation. Stability is about reducing downtime and operational fragility. Speed is about getting engineering time back, enabling more focus on delivering product value. Satisfaction looks outward to customers, partners, and downstream trust. And product success asks the most difficult question of all: what did the product do better because of this, and what was that worth? It shifts the conversation to the customer experience.

AI raises that bar

The current wave of AI investment makes observability more important than ever. The argument is that effective gen AI investment means organizations need to judge AI through value creation, adoption, business impact, cost to scale, and governance. Observable AI has to be a part of the control plane. Think about it: if an AI system is expensive to operate, difficult to reason about, unpredictable in production, or impossible to tie back to user value, then availability metrics alone don’t justify the investment. Observability needs to help answer whether the system is behaving, what it’s actually doing in production, whether it’s trusted, and whether it’s worth the spend. That’s a much higher bar than simply proving the lights are still on.

That has also shifted the stakeholders. Where there used to be one buyer, there are now several—CTO, CPO, CISO, CFO—and they are looking at things through different lenses. Product wants evidence of user and feature outcomes, finance wants the cost and revenue story. Leaders who can support all four stakeholders and tell a complete value story win.

A great example here is Fin.ai. Their AI customer service agent handles millions of conversations on a daily basis. The team needed more than service health dashboards to understand what was actually happening. They built a formal SLI for Fin in Honeycomb, tracking time to first token, model performance, routing decisions, and conversation-level behavior. That gave their teams a way to measure the actual customer experience, not just whether the underlying services were alive. They observed their product. It wasn’t just about helping engineers debug faster, it was about helping them evaluate whether an AI agent was doing its job well enough to deliver the customer experience they expected. As Kesha at Fin always says, “How do you say observability without saying observability?”

Another theme that came from LDX3 was how much security is now a part of observability. In the AI world, attack speed and supply chain risks are higher. It’s less about forensic analysis after the fact, and more about early detection, mitigation, and real-time signal analysis, all while bringing security stakeholders right into the mix.

So, what does good look like?

As per usual, it starts with a baseline. Build toward a target and link directly to outcomes that matter outside the platform team. In fact, the strongest value stories are updated over time, shaped with customer and stakeholder input, and grounded in numbers that teams can defend in a real budget discussion.

That means we must ask better questions, such as:

What did an hour of downtime cost last year and what does it cost now?
How much engineering capacity was recovered and what did we ship with that time?
How did customer experience change and can that be shown rather than claimed?

It means building clearer expressions of value from the data that teams already have. Asking the questions you haven’t before. Observability is no longer about knowing whether systems are running. It’s about understanding what changed because teams could see more clearly, move more confidently, and connect technical signals to outcomes the business actually cares about. If that story cannot be told, there is a good chance the wrong things are being measured.

Want to learn more?

Talk to our team about how we're helping organizations build the operational foundation for AI development success.

Get a Demo