Incident Review: Working as Designed, But Still Failing
A few weeks ago, we had a couple of incidents that ended up impacting query performance and alerting via triggers and SLOs. These incidents were...
Authors' Cut—Debugging with the Core Analysis Loop, and What to Build vs Buy
In this blog, we’ll dive into debugging with the Core Analysis Loop, the functional requirements for a backend datastore that make this possible, and whether...
We Learn Systems by Changing Them
In the social world, there is no outside: we participate in the systems we study. I’ve noticed this in code: when I come to an...
Why Intuitive Debugging Has Stopped Working for You
It’s harder to understand and operate production systems in 2021 than it was in 2001. Why is that? Shouldn’t we have gotten better at this...
Incident Report: The Missing Trigger Notification Emails
On November 18, between 00:50 and 00:56 UTC, an update was deployed that improved Honeycomb’s business intelligence (BI) telemetry available from our production operations environment....
Incident Resolution: Do You Remember, the Twenty Fires of September?
From September to early October, Honeycomb declared five public incidents. Internally, the whole month was part of a broader operational burden, where over 20 different...
Game Launches Should Be Exciting for Your Players, Not for Your LiveOps Team
This blog was co-authored by Amy Davis. The moment of launching something new at a game studio (titles, experiences, features, subscriptions) is a blockbuster moment...
Easily Debug Your AWS Lambda Functions With Honeycomb
With the Honeycomb extension for AWS Lambda, you no longer need to make your Lambda functions Honeycomb-aware. Today, AWS announced the general availability of AWS...
Community wins for the o11ydays!
The observability community helped us pay it forward for the o11ydays! Read about the contest winners and Honeybees' favorite responses....
Handle Unruly Outliers with Log Scale Heatmaps
We often say that Honeycomb helps you find a needle in your haystack. But how exactly is that done? This post walks you through when...
Incident Report: Investigating an Incident That's Already Resolved
Summary On the 23rd of April, we discovered that an incident had occurred approximately one week earlier. On April 16, for approximately 1.5 hours we...
Using Honeycomb to Investigate a Redis Connection Leak
This is a guest post by Alex Vondrak, Senior Platform Engineer at true[X]. This is the story of how I used Honeycomb to troubleshoot redis/redis-rb#924...
Take huge leaps with Honeycomb for Incident Response
As engineering teams shift from delivering services on monolithic architectures to microservices and even serverless environments, developers are no longer just responsible for creating and...
OpenTelemetry: New Honeycomb Exporters
We’re really big fans of OpenTelemetry at Honeycomb. As we’ve blogged about before, OpenTelemetry is the next phase of the OpenTracing and OpenCensus projects. Instead...
The Future of Software is a Sociotechnical Problem
"Sociotechnical" I learned this word from Liz Fong-Jones recently, and it immediately entered my daily lexicon. You know exactly what it means as soon as...