Incident Review: What Comes Up Must First Go Down
On July 25th, 2023, we experienced a total Honeycomb outage. It impacted all user-facing components from 1:40 p.m. UTC to 2:48 p.m. UTC, during which...
Calculating Sampling’s Impact on SLOs and More
What do mall food courts and Honeycomb have in common? We both love sampling! Not only do we recommend it to many of our customers,...
Honeycomb + Tracetest: Observability-Driven Development
Our friends at Tracetest recently released an integration with Honeycomb that allows you to build end-to-end and integration tests, powered by your existing distributed traces....
Observability and the DORA metrics
The Accelerate State of Devops Report highlights four key metrics (known as the DORA metrics, for DevOps Research & Assessment) that distinguish high-performing software organizations:...
Infinite Retention with OpenTelemetry and Honeycomb
Honeycomb is massively powerful at delivering detailed answers from the last several weeks of system telemetry within seconds. It keeps you in the flow state...
Reducing Mean Time to Diagnosis: How Salary Finance Uses Honeycomb to Ask the Right Questions
Salary Finance is a UK-based financial well-being employee benefit program. Over the last seven years, the company grew from a startup to a scaleup, earning...
Trace Propagation and Public API Endpoints in .NET: Part 1 (Disable All)
One of the issues with the W3C trace context is that it doesn’t define any standards for how far a trace is to propagate. If...
Anything But Tech Debt
Engineers often feel they aren’t allowed enough time to address tech debt. Product partners wonder why engineers spend so much time working on it—or at...
Automatic Instrumentation for OpenTelemetry Go
The OpenTelemetry Go project now supports automatic instrumentation via eBPF! This is a big milestone for the project and makes it significantly easier to generate...
The Evolution of Sampling in Honeycomb: Introducing Refinery 2.0
It's rare to have too much telemetry—it's not often that someone says "I wish I didn't have all this information!" However, telemetry is data, and...
Incident Management Steps and Best Practices
Incident management is the way an organization reacts to any kind of outage (security, broken code, severe weather, or anything that’s disruptive to customer service)....
Evolving by Involving
In this post, we’re going to lay out the guiding principle that unifies the diverse world of CS as we see it—and show how we...
On Becoming a VP of Engineering, Part 2: Doing the Job
Charity once said an off-hand sentence that became a mantra for my transition into the VP of Engineering role: “Directors run the company.” This was...
Observing Core Web Vitals with OpenTelemetry
Each CWV measures a specific part of the end user experience. CWV scores can help identify gaps in web page performance. Additionally, Google uses CWV...
On Becoming a VP of Engineering, Part 1: The Path to VP
In February of 2020, I was promoted from Director of Engineering to Honeycomb’s first VP of Engineering. Although Charity wrote an extremely generous public announcement,...