Blog

Category: Incident Response

Operations Incident Response

Restructuring How We Think About Alerts

Back in Alerts Are Fundamentally Messy, I made the point that the events we monitor are often fuzzy and uncertain. To make a distinction between...

Teams & Collaboration Incident Response

Against Incident Severities and in Favor of Incident Types

About a year ago, Honeycomb kicked off an internal experiment to structure how we do incident response. We looked at the usual severity-based approach (usually...

Teams & Collaboration Incident Response

Syncing PagerDuty Schedules to Slack Groups

We’ve posted before about how engineers on call at Honeycomb aren’t expected to do project work, and that whenever they’re not dealing with interruptions, they’re...

Software Engineering Incident Response Dogfooding

Making Room for Some Lint

It’s one of my strongly held beliefs that errors are constructed, not discovered. However we frame an incident’s causes, contributing factors, and context ends up...

Incident Response

Negotiating Priorities Around Incident Investigations

There are countless challenges around incident investigations and reports. Aside from sensitive situations revolving around blame and corrections, tricky problems come up when having discussions...

Service Level Objectives Incident Response

Alerts Are Fundamentally Messy

Good alerting hygiene consists of a few components: chasing down alert conditions, reflecting on incidents, and thinking of what makes a signal good or bad....

Incident Response

Incident Review: What Comes Up Must First Go Down

On July 25th, 2023, we experienced a total Honeycomb outage. It impacted all user-facing components from 1:40 p.m. UTC to 2:48 p.m. UTC, during which...

Incident Response

Incident Management Steps and Best Practices

Incident management is the way an organization reacts to any kind of outage (security, broken code, severe weather, or anything that’s disruptive to customer service)....

Incident Response

There Are No Repeat Incidents

People seem to struggle with the idea that there are no repeat incidents. It is very easy and natural to see two distinct outages, with...

Incident Response

Should Every Incident Get a Retro?

At a recent training session, Jeli spent a great deal of time covering incident retrospectives and what makes an incident worthy of studying. My colleague...

Incident Response

How We Manage Incident Response at Honeycomb

When I joined Honeycomb two years ago, we were entering a phase of growth where we could no longer expect to have the time to...

Incident Response

Counting Forest Fires: Incident Response Metrics

There are limits to what individuals or teams on the ground can do, and while counting fires or their acreage can be useful to know...

Incident Response Debugging

Solving a Murder Mystery

Bugs can remain dormant in a system for a long time, until they suddenly manifest themselves in weird and unexpected ways. The deeper in the...

Software Engineering Operations Incident Response Debugging

Incident Report: The Missing Trigger Notification Emails

On November 18, between 00:50 and 00:56 UTC, an update was deployed that improved Honeycomb’s business intelligence (BI) telemetry available from our production operations environment....

Operations Incident Response Dogfooding Debugging

Incident Report: Investigating an Incident That's Already Resolved

Summary On the 23rd of April, we discovered that an incident had occurred approximately one week earlier. On April 16, for approximately 1.5 hours we...

1 2 »

All-in-one Observability

Why Honeycomb

Looking for something?

Our mission

Blog

Category: Incident Response

Restructuring How We Think About Alerts

Against Incident Severities and in Favor of Incident Types

Syncing PagerDuty Schedules to Slack Groups

Making Room for Some Lint

Negotiating Priorities Around Incident Investigations

Alerts Are Fundamentally Messy

Incident Review: What Comes Up Must First Go Down

Incident Management Steps and Best Practices

There Are No Repeat Incidents

Should Every Incident Get a Retro?

How We Manage Incident Response at Honeycomb

Counting Forest Fires: Incident Response Metrics

Solving a Murder Mystery

Incident Report: The Missing Trigger Notification Emails

Incident Report: Investigating an Incident That's Already Resolved

Ready to get started?