Blog

Posts by Fred Hebert

Fred Hebert

Staff Site Reliability Engineer

Fred is a Staff Site Reliability Engineer (SRE) who has worked as a software engineer for over a decade and ended up with a healthy dislike of computers and clumsy automation. He’s a published technical author who loves distributed systems, systems engineering, and has a strong interest in resilience engineering and human factors.

Debugging

Incident Review: Working as Designed, But Still Failing

A few weeks ago, we had a couple of incidents that ended up impacting query performance and alerting via triggers and SLOs. These incidents were...

Service Level Objectives Culture

On Counting Alerts

A while ago, I wrote about how we track on-call health, and I heard from various people about how “expecting to be woken up” can...

Best Practices

Tracking On-Call Health

If you have an on-call rotation, you want it to be a healthy one. But this is sort of hard to measure because it has...

Best Practices

OnCallogy Sessions

Being on call is challenging. It’s signing up to be operating complex services in a totally interruptible manner, at all hours of the day or...

Software Engineering

On the Brittleness of Dashboards

Dashboards are one of the most basic and popular tools software engineers use to operate their systems. In this post, I'll make the argument that...

Software Engineering

How We Define SRE Work

At the time of writing this post, I have officially been at Honeycomb for one year as a site reliability engineer (SRE). I had shared...

Software Engineering Debugging

Incident Resolution: Do You Remember, the Twenty Fires of September?

From September to early October, Honeycomb declared five public incidents. Internally, the whole month was part of a broader operational burden, where over 20 different...

Service Level Objectives Dogfooding Databases

Data Availability Isn’t Observability

But it’s better than nothing... Most of the industry is racing to adopt better observability practices, and they’re discovering lots of power in being able...

Software Engineering

Lessons Learned From the Migration to Confluent Kafka

Over the last few months, Honeycomb’s platform team migrated to a new iteration of our ingest pipeline for customer events. Our migration to this newer...

Teams & Collaboration Observability Featured

On Not Being a Cog in the Machine

This is my first week here as the first dedicated SRE for Honeycomb, and in a welcoming gesture, I was asked if I wanted to...

« 1 2

All-in-one Observability

Why Honeycomb

Looking for something?

Our mission

Blog

Posts by Fred Hebert

Fred Hebert

Staff Site Reliability Engineer

Incident Review: Working as Designed, But Still Failing

On Counting Alerts

Tracking On-Call Health

OnCallogy Sessions

On the Brittleness of Dashboards

How We Define SRE Work

Incident Resolution: Do You Remember, the Twenty Fires of September?

Data Availability Isn’t Observability

Lessons Learned From the Migration to Confluent Kafka

On Not Being a Cog in the Machine

Ready to get started?