Blog

Posts by Lex Neva

LLMs

AIOps: Prove It!

I’ve read a steadily increasing stream of articles about using AI in SRE, and I have yet to find one that inspires my trust. Each...

Software Engineering Dogfooding Debugging

Always. Enable. Keepalives.

As part of our recent failure testing project, we ran into an interesting failure mode involving the OpenTelemetry SDK for Go. In this post, we’ll...

Software Engineering Dogfooding

Destroy on Friday: The Big Day 🧨 A Chaos Engineering Experiment - Part 2

In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen....

Software Engineering Dogfooding

Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment - Part 1

We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in...

Incident Response

Should Every Incident Get a Retro?

At a recent training session, Jeli spent a great deal of time covering incident retrospectives and what makes an incident worthy of studying. My colleague...

Software Engineering Culture

The Incident Retrospective Ground Rules

I joined Honeycomb as a Staff Site Reliability Engineer (SRE) midway through September, and it’s been a wild ride so far. One thing I was...

All-in-one Observability

Why Honeycomb

Looking for something?

Our mission