Software Engineering
Dogfooding
Debugging
Always. Enable. Keepalives.
As part of our recent failure testing project, we ran into an interesting failure mode involving the OpenTelemetry SDK for Go. In this post, we’ll...
Software Engineering
Dogfooding
Destroy on Friday: The Big Day 🧨 A Chaos Engineering Experiment - Part 2
In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen....
Software Engineering
Dogfooding
Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment - Part 1
We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in...
Incident Response
Should Every Incident Get a Retro?
At a recent training session, Jeli spent a great deal of time covering incident retrospectives and what makes an incident worthy of studying. My colleague...
Software Engineering
Culture
The Incident Retrospective Ground Rules
I joined Honeycomb as a Staff Site Reliability Engineer (SRE) midway through September, and it’s been a wild ride so far. One thing I was...