Incident Report: The Missing Trigger Notification Emails
On November 18, between 00:50 and 00:56 UTC, an update was deployed that improved Honeycomb’s business intelligence (BI) telemetry available from our production operations environment....
Incident Resolution: Do You Remember, the Twenty Fires of September?
From September to early October, Honeycomb declared five public incidents. Internally, the whole month was part of a broader operational burden, where over 20 different...
Game Launches Should Be Exciting for Your Players, Not for Your LiveOps Team
This blog was co-authored by Amy Davis. The moment of launching something new at a game studio (titles, experiences, features, subscriptions) is a blockbuster moment...
Lessons Learned From the Migration to Confluent Kafka
Over the last few months, Honeycomb’s platform team migrated to a new iteration of our ingest pipeline for customer events. Our migration to this newer...
The Future of Developer Careers
While JavaScript frameworks come and go, a change has been brewing over the last several years that will permanently change what it means to be...
Incident Review: Meta-Review, August 2020
Every once in a while, teams or systems hit an inflection point where enough things change at once and the pattern of incidents shifts. We...
Bees Working Together: How ecobee’s Engineers Adopted Honeycomb
At ecobee, adopting Honeycomb started as a grassroots effort. Engineers signed up for the free tier and quickly started sharing insights with teammates. When it...
Observability: 80% Practicing in the Next 2 Years
Observability is more than tooling. Of course having the right tools in place so you can ask arbitrary questions about your environment, without having to...
Bring Test Engineering into your DevOps practice
What do a test engineer and a DevOps or SRE team member have in common? The reality is that different teams need to proactively understand...
Getting At The Good Stuff: How To Sample Traces in Honeycomb
(This is the first post by our new head of Customer Success, Irving.) Sampling is a must for applications at scale; it’s a technique for...
How To Make Your Customers Happy, with Eaze
"Success is a catastrophe that you have to survive." -- CJ Silverio A couple of weeks ago I had the great pleasure of hosting CJ...
From "Secondary Storage" To Just "Storage": A Tale of Lambdas, LZ4, and Garbage Collection
When we introduced Secondary Storage two years ago, it was a deliberate compromise between economy and performance. Compared to Honeycomb’s primary NVMe storage attached to...
Incident Report: Running Dry on Memory Without Noticing
On November 6, 2019, we intermittently rejected 1-3% of customer telemetry data at ingest for four periods of 20 minutes each. The trigger of the...
OpenTelemetry vs OpenTracing | Understanding OpenCensus and Related Terms
There’s been a fair bit of buzz lately about OpenTelemetry, which is the next major version of the OpenTracing and OpenCensus projects. The leadership of...
Working Toward Service Level Objectives (SLOs), Part 1
In theory, Honeycomb is always up. Our servers run without hiccups, our user interface loads rapidly and is highly responsive, and our query engine is...