Get all your observability data in one unified platform with limitless possibilities.
Discover why Honeycomb is the better choice for your engineers, your customers, and your bottom line.
Explore our latest blogs, guides, training videos, and more.
Give all software engineering teams the observability they need to eliminate toil and delight their users.
Nick Travaglini | Aug 15, 2024
Alerts are a perennial topic, and a CoPE will need to engage with them. The bounds of this problem space are formed by two types of alerts: Reactive alerts (in Honeycomb, we call these Triggers): They are alerts that fire after some event, like crossing a pre-determined boundary. Proactive alerts (Burn Alerts based on Honeycomb’s SLO feature): These give notice before crossing a threshold; in the case of SLOs, that means before failing to meet the stated objective.
Nick Travaglini | Aug 08, 2024
The previous post laid out the basic idea of instrumentation and how OpenTelemetry’s auto-instrumentation can get teams started. However, you can’t rely only on auto-instrumentation. This post will discuss the limitations in more detail and how a CoPE can help teams overcome them.
Martin Thwaites | Aug 07, 2024
The Collector is the focal point for telemetry inside your cluster. Instead of your containerized applications sending directly to your OpenTelemetry-capable backend (the place that allows you to ask questions of your telemetry), we send that data to an internal location first, then forward the data on.
Max Aguirre | Aug 05, 2024
“How is my app performing?” is one of the most common, yet hardest questions to answer. There are myriad ways to measure this, like error rate, average response time, and so on. Enter the Application Performance Index (aka Apdex), a single metric that attempts to answer, “Are my application’s users happy?”
Fred Hebert | Jul 29, 2024
It’s one of my strongly held beliefs that errors are constructed, not discovered. However we frame an incident’s causes, contributing factors, and context ends up influencing the shape of the corrective items (if any) that get created. I’ll cover these ideas by using our June 3rd incident where a database migration caused a large outage by locking up a shared database and making it run out of connections.
Nick Travaglini | Jul 25, 2024
The CoPE is made to affect, meaning change, how things work. The disruption it produces is a feature, not a bug. That disruption pushes things away from a locally optimal, comfortable state that generates diminishing returns. It sets things on a course of exploration to find new terrains which may benefit it more—and for longer.
Lex Neva | Jul 23, 2024
In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen. This is part two, where I go over the big day. How did our chaos engineering experiment go? Find out below!
Ruthie Irvin | Jul 18, 2024
Software changes so rapidly that developing on the cutting edge of it cannot fall to a single person. When it comes to asynchronously disseminating information about projects, code comments, PR conversations, Slack, RFCs, and other investigatory documents do a wonderful job, but no amount of async communication replaces the magic of two brains bouncing ideas off of each other.
Lex Neva | Jul 16, 2024
We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in our production environment using AWS’s Fault Injection Service. You might be wondering why the heck we did something so drastic. In this post, we’ll go over why we did it and how we made sure that it wouldn’t impact our service.
Rox Williams | Jul 15, 2024
Transitioning from a monolithic system to a cloud-native microservices environment, Ritchie Bros. sought to modernize their observability infrastructure to support the transition and fuel future growth.
Nick Travaglini | Jul 10, 2024
Getting the right people working in the CoPE is crucial to success because these change agents must limber up the organization and promote the flexibility necessary to perform resilience.
Liz Fong-Jones | Jul 09, 2024
Two years ago, we shared our experiences with adopting AWS Graviton3 and our enthusiasm for the future of AWS Graviton and Arm. Once again, we're privileged to share our experiences as a launch customer of the Amazon EC2 R8g instances powered by AWS Graviton4, the newest generation of AWS Graviton processors.