You may notice that we don’t talk about “monitoring” much, and that’s because we don’t really think of monitoring as what we do, even though it kind of is.
Traditional monitoring relies heavily on predicting how a system may fail and checking for those failures. Traditional graphing involves generating big grids of dashboards that sit on your desktop or your wall, and give you a sense of the health of your system.
That’s not what we do.
Honeycomb is what you do when your monitoring ends.
You still need some simple end-to-end checks for your KPIs, and monitoring for key code paths, but don’t try to “monitor everything” because it’s noisy and impossible. One of two things will happen:
- Your e2e checks will tell you that one of your KPIs is not within acceptable range. (So you jump into honeycomb and start asking questions.)
- A user tells you something is off, but your KPIs are all within acceptable ranges. (So you jump in to honeycomb and start asking questions.)
You can start at the edge and drill down, or start at the end and trace back; either way, within a half dozen clicks you can usually identify the source of the anomaly.
Honeycomb is a debugger for live production systems. Honeycomb is to your systems like your IDE is to your code. For sufficiently complex systems, you should probably spend roughly equal time in both.
Honeycomb lets you ask questions of your live systems, swiftly and interactively. Often you will spot an outlier or correlation that you weren’t expecting – something that never happens when you’re doing intuition-based debugging instead of data-driven debugging. Validate your guesses swiftly and move on.
This will make you a better engineer. :)
Systems are getting more complex every day, outstripping our ability to predict failure conditions or alert on suspicious symptoms. Stop trying to debug via intuition or what you can remember about past outages. Just keep it simple:
- Alert on KPIs
- Instrument your systems
- Ask lots of questions