Since 2013, Monitorama has been a community-driven conference, bringing together open source development and operations engineers to focus on pushing the boundaries of monitoring software and practices. It’s chock full of thought-provoking content in the conference talks. The casual atmosphere also makes the hallway track a great way to network with fellow engineers and vendors alike to pick up on new developments in the monitoring space.
This year’s Monitorama marks a return to the (in-person!) event since 2019, following a pandemic-induced hiatus. We jotted down a few of our takeaways to share in case you missed it.
1. Monitorama is keen on observability, but struggling to differentiate it from monitoring
For both of us authors, this was our first Monitorama event, meaning that we didn’t have a prior frame of reference. But here’s what we saw. In 2018, Monitorama billed itself as an “Open Source Monitoring Conference & Hackathon.” By 2019, it expanded that charter to an “Inclusive Event for Monitoring and Observability Practitioners.” Then Covid happened and everything paused.
Some of the talks this year had been slated for the 2020 event and some new talks didn’t make it to this event, so it’s a bit hard to connect the dots. But with the return of Monitorama, the community is ostensibly into observability, with about half the content focused on it in some shape or form. Yet the subject matter in those talks demonstrates that there is still a long way to go when it comes to differentiating observability from monitoring.
Many of the “observability” talks still centered on reactive responses to alerts and how to search for answers in traditional logging and monitoring tools (albeit, newer and more powerful generations of those tools). Observability was often used as a synonym for monitoring, rather than as a distinctly different method of analysis for the same types of system and application telemetry data.
This take is obviously because of our Honeycomb bias. We’re so hyper-focused on sharper ways to proactively mine telemetry data for better insights that we can sometimes forget how groundbreaking the practice of observability really is. We frequently heard from attendees in the Honeycomb booth that they thought our approaches were years ahead of the curve, which is both delightful and disheartening. The future is here. Come join us! The doors are open. This future is for everyone!
Perhaps the biggest necessary shift and partial reason why the future seems so unevenly distributed has something to do with areas of responsibility. Our sense from the crowd is that developers were still primarily seen as people uninterested in reliability goals. Monitorama attendees skew toward Ops and SRE teams. Many attendees seemed more focused on understanding infrastructure and systems over application behavior. That divide still exists in daily practice, although nearly everyone we spoke to acknowledged that those divisions can’t be the way of the future. Observability takes an application-first approach to understanding system performance and that take is still (very slowly) starting to bubble up in the conference content.
2. OpenTelemetry awareness and adoption is growing
In his talk, OTel Me How to DIY Observability, Steve Flanders asked the room to raise their hands if they knew about OpenTelemetry, and about 90% of the audience did. He followed up by asking who is using it in production, and only about 10% kept their hands up. That gels with what we’ve heard at other conferences. To us, that signals that although it’s still early days for OTel, the momentum is well underway. We see a surging interest in OpenTelemetry from our own customers and we saw that at Monitorama as well.
Almost one year ago, we announced our intent to be all-in on OpenTelemetry. It was reaffirming to hear from conference-goers that the message was received. OpenTelemetry is the future, and folks told us they see Honeycomb as one of the leaders helping usher that future into existence. We connected with engineering teams and other vendors who were on board with that strategy and looking for ways to do more together. Moving and managing telemetry data was also a subtopic in several other conference talks and OpenTelemetry kept popping in as a viable solution. We’re happy to see the OpenTelemetry momentum resonating at Monitorama.
3. Distributed tracing is still misunderstood
In a particularly off-kilter talk, one speaker proclaimed that “distributed tracing is dead!” The premise of this talk was that tracing can be difficult to implement. We can definitely relate to that! Historically, it hasn’t been easy and that’s been a barrier to realizing its benefits for some teams. So the point of view in this talk was to eschew distributed tracing in favor of better logging.
The core advice in this talk was to enrich your logs with additional context to make them more useful when debugging. To get there, you should start by structuring your log data. Then, you should add custom attributes that include business logic in each step so that you can better understand the relevance and meaning of each log line, and its relationship to other units of work in your system.
We actually couldn’t agree with that advice more. At Honeycomb, we call that creating arbitrarily-wide structured events. And if you simply add trace_id
, parent_id
, span_id
, and duration
fields to that log line, you then get a distributed trace that lets you visualize the relationship between distinct events. Traces are just a series of interconnected logs. Unfortunately, this talk perpetuated misconceptions about distributed tracing, its current state of usability, and how useful it can be. Our takeaway was that instead of doing away with tracing, our industry should double down on making traces even easier to generate and interpret.
It just goes to show how much further we have to go in demonstrating why structured wide events are the fundamental building block of observability. It also shows how misconceptions about logs, traces, and metrics as separate data types in separate tools are holding back the monitoring industry from adopting observability practices.
4. SLOs are for developers too!
In her talk, The Little SLI That Could, Sophia Russell walked through a brilliant story around achieving organizational reliability goals. In trying to establish a culture around Service Level Objectives (SLOs), her organization took an “ops first” approach that ultimately failed to garner the adoption and ownership needed to make it successful.
They took that approach back to the drawing board and built out a simple framework in user code (based in Ruby and Java) that made it trivial for software developers to write the Service Level Indicators (SLIs) that, in turn, helped them understand and own their SLOs. As a result, they now have a strong SLO culture across the organization.
What was profound about this talk wasn’t that she’d found a way to sneak SLOs into her developers’ brains. Rather, she understood that if you meet developers where they are, it’s a lot easier to incentivize them to have reliability goals. Her talk emphasized this point a lot, with loads of tips on how to get developers on board with SLOs, and how to help them track progress in their observability tool.
Sometimes, a lot of focus is placed on avoiding writing code to achieve observability. But the reality is that software engineers aren’t afraid to write a few lines of code so long as that code is easy to understand, valuable, and the task isn’t onerous. Sophia’s talk demonstrated this so well, that we found it to be very inspiring.
At Honeycomb, we believe Service Level Objectives provide a much better framework for actionable alerts that work well in tandem with observability practices. We also believe that reliability can’t be achieved unless developers and operators are working in tandem to proactively improve both application and system performance. We’re thrilled to see more strides being made that better enable developer adoption of SLOs.
5. Sustainability and carbon impact reporting are about to go mainstream
One of the more visionary talks at Monitorama 2022 came from (perhaps unsurprisingly) Adrian Cockroft, who is now retired and consulting on environmental sustainability projects. In his talk, Monitoring Carbon, he unpacked the complex math used to generate current measures around the carbon footprint of cloud computing. There are myriad challenges and concepts to consider when thinking about energy consumption and the carbon generated by the tech sector via the equipment we use.
New regulations in the European Union (and soon, likely, the US) are making the reporting and optimization of carbon emissions from computing infrastructure an emerging mainstream concern. Adrian argued that measuring carbon will become as ubiquitous a signal in monitoring as other concerns like throughput, latency, utilization, capacity, and cost. His talk covered the current state of generating measurements, the tremendous complexity behind simple measurement concepts, and what the coming years of simplifying that complexity might look like. We’re eager to see where this coming industry trend will go and we may already have some ideas about what that means in terms of observing environmental impact on a per-request level.
Conclusion
Hopefully, this gives you a high-level feel for what’s happening at Monitorama. It continues to be one of the premier events to learn about evolving monitoring practices. The chance to meet and network with developers and engineers pushing the boundaries of software and practices is unparalleled. And there’s something absolutely magical about Portland and the Pacific Northwest in the summer. We look forward to next year’s event and to seeing how the uptake of observability concepts continues to percolate.
By the way—if you’re interested in meeting us, the next conference we’ll be at is AWS Summit NYC. That’s tomorrow! Come see us at our booth. We’d love to meet you.
This article was written in collaboration with Phillip Carter and George Miranda.