Comprehensive observability starts with good instrumentation. OpenTelemetry, aka “OTel,” sets a unified standard, enabling you to instrument your applications once, then send that data to any backend observability tool of choice. OpenTelemetry’s standard for generating and ingesting telemetry data is slated to become as ubiquitous as current container orchestration standards. Because of this, development teams are increasingly adopting OpenTelemetry to their applications.
If you’ve done your homework on OpenTelemetry, you’ve most likely thought, “an open standard for every type of telemetry data? One that makes observability possible without locking us into one vendor? Let’s try it!” So, what happens next? How do you get started, and what sorts of things should you watch out for? This is the million dollar question, and one that you are not alone in asking.
At Honeycomb, we’re all-in on OpenTelemetry. It’s the best way to ingest high-cardinality and high-dimensional data for fast analysis, enabling users to gain observability into complex and distributed systems. Although there is a lot of information about OpenTelemetry on the web, the information about getting started is presented as a tactical step-by-step process. A bigger picture view of how to introduce it in a company—or to DevOps and Site Reliability Engineers—is more elusive. We all know the “what are the business outcomes?” question when it comes to getting organizational buy-in for new tools and methodologies.
This is exactly why we sat down for a roundtable (available on demand) with engineers from Campspot, Upgrade, and Jimdo, Take Control of Your Observability Data: An Engineering Roundtable on Lessons Learned with OpenTelemetry. We needed to hear first-hand adoption and implementation stories to paint a clear picture on how to get started with OpenTelemetry. So without further ado, let’s dive into three key takeaways.
Getting started with OpenTelemetry at Jimdo: go small before going big
Camal Cakar is a Senior Site Reliability Engineer at Jimdo, which offers small businesses a web authoring platform, analytics, and other tools, to build and strengthen their web and ecommerce presence. He explained that as Jimdo grew, its systems expanded significantly and became more complex—a scenario many of us find ourselves in. Soon, its teams and architecture were distributed around the world, and Jimdo realized that the practice of observability could deliver telemetry data needed to understand what was happening in these large, complex systems.
To get started, Cakar’s team hosted a hackathon to narrow in on a single observability vendor. The hackathon revealed the need for distributed tracing, which monitors and observes requests as they flow through distributed services or microservices-based applications. Distributed tracing provides visibility into each request and all its subsequent parts. As a result, it allows you to understand if your systems are behaving properly in production. And that’s how OpenTelemetry entered the picture. Its common set of APIs, SDKs, and wire protocol creates a single, well-supported integration surface for end-to-end distributed tracing telemetry is what intrigued Jimdo. Cakar’s team also liked the fact that OpenTelemetry is open-source and an open standard. To see if OpenTelemetry was the right solution, Jimdo decided to try it with one team, rather than an example repository, and instrumented the service.
This “start small” approach is ingrained in the Jimdo staff engineering philosophy. “The team saw that OpenTelemetry could work for them as a single source of truth for observability, especially tracing. The word spread, and soon Jimdo was able to test different observability vendors and compare interfaces for ingesting data,” explained Camal.
Fintech Upgrade gets started with OpenTelemetry to withdraw from slow tracing yields
Upgrade is a fintech company based in the US. It offers different financial products to help consumers better understand and manage their finances. As Director of Software Development at Upgrade, Pierre Lacerte leads three different teams working on platform initiatives. The story he shared about getting started with OpenTelemetry at Upgrade comes from a place of experience thanks to the years he’s been using OpenTelemetry.
When he joined Upgrade, tracing was already in play (gathering a common theme here?). The team used Istio for their service mesh and emitted tracing data to Jaeger. In addition, there was instrumentation on their Java apps that was pushed to Jaeger directly. Unfortunately, when querying Jaeger, the response time was slow and granularity was not as rich as it could have been. This frustrated engineers who wanted to quickly debug issues with Upgrade’s system.
To look for other solutions, Pierre and his team used a prototype in production. “We set up the OpenTelemetry Collector between all the trace data and Jaeger and connected it to different APM vendors. We also looked through the Collector libraries and origins,” Pierre said. The result was a side-by-side comparison of the vendors that Upgrade could use to discover which APM tool would work best for all-up observability if, for example, there were a production incident.
Campspot starts off with OpenTelemetry to untangle its tent strings from lock-in
Kristin Smith, DevOps Engineer at Campspot, shared that her start with OpenTelemetry came from a slightly different angle. A bad experience with vendor lock-in led Campspot, an online marketplace for camping reservations, to OpenTelemetry. Before investigating OpenTelemetry, Campspot was using a legacy APM solution they chose because it offered a one-click install. This was attractive at first, but Kristin and her team soon realized they needed to untangle from that solution because it didn’t deliver instrumentation that could persist across APM, monitoring, and observability tools.
“By contrast, OpenTelemetry fit the bill. I wish we could say we came from a pure ideological, ‘Yes, open source,’ place, but really it was kind of based on the breakup with our last APM provider,” Kristin admitted.
No matter the check-in time, OpenTelemetry opens the door to comprehensive observability
These three stories and lessons learned demonstrate that there are options when getting started with OpenTelemetry. For example, because OpenTelemetry has SDKs for 11 languages, you can use OpenTelemetry in many different codebases. OpenTelemetry also provides automatic instrumentation (either as an agent or packages), which lets you build a “base” of telemetry data to get some quick wins for very little effort. You can then add manual instrumentation to add context from your running application for even deeper observability.
Then there’s the OpenTelemetry Collector, an application that receives, processes, and exports telemetry to various destinations. It sits between your software services and observability tools and can be scaled to fit the needs of your traffic patterns. Since the OpenTelemetry Collector runs as a service, it can run it on a virtual machine (VM), run it with your workload (as a separate process), or a container. It’s helpful for pulling in different sources of data, pre-processing that data in various ways, and exporting to different locations depending on your needs. The OpenTelemetry Collector is your Swiss Army knife tool for telemetry data.
Ready for some sweet observability under your OTel pillow?
I’d be remiss if I didn’t tell you that all three of the panelists use OpenTelemetry and Honeycomb for observability, and they have learned some best practices along the way. Be sure to check out the discussion on demand and look for our next blog, where you’ll get some practical advice for implementing OpenTelemetry that includes those best practices—and other tips.
If you want to give Honeycomb a try, sign up for free to get started.