Getting Started with OpenTelemetry
Getting Started with OpenTelemetry
Getting Started with OpenTelemetry
Table of contents
No one ever said observability would be easy, right? But OpenTelemetry is a significant step towards “easier” observability because it’s an open source standard that makes it simple to collect and transport valuable high-cardinality data to any backend tool. With extensive language support and an option to auto-instrument, teams wanting to more fully embrace observability can jump right in.
What is OpenTelemetry?
OpenTelemetry is an open source project that is the result of a merger between two earlier open source efforts, OpenTracing (part of the Cloud Native Computing Foundation, or CNCF) and OpenCensus (a community project from Google Open Source).
OpenTracing created an agnostic API that would work with any observability solution, and OpenCensus offered libraries for specific languages that developers could use to set up the data transfer. In May 2019, the two efforts were brought into a new CNCF open source venture: OpenTelemetry, also known as OTel.
Think of OTel as a vendor-agnostic instrumentation tool organizations can use to extract high-cardinality telemetry data. OTel has three main pieces:
- An extensive library of language-specific APIs and SDKs that can be used to create and send telemetry data
- The OTel Collector, which can take information from the OTel SDKs (and other data sources) and then export it to backends including Jaeger, Prometheus, and Kafka
- The OpenTelemetry Protocol (OTLP), which is a vendor-agnostic way to send metric, log, and trace data
OTel effectively streamlines the process of collecting application data for analysis, freeing up development and operations resources to problem solve rather than continually re-instrument.
It’s important to understand that OpenTelemetry, by itself, is not an observability solution: it’s a way to collect and transport telemetry data to be used with an observability solution. To get the most out of this data, teams must use OTel to connect to an observability platform that can handle and analyze high-cardinality data.
And although OTel brings all the benefits of vendor-agnostic standards, teams may face a steep learning curve when getting started.
What is telemetry?
Teams wanting to record and transmit health and performance data in an automated fashion use telemetry to do so. Telemetry data is generally broken down into three types: traces, metrics, and logs.
Traces
In the world of observability, traces are what enables teams to ask questions they’ve never thought of before, or what many call the “unknown unknowns.” Traces reflect a complete unit of work, and they’re made up of spans, which represent each step taken. Spans contain important data that help identify which part of the service they represent, the timeframes involved (start time and duration), as well as a unique identifier, and the name of the trace.
Teams use distributed tracing to follow and dissect how requests move through a distributed system. When they implement distributed tracing correctly, they will have access to high-cardinality or high-dimensionality data, which provides users with the granular context needed to identify the source of complex issues as user requests flow through their systems.
Metrics
Metrics are the result of teams deciding a priori where to look at system performance by asking very specific questions, such as “What is the latency when an end user adds something to a shopping cart?” Metrics are of limited use, because they require teams to know in advance where something will go wrong. Application failures are rarely predictable, meaning metrics aren’t very helpful in finding and diagnosing a problem.
Also, the more complex and distributed a system is, the less likely metrics will be useful for incident resolution; a predefined set of metrics is going to miss issues happening with a small subset of users because they can’t ask questions they don’t know to ask.
In sharp contrast with traces, metrics allow teams to ask about “known unknowns,” i.e. things they may have seen before.
Logs
Logs are the place for teams to look at all the things if they have endless time and patience. Logs are tough to wade through and using them for debugging can feel like looking for the proverbial needle in the haystack. They’re huge, hard and expensive to store, and rarely do two teams keep logs in the same way, meaning the data and formats aren’t standardized. That “unstructured” nature of most logs is a significant barrier to efficient use. We do understand there is a use case regarding compliance. But when it comes to debugging and observability to understand complex systems, they are too detailed.
OpenTelemetry architecture and components
Automatic Instrumentation
In OTel, there are two types of instrumentation: automatic, which is more straightforward but provides less rich results, or manual, which is the preferred choice for teams wanting to squeeze every last drop out of their high-cardinality data. It is also possible to double down and install OTel both automatically and manually: auto-instrumentation is a quick way to start looking at the data, which can be useful while working on richer, more complex manual instrumentation.
Whether it’s automatic or manual, OTel brings the “one and done” to observability: instrument once and send data to one, or several, vendors seamlessly.
Depending on the choice of programming language, teams may be able to utilize OTel support agents that can leverage app libraries and any dependencies, creating automatic instrumentation by installing the OTel APIs and SDKs. OTel has several agents including options for Java, .NET, PHP, Python, and Ruby. A number of instrumentation libraries are also available, including a “metapackage” for Node.js that streamlines instrumentation.
Teams using Kubernetes have the option to use OTel’s Kubernetes (K8s) Operator to manage the OpenTelemetry Collector. The K8s Operator will allow for auto-instrumentation of any OTel workloads.
And finally, OTel auto-instrumentation is going to look familiar to teams that have used an application performance monitoring (APM) tool in the past.
Manual instrumentation with SDKs
Engineering teams can choose to manually instrument OTel. Manual instrumentation is definitely a tradeoff for most organizations: it requires substantially more time and resources than going the automatic route, but it will also enable richer data and the ability to do very deep dives into system data. If the goal is to surface all the data and marry it to the business context, manual instrumentation is the best choice.
In order to manually instrument an application to work with OTel, teams can choose an SDK (currently, 11 languages are supported) and use the standard OTel API, which will make it seamless to switch SDKs as needed. The SDK makes it easy to sort and export the data to the OTel Collector or other end points as needed.
OpenTelemetry Collector
The OpenTelemetry Collector is like a Swiss Army Knife: it has everything needed to collect data from different sources, process that data in different ways, and export it all to as many locations as required. That’s all a long way of saying the OTel Collector is endlessly configurable.
The OTel Collector is made up of three parts: receivers, processors, and exporters. The receivers take in data from a variety of sources and formats, like OTLP, Jaeger, and Zipkin. There are too many receivers to list. The processors deal with the data—aggregating, sampling, filtering, and processing logic—and can be chained together to tackle more complex datasets. Exporters send the data to telemetry backends.
What are the benefits of OpenTelemetry?
OpenTelemetry offers a uniform and flexible solution for organizations looking to get the most out of high-cardinality data, and it offers a number of tangible benefits, including:
- Standards: A “standard” API means developers can more easily build traces, logs, and metrics into their applications.
- Vendor-agnostic: Forget vendor lock-in, OTel works with any telemetry solution, so teams can change things up as necessary.
- Quick start: OTel’s out-of-the-box auto-instrumentation options for Java, Ruby, .NET, and many more mean it’s possible to start mining data quickly.
- Bespoke: OpenTelemetry lets teams decide what data and metrics matter for them, so it’s possible to work with high-cardinality data in any way that makes sense.
- A supportive community: As the second most popular CNCF project (Kubernetes is the first), OpenTelemetry benefits from a massive and highly active community of contributors and users.
- Futureproof: Because OTel eliminates vendor lock-in and is completely customizable, it can grow and change along with engineering teams.
- Transparency: Every organization wants to get the most insight possible with their observability solution, and OTel’s comprehensiveness and flexibility offers just that.
Challenges of OpenTelemetry
As groundbreaking as OpenTelemtry is, we’d be remiss if we didn’t suggest some patience is required to get started. Instrumentation, especially manual instrumentation, can be daunting—but the payoff in data richness is worth it. There is also the potential issue of having too much data and not knowing what to do with it, or having the proper observability solution that can handle it.
Additionally, the OTel Collector will require teams to spend time and resources on maintenance and updates. All that being said, OpenTelemetry is constantly evolving, with the community of contributors regularly adding new language support and documentation.
OpenTelemetry best practices (a few bullet points)
Here are four ways to get the most out of OTel.
- Put in the effort: Although manual instrumentation can be more of a challenge to set up, the end result is worth it. Manual instrumentation allows you to set your OTel configuration to deliver the data you want and need for observability into your unique applications.
- The OTel Collector is your friend: It not only works with existing data and manages secrets, but it allows teams access to data from other sources.
- Sample the data: OTel is going to bring all. the. data. That will likely be way too much, so cut through the noise by sampling.
- Get active: Join the OTel community and get involved. The CNCF, as well as observability solutions that integrate with OpenTelemetry, host regular workshops and events suited for both brand new users and experts.
Honeycomb’s commitment to OpenTelemetry
Honeycomb supports and contributes to OpenTelemetry. Teams that get the most value from Honeycomb instrument their code as a standard practice: it’s a way of telling their future selves what the code is doing, making it simple to explore in Honeycomb later.
If you have already instrumented your code, send your telemetry to Honeycomb directly or configure an OTel collector to do so. Alternatively, Honeycomb provides OTel SDK distributions for several languages that help you start sending data directly to Honeycomb.
We have hundreds of teams today using OpenTelemetry and Honeycomb. We’re able to bring a different mentality in the way we are able to run and manage our production systems. Everyone wants to join the movement.
Rich Anakor, Chief Solutions Architect, Vanguard
Conclusion
OpenTelemetry is the way the observability world is headed. For teams instrumenting for the first time, or just instrumenting new code, starting with OpenTelemetry simply makes sense. Teams on a different path should likely consider their migration path to OTel, even if it is going to take some time. The standard is still evolving, and some work must be done, but when paired with the right observability solution, will give you unparalleled insight into your systems and applications.