OpenTelemetry Metrics Explained: A Guide for Engineers

OpenTelemetry Metrics Explained: A Guide for Engineers

7 Min. Read

OpenTelemetry (often abbreviated as OTel) is the golden standard observability framework, allowing users to collect, process, and export telemetry data from their systems. OpenTelemetry’s framework is organized into distinct signals, each offering an aspect of observability. Among these signals, OpenTelemetry metrics are crucial in helping engineers understand their systems. In this blog, we’ll explore OpenTelemetry metrics, how they work, and how to use them effectively to ensure your systems and applications run smoothly. 

What are OpenTelemetry metrics?

OpenTelemetry metrics are quantitative measurements that provide insights into your systems’ performance and behavior. Metrics capture numerical data over time, such as CPU usage, memory consumption, request rates, or error counts. They allow you to track trends, detect anomalies, and gain visibility into your systems’ health.

Metrics provide snapshots of data that you can use for actionable insights. Unlike logs, which offer detailed event-specific information, metrics provide a higher-level overview of system performance. These aggregated data points are valuable for detecting system issues and prompting data-driven decisions. 


Read the whitepaper: How OpenTelemetry and Semantic Telemetry Will Reshape Observability


Key components of OpenTelemetry metrics

OpenTelemetry metrics are defined to provide a standardized and flexible method for measuring and analyzing system performance. The OpenTelemetry client architecture outlines signals and what they include at the minimum. The following are the building blocks that make up OpenTelemetry metrics:

  1. Instruments: Instruments are the primary tools for recording metric data. 
  2. Measurement: Measurements represent a single recorded value from an instrument. 
  3. Aggregations: Aggregations define how raw measurements are combined to produce meaningful statistics, such as:
    1. Sum: The total of all recorded values.
    2. Count: The number of recorded measurements.
    3. LastValue: The most recent recorded value.
    4. Histogram: The distribution of recording values, including the min, max, and bucketed counts.
  4. Resource: Contextual information about the environment where the metrics are collected, such as host, application/service, or other system details.
  5. API: The Metrics API provides an interface for instruments and measurements. It defines components like a meter used to create and manage instruments, such as an asynchronous counter. 
  6. SDK: The Metrics SDK processes and exports metric data:
    1. Processors: Handle the logic for metric data aggregation. 
    2. Exporters: Sends processed metrics to external backends, like Prometheus or Honeycomb
  7. Semantic Conventions: Semantic Conventions ensure consistency/conventions across applications and systems for naming metrics and descriptors. (example: http.server.duration for HTTP request duration).
  8. Context propagation: Context propagation correlates metrics with related traces and operations. 

These key components provide a standardized and scalable framework for capturing and analyzing metrics. 

Types of OpenTelemetry instruments

OpenTelemetry supports several metric instruments, such as counters, gauges, and histograms. Instruments come in synchronous and asynchronous implementations and cover various use cases for your metric needs. Here is a detailed overview of those instruments.

Counters

Counters are instruments that measure monotonic incrementing values. They’re used to count occurrences of events, such as the number of requests served. 

# Create A Counter
meter = MeterProvider().get_meter("example-meter")
request_counter = meter.create_counter("api_requests", description="Counts API requests")


#Add Count
request_counter.add(1, attributes={"endpoint": "/home"})

Use cases

  • Counting events, like the volume of data processed.
  • Tacking cumulative totals, like the total number of API requests serviced by a service.
  • Measuring successes or failures, like the number of successful transactions or failed requests.

UpDownCounters

UpDownCounters are similar to counters but allow for both increments and decrements to the measured value. They’re useful for tracking values that can increase and decrease over time, such as active users or queue sizes. 

# Create An UpDownCounter
up_down_counter = meter.create_up_down_counter("active_users", description="Tracks active users")

# Record
up_down_counter.add(1, attributes={"region": "us-east-1"})
up_down_counter.add(-1, attributes={"region": "us-east-1"})

Use cases: 

  • Tracking the number of active users in a system or service.
  • Monitoring the current number of active tasks in a job queue.
  • Measuring concurrent connections or running processes.

Gauges

Gauges measure a value that represents the current value at the time of recording. This value can go up or down. They are used to record non-additive values, where aggregating the value would not make sense. For example, a value such as temperature or memory usage on a system wouldn’t make sense in an aggregate. 

# Create A Gauge
gauge = meter.create_observable_gauge("memory_usage", description="Current memory usage")


# Record
def observe_callback(result):
  result.observe(get_memory_usage(), attributes={"region": "us-east-1"})
gauge.add_callback(observe_callback)

Use cases

  • Tracking the current state, like current CPU utilization, and number of active sessions or connections.
  • Monitoring rates or speeds, like download speeds, transaction rates, and requests per second hitting a web server.

Histograms

Histograms compress raw event measures. They’re used to summarize data distributions. You can use histograms for insights into the shape, range, or spread of your data. 

# Python
latency_histogram = meter.create_histogram("response_latency", description="Response latency in ms")
# Record operation
latency_histogram.record(250, attributes={"endpoint": "/login"})

Use cases

  • Measuring latency across various percentiles or ranges (0-50 or 50-100ms, etc…).
  • Counting the number of requests in various size categories.
  • Analyzing how much memory is being used in predefined ranges.
  • Understanding the distribution of bandwidth usage across devices or time increments

As a summary

  • Use counters for tracking cumulative totals (e.g., total requests).
  • Use UpDownCounters for tracking values that can increase and decrease (e.g., number of active users).
  • Use gauges for measuring real-time values (e.g., current CPU usage).
  • Use histograms for understanding distributions (e.g., request latencies).

When choosing metrics

Metrics work best for low-cardinality data or aggregated trends analysis. Choosing the right metric type depends on the data you want to capture. 

  • Don’t use counters for distributions or instantaneous values. 
  • Don’t use gauges for capturing values summed or distributed over time. 
  • Don’t use histograms when you can use a counter or average. 
  • If you can’t find meaningful bucket boundaries for the metric you are measuring, don’t default to using a histogram. 

There are other ways to improve your observability. We recommend not only using metrics, but also traces and logs.

How OpenTelemetry metrics work with Honeycomb

Honeycomb integrates seamlessly with OpenTelemetry, allowing you to export metrics for observability. Follow these steps to get started using OpenTelemetry metrics with Honeycomb:

  1. Install the OpenTelemetry SDK: Use the OTel SDK for the language runtime of your system.
  2. Configure the Collector: Install and configure the OpenTelemetry Collector to act as a pipeline for your telemetry data and add Honeycomb’s exporter to the Collector’s configuration.
  3. Instrument your code: Use the OpenTelemetry API to create and record metrics in your system.
  4. Visualize in Honeycomb: Once metrics are sent, use Honeycomb’s observability platform to visualize trends, detect anomalies, and learn more about your system.

For detailed instructions on sending data to Honeycomb, refer to Honeycomb’s OpenTelemetry documentation.

OpenTelemetry metrics best practices 

To get the most value out of your OpenTelemetry metrics, follow these best practices:

  • Structure your metrics: adopt a dot-separated format like service.component.metric to indicate hierarchy (e.g., auth.api.request_duration) and include useful resource metadata to improve searchability.
  • Leverage aggregation: Aggregate at the source or Collector level to reduce the volume of raw metrics data.
  • Limit high-cardinality metrics: These are harder to search for and can overwhelm storage systems. We recommend using logs instead. 
  • Integrate with logs and traces: Combine OpenTelemetry metrics with logs and traces for more observability. 

For more tips, see our series on OpenTelemetry best practices.

Conclusion

OpenTelemetry metrics provide a mechanism for monitoring and understanding your applications and infrastructure. By leveraging instruments like counters, gauges, and histograms, you can gain insights into system performance and resolve issues proactively. 

When integrated with Honeycomb, OpenTelemetry metrics become even more valuable, enabling teams to visualize, analyze, and act on their telemetry data effectively. You can jumpstart your observability journey today by exploring Honeycomb’s OpenTelemetry resources and setting up metrics for your systems.

Don’t forget to share!
Rox Williams

Rox Williams

Senior Content Marketing Manager

The First of Her Name, Mother of Pugs, Breaker of Leashes, Khaleesi of the Goth Office, Rox is the Senior Content Marketing Manager at Honeycomb. She comes to us with a background in DevOps writing. In her spare time, you can find her at a vegan restaurant with her partner, or adopting too many senior pugs.

Related posts