Want to see the power of Honeycomb and OTel together? Download our guide.
As our software complexity increases, so does our telemetry—and as our telemetry increases, it needs more and more tweaking en route to its final destination. You’ve likely needed to change an attribute, parse a log body, or touch up a metric before it landed in your backend of choice.
At Honeycomb, we think the OpenTelemetry Collector is the perfect tool to handle data transformation in flight. The Collector can receive data, process it, and then export it wherever it needs to go. If you’re unfamiliar with the Collector, you can quickly review its architecture.
To transform data in the Collector, there is no better solution than the Transform processor. This processor leverages the OpenTelemetry Transformation Language (OTTL) to express telemetry transformations. In this blog, we’ll share some real-life situations that we resolved with the power of OTTL.
The scenarios, and how to solve them
Set an attribute if HTTP status code exists and does not equal 200
In this situation, the user wanted to set an attribute named otel.http.status_code
to ERROR
if the attribute http.response.status_code
existed and did not equal 200. This is a pretty simple situation, but expressing it in a data pipeline isn’t always possible. Luckily, OTTL allows you to specify conditions that need to match before executing the transformation. In this scenario, the solution is:
transform: error_mode: ignore trace_statements: - context: span statements: - set(attributes["otel.http.status_code"], "ERROR") where attributes["http.response.status_code"] != nil && attributes["http.response.status_code"] != 200
Replace all .
with _
Here, the user wanted to replace all the .
characters in their attribute keys with _
. While it sounds simple, modifying key names in a data pipeline normally requires you to know the name of the key you want to change so you can set a new attribute, with the new name, using the old value. Doing this renaming dynamically is harder, but OTTL provides a simple solution:
transform: error_mode: ignore trace_statements: - context: span statements: - replace_all_patterns(attributes, "key", "\\.", "_")
Formatting OTTL
Sometimes, OTTL statements can get long and would greatly benefit from some formatting. Since OTTL statements are strings, you can take advantage of YAML’s | string format:
transform: error_mode: ignore trace_statements: - context: resource statements: - | keep_keys(attributes, [ "http.method", "http.route", "http.url" ] )
Combine attribute values
In this scenario, the user wanted to set an attribute using the value from another attribute, but only when a different attribute matches a regex pattern. There are a lot of requirements here, but OTTL lets you handle them all in one statement:
transform: error_mode: ignore trace_statements: - context: span statements: - set(attributes["index"], Concat(["audit", attributes["k8s.namespace.name"]], "-")) where IsMatch(attributes["k8s.container.name"], "audit.*")
Dropping metrics
OTTL conditions are useful in other processors as well, such as the Filter processor, where they are used to determine when data should be dropped. In this scenario, the user wanted to drop an entire metric based on its name and whether the metric contained any datapoint with the attribute rule_result
with a value of pass
. Here’s how we solved for this:
filter: error_mode: ignore metrics: metric: - 'name == "specific-name" and HasAttrOnDatapoint("rule_result", "pass")'
Remove a span’s parent span id
In this scenario, the user had a very unique goal of removing the parent span id from spans that were from a specific instrumentation. I don’t understand why this was necessary since it’s a pretty risky transformation (you’d likely end up with multiple root spans in the same trace id), but the transform processor is all about freedom:
transform: trace_statements: - context: span statements: - set(parent_span_id, SpanID(0x0000000000000000)) where instrumentation_scope.name == "my-instrumentaiton-scope"
Parse JSON body into attributes
This is one of the most common scenarios users ask about. They have a log body that is a JSON string and they want to move the fields into the log attributes. The solution is:
transform: error_mode: ignore log_statements: - context: log statements: - merge_maps(attributes, ParseJSON(body), "upsert") where IsMatch(body, "^\\{") - flatten(cache) - merge_maps(attributes, cache, "upsert")
Reuse a condition
OpenTelemetry data is structured, and there is a lot of value you can get from that structure. For example, spans all have a Span Kind, which describes the relationship between the span, its parents, and its children in a trace. In this scenario, the user wanted to rename an attribute for when the span kind was SERVER
. The Transform processor has a conditions
option that lets you define conditions to use for all statements. This saves you from having to duplicate the condition multiple times. Here’s the solution:
transform: error_mode: ignore trace_statements: - context: span # only run the statements for the span if the span passes this condition conditions: - kind == SPAN_KIND_SERVER statements: - set(attributes["http.route.server"], attributes["http.route"]) - delete_key(attributes, "http.route")
Set a resource attribute using a datapoint attribute
The user had Kubernetes IP information on the datapoint attributes that they wanted to move to the resource attributes so that the k8sattributes processor would work correctly. The Transform processor is able to do this (although there are some caveats that have only been solved for logs) using a simple set command:
transform: metric_statements: - context: datapoint statements: - set(resource.attributes["k8s.pod.ip"], attributes["k8s_pod_ip"])
Use resource attributes in a log condition
In this scenario, the user wanted to drop a log if the body contained “info” anywhere in the string and if the log was from a specific Kubernetes namespace and app. They already associated the Kubernetes data to their log via the k8sattributes processor, which meant they could use the Filter processor to drop the data. The solution was:
filter: error_mode: ignore logs: log_record: - IsMatch(body, ".*info.*") and resource.attributes["namespace"] == "my-system" and resource.attributes["app"] == "my-app"
Drop specific datapoints
Here, the user wanted to drop datapoints from a metric named sample
if the datapoint had an attribute named test
that does not equal fail
. The Filter processor allows you to access the metric name in the same way the log statement in the previous scenario could access the resources:
filter: error_mode: ignore metrics: datapoint: - metric.name == "sample" and attributes["test"] != "fail"
Index a map and slice
The user needed to access a value within a JSON map they had parsed. OTTL’s grammar allows you to index a map or slice, assuming the underlying datatype actually is a map or a slice. In this case, the solution was:
transform: error_mode: ignore log_statements: - context: log statements: - merge_maps(cache, ParseJSON(body), "upsert") where IsMatch(body, "^\\{") - set(attributes["test"], "pass") where cache["body"] != nil and cache["body"]["keywords"] != nil and cache["body"]["keywords"][0] == "success"
Convert all resource attributes to attributes
The user wanted to move all their resource attributes to their log attributes. Unlike the resource-setting scenario before, there are no caveats when moving resources attributes “down” into a span. The solution was:
transform: error_mode: ignore log_statements: - context: log statements: - merge_maps(attributes, resource.attributes, "upsert") - context: resource statements: - delete_matching_keys(attributes, ".*")
Create a time_bucket
attribute for slow and fast spans
The user wants to use the duration of spans in another pipeline stage. OpenTelemetry’s fields include the start and end timestamps in nanoseconds, but we want to quantify the speed as fast
, mid
, or slow
.
transform: traces: statements: - set(attributes["tp_duration_ns"], end_time_unix_nano - start_time_unix_nano) - set(attributes["time_bucket"], "fast") where attributes["tp_duration_ns"] < 10000000000 - set(attributes["time_bucket"], "mid") where attributes["tp_duration_ns"] >= 11000000000 and attributes["tp_duration_ns"] < 120000000000 - set(attributes["time_bucket"], "slow") where attributes["tp_duration_ns"] >= 120000000000
OpenTelemetry + Honeycomb = ♥️
If you’d like to read more about the power of OpenTelemetry combined with Honeycomb, we have three great resources for you:
Read our whitepaper: The Director’s Guide to Observability: Leveraging OpenTelemetry in Complex Systems
Download our guide: Honeycomb & OpenTelemetry for In-Depth Observability
Aaand another whitepaper: How OpenTelemetry and Semantic Telemetry Will Reshape Observability