The OTTL Cookbook: Common Solutions to Data Transformation Problems

The OTTL Cookbook: Common Solutions to Data Transformation Problems

8 Min. Read

As our software complexity increases, so does our telemetry—and as our telemetry increases, it needs more and more tweaking en route to its final destination. You’ve likely needed to change an attribute, parse a log body, or touch up a metric before it landed in your backend of choice.

At Honeycomb, we think the OpenTelemetry Collector is the perfect tool to handle data transformation in flight. The Collector can receive data, process it, and then export it wherever it needs to go. If you’re unfamiliar with the Collector, you can quickly review its architecture.

To transform data in the Collector, there is no better solution than the Transform processor. This processor leverages the OpenTelemetry Transformation Language (OTTL) to express telemetry transformations. In this blog, we’ll share some real-life situations that we resolved with the power of OTTL.

The scenarios, and how to solve them

Set an attribute if HTTP status code exists and does not equal 200

In this situation, the user wanted to set an attribute named otel.http.status_code to ERROR if the attribute http.response.status_code existed and did not equal 200. This is a pretty simple situation, but expressing it in a data pipeline isn’t always possible. Luckily, OTTL allows you to specify conditions that need to match before executing the transformation. In this scenario, the solution is:

transform:
  error_mode: ignore
  trace_statements:
    - context: span
      statements:
        - set(attributes["otel.http.status_code"], "ERROR") where attributes["http.response.status_code"] != nil && attributes["http.response.status_code"] != 200

Replace all . with _

Here, the user wanted to replace all the . characters in their attribute keys with _. While it sounds simple, modifying key names in a data pipeline normally requires you to know the name of the key you want to change so you can set a new attribute, with the new name, using the old value. Doing this renaming dynamically is harder, but OTTL provides a simple solution:

transform:
  error_mode: ignore
  trace_statements:
    - context: span
      statements:
        - replace_all_patterns(attributes, "key", "\\.", "_")

Formatting OTTL

Sometimes, OTTL statements can get long and would greatly benefit from some formatting. Since OTTL statements are strings, you can take advantage of YAML’s | string format:

transform:
  error_mode: ignore
  trace_statements:
    - context: resource
      statements:
        - |
          keep_keys(attributes,
            [
              "http.method",
              "http.route",
              "http.url"
            ]
          )

Want to see the power of Honeycomb and OTel together? Download our guide.


Combine attribute values

In this scenario, the user wanted to set an attribute using the value from another attribute, but only when a different attribute matches a regex pattern. There are a lot of requirements here, but OTTL lets you handle them all in one statement:

transform:
  error_mode: ignore
  trace_statements:
    - context: span
      statements:
        - set(attributes["index"], Concat(["audit", attributes["k8s.namespace.name"]], "-")) where IsMatch(attributes["k8s.container.name"], "audit.*")

Dropping metrics

OTTL conditions are useful in other processors as well, such as the Filter processor, where they are used to determine when data should be dropped. In this scenario, the user wanted to drop an entire metric based on its name and whether the metric contained any datapoint with the attribute rule_result with a value of pass. Here’s how we solved for this:

filter:
  error_mode: ignore
  metrics:
    metric:
      - 'name == "specific-name" and HasAttrOnDatapoint("rule_result", "pass")'

Remove a span’s parent span id

In this scenario, the user had a very unique goal of removing the parent span id from spans that were from a specific instrumentation. I don’t understand why this was necessary since it’s a pretty risky transformation (you’d likely end up with multiple root spans in the same trace id), but the transform processor is all about freedom: 

transform:
  trace_statements:
    - context: span
      statements:
        - set(parent_span_id, SpanID(0x0000000000000000)) where instrumentation_scope.name == "my-instrumentaiton-scope"

Parse JSON body into attributes

This is one of the most common scenarios users ask about. They have a log body that is a JSON string and they want to move the fields into the log attributes. The solution is:

transform:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - merge_maps(attributes, ParseJSON(body), "upsert") where IsMatch(body, "^\\{")
          - flatten(cache)
          - merge_maps(attributes, cache, "upsert")

Reuse a condition

OpenTelemetry data is structured, and there is a lot of value you can get from that structure. For example, spans all have a Span Kind, which describes the relationship between the span, its parents, and its children in a trace. In this scenario, the user wanted to rename an attribute for when the span kind was SERVER. The Transform processor has a conditions option that lets you define conditions to use for all statements. This saves you from having to duplicate the condition multiple times. Here’s the solution:

transform:
  error_mode: ignore
  trace_statements:
    - context: span
      # only run the statements for the span if the span passes this condition
      conditions:
        - kind == SPAN_KIND_SERVER    
      statements:
        - set(attributes["http.route.server"], attributes["http.route"])
        - delete_key(attributes, "http.route")

Set a resource attribute using a datapoint attribute

The user had Kubernetes IP information on the datapoint attributes that they wanted to move to the resource attributes so that the k8sattributes processor would work correctly. The Transform processor is able to do this (although there are some caveats that have only been solved for logs) using a simple set command:

transform:
    metric_statements:
    - context: datapoint
      statements:
      - set(resource.attributes["k8s.pod.ip"], attributes["k8s_pod_ip"])

Use resource attributes in a log condition

In this scenario, the user wanted to drop a log if the body contained “info” anywhere in the string and if the log was from a specific Kubernetes namespace and app. They already associated the Kubernetes data to their log via the k8sattributes processor, which meant they could use the Filter processor to drop the data. The solution was: 

filter:
  error_mode: ignore
    logs:
      log_record:
        - IsMatch(body, ".*info.*") and resource.attributes["namespace"] == "my-system" and resource.attributes["app"] == "my-app"

Drop specific datapoints

Here, the user wanted to drop datapoints from a metric named sample if the datapoint had an attribute named test that does not equal fail. The Filter processor allows you to access the metric name in the same way the log statement in the previous scenario could access the resources:    

filter:
  error_mode: ignore
  metrics:
    datapoint:
      - metric.name == "sample" and attributes["test"] != "fail"

Index a map and slice

The user needed to access a value within a JSON map they had parsed. OTTL’s grammar allows you to index a map or slice, assuming the underlying datatype actually is a map or a slice. In this case, the solution was:

transform:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - merge_maps(cache, ParseJSON(body), "upsert") where IsMatch(body, "^\\{")
          - set(attributes["test"], "pass") where cache["body"] != nil and cache["body"]["keywords"] != nil and cache["body"]["keywords"][0] == "success"

Convert all resource attributes to attributes

The user wanted to move all their resource attributes to their log attributes. Unlike the resource-setting scenario before, there are no caveats when moving resources attributes “down” into a span. The solution was:

transform:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - merge_maps(attributes, resource.attributes, "upsert")
      - context: resource
        statements:
          - delete_matching_keys(attributes, ".*")

Create a time_bucket attribute for slow and fast spans

The user wants to use the duration of spans in another pipeline stage. OpenTelemetry’s fields include the start and end timestamps in nanoseconds, but we want to quantify the speed as fast, mid, or slow.

transform:
    traces:
      statements:
        - set(attributes["tp_duration_ns"], end_time_unix_nano - start_time_unix_nano)
        - set(attributes["time_bucket"], "fast") where attributes["tp_duration_ns"] < 10000000000
        - set(attributes["time_bucket"], "mid") where attributes["tp_duration_ns"] >= 11000000000 and attributes["tp_duration_ns"] < 120000000000
        - set(attributes["time_bucket"], "slow") where attributes["tp_duration_ns"] >= 120000000000

OpenTelemetry + Honeycomb = ♥️

If you’d like to read more about the power of OpenTelemetry combined with Honeycomb, we have three great resources for you: 

Read our whitepaper: The Director’s Guide to Observability: Leveraging OpenTelemetry in Complex Systems

Download our guide: Honeycomb & OpenTelemetry for In-Depth Observability

Aaand another whitepaper: How OpenTelemetry and Semantic Telemetry Will Reshape Observability

Don’t forget to share!
Tyler Helmuth

Tyler Helmuth

Staff Software Engineer

Tyler is a software engineer with a passion for observability and helping users start their observability journey. He is an active contributor to OpenTelemetry, where he strives to make observability easy for all to achieve.

Related posts