Iterating on an OpenTelemetry Collector Deployment in Kubernetes

Iterating on an OpenTelemetry Collector Deployment in Kubernetes

6 Min. Read

When you want to direct your observability data in a uniform fashion, you want to run an OpenTelemetry collector. If you have a Kubernetes cluster handy, that’s a useful place to run it. Helm is a quick way to get it running in Kubernetes; it encapsulates all the YAML object definitions that you need. OpenTelemetry publishes a Helm chart for the collector.

When you install the OpenTelemetry collector with Helm, you’ll give it some configuration. Override the defaults, add your specific needs. To get your configuration right, play around in a test Kubernetes cluster until the collector works the way you want.

This post details how to start with a default installation of the OpenTelemetry Collector, and iterate the configuration until it works the way you want it to.

Prerequisites:

Do this once: get the Helm chart working

Get access to the official Helm chart (as instructed in the README):

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

You need a name for the installation. I’m using “collectron.” It wants a lowercase RFC 1123 subdomain so stick with lowercase letters, numbers, and dashes. If the name doesn’t include “opentelemetry-collector” the chart will append that for you.

Now, try installing for the first time:

helm install collectron open-telemetry/opentelemetry-collector

If you see an error like this: 

Error: execution error at (opentelemetry-collector/templates/NOTES.txt:14:3): [ERROR] 'mode' must be set. See https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/UPGRADING.md for instructions.

That’s a good sign. We need to give it some configuration.

Create a file to contain the chart configuration. I’m going to call it `values.yaml`.

In it, put `mode: deployment` (if you’re aiming to spin up a collector for front-end traces) or `mode: daemonset` (if the collector is going to process traces from other applications running in Kubernetes).

values.yaml:

mode: deployment

Having tried once, and having created a values.yaml for configuration, commence iteration.

Iterate on the Helm chart

Change values.yaml and save the file.

Update the installation:

helm upgrade collectron open-telemetry/opentelemetry-collector --values values.yaml

Check that exactly one is running:   

kubectl get pods

Tail its log: 

kubectl get pods -o name | grep collectron | sed 's#pod/##' | xargs kubectl logs -f

Send a test span. If you’re not sure how to do that, here’s a reference

More explanation

Each time you change values.yaml, update the installation like this:

helm upgrade collectron open-telemetry/opentelemetry-collector --values values.yaml

Successful output looks like:

Release "collectron" has been upgraded. Happy Helming!
NAME: collectron
LAST DEPLOYED: Fri Jul  8 13:16:07 2022
NAMESPACE: default
STATUS: deployed
REVISION: 19
TEST SUITE: None
NOTES:

Check that the collector is running

We expect Kubernetes to run a pod with a name that starts with the installation name, collectron. The Helm chart appends “opentelemetry-collector” if your name doesn’t already contain this.

Check what’s running with:

kubectl get pods

I see this line:

NAME                                                READY   STATUS    RESTARTS   AGE
collectron-opentelemetry-collector-766b88bbf8-gr482   1/1   Running   0    2m18

See the pod

Check that there is exactly one of them. Check the last column (how long ago) to see whether this one started up after your last helm upgrade.

Troubleshooting: My pod didn’t restart after the upgrade.

If your upgrade did not modify the collector config, then maybe it didn’t need to restart the pod. For instance, adding `service: LoadBalancer` to values.yaml doesn’t need a pod restart.

For everything else: check the output of `helm upgrade`. Maybe there is an error message.

See the pod status

Check that the status is “Running.”

Troubleshooting: My pod stays in PENDING status forever.

Try:

kubectl describe pod <pod name>

This prints a lot more information, including why the pod is still pending. In my case, the output included:

Warning FailedScheduling 16s (x105 over 105m) default-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory

All my nodes were full, so I added another one. Poof! Pod started!

Troubleshooting: My pod status is CrashLoopBackoff.

Something’s going wrong. Use kubectl logs to find out what.

Check on open ports

By default, a lot of ports are open on the collector container. If extra ports are open, that can confuse health checks and stop a load balancer from seeing your collector pods. You can check on this with:

kubectl describe <pod name>

Here’s a one-liner that will list the open ports with their names:

kubectl get pods -o name | grep opentelemetry-collector | sed 's#pod/##' | xargs kubectl get pod -o jsonpath='{range .spec.containers[].ports[*]}{.containerPort}{"\t"}{.name}{"\n"}{end}

Look at the collector’s logs

The full name of the pod lets you request its logs. Copy that from the output of kubectl get pods and then pass it to kubectl logs (your pod name will be different):

kubectl logs collectron-opentelemetry-collector-766b88bbf8-gr482

Here’s a one-liner that you can repeat after the full name of the pod changes. 

kubectl get pods -o name | grep opentelemetry-collector | sed 's#pod/##' | xargs kubectl logs

Hurray, logs! Now we have a feedback loop. 

If the startup logs look OK, try sending a test span.

Is the collector doing what you want? If not, change values.yaml and repeat.

What to change next?

All your options are listed in the chart repository’s values.yaml.

You want a `config:` section in values.yaml. The Helm chart will take what you put here, combine it with its defaults, and produce the collector’s configuration file. You’ll definitely want to define some pipelines and exporters. The chart’s README has some examples.

Note that when you turn off receivers you don’t need, you’ll also want to close the ports on the container. For instance, at the top level of values.yaml:

ports:
    jaeger-compact:
        enabled: false

For examples of collectors that process metrics, there are some docs over at Lightstep.

For a full example, here’s the configuration I use to send traces from my client-side apps to Honeycomb. 

Get started today

If you’re interested in what Honeycomb has to offer, create a free account today. You get state-of-the-art observability on up to 20 million events (spans) per month—a very useful free tier! 

If you want to tell me about your particular experience or if you need more help with this tutorial, sign up for office hours. I’ll happily spend some 1:1 time to hear about your experience and walk you through the OpenTelemetry collector.

Don’t forget to share!
Jessica Kerr

Jessica Kerr

Manager, Developer Relations

Jess is a symmathecist, in the medium of code. She sees development teams as learning systems made of people and running software. If we make that software teach us what’s happening, it’s a better teammate. And if this process makes us into systems thinkers, we can be better persons in the world.

Related posts