Ask Miss O11y: Long-Running Requests

By: Liz Fong-Jones | January 20th, 2022

Ask Miss O11y

2 Min. Read

Dear Miss O11y,

How do I think about instrumenting and setting service-level objectives (SLOs) on streaming RPC workloads with long-lived connections? We won’t necessarily have a “success” metric per stream to make a percentage with and a very successful stream could last for days before finishing. What about if the workload contains many individual messages that can independently succeed or fail? How can I adopt observability and SLOs to continuous workloads?

– Stuck on Streams (Nick Herring and Jake Demarest-Mays)

Dear Stuck,

You need not fear a long-lived streaming workload. A few simple tricks can transform a request that may not ever terminate for hours or days into something you can get regular health and status updates on. We in fact have one of those continuous processing services—Beagle, our Service Level Objective stream processor—which we’ve instrumented in this fashion.

Beagle’s instrumentation consists of one root span per stream started per minute (“tick”) with a span duration of 60 seconds. Any outbound requests that start during that window are parented to the tick root span. And when the tick’s duration ends, it also rolls up the number of successful vs. failed writes in total, the amount by which it’s delayed vs. the latest Kafka offset, and so forth into attributes on its root span. One tick is generated per stream per minute, so we’re always able to follow the behavior of each stream of data across multiple minutes by aggregating by the stream ID.

In that fashion, we are able to get periodic minute-long, metric-like data from the root span, along with data from functional traces that don’t sprawl across hours or days accumulating tens of thousands of spans, as well as lower-level data that can be used to feed our own SLOs (it’s Beagles all the way down!).

So my advice to you is to set a maximum duration before you roll over and start a new span, have a connection/stream ID to tie the data together, hybridize your root span into being a metrics rollup, and then parent any work that is initiated during that time window to the currently active tick span. You can then set SLOs on the number of succeeding/failing connections in each minute, or even down to the per-sub-request level success, without having to wait for the stream to finish to report its status.

Hope that helps!

Don’t forget to share!

Liz Fong-Jones

Field CTO

Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with over two decades of experience. She is currently the Field CTO at Honeycomb, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.

Martin Thwaites | May 09, 2023

Ask Miss O11y: To Metric or to Trace?

Dear Miss O11y, I remember reading quite interesting opinions from you about usage of metrics and traces in an application. Did you elaborate on those points in a blog post somewhere, so I can read your arguments to forge some advice for myself? I must admit that I was quite puzzled by your stance regarding the (un)usefulness of metrics compared to traces in apps in some contexts (debugging).

Ask Miss O11y Metrics Tracing

Ask Miss O11y: Is There a Beginner’s Guide to Adding Observability to Your Applications?

Martin Thwaites | Mar 22, 2023

Ask Miss O11y: Is There a Beginner’s Guide On How to Add Observability to Your Applications?

Dear Miss O11y, I want to make my microservices more observable. Currently, I only have logs. I’ll add metrics soon, but I’m not really sure if there is a set path you follow. Is a guide of some sort, or best practice, like you have to have x kinds of metrics? I just want to know what all possibilities are out there. I am very new to this space.

Ask Miss O11y Observability

Jessica Kerr | Mar 09, 2023

Ask Miss O11y: Error: missing ‘x-honeycomb-dataset’ header

Your API Key (in the x-honeycomb-team header) tells Honeycomb where to put your data. It specifies a team and an environment. Then, Honeycomb figures out which dataset to put each event in, based on the service.name field in the event. Except…

Ask Miss O11y

All-in-one Observability

Why Honeycomb

Looking for something?

Our mission

Ask Miss O11y: Long-Running Requests

Liz Fong-Jones

Related posts

Ask Miss O11y: To Metric or to Trace?

Ask Miss O11y: Is There a Beginner’s Guide On How to Add Observability to Your Applications?

Ask Miss O11y: Error: missing ‘x-honeycomb-dataset’ header

Ready to get started?