Observability Instrumentation Dogfooding

How Honeycomb Uses Honeycomb, Part 6: Instrumenting a Production Service

By Ben Hartshorne | Last modified on January 11, 2019

This post continues our dogfooding series from How Honeycomb Uses Honeycomb, Part 5: The Correlations Are Not What They Seem.

In a recent blog post, I talked about what sorts of things should go into an event. That was a lovely and generic list. This post grounds it in our reality - I’m here to share with you how we configure libhoney-go and talk about the instrumentation we add in our own production services. The service I’m using as an example here is our API for receiving events from the wild. It’s an HTTP server, and for the most part, each incoming request sends a single event out to our dogfooding cluster.

The godoc pages for libhoney have more details on anything specific to libhoney that I gloss over in this post (eg builder vs. event, dynamic fields, etc.).

Initialization

There are a few pieces of information we want to add to every event we send. They fall in to two broad categories

information about the host / environment running the service
application-agnostic measurements that are always useful

Here they are!

// Add these fields to the top-level `libhoney` to automatically include
// Add these f // in all emitted events
libhoney.AddField("server_hostname", hostname)
libhoney.AddField("build_id", BuildID)
libhoney.AddField("env", os.Getenv("ENV"))
libhoney.AddField("role", os.Getenv("ROLE"))
libhoney.AddField("instance_type", os.Getenv("INSTANCE_TYPE"))
libhoney.AddField("availability_zone", os.Getenv("AZ"))
// Attaching a function as a dynamic field means it'll be
// evaluated on each event.Send()
libhoney.AddDynamicField("num_goroutines",
    func() interface{} { return runtime.NumGoroutine() })
getAlloc := func() interface{} {
    var mem runtime.MemStats
    runtime.ReadMemStats(&mem)
    return mem.Alloc
}
libhoney.AddDynamicField("memory_inuse", getAlloc)

startTime := time.Now()
libhoney.AddDynamicField("process_uptime_sec", func() interface{} {
    return time.Now().Sub(startTime) / time.Second
})

Wrapping a request

As is the Go way, our service is a web server because it implements the ServeHTTP method. Here’s the beginning of that function definition. The things to note:

we create a libhoney builder that will hold all the fields instrumented for this request (more about why a builder at the end)

ww  :=:=  apiapi..NewResponseWrapperNewResponseWrapper((writerwriter))
 builderbuilder  :=:=  libhoneylibhoney..NewBuilderNewBuilder()()
 hnyWrapperhnyWrapper  :=:=  &&libhoneyWrapperlibhoneyWrapper{{builderbuilder::  builderbuilder}}

we put a builder in the context and use that to give all request handlers the ability to add fields

ctx := context.WithValue(request.Context(), CtxKeyLibhoney, hnyWrapper)
r := request.WithContext(ctx)

we dispatch the event to Honeycomb in a defer so that it always gets sent when the function finishes

defer func() {
    ...
    evt.Send()
}()

within that defer, we do dynamic sampling based on three fields to drop sections of the most frequent traffic while making sure to send representative samples of all our important requests (we want to see errors and infrequent customers as much as understanding the volume of more frequent successful traffic). We use the dynsampler package for sampling logic

if a.DynamicSampler != nil {
    key := fmt.Sprintf("%s,%s,%v,%v", r.Method, r.URL.Path,
        hnyWrapper.datasetID, w.Status())
    evt.AddField("dynsample_key", key)
    rate := a.DynamicSampler.GetSampleRate(key)
    evt.AddField("dynsample_rate", rate)
    evt.SampleRate = uint(rate)
}

finally we set up a timer for the whole request and then go on to do the work of actually handling everything

defer trackTimer(hnyWrapper, "request", time.Now())
// go on and actually handle the request

Here it is all together: https://gist.github.com/maplebed/d12e7f3f773f506250a6129a07d365a7

Helper functions

There are a few things we do to streamline our use of libhoney. The library in its native form is fine for general use, but a common pattern in Go is to wrap 3rd party integrations with an interface that only exposes the functionality you use. Despite being the authors of the library, we do this for a few reasons. It makes it easier to manage builders and events, easier to mock it for tests, and serves as a convenient carrier for a few fields we also want to send to other instrumentation targets (eg Sentry for exceptions).

For most of the code, the only action necessary is adding a field to an event. We wrote the libhoneyWrapper above to abstract out that use, as well as keep a few extra fields locally that we reuse for other targets (eg sending stack traces to Sentry):

type eventWrapper interface {
    AddField(string, interface{})
}

type libhoneyWrapper struct {
    requestID string
    datasetID int
    teamID    int
    builder   *libhoney.Builder
}

Getting the wrapper out of the context is straight forward, but repeated everywhere. It gets its own function to make sure that even without the wrapper in the context, code doesn’t crash:

func GetLibhoney(ctx context.Context) *libhoneyWrapper {
    if hny, ok := ctx.Value(CtxKeyLibhoney).(*libhoneyWrapper); ok {
        return hny
    }
    // returning placeholder builder to avoid panic
    return &libhoneyWrapper{builder: libhoney.NewBuilder()}
}

The other common task we have is wrapping some code in a timer. To centralize the math and the pattern for field names, we put that in its own function:

func trackTimer(evt eventWrapper, label string, startTime time.Time) {
    durMs := float64(time.Since(startTime)) / float64(time.Millisecond)
    evt.AddField(label+"_dur_ms", durMs)
}

Use throughout the code

Within any function down the tree handling the request, the pattern to follow to add additional information to the event is simple - grab the wrapper out of the request context and add another piece of data. (We use the defer again here, allowing us to define code up front to be executed when the enclosing function returns.)

func (a *App) getMarkers(r *http.Request, w http.ResponseWriter) error {
    hnyWrapper := GetLibhoney(r.Context())
    defer trackTimer(hnyWrapper, "get_markers", time.Now())
    // go on and actually get the markers, adding more fields as needed
    id := a.getIDFromMarker(r)
    hnyWrapper.AddField("marker_id", id)
    // etc.

Catching errors

Whenever we come across the standard foo, err := ... pattern with 3rd party libraries, the err itself as well as a human description of the error get added to the event. We do this to make clear where the error is coming from and to make it easier to create a Break Down in the Honeycomb query builder. There are two ways that the error message itself can make breakdowns and analysis difficult:

it is too vague (eg permission denied) and says nothing about the context
it is too specific (eg invalid character '}' at position 34) and would create more groupings than desired

gzReader, err := gzip.NewReader(&buf)
if err != nil {
    evt.AddField("error_desc", "failed creating gzip reader")
    evt.AddField("error", err.Error())
    return nil, err
}

Why a builder?

A question comes up, given what we’ve talked about so far. Why did we put a builder in the context instead of an event? It gives us a little more flexibility, but practically, for our API server, there’s one reason why we made this choice. When someone POSTs to /1/event, that single HTTP request is sending in one event, and our API server generates one event to our dogfooding cluster. However, when they POST to /1/batch, that batch represents the transmission of many events to our API server, and our API server in turn creates many events to send in to our dogfooding cluster.

By using a builder, all the attributes set while parsing the batch envelope such as user_agent, write key, etc.) are carried through to the events sent representing the contents of the batch. Builders are an excellent tool to use when there are a number of common attributes you want to set that relate to the generation of multiple events downstream.

Conclusion

There are many ways to use libhoney. How it will work best for you will be influenced by your team, the frameworks you use, and so on. This is the way we do it now, though we’re still learning! I’d love to hear how you use it—please write in and tell us your story. And if you don’t yet, sign up and give Honeycomb a try!

This is another installation in a series of dogfooding posts showing how and why we think Honeycomb is the future of systems observability. Stay tuned for more!

OpenTelemetry Observability

How Honeycomb Uses Honeycomb, Part 6: Instrumenting a Production Service

Initialization

Wrapping a request

Helper functions

Use throughout the code

Catching errors

Why a builder?

Conclusion

Related Posts

Real User Monitoring With a Splash of OpenTelemetry

Transforming to an Engineering Culture of Curiosity With a Modern Observability 2.0 Solution

Honeycomb + Google Gemini

Ready to get started?