Dynamic Sampling by Example

Query count and sampling chance plotted

3 Min. Read

Putting it all together: head and tail per-key target rate sampling

If you want to make head sampling automatically instrument everything downstream, make sure you pass the head sampling decision and corresponding rate from parent to child span (e.g. via HTTP header) to force sampling even if dynamic sampling at the lower level’s context would not have chosen to instrument the request.

var headCounts, tailCounts map[interface{}]int
var headSampleRates, tailSampleRates map[interface{}]float64

// Boilerplate main() and goroutine init to overwrite maps and roll them over every interval goes here. checkSampleRate() etc. from above as well

func handler(resp http.ResponseWriter, req *http.Request) {
	var r, upstreamSampleRate float64
	if r, err := floatFromHexBytes(req.Header.Get("Sampling-ID")); err != nil {
		r = rand.Float64()
	}

	// Check if we have an non-negative upstream sample rate; if so, use it.
	if upstreamSampleRate, err := floatFromHexBytes(req.Header.Get("Upstream-Sample-Rate")); err == nil && upstreamSampleRate > 1.0 {
		headSampleRate = upstreamSampleRate
	} else {
		headSampleRate := checkHeadSampleRate(req, headSampleRates, headCounts)
		if headSampleRate > 0 && r < 1.0 / headSampleRate {
			// We'll sample this when recording event below; propagate the decision downstream though.
		} else {
			// clear out headSampleRate as this event didn't qualify for sampling.
			headSampleRate = -1.0
		}
	}

	start := time.Now()
	i, err := callAnotherService(r, headSampleRate)
	resp.Write(i)

	if headSampleRate > 0 {
		RecordEvent(req, headSampleRate, start, err)
	} else {
		// Same as for head sampling, except here we make a tail sampling decision we can't propagate downstream.
		tailSampleRate := checkTailSampleRate(resp, start, err, tailSampleRates, tailCounts)
		if tailSampleRate > 0 && r < 1.0 / tailSampleRate {
			RecordEvent(req, tailSampleRate, start, err)
		}
	}
}

That was complicated, but is extremely powerful for capturing all the necessary context we need to effectively debug our modern, high-throughput systems. There’s even more interesting ways to combine head and tail based trace sampling, such as temporarily increasing the probability of head sampling on the request’s head sampling key if a tail heuristic saw an error in the response.

And, of course, collector-side buffered sampling allows deferring sampling decisions until after an entire trace has been buffered, bringing the advantages of head sampling to properties known at the tail.

Conclusion

Hopefully this practical, iterative set of code examples inspired you to get started with dynamic sampling in your own code. And if you’re interested in overcoming the limitation of per-process sampling decisions and be able to make tail-based sampling decisions based on buffered execution traces, Honeycomb has an upcoming buffered sampling feature. Email solutions@honeycomb.io to request early access.

For more information, read the Honeycomb documentation on sampling, or look at our sample code in Go or JavaScript, and Travis-CI’s Ruby port! Our friends at Cribl have also written a post on dynamic sampling of log data, with no new code needed! Write to me at lizf@honeycomb.io if you have comments or questions!

Don’t forget to share!
Liz Fong-Jones

Liz Fong-Jones

Field CTO

Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with over two decades of experience. She is currently the Field CTO at Honeycomb, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.

Related posts