Putting it all together: head and tail per-key target rate sampling
If you want to make head sampling automatically instrument everything downstream, make sure you pass the head sampling decision and corresponding rate from parent to child span (e.g. via HTTP header) to force sampling even if dynamic sampling at the lower level’s context would not have chosen to instrument the request.
var headCounts, tailCounts map[interface{}]int
var headSampleRates, tailSampleRates map[interface{}]float64
// Boilerplate main() and goroutine init to overwrite maps and roll them over every interval goes here. checkSampleRate() etc. from above as well
func handler(resp http.ResponseWriter, req *http.Request) {
var r, upstreamSampleRate float64
if r, err := floatFromHexBytes(req.Header.Get("Sampling-ID")); err != nil {
r = rand.Float64()
}
// Check if we have an non-negative upstream sample rate; if so, use it.
if upstreamSampleRate, err := floatFromHexBytes(req.Header.Get("Upstream-Sample-Rate")); err == nil && upstreamSampleRate > 1.0 {
headSampleRate = upstreamSampleRate
} else {
headSampleRate := checkHeadSampleRate(req, headSampleRates, headCounts)
if headSampleRate > 0 && r < 1.0 / headSampleRate {
// We'll sample this when recording event below; propagate the decision downstream though.
} else {
// clear out headSampleRate as this event didn't qualify for sampling.
headSampleRate = -1.0
}
}
start := time.Now()
i, err := callAnotherService(r, headSampleRate)
resp.Write(i)
if headSampleRate > 0 {
RecordEvent(req, headSampleRate, start, err)
} else {
// Same as for head sampling, except here we make a tail sampling decision we can't propagate downstream.
tailSampleRate := checkTailSampleRate(resp, start, err, tailSampleRates, tailCounts)
if tailSampleRate > 0 && r < 1.0 / tailSampleRate {
RecordEvent(req, tailSampleRate, start, err)
}
}
}
That was complicated, but is extremely powerful for capturing all the necessary context we need to effectively debug our modern, high-throughput systems. There’s even more interesting ways to combine head and tail based trace sampling, such as temporarily increasing the probability of head sampling on the request’s head sampling key if a tail heuristic saw an error in the response.
And, of course, collector-side buffered sampling allows deferring sampling decisions until after an entire trace has been buffered, bringing the advantages of head sampling to properties known at the tail.
Conclusion
Hopefully this practical, iterative set of code examples inspired you to get started with dynamic sampling in your own code. And if you’re interested in overcoming the limitation of per-process sampling decisions and be able to make tail-based sampling decisions based on buffered execution traces, Honeycomb has an upcoming buffered sampling feature. Email solutions@honeycomb.io to request early access.
For more information, read the Honeycomb documentation on sampling, or look at our sample code in Go or JavaScript, and Travis-CI’s Ruby port! Our friends at Cribl have also written a post on dynamic sampling of log data, with no new code needed! Write to me at lizf@honeycomb.io if you have comments or questions!