Recording the sample rate
What if we need to change the flagged value at some point in the future? The instrumentation collector wouldn’t know exactly when the value changed. Thus, it’s better to explicitly pass the current sampleRate
when sending a sampled event — indicating the event statistically represents sampleRate
similar events.
// Note: sampleRate can be specific to this service and doesn't have to be universal!
var sampleRate = flag.Int("sampleRate", 1000, "Service's sample rate")
func handler(resp http.ResponseWriter, req *http.Request) {
start := time.Now()
i, err := callAnotherService()
resp.Write(i)
r := rand.Float64()
if r < 1.0 / *sampleRate {
RecordEvent(req, *sampleRate, start, err)
}
}
This way, we can keep track of the sampling rate in effect when each sampled event was recorded. This gives us the data to accurately calculate even if the sampling rate is different. For example, if we were trying to calculate the total number of events meeting a filter such as “err != nil
“, we’d multiply the count of seen events with “err != nil
” by each’s sampleRate
. And if we were trying to calculate the sum of durationMs
, we’d need to weight each sampled event’s durationMs
, multiplying it by sampleRate
before adding the weighted figures all up.
There’s more to consider about how sampling rates and tracing work together though in the next section.
Consistent sampling
We also need to consider how sampling interacts with tracing. Instead of independently generating a sampling decision inside of each handler, we should use a centrally generated “sampling/tracing ID” propagated to all downstream handlers. Why? This lets us make consistent sampling decisions between different manifestations of the same end user’s request. It would be unfortunate to discover that we have sampled an error far downstream for which the upstream context is missing because it was dropped. Consistent sampling guarantees that if a 1:100 sampling occurs, a 1:99, 1:98, etc. sampling preceding or following it also preserves the execution context. And half of the events chosen by a 1:100 sampling will be present under a 1:200 sampling.
var sampleRate = flag.Int("sampleRate", 1000, "Service's sample rate")
func handler(resp http.ResponseWriter, req *http.Request) {
// Use an upstream-generated random sampling ID if it exists.
// otherwise we're a root span. generate & pass down a random ID.
var r float64
if r, err := floatFromHexBytes(req.Header.Get("Sampling-ID")); err != nil {
r = rand.Float64()
}
start := time.Now()
// Propagate the Sampling-ID when creating a child span
i, err := callAnotherService(r)
resp.Write(i)
if r < 1.0 / *sampleRate {
RecordEvent(req, *sampleRate, start, err)
}
}
Now we have support for adjusting the sample rate without recompiling, including at runtime. But why manually adjust the rate? In the next chapter, we’ll discuss Target Rate Sampling.
Target Rate Sampling
We don’t need to manually flag-adjust the sampling rates for each of our services as traffic swells and sags; instead, we can automate this by tracking the incoming request rate that we’re receiving!
var targetEventsPerSec = flag.Int("targetEventsPerSec", 5, "The target number of requests per second to sample from this service.")
// Note: sampleRate can be a float! doesn't have to be an integer.
var sampleRate float64 = 1.0
// Track requests from previous minute to decide sampling rate for the next minute.
var requestsInPastMinute *int
func main() {
// Initialize counters.
rc := 0
requestsInPastMinute = &rc
go func() {
for {
time.Sleep(time.Minute)
newSampleRate = *requestsInPastMinute / (60 * *targetEventsPerSec)
if newSampleRate < 1 {
sampleRate = 1.0
} else {
sampleRate = newSampleRate
}
newRequestCounter := 0
// Production code would do something less race-y, but this is readable
requestsInPastMinute = &newRequestCounter
}
}()
http.Handle("/", handler)
[...]
}
func handler(resp http.ResponseWriter, req *http.Request) {
var r float64
if r, err := floatFromHexBytes(req.Header.Get("Sampling-ID")); err != nil {
r = rand.Float64()
}
start := time.Now()
*requestsInPastMinute++
i, err := callAnotherService(r)
resp.Write(i)
if r < 1.0 / sampleRate {
RecordEvent(req, sampleRate, start, err)
}
}
The previous code lets us have a predictable retention window (or bill, with another collection service). However, it has one significant drawback, which we’ll address in the next chapter on per-key rates.