The W3C trace context specification is an amazing standard and a massive leap in standardization of telemetry correlation in the current climate of microservices being the de facto for new systems (that’s a debate for another day).
One of the issues with the W3C trace context is that it doesn’t define any standards for how far a trace is to propagate. If a third party accidentally sends trace headers from their service, you’ll use their trace IDs and baggage data. This can have unwanted affects on your telemetry backend, such as the trace showing missing root spans, or including multiple API calls in a single trace at the top level. This makes understanding and debugging trace data hard. Worse though, the baggage data from the third party could contain PII data, which would therefore mean you’re processing PII without realizing it.
The baggage issue
Imagine that you have a public API, which is called by your clients. This API also calls out to a third party for exchange rate information.
You’re really careful internally that you don’t set Personally Identifiable Information (PII) in baggage, as you know that it will be sent to the Exchange Rate service of the third party.
It’s not your data that you’re passing onto the Exchange Rate service, however, those incoming baggage headers aren’t useful to you and therefore should be ignored.
Trace propagation in .NET
W3C trace context in .NET is propagated in two ways. The first is built into the .NET Runtime using a class called DistributedContextPropagator
. The second is part of OpenTelemetry using the TextMapPropagator
class.
We must override both of these classes for inbound and outbound propagation to be disabled in an ASP.NET Core site.
To override the DistributedContextPropagator
, you need to remove the one added by the ASP.NET Core HostBuilder.
using System.Diagnostics; var builder = WebApplication.CreateBuilder(args); builder.Services.Remove(new ServiceDescriptor( typeof(DistributedContextPropagator), typeof(DistributedContextPropagator), ServiceLifetime.Singleton)); builder.Services.AddSingleton<DistributedContextPropagator, CustomContextPropagator>(); // .. other service registrations
To override the OpenTelemetry propagators, you need to register them with the SetDefaultTextMapPropagator
method.
using OpenTelemetry; Sdk.SetDefaultTextMapPropagator(new CompositeTextMapPropagator( new List<TextMapPropagator>() { new CustomPropagator() }));
If you need to inject additional objects into your custom propagator, OpenTelemetry has a new method that’s run as the OpenTelemetry TracerProvider is created. It’s called ConfigureOpenTelemetryTracerProvider
, which takes the TracerProviderBuilder
and also the built ServiceProvider
.
builder.Services.AddSingleton<CustomPropagator>(); services.ConfigureOpenTelemetryTracerProvider((sp, tp) =>{ Sdk.SetDefaultTextMapPropagator(new CompositeTextMapPropagator( new List<TextMapPropagator>() { sp.GetRequiredService<CustomPropagator>() })); });
Ignore all incoming trace data
The easiest way around the propagation issue is to ignore all incoming trace headers. This is fine if your service only has public endpoints. If you need something a little more granular, Part 2 includes more details on how you can do this with conditional logic.
First, we create a derived class from DistributedContextPropagator
:
internal class DisableAllContextPropagator : DistributedContextPropagator { public override IReadOnlyCollection<string> Fields { get; } = new ReadOnlyCollection<string>(new[] { "traceparent" }); public override IEnumerable<KeyValuePair<string, string?>>? ExtractBaggage(object? carrier, PropagatorGetterCallback? getter) { throw new NotImplementedException(); } public override void ExtractTraceIdAndState(object? carrier, PropagatorGetterCallback? getter, out string? traceId, out string? traceState) { throw new NotImplementedException(); } public override void Inject(Activity? activity, object? carrier, PropagatorSetterCallback? setter) { throw new NotImplementedException(); } }
Here, we have three methods that we’re interested in. The first two (ExtractBaggage
and ExtractTraceIdAndState
) are about extracting the inbound trace context, whereas the last one (Inject
) is about pushing our current trace context onto our downstream services.
We still want downstream trace propagation to work as it’s important for our internal distributed tracing to produce a correlated trace waterfall, so for that, we’ll bring in the default propagator and delegate to that. CreateDefaultPropagator
is a static method on DistributedContextPropagator
that will create what would have been the propagator if we didn’t override. Right now (.NET 7), it returns a LegacyPropagator.
internal class DisableAllContextPropagator : DistributedContextPropagator { private readonly DistributedContextPropagator _legacy = CreateDefaultPropagator(); // other code public override void Inject(Activity? activity, object? carrier, PropagatorSetterCallback? setter) { _legacy.Inject(activity, carrier, setter); } }
For our other two methods, we want to return defaults as we don’t want to take into any inbound context data.
internal class DisableAllContextPropagator : DistributedContextPropagator { // other code public override IEnumerable<KeyValuePair<string, string?>>? ExtractBaggage(object? carrier, PropagatorGetterCallback? getter) { return Enumerable.Empty<KeyValuePair<string, string?>>(); } public override void ExtractTraceIdAndState(object? carrier, PropagatorGetterCallback? getter, out string? traceId, out string? traceState) { traceId = null; traceState = null; return; } // other code }
We then need to do the same for the OpenTelemetry propagators. In OpenTelemetry, however, there are two separate propagators. One is for the trace context (TraceContextPropagator
), and the other is for the baggage (BaggagePropagator
). The code is pretty similar, and the logic is the same. The class these are derived from is the TextMapPropagator
, which has only two methods we’re interested in.
internal class DisableAllTracePropagator : TraceContextPropagator { public override PropagationContext Extract<T>(PropagationContext currentContext, T carrier, Func<T, string, IEnumerable<string>> getter) { throw new NotImplementedException(); } public override void Inject<T>(PropagationContext context, T carrier, Action<T, string, string> setter) { throw new NotImplementedException(); } }
As in the DistributedContextPropagator
, we want to return defaults from the Extract<T>
method, and delegate the Inject<T>
method to what would have been the existing Propagator.
public override PropagationContext Extract<T>(PropagationContext currentContext, T carrier, Func<T, string, IEnumerable<string>> getter) { return new PropagationContext(new ActivityContext(), new Baggage()); } public override void Inject<T>(PropagationContext context, T carrier, Action<T, string, string> setter) { base.Inject(context, carrier, setter); }
Repeat the same code for the BaggagePropagator
.
Once we have all the classes, we need to register them. I do this with an extension to the IServiceCollection
, as it encapsulates the setup nicely and gives it context without having to use comments and sections.
public static IServiceCollection DisableInboundTracePropagation(this IServiceCollection services) { services.Remove(new ServiceDescriptor(typeof(DistributedContextPropagator), typeof(DistributedContextPropagator), ServiceLifetime.Singleton)); services.AddSingleton<DistributedContextPropagator, DisableAllContextPropagator>(); services.ConfigureOpenTelemetryTracerProvider((sp, tp) =>{ Sdk.SetDefaultTextMapPropagator(new CompositeTextMapPropagator( new List<TextMapPropagator>() { new DisableAllTracePropagator(), new DisableAllBaggagePropagator() })); }); return services; }
Conclusion
Trace propagation is the true superpower of debugging distributed systems in production. As the movies say, with “With great power comes great responsibility.” You need to consider carefully whether you trust your consumers not to provide those headers, whether you’re going to strip them before they make it to your application, or whether you want to be a little more clever.
In the next post, I’ll cover a more advanced approach to deciding when to trust inbound based on criteria from the request like allowing it for specific endpoints. In the meantime, want more .NET content? Here’s a piece on OpenTelemetry performance degradation in .NET.