Currently, the only service Jaeger shows on trace.wikimedia.org is OTLPResourceNoServiceName.
This is because the OTLP exporter extension packaged with our version of Envoy actually doesn't export a service name at all -- but it's required by the OTLP spec, and so a placeholder value is filled in. (Note that despite a lot of digging, I haven't actually found where this happens yet.)
This omission was corrected with Envoy pull request #22472. That makes the service name easily configurable by name on each tracer stanza, matching how other tracer implementations work in Envoy.
The merge commit of such, in August 2022, became part of v1.24.0 -- but not part of any earlier version.
We currently run v1.23.12, and so don't have this fix. A service-wide Envoy upgrade is a so-called "heavy lift"; it involves at the very least a redeployment of every service we run.
Fortunately the OTel Collector has various data rewriting abilities -- many thanks @Clement_Goubert for the original suggestion to run that as our collector everywhere.
Proposal
Use the OpenTelemetry Collector's transformprocessor to rewrite the service.name Resource according to the following steps:
- On k8s
- If a valid service.name is already set, use that and do nothing else.
- Ensures forwards compatibility.
- If a span's upstream_cluster.name is set to something other than local_service, use that as the service name.
- Allows easy overriding in existing infrastructure (see below re: mesh).
- Otherwise, take the piece of the span's node_id value before the first period, and use that as the service name (on k8s this is the pod name, example: mw-debug.eqiad.pinkunicorn-5bbd65ff7c-ws289)
- Provides a sensible default without redeploying anything -- node_id is set automatically already.
- Otherwise, use our own "unknown" value.
- If a valid service.name is already set, use that and do nothing else.
- On bare-metal
- If a valid service.name is already set, use that and do nothing else.
- Ensures forwards compatibility.
- If a span's upstream_cluster.name is set to something other than something matching ^local_(port|path)_.*, use that as the service name.
- Ensures forwards compatibility.
- Otherwise, let's add an optional hiera to the role for a service name, use that if present, and if not, use our own "unknown" value.
- If a valid service.name is already set, use that and do nothing else.
Additionally, for k8s, I propose a new minor version of the mesh module that:
- allows specifying a service name for tracing as part of its configuration, which if set, will override the local_service cluster name
- and where that name will default to {{ .Release.Namespace }} if not set
Alternatives considered
Other OTel processors ๐ซ
In this case, since we need to rewrite service.name, which is defined as a Resource in the spec, we can't use either the attributesprocessor nor the spanprocessor. Since the source of the data we wish to write into the service.name Resource is in the span attributes, and not already in the Resources section, we can't use the resourceprocessor either.
Upgrade Envoy ๐
Too much work and too much risk for this quarter.
However, the implementation described above allows for a graceful migration to defining the service name directly in Envoy when we do upgrade in the future.

