Page MenuHomePhabricator

useful operation names in traces
Open, Needs TriagePublic

Description

Right now we have Envoy's ingress for all operations. Not amazing.

I've experimented with mashing up parts of the HTTP method and easily-extractable parts of the URI (path & query) and the results look pretty good on a random many-service trace I've found:

image.png (1×1 px, 372 KB)

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone

Event Timeline

CDanis renamed this task from useful automatic operation names to useful operation names in traces.Jun 12 2024, 5:40 PM

Change #1042350 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] otelcol: Auto-generate useful operation names

https://gerrit.wikimedia.org/r/1042350

Example trace as processed in codfw production:
https://trace.wikimedia.org/trace/06aabdeeb578a2663034270cf6d4accf

@fgiunchedi please let me know what you think. I have mixed feelings about a few parts of it myself

Change #1042350 merged by jenkins-bot:

[operations/deployment-charts@master] otelcol: Auto-generate useful operation names

https://gerrit.wikimedia.org/r/1042350

Having thought about this a while, I think we should make some changes.

  1. Omit all query parameters. (It would be okay to indicate presence vs absence of any in the operation name -- no suffix vs ?... for instance.)
  2. For external-looking domains (roughly, ones not beginning with http://localhost, but I'll take a deeper look), collapse all the paths into one of a few options, whichever is the best fit:
    • /w/${FOOBAR}.php
    • /wiki/...
    • /...
    • /
  3. Then, like before, combine METHOD and the modified URL to make a name. Some examples:

This has a number of benefits:

  • Limits the cardinality of the Operation field, as well as (hopefully) correlates to distinct/interesting statistical classes -- as advised in both the Span spec and the HTTP Spans Semconv spec.
  • Limits the length of operation names, which is good because Jaeger's UI insists on showing the trace root's entire operation name, regardless of your screen size.
  • Adds some clicks between loading a trace and displaying PII. (Like logs, traces are NDA-access-protected and retention-time-limited, and not guaranteed to be PII-free/known to contain PII, although we do make modest effort to scrub high-value secrets at ingestion time.)

I think it's fine to leave the original URL in http.url without this new minification, and only use it for abbreviating operation names.

Having thought about this a while, I think we should make some changes.

  1. Omit all query parameters. (It would be okay to indicate presence vs absence of any in the operation name -- no suffix vs ?... for instance.)
  2. For external-looking domains (roughly, ones not beginning with http://localhost, but I'll take a deeper look), collapse all the paths into one of a few options, whichever is the best fit:
    • /w/${FOOBAR}.php
    • /wiki/...
    • /...
    • /
  3. Then, like before, combine METHOD and the modified URL to make a name. Some examples:

This has a number of benefits:

  • Limits the cardinality of the Operation field, as well as (hopefully) correlates to distinct/interesting statistical classes -- as advised in both the Span spec and the HTTP Spans Semconv spec.
  • Limits the length of operation names, which is good because Jaeger's UI insists on showing the trace root's entire operation name, regardless of your screen size.
  • Adds some clicks between loading a trace and displaying PII. (Like logs, traces are NDA-access-protected and retention-time-limited, and not guaranteed to be PII-free/known to contain PII, although we do make modest effort to scrub high-value secrets at ingestion time.)

I think it's fine to leave the original URL in http.url without this new minification, and only use it for abbreviating operation names.

I think this is reasonable. Jaeger still leaves span attributes searchable, so we'll not really lose context.