Page MenuHomePhabricator

Proof of Concept: OpenTelemetry in MediaWiki
Closed, ResolvedPublicSpike

Description

3-day spike to explore possibilities to wire the OpenTelemetry library with MediaWiki code. Then utilize the OpenTelemetry to generate/forward tracestate and traceparent headers.

Possible options:

  • Module within MediaWiki Core
  • MediaWiki Extension
  • Do not use the OpenTelemetry, expect the edge to inject headers, and MW will forward those (implemented in T320559)

Definition of done: working PoC with recommended implementation, effort estimation, and security risk assessment.

Event Timeline

Aklapper renamed this task from Proof of Cocept: OpenTelemetry in MediaWiki to Proof of Concept: OpenTelemetry in MediaWiki.Aug 14 2023, 2:58 PM
pmiazga edited projects, added Spike; removed Epic.
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptAug 21 2023, 1:53 PM

For quick hacking I'm going to use the OpenTelemetry monorepo - https://github.com/open-telemetry/opentelemetry-php as the one from mszabo (https://github.com/mszabo-wikia/opentelemetry-php) seems a bit outdated.

pmiazga changed the task status from Open to In Progress.Aug 22 2023, 3:49 PM

Sorry for a bit of silence in this ticket, Recently I focused a little bit more on the T344926 issue which caused us lots of trouble due to layering issues (RESTBagOStuff and MultiHttpClient are part of general libs but Telemetry is MediaWiki specific).

I got the OpenTelemetry running locally and I was able to send reports to the local collector. The effort to use this library is manageable. After a brief conversation with Joanna from SRE we would like to use OpenTelemtry not to only pass tracing headers, but we would also like to use the spans across MediaWiki execution. If possible we would like to add spans for things like PoolCounter locks, database calls, HTTP requests and maybe long-running parts of MW execution. At first, we will start with a smaller set of things we instrument, and then with time, we will extend the components we would like to monitor.

OpenTelemetry provides automatic instrumentation (they distribute the opentelemetry.so library) but it seems to be a little bit too invasive and it would require us to add a PHP library. Also, automatic instrumentation with PHP requires at least PHP 8.0, and the OpenTelemetry PHP extension. The extension enables registering observer functions (as PHP code) against classes and methods, and executing those functions before and after the observed method runs. It doesn't automatically generate traces, we would still have to manually create those. For automatic traces, there is a set of libraries (https://packagist.org/search/?query=open-telemetry&tags=instrumentation) but it's again another set of code we would have to load and execute within MediaWiki core. This is not the way for us.

We should use Manual instrumentation, where we only load the OpenTelemetry SDK and manually instrument the things we need. Currently, I'm still checking the performance of this solution, the good part is that OpenTelemetry provides a built-in way to sample the data it reports, but overall it requires plenty of dependencies/the overhead it is doing is pretty big.

Another thing I discovered is that there is an ongoing effort to provide a Application Tracing PSR-22 standard (https://github.com/php-fig/fig-standards/pull/1301) but it's in DRAFT state. It started in May 2023 and since then there is not much action. We should keep an eye on this PSR. Maybe even we could help to push it forward, therefore MediaWiki could stick to PSR-22 instead of hard dependency upon OpenTelemetry-PHP lib.

Problems/blockers I noticed during the experiment:

  • I cannot run the full OpenTelemetry stack locally as (examples from OpenTelemtry repo) as some docker images are not supported for Apple Silicon Chips. This caused me a little of trouble and I had to use a regular x86 machine to be able to use all tools provided by OpenTelemtry. For example otel/opentelemetry-collector-contrib or zipkin are not available. For active development most likely we will need some services on BetaCluster.
  • the PHP package for OpenTelemtry is still in beta state, and our composer configuration requires stable stability. We can overcome this by installing a specific version of the library but this way we will lock ourselves. Also, for some reason we decided to use only stable packages - and due to the fact that opentelemetry is still in the beta phase - it may require more work than expected when we want to migrate between different versions
List of options and my stand on it

Providing a MediaWiki OpenTelemtry extension is most likely not an option due to the fact that we would like to instrument internals of MediaWiki core. Most likely it has to be bundled with MediaWiki Core as I don't see an easy way to instrument core internals. It could be possible with Automatic Instrumentation but as I highlighted earlier - it's not an option.
I'm not fully sure yet what the best way to run MediaWiki without OpenTelemtry on third parties is. I'm leaning towards having an interface and two implementations/two services (a OpenTelemetryAdapter and NullTelemetryAdapter) and then based on the configs return one of those. But also using an OpenTelemtry as is, and then having a Null Collector/Exporter is a possibility too.

Also, we cannot go with the third option. Just leaving envoy to pass us tracing headers and then dumbly forwarding those - this way, without OpenTelemetry SDK we won't be able to provide spans for internals nor use any of it's perks.

Change 958470 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/core@master] DNM: OpenTelemetry Proof-of-concept

https://gerrit.wikimedia.org/r/958470