* Affected components: MediaWiki core and extensions.
* Engineer for initial implementation: @colewhite (WMF SRE Foundations)
* Code stewards: TBD.
### Motivation
The metrics interface in MediaWiki is outsourced and has heavy integrations with a StatsD-specific library. This situation renders StatsD metrics well enough and has served us well for quite a while, but there are some limitations to the current free-for-all approach.
1. There is no room to leverage other metrics backends or protocols.
1. There is no clear way to infuse orderliness or standards over what metrics are currently generated.
1. There is little to no documentation describing the metrics MW+Extensions already generate and what they are intended to show.
The Observability team is pushing to deprecate Graphite/StatsD in favor of Prometheus. [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/481110 | Prior attempts ]] to use the existing model with tools available have proven difficult and are regarded as unsustainable.
##### Requirements
Possibly incomplete.
1. A metrics interface that is sustainable and abstracts the backend implementation.
1. Maintains the current post-response emitting of metrics.
1. Introduces no dependencies, itself.
-------
### Exploration
The initial proposal was to insert statsd-exporter between MW and StatsD and leverage matching rules generate appropriate Prometheus metrics. It was noted the mapping rules would be difficult to maintain and introduced a circular dependency on a sidecar service managed by Puppet.
After that, we considered what it would take to have MW (and extensions) maintain their own statsd-exporter configuration and coalesce them on deploy. This seemed a fragile, error-prone option and had the drawback of putting an unnecessary burden on developers for a single backend solution.
After that, we explored what adopting an existing Prometheus-specific library would entail. Current options have dependencies on Redis, disk, or make heavy usage of APC. None of these options seemed great from the reliability, resource utilization, or current state of library development.
After that, we went back to statsd-exporter and found support for DogStatsD, a StatsD extension that adds key:value tags to metrics and uses the same UDP transport mechanism. This option has no need for a cross-request persistent backend. This is what the current implementation demonstrates but does have a few drawbacks:
1. This solution requires a sidecar for translation to Prometheus.
1. In order for statsd-exporter to automatically generate meaningful Prometheus metrics, the StatsD metrics namespace has to be tightly controlled.
1. An extra deploy step of restarting the sidecar is strongly recommended.
NOTE: The demonstration implementation does not handle the requirement to document the metrics being generated. Discussion on how to address this point in a sustainable way is welcome and requested.