With https://gerrit.wikimedia.org/r/q/1204f0de in MediaWiki 1.42/wmf.17 we have started to push what it has become quite a big metric (~2M metrics!): mediawiki_resourceloader_build_seconds_bucket. This is due to an explosion in cardinality by combining extension + buckets + per-host metrics, e.g.
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="+Inf", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="0.005", name="user_options", site="codfw"} 1477903
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="0.01", name="user_options", site="codfw"} 1478057
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="0.025", name="user_options", site="codfw"} 1478060
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="0.05", name="user_options", site="codfw"} 1478062
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="0.1", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="0.25", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="0.5", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="1", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="10", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="2.5", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="30", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="5", name="user_options", site="codfw"} 1478083
mediawiki_resourceloader_build_seconds_bucket{cluster="api_appserver", instance="mw2261:9112", job="statsd_exporter", le="60", name="user_options", site="codfw"}And has resulted in ~30k samples/s additonal load on prometheus/ops
I'm not sure right off the bat how to address the issue, though for sure going forward we should be paying extra attention when dealing with histograms in mw since those are easy to make cardinality explode, what do you think @colewhite @herron @DAlangi_WMF (cc @Krinkle since I saw you followed up on the change above)
