Page MenuHomePhabricator

statsd-exporter in k8s is not configured to use its mapping configuration
Closed, ResolvedPublic

Description

Example taken from mw-api-int namespace:

    {
        "Id": "201c3ef01ea0f4ef132fb3eaa03e67513d31bc3b07d875d8b67fb2bb5682bf73",
        "Created": "2024-06-18T10:32:16.275077554Z",
        "Path": "/usr/bin/prometheus-statsd-exporter",
        "Args": [
            "--statsd.listen-udp=0.0.0.0:9125",
            "--statsd.listen-tcp=0.0.0.0:9125"
        ],
<snip>

It appears to be missing --mapping-config=/etc/monitoring/prometheus-statsd.conf.

This is causing timing metrics to be exported as quantiles and not histograms as we expected.

Event Timeline

Change #1051429 had a related patch set uploaded (by RLazarus; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] statsd: re-add default args

https://gerrit.wikimedia.org/r/1051429

Change #1051429 merged by jenkins-bot:

[operations/deployment-charts@master] statsd: re-add default args

https://gerrit.wikimedia.org/r/1051429

Mentioned in SAL (#wikimedia-operations) [2024-07-02T21:51:04Z] <rzl@deploy1002> Started scap sync-world: T369080

Mentioned in SAL (#wikimedia-operations) [2024-07-02T21:54:42Z] <rzl@deploy1002> Finished scap: T369080 (duration: 04m 13s)

Disregard the above scap, I got too carried away with "never run helmfile across all mw deployments, use scap instead" but obviously that rule doesn't apply here. :)

I followed up with a helmfile run in each of services/mw-* so this should now be everywhere.

Thank you @RLazarus!

@dcausse, I see some metrics now at mediawiki_cirrus_search_request_time_bucket. Anything amiss?

@dcausse, I see some metrics now at mediawiki_cirrus_search_request_time_bucket. Anything amiss?

Everything looks good, thanks everyone for the quick fix!

colewhite assigned this task to Joe.