Page MenuHomePhabricator

Scap prometheus migration: Reduce the cardinality of scap timers/statsd metrics
Closed, ResolvedPublic

Description

We make use of "timers" with variables in scap, e.g. for scap deploy we show Finished deploy_foo/deploy (duration: 00m 29s)

These timers, optionally, send timing data to statsd.

This means graphite (and in future prometheus) has a lot of very long labels like:

  • running_helm___kubeconfig__etc_kubernetes_mw_api_ext_deploy_codfw_config_rollback_canary_in__srv_deployment_charts_helmfile_d_services_mw_api_ext
  • deploy__gerrit_gerrit_30691f2_
  • scap_prep_1_43_0_wmf_9

We should normalize or eliminate timers containing variables to reduce cardinality in statsd and hopefully make our metrics more useful

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Reduce the cardinality of scap timers/statsd metrics (phase 2)repos/releng/scap!572dancymaster-I35419fd6888b89c5f9bfc68482665b24dd476f60master
Reduce the cardinality of scap timers/statsd metrics (phase 1)repos/releng/scap!569dancymaster-I56b7ebbc58ed1460f658a2049586d57d69944a52master
Reduce the cardinality of scap timers/statsd metricsrepos/releng/scap!562dancymaster-I81ae5d2fdf6a1a9c8b959c976f52d4b16022ead4master
Customize query in GitLab

Event Timeline

@lmata @colewhite Does the existing statsd listener accept tagged datapoints like so?

metric_name;tag1=value1;tag2=value2

If it does, that would be useful to us as we work on transitioning to Prometheus style metrics (which support labels in a similar way to Graphite tags).

If it does, that would be useful to us as we work on transitioning to Prometheus style metrics (which support labels in a similar way to Graphite tags).

We use statsite as the metrics entry point. I would guess it doesn't from a quick look at the docs and bug tracker.

thcipriani merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/569

Reduce the cardinality of scap timers/statsd metrics (phase 1)

Mentioned in SAL (#wikimedia-operations) [2024-11-15T19:51:31Z] <dancy@deploy2002> Started scap sync-world: Testing T377883

Mentioned in SAL (#wikimedia-operations) [2024-11-15T19:54:37Z] <dancy@deploy2002> Finished scap sync-world: Testing T377883 (duration: 03m 06s)

If it does, that would be useful to us as we work on transitioning to Prometheus style metrics (which support labels in a similar way to Graphite tags).

We use statsite as the metrics entry point. I would guess it doesn't from a quick look at the docs and bug tracker.

Thanks for this information. I've kept changes simple in that case.

Is it possible to delete the data series with the following name prefixes from graphite?
scap.deploy__
scap.running_
scap.scap_prep_

Is it possible to delete the data series with the following name prefixes from graphite?
scap.deploy__
scap.running_
scap.scap_prep_

Done!

dancy claimed this task.
dancy triaged this task as Low priority.