Page MenuHomePhabricator

add disk space usage to grafana dashboard for gitlab-runners
Closed, ResolvedPublic

Description

To check on effects of measures such as https://gerrit.wikimedia.org/r/c/operations/puppet/+/881007 for T327060 it would be useful to have graphs showing disk space usage on gitlab-runners.

We have an existing dashboard for gitlab-runner details at:

https://grafana.wikimedia.org/d/H6fikj0nk/gitlab-runner-detail?orgId=1&refresh=30s&from=1674065872891&to=1674152272891

But this does not have a panel for disk space usage so far.

The generic host dashboard for any host also does not have it, but does have similar things like disk utilization and disk saturation.

Let's add a panel to the gitlab-runner specific dashboard that shows simple "space left".

It's suprisingly hard for me to do that, would enjoy some teachings from others with experience creating dashboards in grafana-rw.

Event Timeline

I like the idea of having disk usage in the gitlab-runner dasboard too. Although this will be for Trusted Runners only, as Shared Runners in WMCS are not scraped by Prometheus currently.

It's suprisingly hard for me to do that, would enjoy some teachings from others with experience creating dashboards in grafana-rw.

We can pair on Monday if you like. I often use a combination of Grafanas copy & paste function or just copy the raw JSON from an existing dashboard. (so either click on headline -> more -> copy and then on the other dashboard top right + symbol -> paste panel from clipboard. Or just click on headline -> inspect -> panel json). But we can do that together if you like.

I like the idea of having disk usage in the gitlab-runner dasboard too.
We can pair on Monday if you like.

Sounds good to me:) thank you!

I just found grafana-cloud.wikimedia.org. So we can also try to add dashboards for Shared Runners. There seems to be some metrics available like: https://grafana-cloud.wikimedia.org/d/000000590/instance-details?orgId=1&var-project=gitlab-runners&var-job=node&var-node=runner-1021&from=now-24h&to=now

LSobanski triaged this task as Medium priority.Jan 30 2023, 4:41 PM
LSobanski moved this task from Incoming to Work in Progress on the collaboration-services board.

@Dzahn and I created multiple dashboards for runner disk space usage:

Overview over all Trusted Runners: https://grafana.wikimedia.org/d/Chb-gC07k/gitlab-ci-overview?orgId=1&from=now-7d&to=now&viewPanel=10
Detail for one Runner: https://grafana.wikimedia.org/d/H6fikj0nk/gitlab-runner-detail?orgId=1&refresh=30s&viewPanel=26

And we started a dashboard in Grafana Cloud: https://grafana-cloud.wikimedia.org/d/FrErwP0Vk/gitlab-runner-overview?orgId=1&from=now-7d&to=now

It seems the node exporter for cloud hosts is missing metrics for /var/lib/docker. I'll open a follow up task to get this metric also into prometheus and the dashboard.

follow-up task was declined almost immediately and we are now being blocked by that. T328972#8591290

Jelto claimed this task.

All gitlab-runners dashboards have disk usage for / and /var/lib/docker now.

Overview Trusted Runners: https://grafana.wikimedia.org/d/Chb-gC07k/gitlab-ci-overview?orgId=1&from=now-7d&to=now&viewPanel=10
Detail Runner: https://grafana.wikimedia.org/d/H6fikj0nk/gitlab-runner-detail?orgId=1&refresh=30s&viewPanel=26
Grafana Cloud: https://grafana-cloud.wikimedia.org/d/FrErwP0Vk/gitlab-runner-overview?orgId=1&from=now-7d&to=now

Some dashboards could need a bit more tweaking for other graphs (especially the Grafana Cloud one). But I'm closing this task as disk space usage is available now on dashboards.