Page MenuHomePhabricator

Create a database CPU saturation dashboard for codfw
Open, LowPublic

Description

https://grafana.wikimedia.org/d/XyoE_N_Wz/wikidata-database-cpu-saturation?orgId=1 shows the values for eqiad but there is no equivalent for codfw. This was called out during the Sep 2020 DC failover.

Event Timeline

Marostegui triaged this task as Medium priority.Sep 2 2020, 2:35 PM
Marostegui moved this task from Triage to Backlog on the DBA board.
Marostegui added a subscriber: Marostegui.

To give some context to why we have an specific CPU one for s8 (wikidatawiki): a few months ago wikidatawiki was having serious performance issues, when we compressed InnoDB (and we had less servers), so we had to tweak LB weights mostly based on CPU usage.
Migrating to 10.4 showed a massive improvements when dealing with InnoDB compression and also we've added more servers to that section, so we are out of the woods in that sense.
But it is definitely useful to have a codfw CPU dashboard for s8 as well, at least to mimic what we have in eqiad.

I am not sure if we should keep this open anymore. We are not having CPU usage problems in s8 anymore (we have most of hosts running 10.4 and we have lots of hosts already).
If we do want to keep the codfw CPU dashboard, we'd also need to make sure it is populated automatically, as the eqiad one is done manually, that is: we have to add/remove new/old hosts from that dashboard manually.
With the last server movements in s8, I am pretty sure that eqiad dashboard doesn't reflect the reality anymore.

LSobanski lowered the priority of this task from Medium to Low.Thu, Apr 8, 12:26 PM

Dropping priority for now. I'll poke around the dashboards, see if I have any ideas and close if I won't.