I wrote a script (P64213) that can compare set of pooled replicas of a section in a given metric. I picked mysql_global_status_innodb_data_read for now but making it work for any other metric is easy.
Here is the result:
section | dc | graph |
s7 | codfw | |
s8 | codfw | |
s2 | codfw | |
s1 | codfw | |
s6 | codfw | |
s3 | codfw | |
s5 | codfw | |
s4 | codfw | |
s5 | eqiad | |
s2 | eqiad | |
s7 | eqiad | |
s8 | eqiad | |
s6 | eqiad | |
s3 | eqiad | |
s4 | eqiad | |
s1 | eqiad | |
There are some really large spikes (due to maint) that I had to throw away any metrics is quite different from median of the values for that replica. That's why sometimes it looks cut or incomplete.
Also in some cases, the under-utilization is intentional (vslow, dump, etc.) but in some cases it's probably not.
This helps as distribute the load better and make T360930: Section-wide circuit breaking more effective.