db1028 paged for high lag today
```
01:54 <kaldari> !log foreachwiki extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php
02:21 -icinga-wm:#wikimedia-operations- PROBLEM - MariaDB Slave Lag: s7 on db1028 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.01 seconds
03:38 <godog> kaldari: is it possible populateLocalAndGlobalIds.php is spamming centralauth ? we got a slave lag page
03:38 <godog> also traffic increased on s7 globally
https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?var-dc=eqiad%20prometheus%2Fops&var-group=All&var-shard=s7&var-role=All&from=1481233111803&to=1481254711803
03:38 <kaldari> godog: Maybe, I'll kill it for now....
03:39 <godog> kaldari: ok thanks! I'll keep an eye on it and see if that was the cause
03:40 <godog> only db1028 suffered though, the other slaves didn't have a problem with it
03:41 <kaldari> godog: killed it
03:42 <godog> yeah written rows are dropping
03:44 <kaldari> godog: that's not surprising, all the script does is write a lot of rows, but it's supposed to waitForSlaves
after each batch of 1000.
03:45 <kaldari> it did about 4 million writes this afternoon before I killed it
03:47 <godog> kaldari: ah, db1028 is weighted at 0 in mw I wonder if that's related
03:53 <godog> I wonder if db1028 will get a chance of catching up on the lag now
03:53 <godog> kaldari: anyways thanks for killing it, so far my best lead is db1028 being at weight 0 and waitforslave not waiting for it
03:54 <kaldari> looks like the lag is leveling off at least (rather than continuing to climb)
03:55 <kaldari> and now actually went down a little bit
03:56 <kaldari> godog: thanks for pinging me, looks like we might be on the way back to normal now.
03:57 <kaldari> I was looking at the slave lag when I first started running the script, but hadn't checked it since
03:58 <godog> kaldari: no worries, yeah it took about 30m to page I think
03:58 <kaldari> hmm, now it went back up again
03:59 <kaldari> godog: https://tendril.wikimedia.org/chart?hosts=db1028&vars=seconds_behind_master&mode=value
04:01 <godog> indeed, I'm comparing it with e.g. db1062 in grafana
04:01 <godog> the "write query stats" panel, which it dropped for db1062 but not db1028
04:04 <godog> looks like it might be still flushing to disk, writing a lot
04:04 <kaldari> yeah, maybe it still has a lot to catch up on
04:05 -icinga-wm:#wikimedia-operations- RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag:
0.00 seconds
04:06 <godog> there we go
04:07 -icinga-wm:#wikimedia-operations- RECOVERY - MariaDB Slave Lag: s7 on db1028 is OK: OK slave_sql_lag Replication lag: 0.00
seconds
04:08 <godog> I'll wait a bit to make sure db1028 is ok
```
So it looks like lag increased only on db1028 which is the only non-master at weight 0