db1134 paged for high replication lag:
`22:07:54 <+icinga-wm> PROBLEM - MariaDB Replica Lag: s1 #page on db1134 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1468.87 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica`
We depooled it: `!log rzl@cumin1001 dbctl commit (dc=all): 'depool db1134', diff saved to https://phabricator.wikimedia.org/P14310 and previous config saved to /var/cache/conftool/dbconfig/20210211-031048-rzl.json`
As of this writing, still digging into why, although the graphs suggest whatever happened happened at 2245: https://grafana-rw.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1134&var-port=9104&from=1613006340000&to=1613013540000
And @colewhite notes `Feb 11 02:42:44 db1134 mysqld[3122]: 210211 2:42:44 [ERROR] mysqld got signal 7`, and that the syslog indicates memory corruption.
#DBA, I'm leaving it depooled overnight for you to investigate during your working hours. <3
Logs:
```
Feb 11 02:42:44 db1134 kernel: [1694159.917831] {1}Hardware error detected on CPU12
Feb 11 02:42:44 db1134 kernel: [1694159.917842] {1}event severity: recoverable
Feb 11 02:42:44 db1134 kernel: [1694159.917843] {1} Error 0, type: recoverable
Feb 11 02:42:44 db1134 kernel: [1694159.917844] {1} fru_text: B3
Feb 11 02:42:44 db1134 kernel: [1694159.917844] {1} section_type: memory error
Feb 11 02:42:44 db1134 kernel: [1694159.917845] {1} error_status: 0x0000000000000400
Feb 11 02:42:44 db1134 kernel: [1694159.917846] {1} physical_address: 0x0000006901926840
Feb 11 02:42:44 db1134 kernel: [1694159.917848] {1} node: 2 card: 2 module: 0 rank: 1 bank: 0 row: 675 column: 0
Feb 11 02:42:44 db1134 kernel: [1694159.917850] {1} DIMM location: not present. DMI handle: 0x0000
Feb 11 02:42:44 db1134 kernel: [1694159.919460] Memory failure: 0x6901926: Killing mysqld:3122 due to hardware memory corruption
Feb 11 02:42:44 db1134 kernel: [1694159.928058] Memory failure: 0x6901926: recovery action for dirty LRU page: Recovered
Feb 11 02:42:44 db1134 mysqld[3122]: 210211 2:42:44 [ERROR] mysqld got signal 7 ;
```
```
racadm getsel
Record: 11
Date/Time: 02/11/2021 01:38:37
Source: system
Severity: Critical
Description: Correctable memory error rate exceeded for DIMM_B3.
-------------------------------------------------------------------------------
Record: 12
Date/Time: 02/11/2021 01:38:44
Source: system
Severity: Critical
Description: Correctable memory error logging disabled for a memory device at location DIMM_B3.
-------------------------------------------------------------------------------
Record: 13
Date/Time: 02/11/2021 02:42:43
Source: system
Severity: Critical
Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_B3.
```