Page MenuHomePhabricator

phabricator_conduit.conduit_methodcalllog failed replicating on dbstore1002, probably m3 needs a reload on that server
Closed, ResolvedPublic


PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Delete_rows_v1 event on table phabricator_conduit.conduit_methodcalllog: Cant find record in conduit_methodcalllog, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1048-bin.001351, end_log_pos 640518090

Event Timeline

Given that m3 isn't big (100G) I can either reimport that table or the whole tablespace from db2012

We should stop the slaves in sync and import at least the tables on phabricator_conduit.

That also works. We can stop db1048's replication for a few seconds let the slaves reach the same position, stop them, and then start db1048 again

Mentioned in SAL (#wikimedia-operations) [2016-11-23T10:24:25Z] <marostegui> Stopping replication on the following m3 hosts for maintenance - db1048, dbstore1002 (m3 instance), db2012 - T151384

Change 323137 had a related patch set uploaded (by Marostegui):
site.pp: m3 has the wrong db master entry

Change 323137 merged by Marostegui:
site.pp: m3 has the wrong db master entry

The table phabricator_conduit.conduit_methodcalllog has been reimported from db2012 to dbstore1002 and replication is working again.
On dbstore1002 I have left the old table renamed, just in case:

MariaDB DBSTORE localhost phabricator_conduit > show tables like '%BKUP%';
| Tables_in_phabricator_conduit (%BKUP%) |
| BKUP_conduit_methodcalllog             |
1 row in set (0.00 sec)

Let's resolve, reopen if something else happens. dbstore1002 is not at all a primary server for phabricator.

Sounds good. I will leave the table there and add a note in my calendar to drop it in a couple of weeks