phabricator_conduit.conduit_methodcalllog failed replicating on dbstore1002, probably m3 needs a reload on that server
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jcrespo
	Nov 22 2016, 7:54 PM

Description

PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Delete_rows_v1 event on table phabricator_conduit.conduit_methodcalllog: Cant find record in conduit_methodcalllog, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1048-bin.001351, end_log_pos 640518090

Details

	Subject	Repo	Branch	Lines +/-
	site.pp: m3 has the wrong db master entry	operations/puppet	production	+2 -2

Customize query in gerrit

Event Timeline

jcrespo created this task.Nov 22 2016, 7:54 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 22 2016, 7:54 PM

Paladox subscribed.Nov 22 2016, 9:02 PM

Given that m3 isn't big (100G) I can either reimport that table or the whole tablespace from db2012

We should stop the slaves in sync and import at least the tables on phabricator_conduit.

That also works. We can stop db1048's replication for a few seconds let the slaves reach the same position, stop them, and then start db1048 again

• Marostegui claimed this task.Nov 23 2016, 10:18 AM

Mentioned in SAL (#wikimedia-operations) [2016-11-23T10:24:25Z] <marostegui> Stopping replication on the following m3 hosts for maintenance - db1048, dbstore1002 (m3 instance), db2012 - T151384

Change 323137 had a related patch set uploaded (by Marostegui):
site.pp: m3 has the wrong db master entry

https://gerrit.wikimedia.org/r/323137

gerritbot added a project: Patch-For-Review.Nov 23 2016, 10:49 AM

Change 323137 merged by Marostegui:
site.pp: m3 has the wrong db master entry

https://gerrit.wikimedia.org/r/323137

The table phabricator_conduit.conduit_methodcalllog has been reimported from db2012 to dbstore1002 and replication is working again.
On dbstore1002 I have left the old table renamed, just in case:

MariaDB DBSTORE localhost phabricator_conduit > show tables like '%BKUP%';
+----------------------------------------+
| Tables_in_phabricator_conduit (%BKUP%) |
+----------------------------------------+
| BKUP_conduit_methodcalllog             |
+----------------------------------------+
1 row in set (0.00 sec)

Let's resolve, reopen if something else happens. dbstore1002 is not at all a primary server for phabricator.

Sounds good. I will leave the table there and add a note in my calendar to drop it in a couple of weeks
Thanks

phabricator_conduit.conduit_methodcalllog failed replicating on dbstore1002, probably m3 needs a reload on that serverClosed, ResolvedPublicActions

Description

Details

Event Timeline

phabricator_conduit.conduit_methodcalllog failed replicating on dbstore1002, probably m3 needs a reload on that server
Closed, ResolvedPublic
Actions