Page MenuHomePhabricator

replication broken on db1124:3318 on wikidata.pagelinks
Closed, ResolvedPublic

Description

db1124 reported broken replication 2018-11-14 09:42
wikidatawiki.pagelinks (s8) was not able to execute an event because of a missing row.

We searched the event which was not able to execute, it was a DELETE ROW, but the row was non-existent on the santitarium host.
We created the row manually and restarted replication

Event Timeline

Banyek created this task.Nov 14 2018, 5:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 14 2018, 5:56 PM
Banyek closed this task as Resolved.Nov 14 2018, 5:56 PM
Marostegui added a comment.EditedDec 7 2018, 6:27 AM

This broke again yesterday:

Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table wikidatawiki.pagelinks; Can't find record in 'pagelinks', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master l
Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [Warning] Slave: Can't find record in 'pagelinks' Error_code: 1032
Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'db1087-bin.003545' position 943920625
Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [Note] Slave SQL thread exiting, replication stopped in log 'db1087-bin.003545' at position 943920625

I fixed it with:

INSERT INTO `pagelinks` VALUES (7232144,0,'Q639669',0);
Marostegui renamed this task from replication broken on db1124 to replication broken on db1124:3318 on wikidata.pagelinks.Dec 7 2018, 7:39 AM

This has happaned today again with pagelinks. I have had to add a few rows and delete multiple rows after it that were giving duplicate entry for another pl_from value.
I have waited until db1124:3318 fully caught up with the master before I left.
We need to check again pagelinks between the master and the sanitarium host.

Another breakage happened today:

05:28 <+icinga-wm> PROBLEM - MariaDB Slave SQL: s8 on db1124 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Delete_rows_v1 event on table wikidatawiki.pagelinks: Cant find record in pagelinks, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1087-bin.003648, end_log_pos 920552713
05:38 <+icinga-wm> PROBLEM - MariaDB Slave Lag: s8 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 750.52 seconds

Maybe we should reimport the table in january?

Maybe we should reimport the table in january?

Yeah, check this https://phabricator.wikimedia.org/T212574#4842526