Page MenuHomePhabricator

Wrong page title in labs database replica enwiki page table
Closed, ResolvedPublic

Description

A page and its talk page have the wrong title in the page table.

SELECT page_id, page_namespace, page_title FROM enwiki_p.page where page_id IN (50274778,1272531,976991,50274777) ORDER by page_namespace;

Page ids 976991 and 1272531 should have page title Orchard_Road instead of Orchard,_Singapore.

There was a page move on April 22, 2016‎.

Evidently the labs replica page table does not have the unique index that production has:
CREATE UNIQUE INDEX /*i*/name_title ON /*_*/page (page_namespace,page_title);

I don't think that this will be fixed by T126946 because the page table was reimported before April 1, 2016.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 31 2016, 4:07 PM
MariaDB [enwiki_p]> SELECT page_id, page_namespace, page_title FROM enwiki_p.page where page_id IN (50274778,1272531,976991,50274777) ORDER by page_namespace;
+----------+----------------+--------------------+
| page_id  | page_namespace | page_title         |
+----------+----------------+--------------------+
|   976991 |              0 | Orchard,_Singapore |
| 50274777 |              0 | Orchard,_Singapore |
|  1272531 |              1 | Orchard,_Singapore |
| 50274778 |              1 | Orchard,_Singapore |
+----------+----------------+--------------------+
4 rows in set (0.01 sec)

Yowza.

jcrespo added a subscriber: jcrespo.Jun 6 2016, 8:31 AM

After seeing many cases like this, I can conclude that replication to labs breaks whenever there is a page move, an archival or an undeletion. I have not yet clear why, but given that it only happens on labs, it is probably due to running insecure statements that break on filtering, or due to the filtering itself.

If the first case, that could be solved by switching to row based replication (which is planned, but not something that will happen immediately because it involves production, and it is not easy). If the second case, it may be even more difficult, or logically impossible (if things should be sanitized due to them containing private data and at the same time queries are done based on sanitized data, things will eventually drift sooner or later).

I think the more immediate way to solve this would be to finish the resync, hope that the final state will be 99% consistent, and then setup a process that monitors and fixes the differences until the root cause is analyzed and fixed (if that is possible).

Of course, I can fix individual reports, although we should setup a more convenient way than a ticket per row with problems.

jcrespo moved this task from Triage to Backlog on the DBA board.Jun 6 2016, 9:21 AM
chasemp triaged this task as High priority.Jun 21 2016, 1:39 PM
jcrespo closed this task as Resolved.Mar 3 2017, 10:16 AM
jcrespo claimed this task.

Fixed:

labsdb1001[enwiki]> SELECT page_id, page_namespace, page_title FROM page where page_id IN (50274778,1272531,976991,50274777) ORDER by page_namespace;
+----------+----------------+--------------------+
| page_id  | page_namespace | page_title         |
+----------+----------------+--------------------+
|   976991 |              0 | Orchard_Road       |
| 50274777 |              0 | Orchard,_Singapore |
|  1272531 |              1 | Orchard_Road       |
| 50274778 |              1 | Orchard,_Singapore |
+----------+----------------+--------------------+
4 rows in set (0.01 sec)

Of course this was already right on the new, reimported servers:

root@labsdb1009[enwiki]> SELECT page_id, page_namespace, page_title FROM page where page_id IN (50274778,1272531,976991,50274777) ORDER by page_namespace;
+----------+----------------+--------------------+
| page_id  | page_namespace | page_title         |
+----------+----------------+--------------------+
|   976991 |              0 | Orchard_Road       |
| 50274777 |              0 | Orchard,_Singapore |
|  1272531 |              1 | Orchard_Road       |
| 50274778 |              1 | Orchard,_Singapore |
+----------+----------------+--------------------+
4 rows in set (0.00 sec)