Labs database replica drift
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	valhallasw
	Jun 29 2016, 5:22 PM

Description

https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database/Replica_drift

2017-10-17: Replica drift was a recurring problem for the Wiki Replicas prior to the introduction of row-based replication (RBR) between the sanitarium server(s) and their upstream sources. The RBR replication used to populate the *.{analytics,web}.db.svc.eqiad.wmflabs servers will not allow arbitrary differences in data to be synchronized. If there is a replication failure it will halt all replication with the master server which will in turn raise an alert that will be noticed and corrected.

https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/

Please report database drift/inconsistencies as a comment on this task.

Please mention:

the SQL query, preferably in a format that is easy to compare between production and labs (count(*), md5sum, ...)
the actual result
the expected result (e.g. based on the live wiki, or a rough indication of why the query is wrong)

Related Objects
Search...

Status	Assigned	Task
Resolved	jcrespo	T50930 Database replication problems - production and labs (tracking)
Declined	jcrespo	T150767 Wikireplica service for tools and labs - issues and missing available views (tracking)
Resolved	jcrespo	T138967 Labs database replica drift
Duplicate	None	T141818 Phantom entries in "Quarry" query and or labs replica of enwiki db
Resolved	jcrespo	T159023 Possible labs imagelinks drift
Duplicate	None	T167554 Missing plwikisource_p.revision and plwikisource_p.page data on labs replica database
Declined	None	T172567 Data missing from labs replica of enwiki.imagelinks
Resolved	• Marostegui	T178135 ores_classification table corrupt on enwiki labs replica labsdb1001
Resolved	• Marostegui	T179767 Database replica drift on web and analytics clusters

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

jcrespo merged a task: T141818: Phantom entries in "Quarry" query and or labs replica of enwiki db.Aug 2 2016, 5:47 PM

Notifying here concerning https://phabricator.wikimedia.org/T141818

May I ask you to please copy the query here to check the drift? We keep here the older subtasks only because they are older than this one, but following a single format will help speed up the resolution of all issues (it would be almost impossible to handle one ticket per row with issues). Thank you in advance- I am just trying to respond quickly to all issues!

Gorthian subscribed.Oct 7 2016, 1:27 AM

MariaDB [enwiki_p]> select page_title from page join pagelinks on pl_from = page_id where page_namespace=0 and pl_namespace = 0 and pl_title="Will_Johnson";
+----------------------------------+
| page_title                       |
+----------------------------------+
| Denver_Broncos                   |
| List_of_current_AFC_team_rosters |
| Will_Johnson_(disambiguation)    |
| William_Johnson_(rugby_player)   |
| Will_Johnson_(American_football) |
| Johnson,_Will                    |
| 2016_Denver_Broncos_season       |
+----------------------------------+
7 rows in set (0.01 sec)

The comparable query to the enwiki API yields only 4 incoming links.

https://en.wikipedia.org/w/api.php?action=query&list=backlinks&bltitle=Will_Johnson&blnamespace=0&format=jsonfm

@russblau Thanks for the report- it is 5 as we speak, but it is indeed wrong. This and hopefully all drift issues are fixed on the imports on the new labsdb servers T147052, that I hope they will be soon available.

jcrespo mentioned this in T134203: enwiki_p replica on s1 is corrupted.Nov 7 2016, 10:41 AM

jcrespo moved this task from Triage to Meta/Epic on the DBA board.Nov 10 2016, 1:16 PM

jcrespo added a parent task: T150767: Wikireplica service for tools and labs - issues and missing available views (tracking).Nov 15 2016, 5:29 PM

https://quarry.wmflabs.org/query/5979

This as of 30 Novemeber 2016 is showing 45 rows that the query says are 'orphaned' but when checked on English Wikipedia certainly arent.

I was asked to mention this here, although I'm unsure if it's actually a 'replication' issue.

https://quarry.wmflabs.org/query/14916 reports 35 entries in the transcode table, all for files that were uploaded to Commons in 2013, and then renamed while still being transcoded. This created 'orphaned' entries in the transcode table.

AFAICT, the bug that caused this seems to have been long ago fixed, as the examples here are all from 2013. Before the recent cleanup that's been going on with old failed transcodes, there were many other entries here, all related to files that had been deleted while being transcoded. These were easily fixed by simply undeleting and redeleting the file, a couple of months ago. The entries were all, also, from years ago... I suspect these were the relics of fixed bugs, in the database.

There seems to be no way from the 'admin' end to fix these entries, as they are for redirects, not videos.... they could probably usefully simply be deleted from the DB, and then we can watch to see if it recurs.

The particular search is relevant, however, as both times recently where the video scaler system was reset (last month, and earlier this month) all 'running' transcodes were populated into it.... they were shown as 'completed' on the file pages, but there were reports (such as T154186) that supposedly successfully transcoded videos would not play. Requeueing the videos, so that they were transcoded again, seemed to fix the issue.

Resetting a transcode, while it is running, also results in it ending up in this search.

This also seems to be a bug, but a somewhat useful one.

@Revent: As the production database servers return the same result, this has nothing to do with labs replication.

@Revent then I missunderstood you, we should file a proper, standalone bug.

zhuyifei1999 subscribed.Jan 25 2017, 5:33 PM

Betacommand added a subtask: T159023: Possible labs imagelinks drift.Feb 25 2017, 2:05 AM

jcrespo closed subtask T159023: Possible labs imagelinks drift as Resolved.Mar 6 2017, 3:20 PM

I have found six eswiki deleted articles which still appear in the page table in Labs, located by using this trial-and-error query after one of them was detected in my Labs tool:

MariaDB [eswiki_p]> select page_title from page where page_is_new = 1 and page_touched like '2015061%' and page_namespace = 0 and page_is_redirect = 0 and page_touched = page_links_updated;
+---------------------------------------------------------------------+
| page_title                                                          |
+---------------------------------------------------------------------+
| Cristian_Manea                                                      |
| Morgawr                                                             |
| XFL_Airabonita                                                      |
| Lego_Battles:_Ninjago                                               |
| FIFA_16.                                                            |
| Yessica_Puín_nació_en_1999/07/16_En_la_Ciudad_de_Bogotá-Colombia    |
+---------------------------------------------------------------------+
6 rows in set (9.70 sec)

Fixed:

root@labsdb1003[eswiki_p]> select page_title from page where page_is_new = 1 and page_touched like '2015061%' and page_namespace = 0 and page_is_redirect = 0 and page_touched = page_links_updated;
Empty set (9.90 sec)

root@labsdb1001[eswiki_p]> select page_title from page where page_is_new = 1 and page_touched like '2015061%' and page_namespace = 0 and page_is_redirect = 0 and page_touched = page_links_updated;
Empty set (10.43 sec

jcrespo mentioned this in T166194: Database replication issues with deleted pages (affecting Tool Labs and Analytics Store).May 24 2017, 5:58 AM

There seem to be several pages on English Wikipedia which have been deleted but still appear on the Labs and Analytics Store replicas in the page table. For example:

MariaDB [enwiki_p]> select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever';
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+
| page_id  | page_namespace | page_title    | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+
| 49995280 |              0 | BatissForever |                   |            0 |                0 |           0 | 0.638177166992 | 20160331025325 | 20160330185410     |   712718594 |      212 | wikitext           |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+

"several" is too vague- I have fixed the one given:

root@neodymium:~$ ./sql.py -h labsdb1001.eqiad.wmnet enwiki -e "select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever'" --no-dry-run

Results for labsdb1001.eqiad.wmnet:3306/enwiki:
0 rows in set (0.00 sec)
root@neodymium:~$ ./sql.py -h labsdb1003.eqiad.wmnet enwiki -e "select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever'" --no-dry-run

Results for labsdb1003.eqiad.wmnet:3306/enwiki:
0 rows in set (0.00 sec)

Here are a few more such deleted pages that I happen to know of:

Labs:

MariaDB [enwiki_p]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 1215
1 row in set (0.04 sec)

MariaDB [enwiki_p]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 149
1 row in set (0.00 sec)

MariaDB [enwiki_p]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 7509
1 row in set (0.96 sec)

Production:

mysql:wikiadmin@db1083 [enwiki]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 15
1 row in set (0.01 sec)

mysql:wikiadmin@db1083 [enwiki]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2
1 row in set (0.00 sec)

mysql:wikiadmin@db1083 [enwiki]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2205
1 row in set (0.03 sec)

To clarify, the only reliable solution is the start using the new servers already (labsdb-web.eqiad.wmnet and labsdb-analytics.eqiad.wnet). See:

root@labsdb1009[enwiki_p]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 15
1 row in set (0.01 sec)

root@labsdb1009[enwiki_p]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2
1 row in set (0.00 sec)

root@labsdb1009[enwiki_p]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2205
1 row in set (0.04 sec)

The fixes I will be doing to conciliate the existing labsdb1001 and 1003 will only be poor hacks, that will break again sortly.

Is there a task tracking the updating of the DNS entries such as enwiki.labsdb to point to the new servers?

No, we are not ready yet because not all wikis are available yet, but you can follow the meta ticket at: T153058

We will be finishing the reimports first, then announcing it as beta first "opt-in", and later changing the dns defaults.

Labs now:

root@labsdb1001[enwiki_p]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 15
1 row in set (0.75 sec)

root@labsdb1001[enwiki_p]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2
1 row in set (0.30 sec)

root@labsdb1001[enwiki_p]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2205
1 row in set (4.30 sec)

Thank you for the reports, keep reporting important differences as comments here, just remember the goal is to get rid of the unsafe current labs hosts so we do not have to keep doing this :-/.

@jcrespo: I don't have any way to identify all the extra pages on Tool Labs. All I can say is that there are 1032 extra there:

Tool Labs:

MariaDB [enwiki_p]> select count(*) from page;
+----------+
| count(*) |
+----------+
| 42266366 |
+----------+

Production:

mysql:wikiadmin@db1083 [enwiki]> select count(*) from page;
+----------+
| count(*) |
+----------+
| 42265334 |
+----------+

Is there not any way to sync them? This is causing problems for us, but we can work around it if we have to.

Gorthian unsubscribed.May 24 2017, 7:44 PM

• Tbayer subscribed.May 24 2017, 9:49 PM

Maybe what we can try is to stop sanitarium2 and sanitarium at the same replication position and import the page table (it is around 7.3G).

@Marostegui - you can do that if you want, but a) it will cause more complains of missing pages and the lag b) it will break in only few days (all tables were reimported for enwiki in a long process that took 4 months, and by the end of it, it had already drifted again).

If Kaldari want a reliable replica, he should start using labsdb-web.eqiad.wmnet or labsdb-analytics.eqiad.wmnet now. Current labsdb hosts have no fix because mediawiki does not support labs correctly, and we have to workaround that with the new replication method.

• Marostegui mentioned this in T166091: Labs database corruption.May 25 2017, 1:01 PM

@jcrespo: Is there (or will there be) a way to access and create user databases on the new servers (as per https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#User_databases)?

@russblau No, new databases are in read only mode. We are unsure how we can support that and how (because of T156869), so for now all new servers are in full read only mode. You can create databases on toolsdb as usual.

bd808 moved this task from Triage to Database on the Cloud-Services board.Jun 8 2017, 4:30 AM

Cross-post from T166091: Labs database corruption:

SELECT pl_namespace, pl_title
FROM page
JOIN pagelinks ON pl_from = page_id
WHERE page_namespace=0 AND page_title="Ajinkya_Rahane"
AND pl_title LIKE "Ankit%";
+--------------+--------------+
| pl_namespace | pl_title     |
+--------------+--------------+
|            0 | Ankit_Sharma |
+--------------+--------------+

In T166091#3285369, @Legoktm wrote:

From a production database server:

mysql:wikiadmin@db1083 [enwiki]> SELECT pl_namespace, pl_title
    -> FROM page
    -> JOIN pagelinks ON pl_from = page_id
    -> WHERE page_namespace=0 AND page_title="Ajinkya_Rahane"
    -> AND pl_title LIKE "Ankit%";
+--------------+--------------------------+
| pl_namespace | pl_title                 |
+--------------+--------------------------+
|            0 | Ankit_Sharma_(cricketer) |
+--------------+--------------------------+
1 row in set (0.00 sec)

bd808 merged a task: T166091: Labs database corruption.Jun 8 2017, 5:04 PM

bd808 added subscribers: Dispenser, Jn045.

Copied from T167554: Missing plwikisource_p.revision and plwikisource_p.page data on labs replica database by @Ankry:

As this query shows (https://quarry.wmflabs.org/query/19383)
Revisions from 1147609 to 1147764 are missing from plwikisource_p.revision table in labs database.

They do not seem to be deleted, so some Quary queries report incomplete revision lists (or incorrect revision count) for specific pages, eg.:
https://quarry.wmflabs.org/query/19385
vs.
https://pl.wikisource.org/w/index.php?title=Strona:M._Arcta_s%C5%82ownik_ilustrowany_j%C4%99zyka_polskiego_-_Tom_1.djvu/732&action=history

can it be fixed?

LabsDB

$ sql plwikisource_p
MariaDB [plwikisource_p]> SELECT rev_id FROM revision WHERE rev_id = 1147693;
Empty set (0.00 sec)

MariaDB [plwikisource_p]> SELECT MAX(rev_id) FROM revision WHERE rev_id < 1147693;
+-------------+
| MAX(rev_id) |
+-------------+
|     1147608 |
+-------------+
1 row in set (0.00 sec)

MariaDB [plwikisource_p]> SELECT MIN(rev_id) FROM revision WHERE rev_id > 1147693;
+-------------+
| MIN(rev_id) |
+-------------+
|     1147765 |
+-------------+
1 row in set (0.00 sec)

prod

$ sql plwikisource
(wikiadmin@db1077) [plwikisource]> SELECT rev_id FROM revision WHERE rev_id = 1147693;
+---------+
| rev_id  |
+---------+
| 1147693 |
+---------+
1 row in set (0.00 sec)

(wikiadmin@db1077) [plwikisource]> SELECT MAX(rev_id) FROM revision WHERE rev_id < 1147693;
+-------------+
| MAX(rev_id) |
+-------------+
|     1147692 |
+-------------+
1 row in set (0.00 sec)

(wikiadmin@db1077) [plwikisource]> SELECT MIN(rev_id) FROM revision WHERE rev_id > 1147693;
+-------------+
| MIN(rev_id) |
+-------------+
|     1147694 |
+-------------+
1 row in set (0.00 sec)

bd808 merged a task: T167554: Missing plwikisource_p.revision and plwikisource_p.page data on labs replica database.Jun 9 2017, 11:17 PM

bd808 added a subscriber: Zdzislaw.

Betacommand added a subtask: T167554: Missing plwikisource_p.revision and plwikisource_p.page data on labs replica database.Jun 10 2017, 1:00 AM

The revisions missing from plwikisource_p were created between 20160428113148 and 20160428122541.
page table entries created in the same period, eg.
https://pl.wikisource.org/w/index.php?title=Specjalna:Wk%C5%82ad/AkBot&dir=prev&offset=20160428110138&limit=130&target=AkBot
are also missing. (I checked few random from the above list - none is found in labs)

bd808 triaged this task as Medium priority.Jun 13 2017, 7:25 PM

bd808 added a project: Epic.

bd808 added a subtask: T168349: enwiki_p logging vs logging_userindex returning dramatically different results.Jun 20 2017, 3:23 AM

Linked T168349: enwiki_p logging vs logging_userindex returning dramatically different results as a child. The report there is pretty long for pasting into this task.

• MZMcBride mentioned this in T109179: Migrate MySQLs to use ROW-based replication.Jun 20 2017, 10:29 PM

• MZMcBride updated the task description. (Show Details)Jun 21 2017, 12:47 AM

In T138967#3333050, @bd808 wrote:

Cross-post from T166091: Labs database corruption:

SELECT pl_namespace, pl_title
FROM page
JOIN pagelinks ON pl_from = page_id
WHERE page_namespace=0 AND page_title="Ajinkya_Rahane"
AND pl_title LIKE "Ankit%";
+--------------+--------------+
| pl_namespace | pl_title     |
+--------------+--------------+
|            0 | Ankit_Sharma |
+--------------+--------------+

This is still happening on the default and busted replicas:

mzmcbride@tools-bastion-03:~$ sql enwiki_p;
MariaDB [enwiki_p]> SELECT pl_namespace, pl_title FROM page JOIN pagelinks ON pl_from = page_id WHERE page_namespace=0 AND page_title="Ajinkya_Rahane" AND pl_title LIKE "Ankit%";
+--------------+--------------+
| pl_namespace | pl_title     |
+--------------+--------------+
|            0 | Ankit_Sharma |
+--------------+--------------+
1 row in set (0.01 sec)

But when connecting to labsdb-web.eqiad.wmnet, there's better data:

mzmcbride@tools-bastion-03:~$ mysql -hlabsdb-web.eqiad.wmnet enwiki_p;
MariaDB [enwiki_p]> SELECT pl_namespace, pl_title FROM page JOIN pagelinks ON pl_from = page_id WHERE page_namespace=0 AND page_title="Ajinkya_Rahane" AND pl_title LIKE "Ankit%";
+--------------+--------------------------+
| pl_namespace | pl_title                 |
+--------------+--------------------------+
|            0 | Ankit_Sharma_(cricketer) |
+--------------+--------------------------+
1 row in set (0.00 sec)

I updated this task's description a bit.

jcrespo removed a subtask: T168349: enwiki_p logging vs logging_userindex returning dramatically different results.Jul 3 2017, 9:44 AM

bd808 edited projects, added Data-Services; removed Cloud-Services.Jul 5 2017, 12:40 AM

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Jul 18 2017, 3:23 PM

Reedy mentioned this in T172567: Data missing from labs replica of enwiki.imagelinks.Aug 4 2017, 9:19 PM

bd808 added a subtask: T172567: Data missing from labs replica of enwiki.imagelinks.Aug 7 2017, 5:23 AM

bd808 mentioned this in Blog Post: New Wiki Replica servers ready for use.Sep 1 2017, 12:58 AM

From arwiki: File:الله عز وجل.png. Deleted in Nov 2015.

$ sql arwiki_p -e 'SELECT 1 AS "Exists" FROM arwiki_p.image WHERE img_name="الله_عز_وجل.png";'
Exists
1
$ sql enwiki_p -e 'SELECT 1 AS "Exists" FROM arwiki_p.image WHERE img_name="الله_عز_وجل.png";'
$ # No results

SELECT COUNT(rev_id) FROM enwiki_p.revision WHERE rev_page = 48357647

returns 4 when it should be 0, as the page was deleted on May 4, 2017. I fixed this by just restoring it and deleting it again, but thought I'd still let you know. Apparently there are a number of such Draft pages that were deleted around May 4 that are still in the revision table.

In T138967#3586971, @Dispenser wrote:

From arwiki: File:الله عز وجل.png. Deleted in Nov 2015.

$ sql arwiki_p -e 'SELECT 1 AS "Exists" FROM arwiki_p.image WHERE img_name="الله_عز_وجل.png";'
Exists
1
$ sql enwiki_p -e 'SELECT 1 AS "Exists" FROM arwiki_p.image WHERE img_name="الله_عز_وجل.png";'
$ # No results

Using the new wiki replicas:

$ mysql --defaults-file=$HOME/replica.my.cnf -h wikireplica-web.eqiad.wmnet
(u3518@wikireplica-web.eqiad.wmnet) [(none)]> use arwiki_p;
Database changed

(u3518@wikireplica-web.eqiad.wmnet) [arwiki_p]> SELECT 1 AS "Exists" FROM arwiki_p.image WHERE img_name="الله_عز_وجل.png";
Empty set (0.00 sec)

(u3518@wikireplica-web.eqiad.wmnet) [arwiki_p]> use enwiki_p;
Database changed

(u3518@wikireplica-web.eqiad.wmnet) [enwiki_p]> SELECT 1 AS "Exists" FROM arwiki_p.image WHERE img_name="الله_عز_وجل.png";
Empty set (0.00 sec)

The new replica servers are up and running and fully populated with data for all of the wikis. The last thing that is really blocking their announcement to the Toolforge and Cloud VPS communities is T174860: Define naming scheme for connecting to new wiki replica cluster. wikireplica-web.eqiad.wmnet can be used for now as shown in my test queries, but that service name will not be maintained long term.

bd808 updated the task description. (Show Details)Sep 7 2017, 12:59 AM

bd808 updated the task description. (Show Details)Sep 26 2017, 1:01 AM

bd808 added a subtask: T178135: ores_classification table corrupt on enwiki labs replica labsdb1001.Oct 16 2017, 4:55 AM

• Marostegui closed subtask T178135: ores_classification table corrupt on enwiki labs replica labsdb1001 as Resolved.Oct 16 2017, 5:31 AM

I have found an apparent replication issue on one of the new servers:

tools.dplbot@tools-bastion-03:~$ mysql -h enwiki.analytics.db.svc.eqiad.wmflabs enwiki_p
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 4455943
Server version: 10.1.28-MariaDB MariaDB Server

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [enwiki_p]> select page_title, cl_to from page, categorylinks where page_id = cl_from and cl_from=335652;
+------------+--------------------------------+

page_title cl_to

+------------+--------------------------------+

CU All_disambiguation_pages

CU Wikipedia_disambiguation_pages

+------------+--------------------------------+
2 rows in set (0.01 sec)

By comparison, the live wiki page [[CU]] at https://en.wikipedia.org/wiki/CU shows links to three categories: "All_disambiguation_pages", "All_article_disambiguation_pages", and "Disambiguation_pages".

root@labsdb1009[enwiki]> select page_title, cl_to from page, categorylinks where page_id = cl_from and cl_from=335652;
+------------+--------------------------------+
| page_title | cl_to                          |
+------------+--------------------------------+
| CU         | All_disambiguation_pages       |
| CU         | Wikipedia_disambiguation_pages |
+------------+--------------------------------+
2 rows in set (0.00 sec)

(db1052 is the enwiki production master)

root@db1052[enwiki]> select page_title, cl_to from page, categorylinks where page_id = cl_from and cl_from=335652;
+------------+--------------------------------+
| page_title | cl_to                          |
+------------+--------------------------------+
| CU         | All_disambiguation_pages       |
| CU         | Wikipedia_disambiguation_pages |
+------------+--------------------------------+
2 rows in set (0.01 sec)

Either there is a cache issue (reload browser to clear the cache), or there is a production problem, but not a replica drift.

On a core/production server it has the same result:

root@PRODUCTION s1 slave[enwiki]>  select page_title, cl_to from page, categorylinks where page_id = cl_from and cl_from=335652;
+------------+--------------------------------+
| page_title | cl_to                          |
+------------+--------------------------------+
| CU         | All_disambiguation_pages       |
| CU         | Wikipedia_disambiguation_pages |
+------------+--------------------------------+
2 rows in set (0.00 sec)

And same on enwiki master:

root@PRODUCTION s1 master[enwiki]> select page_title, cl_to from page, categorylinks where page_id = cl_from and cl_from=335652;
+------------+--------------------------------+
| page_title | cl_to                          |
+------------+--------------------------------+
| CU         | All_disambiguation_pages       |
| CU         | Wikipedia_disambiguation_pages |
+------------+--------------------------------+
2 rows in set (0.00 sec)

So the labs replicas contain the same data that the core servers.

User error :-) please disregard.

We should update https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database/Replica_drift and close this ticket, as it is unlikely that there will be drift again (due to ROW + GTID), unless something very very bad happens. Drift on the old servers will not be corrected, as it was not possible to do the fixes in place.

bd808 updated the task description. (Show Details)Oct 18 2017, 2:01 AM

The main description here and https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database/Replica_drift have been updated to reflect our confidence that RBR replication and other changes made for the new Wiki Replica cluster will prevent drift from occurring (or at least from going unnoticed until reported by users).

I am closing this as resolved rather than declined because there was a large amount of work done by @jcrespo and @Marostegui to address the underlying problems rather than just continuing to make point-in-time fixes as new problems were reported. On behalf of all of the users of the Wiki Replica databases I would like to thank them yet again for the effort that they put into this project.

bd808 closed subtask T172567: Data missing from labs replica of enwiki.imagelinks as Declined.Oct 18 2017, 2:17 AM

Liuxinyu970226 unsubscribed.Oct 18 2017, 5:04 AM

jcrespo awarded a token.Oct 18 2017, 8:45 AM

TheDJ awarded a token.Oct 25 2017, 11:51 AM

Dispenser created subtask T179767: Database replica drift on web and analytics clusters.Nov 4 2017, 5:57 PM

• Marostegui closed subtask T179767: Database replica drift on web and analytics clusters as Resolved.Nov 6 2017, 8:33 AM

Aklapper removed subscribers: • Tbayer, Anomie.Oct 16 2020, 5:42 PM