Labs database replica drift
Open, Needs TriagePublic

Related Objects

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 29 2016, 5:22 PM
Anomie added a subscriber: Anomie.Jun 29 2016, 6:23 PM

May I ask you to please copy the query here to check the drift? We keep here the older subtasks only because they are older than this one, but following a single format will help speed up the resolution of all issues (it would be almost impossible to handle one ticket per row with issues). Thank you in advance- I am just trying to respond quickly to all issues!

russblau added a subscriber: russblau.EditedNov 5 2016, 2:11 PM
MariaDB [enwiki_p]> select page_title from page join pagelinks on pl_from = page_id where page_namespace=0 and pl_namespace = 0 and pl_title="Will_Johnson";
+----------------------------------+
| page_title                       |
+----------------------------------+
| Denver_Broncos                   |
| List_of_current_AFC_team_rosters |
| Will_Johnson_(disambiguation)    |
| William_Johnson_(rugby_player)   |
| Will_Johnson_(American_football) |
| Johnson,_Will                    |
| 2016_Denver_Broncos_season       |
+----------------------------------+
7 rows in set (0.01 sec)

The comparable query to the enwiki API yields only 4 incoming links.

https://en.wikipedia.org/w/api.php?action=query&list=backlinks&bltitle=Will_Johnson&blnamespace=0&format=jsonfm

@russblau Thanks for the report- it is 5 as we speak, but it is indeed wrong. This and hopefully all drift issues are fixed on the imports on the new labsdb servers T147052, that I hope they will be soon available.

jcrespo moved this task from Triage to Meta/Epic on the DBA board.Nov 10 2016, 1:16 PM

https://quarry.wmflabs.org/query/5979

This as of 30 Novemeber 2016 is showing 45 rows that the query says are 'orphaned' but when checked on English Wikipedia certainly arent.

Revent added a subscriber: Revent.Jan 25 2017, 1:54 AM

I was asked to mention this here, although I'm unsure if it's actually a 'replication' issue.

https://quarry.wmflabs.org/query/14916 reports 35 entries in the transcode table, all for files that were uploaded to Commons in 2013, and then renamed while still being transcoded. This created 'orphaned' entries in the transcode table.

AFAICT, the bug that caused this seems to have been long ago fixed, as the examples here are all from 2013. Before the recent cleanup that's been going on with old failed transcodes, there were many other entries here, all related to files that had been deleted while being transcoded. These were easily fixed by simply undeleting and redeleting the file, a couple of months ago. The entries were all, also, from years ago... I suspect these were the relics of fixed bugs, in the database.

There seems to be no way from the 'admin' end to fix these entries, as they are for redirects, not videos.... they could probably usefully simply be deleted from the DB, and then we can watch to see if it recurs.

The particular search is relevant, however, as both times recently where the video scaler system was reset (last month, and earlier this month) all 'running' transcodes were populated into it.... they were shown as 'completed' on the file pages, but there were reports (such as T154186) that supposedly successfully transcoded videos would not play. Requeueing the videos, so that they were transcoded again, seemed to fix the issue.

Resetting a transcode, while it is running, also results in it ending up in this search.

This also seems to be a bug, but a somewhat useful one.

@Revent: As the production database servers return the same result, this has nothing to do with labs replication.

@Revent then I missunderstood you, we should file a proper, standalone bug.

-jem- added a subscriber: -jem-.Mon, May 15, 1:27 PM

I have found six eswiki deleted articles which still appear in the page table in Labs, located by using this trial-and-error query after one of them was detected in my Labs tool:

MariaDB [eswiki_p]> select page_title from page where page_is_new = 1 and page_touched like '2015061%' and page_namespace = 0 and page_is_redirect = 0 and page_touched = page_links_updated;
+---------------------------------------------------------------------+
| page_title                                                          |
+---------------------------------------------------------------------+
| Cristian_Manea                                                      |
| Morgawr                                                             |
| XFL_Airabonita                                                      |
| Lego_Battles:_Ninjago                                               |
| FIFA_16.                                                            |
| Yessica_Puín_nació_en_1999/07/16_En_la_Ciudad_de_Bogotá-Colombia    |
+---------------------------------------------------------------------+
6 rows in set (9.70 sec)

Fixed:

root@labsdb1003[eswiki_p]> select page_title from page where page_is_new = 1 and page_touched like '2015061%' and page_namespace = 0 and page_is_redirect = 0 and page_touched = page_links_updated;
Empty set (9.90 sec)

root@labsdb1001[eswiki_p]> select page_title from page where page_is_new = 1 and page_touched like '2015061%' and page_namespace = 0 and page_is_redirect = 0 and page_touched = page_links_updated;
Empty set (10.43 sec

There seem to be several pages on English Wikipedia which have been deleted but still appear on the Labs and Analytics Store replicas in the page table. For example:

MariaDB [enwiki_p]> select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever';
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+
| page_id  | page_namespace | page_title    | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+
| 49995280 |              0 | BatissForever |                   |            0 |                0 |           0 | 0.638177166992 | 20160331025325 | 20160330185410     |   712718594 |      212 | wikitext           |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+

"several" is too vague- I have fixed the one given:

root@neodymium:~$ ./sql.py -h labsdb1001.eqiad.wmnet enwiki -e "select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever'" --no-dry-run

Results for labsdb1001.eqiad.wmnet:3306/enwiki:
0 rows in set (0.00 sec)
root@neodymium:~$ ./sql.py -h labsdb1003.eqiad.wmnet enwiki -e "select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever'" --no-dry-run

Results for labsdb1003.eqiad.wmnet:3306/enwiki:
0 rows in set (0.00 sec)

Here are a few more such deleted pages that I happen to know of:

Labs:

MariaDB [enwiki_p]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 1215
1 row in set (0.04 sec)

MariaDB [enwiki_p]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 149
1 row in set (0.00 sec)

MariaDB [enwiki_p]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 7509
1 row in set (0.96 sec)

Production:

mysql:wikiadmin@db1083 [enwiki]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 15
1 row in set (0.01 sec)

mysql:wikiadmin@db1083 [enwiki]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2
1 row in set (0.00 sec)

mysql:wikiadmin@db1083 [enwiki]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2205
1 row in set (0.03 sec)

To clarify, the only reliable solution is the start using the new servers already (labsdb-web.eqiad.wmnet and labsdb-analytics.eqiad.wnet). See:

root@labsdb1009[enwiki_p]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 15
1 row in set (0.01 sec)

root@labsdb1009[enwiki_p]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2
1 row in set (0.00 sec)

root@labsdb1009[enwiki_p]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2205
1 row in set (0.04 sec)

The fixes I will be doing to conciliate the existing labsdb1001 and 1003 will only be poor hacks, that will break again sortly.

Is there a task tracking the updating of the DNS entries such as enwiki.labsdb to point to the new servers?

No, we are not ready yet because not all wikis are available yet, but you can follow the meta ticket at: T153058

We will be finishing the reimports first, then announcing it as beta first "opt-in", and later changing the dns defaults.

Labs now:

root@labsdb1001[enwiki_p]> select count(*) from page where page_namespace in (10,11) and ( page_title like 'Cite_doi%' OR page_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 15
1 row in set (0.75 sec)

root@labsdb1001[enwiki_p]> select count(*) from redirect where rd_namespace in (10,11) and ( rd_title like 'Cite_doi%' OR rd_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2
1 row in set (0.30 sec)

root@labsdb1001[enwiki_p]> select count(*) from pagelinks where pl_namespace in (10,11) and ( pl_title like 'Cite_doi%' OR pl_title like 'Cite_pmid%' )\G
*************************** 1. row ***************************
count(*): 2205
1 row in set (4.30 sec)

Thank you for the reports, keep reporting important differences as comments here, just remember the goal is to get rid of the unsafe current labs hosts so we do not have to keep doing this :-/.

@jcrespo: I don't have any way to identify all the extra pages on Tool Labs. All I can say is that there are 1032 extra there:

Tool Labs:

MariaDB [enwiki_p]> select count(*) from page;
+----------+
| count(*) |
+----------+
| 42266366 |
+----------+

Production:

mysql:wikiadmin@db1083 [enwiki]> select count(*) from page;
+----------+
| count(*) |
+----------+
| 42265334 |
+----------+

Is there not any way to sync them? This is causing problems for us, but we can work around it if we have to.

Gorthian removed a subscriber: Gorthian.Wed, May 24, 7:44 PM
Tbayer added a subscriber: Tbayer.Wed, May 24, 9:49 PM