Page MenuHomePhabricator

Table field templatelink.tl_from in database does not always match page.page_namespace
Open, Needs TriagePublic

Description

In my understanding of the database tables, the pair tl_from + tl_from_namespace shall always match the pair page_id + page_namespace, but it does not!

See https://quarry.wmflabs.org/query/34007

in dewiki_p there are (currently) 122 links with different pairs, (in enwiki_p the quarry take too long), in frwiki_p there are 80 such pairs, in eswiki_p is none, in itwiki_p there are 25 – did not check more

In dewiki I changed the page Thea_Schmidt-Keune and purged (with ?action=purge) Wolfgang_Häntsch, but this did not help (both can be seen on page 2 of the quarry output). In both pages, the template "Personendaten" has tl_from_namespace = 2 (which should be 0) and page_namespace = 0 (which is correct).

Effects which can be seen by the user: At the very end of this list, you see the following two entries:
https://de.wikipedia.org/w/index.php?title=Spezial:Linkliste/Vorlage:INSEE-ISO_3166-2&namespace=2&limit=500

Saint-Julien-sur-Reyssouze (Vorlageneinbindung) ‎ (← Links | bearbeiten)
Saint-Martin-de-Bavel (Vorlageneinbindung) ‎ (← Links | bearbeiten)

Both pages are in namespace 0, but are presented to the user in namespace 2

Effects for bots and scripts: Queries need to join the table page to retrieve the correct namespace, so queries get more complicated and the database server has more work to do.

Event Timeline

Today (23 days later) there are 128 hits in dewiki_p (6 additional wrong pairs), 78 in frwiki_p (2 fewer), still none in eswiki_p, still 25 in itwiki_p.

The main difference in dewiki_p is https://de.wikipedia.org/wiki/Huntley_%28Illinois%29 which causes the additional hits. This article was created in user namespace as Benutzer:Versionen/Huntley, Illinois than the english versions got imported, then the article was moved to article namespace and finally it was reverted to the last german version. So I would guess that the problem is with reverting to a version which lived in a different namespace.

bd808 added a subscriber: bd808.

This does not seem to be specific to the Wiki Replicas. It looks like this is some data corruption/drift in the production database tables:

wikiadmin@10.64.16.33(dewiki)> SELECT page_id, page_title, page_namespace, tl_title, tl_from, tl_from_namespace FROM templatelinks JOIN page ON tl_from = page_id WHERE tl_from_namespace = 2 and tl_title = 'Boolandnot' and tl_from = 10427306;
+----------+---------------------+----------------+------------+----------+-------------------+
| page_id  | page_title          | page_namespace | tl_title   | tl_from  | tl_from_namespace |
+----------+---------------------+----------------+------------+----------+-------------------+
| 10427306 | Aishwarya_von_Nepal |              0 | Boolandnot | 10427306 |                 2 |
+----------+---------------------+----------------+------------+----------+-------------------+
1 row in set (0.00 sec)

I'm adding the DBA tag here in the hope that one of the DBAs can provide an opinion on how or if this can be corrected.

Marostegui added a subscriber: Marostegui.

This query result seems to be consistent across all the servers that run dewiki:

root@cumin1001:/home/marostegui# ./section s5 | while read host port; do echo "$host:$port"; mysql.py -h $host:$port dewiki -e "SELECT page_id, page_title, page_namespace, tl_title, tl_from, tl_from_namespace FROM templatelinks JOIN page ON tl_from = page_id WHERE tl_from_namespace = 2 and tl_title = 'Boolandnot' and tl_from = 10427306;"; done
labsdb1012.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
labsdb1011.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
labsdb1010.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
labsdb1009.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
dbstore1003.eqiad.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2113.codfw.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2111.codfw.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2099.codfw.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2094.codfw.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2089.codfw.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2084.codfw.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2075.codfw.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2066.codfw.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2059.codfw.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2052.codfw.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db2038.codfw.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1130.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1124.eqiad.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1113.eqiad.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1110.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1102.eqiad.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1100.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1097.eqiad.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1096.eqiad.wmnet:3315
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1082.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2
db1070.eqiad.wmnet:3306
page_id	page_title	page_namespace	tl_title	tl_from	tl_from_namespace
10427306	Aishwarya_von_Nepal	0	Boolandnot	10427306	2

So the data is always the same, whether it is the expected data is a different thing, but at least the data seems to be consistent.
Unfortunately, this needs to be solved by someone with a lot more MW knowledge than the DBAs, and most likely using a maintenance script I would say. I am tagging Wikimedia-Rdbms to see if some core developer can troubleshoot this further.

It is even worse!

SELECT page_id, page_title, page_namespace, pl_title, pl_from, pl_from_namespace FROM pagelinks, page WHERE page_id = pl_from AND pl_from_namespace != page_namespace;

This query returns 208 records, for 15 pages which according to table pagelinks are in namespace 2 (4, 10 or 101), but they are actually in namespace 0.

Just one example:
https://de.wikipedia.org/w/index.php?title=Spezial%3ALinkliste&target=Albens&namespace=2
There you see "Bahnstrecke Aix-les-Bains–Annemasse" which should not show up, when I restrict the view to namespace 2.

Another with namespace 4:
https://de.wikipedia.org/w/index.php?title=Spezial%3ALinkliste&target=Sexpuppe&namespace=4
"Verrückt nach Fixi" is in namespace 0, not in 4!

Next query, with table imagelinks

SELECT page_id, page_title, page_namespace, il_to, il_from, il_from_namespace FROM imagelinks, page WHERE page_id = il_from AND il_from_namespace != page_namespace;
68 records, 9 articles affected.

One example for images:
https://de.wikipedia.org/w/index.php?title=Spezial%3ALinkliste&target=Datei%3ACoat_of_Arms_of_Hungary.svg&namespace=2
"Staatswappen der europäischen Länder" is in namespace 0 and not in namespace 2

I did not look at every article (checked about 10-15), but those I watched have one thing in common: They got moved from one namespace to another.
It does not happen for all articles which got moved, in deWP many articles start up in namespace 2 and when they are ready for the world, they are moved to namespace 0. It happens just for "some". So I expect some timing problem, or a queue which holds some jobs and the queue entry gets replaced by some other action for that article. Maybe a database error which is caught, but not handled fine, or a transaction where a rollback is done when a commit is expected. Something strange.