Page MenuHomePhabricator

Missing rows from categorylinks on production servers (dewiki)
Open, Needs TriagePublic

Description

At least dewiki database is currupted on labs because some expected rows in categorylinks, imagelinks and so on are missing. But this problems affects the production wiki, too. API of de.wikipedia.org is returning the same wrong results, so i think the database replication is broken there, too.

Example:
http://de.wikipedia.org/w/index.php?title=All_inclusive&oldid=138210488 has four categories (three in source, one included from a template)

http://de.wikipedia.org/w/api.php?action=query&pageids=1106087&prop=categories does not show any category:

"query": {
    "pages": {
        "1106087": {
            "pageid": 1106087,
            "ns": 0,
            "title": "All inclusive"
        }
    }
}

On toollabs:

$ mysql -hs5.labsdb -vvve "select * from categorylinks where cl_from=1106087" dewiki_p
--------------
select * from categorylinks where cl_from=1106087
--------------
Empty set (0.00 sec)

Bye

My bot is searching for articles without categories on dewiki. that is why i detected this error. There a currently about fifty articles with missing database rows on dewiki.categorylinks table and the number is still rising.. A Null-Edit on the wiki solves the error for a single article.

Related Objects

StatusAssignedTask
OpenNone
OpenNone
OpenMerl
Resolvedcoren
Resolvedcoren
Resolvedcoren
ResolvedNone
ResolvedRicordisamoa
DeclinedNone
InvalidSpringle
ResolvedSpringle
Resolvedcoren
Declinedcoren
Resolvedcoren
Resolvedcoren
Resolvedcoren
ResolvedKrenair
ResolvedKrenair
ResolvedKrenair
Resolvedjcrespo
OpenNone
DeclinedNone
Resolvedbd808
Resolvedbd808
OpenMerl
OpenMerl
ResolvedMarostegui
StalledNone

Event Timeline

Maniphest changed the visibility from "Public (No Login Required)" to "Custom Policy".Jan 28 2015, 12:02 AM
Maniphest changed the edit policy from "All Users" to "Custom Policy".
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 28 2015, 12:02 AM
Merl created this task.Jan 28 2015, 12:02 AM
Merl triaged this task as High priority.
Merl updated the task description. (Show Details)
Merl changed Security from None to Software security bug.
Merl edited subscribers, added: Merl; removed: Aklapper.
Merl added a subscriber: Springle.Jan 28 2015, 12:05 AM
Springle added a comment.EditedJan 28 2015, 2:03 AM

I can confirm this applies to all dewiki production boxes.

To be precise (might be just a terminology thing), this is unlikely to relate to database replication because the s5 master is also affected. More likely some bug, or failing job, or failing transaction is not running reliably and/or not being retried if necessary.

Time for some digging.

@Merl, please post some other examples?

Merl added a comment.Jan 28 2015, 8:28 AM

Because of these error i had to stop my bot last night. That is why i have now no updated list. But i can post you some old examples
http://de.wikipedia.org/w/index.php?title=Heinrich_VII._von_Kranlucken&oldid=138229063
http://de.wikipedia.org/w/index.php?title=Caparezza&oldid=138234404
http://de.wikipedia.org/w/index.php?title=Katalogabbruch&oldid=138234568

My Bot find these articles using datbase queries on labs. Because of replications error on toollabs/dewiki-toolserver in the past it does also an additional api check before editing at this article.

Afaics labsdbs are in sync with production masters for all these examples. Can't replicate what isn't there.

I'm coming up empty on a cause so far, but am also hugely jetlagged today :-( Need core dev input. @aaron, @Anomie, what should we be seeing?

https://de.wikipedia.org/w/api.php?action=query&pageids=1106087&prop=categories|links|templates|images shows a more interesting story: when the page was parsed for the links update it apparently was affected by whatever bug caused T87645, as evidenced by the fact that the links table output shows links to a title "Kategorie:Beherbergung" in namespace 0.

Is this still occurring, or did it only occur for pages edited/reparsed/etc between around 08:25 UTC and 10:20 UTC on 27 January 2015? If the latter, the best solution would probably be to null-edit them or feed them to api.php action=purge with forcelinkupdate=1.

Why is this a private task?

Krenair edited projects, added DBA; removed Wikimedia-Labs-General.Sep 13 2015, 7:08 PM
Reedy changed the visibility from "Custom Policy" to "Public (No Login Required)".
Reedy changed the edit policy from "Custom Policy" to "All Users".
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptNov 30 2015, 11:09 AM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript
Restricted Application added a project: Security. · View Herald Transcript
Reedy changed the visibility from "Custom Policy" to "Public (No Login Required)".
Reedy changed the edit policy from "Custom Policy" to "All Users".
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptNov 30 2015, 11:09 AM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript
Restricted Application added a project: Security. · View Herald Transcript
Reedy changed the visibility from "Custom Policy" to "Public (No Login Required)".
Reedy changed the edit policy from "Custom Policy" to "All Users".
Reedy changed Security from Software security bug to None.
Reedy added a subscriber: Reedy.

Why is this a private task?

No idea. It's not anymore.

Stupid phab. Change 2 policies, remove security AND an extra combo box

Reedy lowered the priority of this task from High to Low.Nov 30 2015, 11:16 AM
jcrespo added a subscriber: Peachey88.

@Peachey88 This is not a replication issue, please do not merge the unrelated task.

Low because it is believed to be a temporary glitch not ongoing, and most of the affected pages will have been already fixed automatically (cannot reproduce now the examples).

jcrespo renamed this task from Database servers having missing rows (on labs and api production servers) to Missing rows from categorylinks on production servers (dewiki).Nov 30 2015, 11:24 AM
jcrespo moved this task from Triage to Blocked external/Not db team on the DBA board.
jcrespo edited projects, added MediaWiki-General; removed Cloud-Services.

Is something being done to fix this? This is a very annoying bug, and it seems to affect a few files everyday. It goes away if you purge the file with forcelinkupdate using the API, or if you edit the page. See for example https://en.wikipedia.org/w/api.php?action=query&pageids=50604305&prop=categories|links|templates|images&format=xml:

<api batchcomplete=""><query><pages><page _idx="50604305" pageid="50604305" ns="6" title="File:Athlone Institute of Technology Logo Circa 2015.png"/></pages></query></api>

No files, no templates, no categories, nothing. But if you check https://en.wikipedia.org/wiki/File:Athlone_Institute_of_Technology_Logo_Circa_2015.png, you can see that the file has two templates, "Non-free use rationale 2" and "Non-free logo" (which in turn call a few other templates), four categories (added by the templates), two images (added by the second template) and one external link.

jcrespo raised the priority of this task from Low to Normal.May 22 2016, 1:32 PM

This seems to be affecting enwikisource, too: T135801

jcrespo raised the priority of this task from Normal to Needs Triage.May 22 2016, 1:35 PM
jcrespo removed projects: MediaWiki-General, DBA.

This is still either a mediawiki parsing bug or a job queue execution failure, still needs investigation.

This seems to be affecting enwikisource, too: T135801

Not just enwikisource, that just happens to be the example I used there. That particular issue was originally noticed on bnwikisource. (Diagnosing issues in Bengali is a real treat.)

As I mentioned on T135801, I'm not sure what users are expected to do about these missing rows. A null-edit script that iterates through every page is looking more and more attractive, yet again.

In T135801, @MZMcBride may have mixed up two different bugs.

Bug 1: Some lines are missing from both labs and production. Example: https://en.wikipedia.org/wiki/File:Athlone_Institute_of_Technology_Logo_Circa_2015.png / https://en.wikipedia.org/w/api.php?action=query&pageids=50604305&prop=categories|links|templates|images&format=xml where both databases are missing information about templates, categories and other things. I assume that the API query shows data from the production database.

Bug 2: Some lines are only wrong in the labs database but not in the production database. Example: https://en.wikipedia.org/wiki/File:Arfakrim12_-_from_Commons.jpg (39394928) which according to the "File usage" section of the file information page only is used in "Arfa Karim" (which I assume means that production thinks that the file only is used on that page). Labs reports it as used on three pages: "Arfa Karim" (4135978), "Portal:Technology in Pakistan" (50170821) and "Portal:Technology in Pakistan/Selected picture/1" (50264092), see https://quarry.wmflabs.org/query/9952. As you can see at https://en.wikipedia.org/w/index.php?title=Portal:Technology_in_Pakistan/Selected_picture/1&action=history, the image was removed from the portal namespace a month ago, so it seems that production is right and that labs is wrong.

This is only about bug 1. Bug 2 should be reported somewhere else if it hasn't been reported already. Bug 2 has also been discussed at "mw:Topic:T2r79et87v4753wo" and at "w:en:Project talk:Database reports#Wikipedia:Database reports/Orphaned talk pages false positives".

This is only about bug 1. Bug 2 should be reported somewhere else if it hasn't been reported already. Bug 2 has also been discussed at "mw:Topic:T2r79et87v4753wo" and at "w:en:Project talk:Database reports#Wikipedia:Database reports/Orphaned talk pages false positives".

I agree that there are two issues, but they are interconnected. It's difficult to distinguish "missing rows on Labs" from two possible causes: rows don't exist in production ("bug 1") or replication failed for some reason ("bug 2"). The end result for Labs users is the same: missing rows on Labs.

T135801 ended up being about what you call bug 1. We have other filed tasks about either bug 1 or bug 2.

Rough guesses, bug 1 (rows missing both in production and in Labs):

Bug 2 (missing rows only on Labs):

What you call bug number 2 is offtopic on this ticket and mentioning it here it will only contribute to obscure bug number 1 and confuse people that could be working on fixing it.

Bug #2's fix, as I mentioned to you (T135801#2312881), is currently in progress, and as there are many tickets open about it, it is being tracked on T126946, but will take weeks to take effect.

That page was renamed on '11:01, 22 April 2016‎' labs knows nothing about the rename. But what it is worse is that pages had been reimported already, which put us back to square 1.

I've made a new reimport and now it works: https://tools.wmflabs.org/sigma/usersearch.py?name=MatthewHoobin&page=I_Hate_Everything_%28YouTube_channel%29&server=enwiki&max= This was #2

The other things are mostly bug#1, and are more examples of this very same bug- job issues with category membership or link changes *on production*.

I agree that there are two issues, but they are interconnected. It's difficult to distinguish "missing rows on Labs" from two possible causes: rows don't exist in production ("bug 1") or replication failed for some reason ("bug 2"). The end result for Labs users is the same: missing rows on Labs.

You can try to access the same information via the action API. Most of the time there's a direct correspondence between database rows that are available in Labs and some API query, for example missing categorylinks can be checked by using prop=categories. If the API result returns the expected rows it's "bug 2", while if it too is missing the rows it's "bug 1".