Page MenuHomePhabricator

Links tables are sometimes not being populated
Open, MediumPublic

Description

At 17:05, 31 October 2015‎, the API request

https://en.wikipedia.org/w/api.php?action=query&format=xml&redirects=1&meta=userinfo&rvprop=content&prop=templates|revisions&uiprop=hasmsg&titles=File:Tenth_Avenue_Kid_poster.jpg&tllimit=500

returned the following:

<?xml version="1.0"?><api batchcomplete=""><query><normalized><n from="File:Tenth_Avenue_Kid_poster.jpg" to="File:Tenth Avenue Kid poster.jpg" /></normalized><pages><page _idx="48424222" pageid="48424222" ns="6" title="File:Tenth Avenue Kid poster.jpg"><revisions><rev contentformat="text/x-wiki" contentmodel="wikitext" xml:space="preserve">== Summary ==
{{Non-free use rationale poster
| Media             = film
| Article           = Tenth Avenue Kid
| Use               = Infobox
&lt;!-- ADDITIONAL INFORMATION --&gt;
| Name              = Tenth Avenue Kid 
| Distributor       = Republic Pictures
| Publisher         = 
| Type              = 
| Website           =
| Owner             = 
| Commentary        = 
&lt;!--OVERRIDE FIELDS --&gt;
| Description       = 
| Source            = 
| Portion           = 
| Low_resolution    = 
| Purpose           = &lt;!-- Must be specified if Use is not Infobox / Header / Section / Artist --&gt;
| Replaceability    = 
| other_information = 
}}

== Licensing ==
{{Non-free poster|image has rationale=yes}}</rev></revisions></page></pages><userinfo id="5866303" name="ImageTaggingBot" messages="" /></query></api>

Note the complete absence of a "<templates>" section, despite the fact that the page directly transcludes two templates.

Event Timeline

Carnildo raised the priority of this task from to Needs Triage.
Carnildo updated the task description. (Show Details)
Carnildo added a project: MediaWiki-Action-API.
Carnildo subscribed.

Like I said, it's intermittent. ImageTaggingBot appears to encounter this once every thousand requests or so.

Like I said, it's intermittent. ImageTaggingBot appears to encounter this once every thousand requests or so.

Can you capture the HTTP headers next time it happens? It may be a screwed up proxy or appserver

Server response headers from https://en.wikipedia.org/w/api.php?action=query&format=xml&titles=File:Kerbisher2.gif&tllimit=500&meta=userinfo&rvprop=content&prop=templates|revisions&redirects=1&uiprop=hasmsg

Cache-Control: private, must-revalidate, max-age=0
Connection: close
Date: Sun, 01 Nov 2015 19:05:14 GMT
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
Age: 0
Server: nginx/1.9.4
Content-Type: text/xml; charset=utf-8
Client-Date: Sun, 01 Nov 2015 19:05:14 GMT
Client-Peer: 198.35.26.96:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=BE/O=GlobalSign nv-sa/CN=GlobalSign Organization Validation CA - SHA256 - G2
Client-SSL-Cert-Subject: /C=US/ST=California/L=San Francisco/O=Wikimedia Foundation, Inc./CN=*.wikipedia.org
Client-SSL-Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Client-SSL-Socket-Class: IO::Socket::SSL
Client-Transfer-Encoding: chunked
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Analytics: WMF-Last-Access=01-Nov-2015;https=1
X-Cache: cp1053 miss (0), cp4017 miss (0), cp4017 frontend miss (0)
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Powered-By: HHVM/3.6.5
X-Varnish: 823044711, 2788295235, 841757361

And a repeat an hour later, same page:

Cache-Control: private, must-revalidate, max-age=0
Connection: close
Date: Sun, 01 Nov 2015 20:06:01 GMT
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
Age: 0
Server: nginx/1.9.4
Content-Type: text/xml; charset=utf-8
Client-Date: Sun, 01 Nov 2015 20:06:01 GMT
Client-Peer: 198.35.26.96:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=BE/O=GlobalSign nv-sa/CN=GlobalSign Organization Validation CA - SHA256 - G2
Client-SSL-Cert-Subject: /C=US/ST=California/L=San Francisco/O=Wikimedia Foundation, Inc./CN=*.wikipedia.org
Client-SSL-Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Client-SSL-Socket-Class: IO::Socket::SSL
Client-Transfer-Encoding: chunked
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Analytics: WMF-Last-Access=01-Nov-2015;https=1
X-Cache: cp1068 miss (0), cp4018 miss (0), cp4017 frontend miss (0)
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Powered-By: HHVM/3.6.5
X-Varnish: 1128580727, 2878596088, 845302430

The API is correctly reporting the fact that there are no templatelinks recorded for the title:

db1052 [enwiki]> select * from templatelinks where tl_from=48433757;
Empty set (0.00 sec)

db1052 [enwiki]> select * from pagelinks where pl_from=48433757;
Empty set (0.01 sec)

db1052 [enwiki]> select * from categorylinks where cl_from=48433757;
Empty set (0.01 sec)

db1052 [enwiki]> select * from imagelinks where il_from=48433757;
Empty set (0.00 sec)

db1052 [enwiki]> select * from externallinks where el_from=48433757;
Empty set (0.00 sec)

db1052 [enwiki]> select * from iwlinks where iwl_from=48433757;
Empty set (0.00 sec)

The question then becomes "why aren't there templatelinks recorded for this title"? I know @aaron has been making changes related to pushing links updates into jobs, so CCing him in case he has any insight.

Anomie renamed this task from API intermittently fails to return template transclusions when requested to Links tables are sometimes not being populated.Nov 2 2015, 2:50 PM
Anomie edited projects, added MediaWiki-Page-editing; removed MediaWiki-Action-API.
Anomie set Security to None.
matmarex added a project: MediaWiki-libs-Rdbms.

There are more examples of problematic pages on the merged task T117679, the oldest being https://en.wikipedia.org/wiki/File:Agraharam.jpg from 26 October.

[02:11] <AaronSchulz> it's on the todo list, lots of stuff to get too. I didn't see any "article not found" errors in kibana for titles in T117332, so it's not just simple slave lag making the revision not be found for those new files.
[02:13] <AaronSchulz> null edits fix it though, which just enqueues another jobs...so the problem must be with the original job not working somehow
[02:15] <AaronSchulz> there is some low hanging slave-lag fix "fruit" I can fix though...not sure it would help but it should be done
[02:15] <AaronSchulz> (e.g. switching to READ_LATEST if the expected rev is not there)

Change 253077 had a related patch set uploaded (by Aaron Schulz):
Race condition fixes for refreshLinks jobs

https://gerrit.wikimedia.org/r/253077

Change 253077 merged by jenkins-bot:
Race condition fixes for refreshLinks jobs

https://gerrit.wikimedia.org/r/253077

Let's call this "resolved". If the issue still occurs on wikis running 1.27.0-wmf.8 (you can check that using Special:Version), feel free to reopen.

Is it possible to painlessly backport the fix to wmf.7? Per https://www.mediawiki.org/wiki/MediaWiki_1.27/Roadmap, all wikis are stuck on wmf.7 for the next two weeks.

It seems to me that the code fix will solve the problem partially only. But unsure if this bug should be reopened or another one created for this.

After the bug in code is fixed, we will still have a bunch of pages with missing / incorrect entries in link tables distributed among all wikis. And I doubt the problem to disappear automatically after some period of time as it is database, not cache related. Null edits on all pages hurt by this bug should update the link tables and fix this. However:

  • I see no point to do it before a fixed version is uploaded to servers (as no warranty that null edir really fixes the problem then)
  • It is a big pain to null-edit all pages in all wikis (for enwiki it may take years; even on en/fr.wikisource it would be many months)
  • I have no idea how to locate the affected pages, or at least locate the pages that are suspected to be affected.

Anybody is able to estimate when the bug started to appear (the affected pages wuold be the pages editted is a specific period of time, and not later)? Or suggest another database query that should contain all affected pages among result (but not all pages on a wiki)?

Things like this aren't always easy to test in development. People make "best effort" towards fixes, and then they need testing in production,

The bug was reported at the end of October. You can probably knock a couple of weeks of that for pages to nulledit. I suspect it wouldn't take that long using a bot working 24/7 at a reasonable "edit rate"

Note that this is not even deployed yet…

Change 256269 had a related patch set uploaded (by Reedy):
Race condition fixes for refreshLinks jobs

https://gerrit.wikimedia.org/r/256269

Change 256269 abandoned by Jforrester:
Race condition fixes for refreshLinks jobs

Reason:
Let's use the train rather the pull it forward a couple of days.

https://gerrit.wikimedia.org/r/256269

Hi, if I understand correctly, this means any affected pages that were tagged for speedy deletion (e.g. on enwiki) are not showing up in the speedy categories, and hence may not get deleted. This is a rather serious issue for attack pages and copyright violations. I was talking to some folks on IRC, perhaps we could go by edit summary to automate null edits on pages that were tagged for speedy by our known new page patrollers and bots? Is there an easier technical solution on WMF side for this?

Hi, if I understand correctly, this means any affected pages that were tagged for speedy deletion (e.g. on enwiki) are not showing up in the speedy categories, and hence may not get deleted. This is a rather serious issue for attack pages and copyright violations. I was talking to some folks on IRC, perhaps we could go by edit summary to automate null edits on pages that were tagged for speedy by our known new page patrollers and bots? Is there an easier technical solution on WMF side for this?

IMO you are right. We have exactly this problem.

My tests performed 1-2 weeks ago on plwikisource showed that at least 1-2% pages editted were affected by this bug. But ratio can be different for a large traffic wiki like enwiki.

Current estimates of potentally affected pages (from quarry):

select count(distinct rev_page) from enwiki_p.revision where rev_timestamp < '20151209015000' and rev_timestamp > '20151001000000';

count(distinct rev_page)
2877079

The number is likely overestimated as the '20150915000000' timestamp is arbitrary (@Reedy said above: "The bug was reported at the end of October") and any page already editted after '20151209015000' should have correct link tables entries.

Assuming a single bot operating with rather high 50/min ratio should fix it in 2877079/(50*60*24) = 39 days (of continuous bot work)

Let's call this "resolved". If the issue still occurs on wikis running 1.27.0-wmf.8 (you can check that using Special:Version), feel free to reopen.

This (or similar) problem still happens. The 6 pages listed in https://pl.wikisource.org/wiki/Specjalna:Nieskategoryzowane_strony :

https://pl.wikisource.org/wiki/Strona:Gustaw_Daniłowski_-_Lili.djvu/57
https://pl.wikisource.org/wiki/Strona:Gustaw_Daniłowski_-_Lili.djvu/58
https://pl.wikisource.org/wiki/Strona:Gustaw_Daniłowski_-_Lili.djvu/59
https://pl.wikisource.org/wiki/Strona:Gustaw_Daniłowski_-_Lili.djvu/60
https://pl.wikisource.org/wiki/Strona:Gustaw_Daniłowski_-_Lili.djvu/61
https://pl.wikisource.org/wiki/Strona:Gustaw_Daniłowski_-_Lili.djvu/62

were created yesterday and are not properly linked to the category:

https://pl.wikisource.org/w/index.php?title=Kategoria:Nieskorygowana&pagefrom=Gustaw+Dani%C5%82owski+-+Lili.djvu%2F50%0AGustaw+Dani%C5%82owski+-+Lili.djvu%2F50#mw-pages

I can't check database at the moment as Quarry seems to be stuck. Any hints?

See also https://gerrit.wikimedia.org/r/#/c/258445/ which will be in the next release. That fixed a bug where new pages would have bad links until the link job was retried sometimes (around 1 hour).

See also https://gerrit.wikimedia.org/r/#/c/258445/ which will be in the next release. That fixed a bug where new pages would have bad links until the link job was retried sometimes (around 1 hour).

wmf9 is out everywhere since yesterday.

aaron removed aaron as the assignee of this task.Dec 22 2015, 6:36 AM

For those who saw this issue, could you re-check if you still experience this issue? There were quite a lot of changes around jobs and related topics since this task was first opened, it could have been fixed.

ImageTaggingBot's most recent encounter with this issue is from June 6, for https://en.wikipedia.org/wiki/File:Tennis_A_Team_1926.jpg. Since the bot has been averaging only a few reports of this a month, any fixes since then won't be obvious yet.

This happened again with https://en.wikipedia.org/wiki/User:TimSGearhart/sandbox. A speedy deletion tag was added on April 16 but the page was never put in the category. I have since made a null edit and that fixed it.

Maybe we could run a query to look for Template:Db (and variants) transclusions where the page is not in Category:Candidates for speedy deletion?

This happened again with https://en.wikipedia.org/wiki/User:TimSGearhart/sandbox. A speedy deletion tag was added on April 16 but the page was never put in the category. I have since made a null edit and that fixed it.

Maybe we could run a query to look for Template:Db (and variants) transclusions where the page is not in Category:Candidates for speedy deletion?

Was the page being tracked in "what links here" of the template, before you did the null edit? It would be strange to have the templatelinks addition and not the categorylinks, for the same edit. If it was not in templatelinks neither, then that query won't give any useful result.

Was the page being tracked in "what links here" of the template, before you did the null edit? It would be strange to have the templatelinks addition and not the categorylinks, for the same edit. If it was not in templatelinks neither, then that query won't give any useful result.

Oh good point. No it probably was not in "what links here", then. I should have checked.

Can you think of a way to find these uncategorized pages?

Can you think of a way to find these uncategorized pages?

There's no way other than reparse all the pages again (running refreshLinks.php ), or retrieve all page content and search for that template (either xml dump or api query, the latter may take a lot of time)