Page MenuHomePhabricator

Categories tracking pages with wikidata links are not updated when items on Wikidata are modified
Open, Needs TriagePublic

Description

Let a page on a client wiki add a tracking category if the linked item is missing a P1472 statement.
Then, when a P1472 is added to the item, the page should be re-renders, but it is not - it still shows the category.
When purging the page, it is re-rendered, and no longer shows the category; however, the category page still lists the page.

Expected behavior: after the P1472 statement is added to the wikidata item, the page on the client wiki should be re-rendered, and should no longer be listed on the category page.

Hypothesis: The RefreshLinksJob scheduled by PageUpdater ends up using a cached ParserOutput, instead of re-parsing the page. Puring updates the parser cache, but doesn't update the categorylinks table. A null edit is needed to trigger that.

Event Timeline

New Picture (2).png (894×919 px, 99 KB)

In the screen shot above you can see on top Creator:Andrey_Voronikhin creator page, without c:Category:Creator templates with Wikidata link: item missing linkback, and on a bottom content of c:Category:Creator templates with Wikidata link: item missing linkback with Creator:Andrey_Voronikhin in it.

This doesn't only happen in commons. It happens also in eswiki. The categories aren't refreshed until a null edit.

When I've seen this happening, I've just assumed it's due to large job queues, and that the update will happen eventually. Does changing the page on Wikidata not add the refresh tasks to the job queue?

@aaron Any chance you could have a look at it? @daniel had a quick look and ran into problems understanding the logic in the job queue code.

A wikidata change triggers links updates as follows:

  • ChangeHandler::handleChange calls WikiPageUpdater::scheduleRefreshLinks
  • WikiPageUpdater::scheduleRefreshLinks schedules a RefreshLinksJob
  • RefreshLinksJob::runForTitle() then...
    • re-parses the page (hopefully - the interaction with the parser cache is somewhat complex. But since the page itself is getting re-rendered, this part seems to work)
    • calls WikitextContent::getSecondaryDataUpdates, which returns a LinksUpdate
    • calls LinksUpdate::doUpdate, which updates the database, including the categorylinks table

My suspicion is that the problem is with the category page itself being cached in the CDN. Perhaps the HTMLCacheUpdateJob triggered by PageUpdater should have the recursive flag set.

If the CDN cache is not the problem, then the issue may be with RefreshLinksJob using an old, cached ParserOutput - but then I don't see how the ParserOutput gets updated at all.

@aaron My confusion is understanding the subtleties of the timestamp handling in RefreshLinksJob, and the interaction between HTMLCacheUpdateJob and RefreshLinksJob, and the ParserCache. Can you check that WikiPageUpdater is setting the correct parameters?

Some observations from the trenches (in case they help). So in case of Category:Creator templates with Wikidata link: item missing linkback:

  • The pages in the category will not disappear for a very long time (next edit?) even if the condition that triggered addition to the category was fixed on Wikidata
  • Purging creator page from the category removes the category from the page, but does not remove page from the category
  • Null edit of the page is the ultimate fix. I always use a bot to "touch" each page in such category before working on it.

Two more random thoughts:

  • There might be two separate issues here: (1) page pulling data from Wikidata is not updated after edit on Wikidata and (2) that page purge removes category from the page but does not trigger removal of the page from the category
  • Related to issue 1: is there a way to look up a list of pages that should be updated (let's say on Commons) when some specific item is changed? There must be a way through SQL query but perhaps there is an easier way. That way we could verify that page like Creator:Achille Laugé is on the list to be updated when Q2823073 is modified.

@Jarekt You can check which pages depend on which entity using Special:EntityUsage on the client wiki, e.g. https://commons.wikimedia.org/wiki/Special:EntityUsage/Q23.

As to problem (1) - you said in the task description "When P373 property is added to the item on Wikidata, the page itself no longer shows the category". That would mean problem (1) does not exist - or am I misunderstanding something? Problem (2) apparently "somehow" exists, but it's not clear how, exactly.

As to problem (1) - you said in the task description "When P373 property is added to the item on Wikidata, the page itself no longer shows the category". That would mean problem (1) does not exist - or am I misunderstanding something? Problem (2) apparently "somehow" exists, but it's not clear how, exactly.

I made mistake in the description P373 should be changed to P1472. The problem (1) exist until a purge or some other event with similar result. Problem (2) exist after the purge (or other event) and before null edit.

You can do the following experiment to observe the issue. Pick any page in Category:Creator templates with Wikidata link: item missing linkback. It can be in one of 3 states:

  1. item on Wikdata does not have matching P1472 (and the page should be there)
  2. item on Wikdata has matching P1472 and creator page shows " Category:Creator templates with Wikidata link: item missing linkback" in category list (problem #1)
  3. item on Wikdata has matching P1472 and creator page does not show " Category:Creator templates with Wikidata link: item missing linkback" in category list (problem #2)

You can get from state 1 to state 2 by adding P1472 to wikidata item (press Up arrow with wikidata logo for easiest way), and from state 2 to state 3 by purging the creator page. Touching all the creator pages before any action ensures that afterwards all the pages will be in state 1.

Oh! So, the problem description should be:


Let a Wikipedia add a tracking category if the linked item is missing a P1472 statement.

Then, when a P1472 is added to the item, the page should be re-renders, but it is not - it still shows the category.

When purging the page, it is re-rendered, and no longer shows the category; however, the category page still lists the page.


If this is correct, the behavior is consistent with the RefreshLinksJob using a cached ParserOutput object, instead of re-parsing the page, and we need to find out why.

The effect is that a) the page isn't updated and b) the categorylinks table is updated, but based on the old information - so the offending category remain in place. Puring the page will only fix the visible page, not the links, because action=purge does not trigger a RefreshLinksJob. That requires a null edit.

daniel updated the task description. (Show Details)
daniel updated the task description. (Show Details)

A good way to test this theory would be to find a page affected by this and see if page_links_updated timestamp is greater than the timestamp on the wikidata edit. The timestamp on page_links_update should be the time that the page was finished being parsed (i wonder why its not start of parse?) So if its using an old cached ParserCache object than the timestamp should be from before the wikidata edit (or perhaps the same time if there is some sort of race condition).

Change 602053 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/core@master] Remove unwanted parse step

https://gerrit.wikimedia.org/r/602053

I believe that https://gerrit.wikimedia.org/r/602053 should fix this (and a couple of related issues), but it needs review. Anyone? :)

Change 602053 merged by jenkins-bot:
[mediawiki/core@master] Remove unwanted parse step

https://gerrit.wikimedia.org/r/602053