Page MenuHomePhabricator

Investigate why Wikidata changes don't trigger LinksUpdate job [4 hours]
Closed, ResolvedPublic

Description

Context:
Wikibase Client

Problem:
Wikidata changes don't seem to trigger LinksUpdate job. With T280627 we have had a similar problem in the context of sitelinks.

Examples:

  • "In the infobox film of eswiki we use Wikidata properties for automatic categorization by director, year, genres,... and I realized that changing the values of Wikidata changes the rendering of the list of categories in the HTML output, but that the pages don't display in the category unless the page is null edited."
  • "I connected this article to its Wikidata entity, but even if the rendering of the page changed to reflect that the IMDb ID of the template parameter is the same as the one of the Wikidata entity it's still displayed in the category that tracks the mismatch of identifiers."

Acceptance criteria:

  • we figured out what is going on and came up with a plan of attack

Original:
In the infobox film of eswiki we use Wikidata properties for automatic categorization by director, year, genres,... and I realized that changing the values of Wikidata changes the rendering of the list of categories in the HTML output, but that the pages don't display in the category unless the page is null edited.

I don't really know if it isn't working or if it is just that the LinksUpdate job is very lagged, but I think it's better reporting just in case.

Event Timeline

@Lydia_Pintscher, I just tested it and it's still happening in production right now. I connected this article to its Wikidata entity, but even if the rendering of the page changed to reflect that the IMDb ID of the template parameter is the same as the one of the Wikidata entity it's still displayed in the category that tracks the mismatch of identifiers.

There's no change linked to the task, so I assume that there's nothing pending to land in production and that when you added the TestMe project you were requesting confirmation of this problem still happening in production.

@Agabi10 Yes precisely. Thank you so much for checking and confirming!

Manuel renamed this task from Wikidata changes don't seem to trigger LinksUpdate job to Investigate why Wikidata changes don't trigger LinksUpdate job .Jul 20 2021, 9:58 AM
Manuel removed a project: TestMe.
Manuel updated the task description. (Show Details)

Related would be T280627 and some fixes that happened in there.
We should check the reproduction (the updates etc could take 6 hours)
We should be able to look in kafka for the job from the edit mentioned in T278924#7221244

Addshore renamed this task from Investigate why Wikidata changes don't trigger LinksUpdate job to Investigate why Wikidata changes don't trigger LinksUpdate job [4 hours].Jul 21 2021, 10:45 AM

So I see jobs being triggered. I'm not sure if they succeed or not:

ladsgroup@stat1005:~$ kafkacat -b kafka-main2001.codfw.wmnet -p 0 -t 'codfw.mediawiki.job.refreshLinks' -o -100000 | grep -i BadTitle | head
{"$schema":"/mediawiki/job/1.0.0","meta":{"uri":"https://placeholder.invalid/wiki/Special:Badtitle","request_id":"2d5fb4d6e4ececb5855550bf","id":"8d296dae-359a-4dd8-b2ed-380ca70e74f0","dt":"2021-08-10T08:49:00Z","domain":"zh.wiktionary.org","stream":"mediawiki.job.refreshLinks"},"database":"zhwiktionary","type":"refreshLinks","sha1":"d460e8e316f7b8f9172a13c2309306ced597fa7d","params":{"rootJobSignature":"title-batch:408f0dc002e682e27bcd6c18b9ec39b1440886e8","causeAction":"update","causeAgent":"uid:609373","namespace":0,"title":"Schulter","requestId":"[redacted]"},"mediawiki_signature":"52997c36528f7a886adb958bba6bbd295baf5836"}

One funny error I found is that for a job being triggered in zhwiktionary, we are setting uid of wikidata as the cause agent which can lead to confusion (but for later).

I dig a bit more.

Another thing: The six hour maximum is not true, I have found jobs with roots that go back maybe five days before.

I cannot check if the job succeeded but I can see the links got updated:

MariaDB [zhwiktionary_p]> select * from page where page_title = 'Schulter';
+---------+----------------+------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|  784623 |              0 | Schulter   |                   |                0 |           0 | 0.750508417857 | 20210810095552 | 20210810100224     |     6181565 |      906 | wikitext           | NULL      |
+---------+----------------+------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
1 row in set (1.21 sec)

while no actual edit has happened in this randomly chosen page (https://zh.wiktionary.org/w/index.php?title=Schulter&action=history) so I don't think a null edit happened on it either.

My conclusion is that it works, it's just slow specially when a template gets edited and the queue gets quite filled, it might take several days. My suggestoin course of action would be nothing or increasing concurrency of refreshlinks jobs (plus giving jobrunners more resources but that's not easy)

Sounds plausible to me. Moving to Verification for decision on further course of action.

Will flag this with Lydia