Page MenuHomePhabricator

EventStreams sending same data over and over (page links change)
Open, HighPublicBUG REPORT

Description

Concerning this API endpoint:
https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_page_links_change

An example JSON record given below, as sent by the stream. Notice when checking the diff https://arz.wikipedia.org/w/index.php?diff=5641431 it does not match the links in the JSON. This JSON record is sent repeatedly by the stream, maybe 10 days in a month and 1 to 7 times per day. The diff number and article title changes, but the set of links in the JSON stay the same repeating.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
{"$schema":"/mediawiki/page/links-change/1.0.0","meta":{"uri":"https://arz.wikipedia.org/wiki/%D8%B1%D9%88%D8%AF%D9%8A%D9%88%D9%85","request_id":"bc403e26b8b72080c369aa66","id":"26739413-d570-4363-af37-af690a94f501","dt":"2021-09-01T23:30:50Z","domain":"arz.wikipedia.org","stream":"mediawiki.page-links-change","topic":"codfw.mediawiki.page-links-change","partition":0,"offset":203083041},"database":"arzwiki","page_id":1389768,"page_title":"روديوم","page_namespace":0,"page_is_redirect":false,"rev_id":5641431,"performer":{"user_text":"InternetArchiveBot","user_groups":["bot","*","user","autoconfirmed"],"user_is_bot":true,"user_id":142851,"user_registration_dt":"2020-12-18T16:05:11Z","user_edit_count":20253},"added_links":[{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_BNF","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_GND","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_LCCN","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_LNB","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_NDL","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:CS1_maint:_uses_authors_parameter","external":false},{"link":"/wiki/International_Standard_Book_Number","external":false},{"link":"/wiki/National_Library_of_Latvia","external":false},{"link":"/wiki/Oxford_University_Press","external":false},{"link":"/wiki/%25D9%2585%25D9%2583%25D8%25AA%25D8%25A8%25D8%25A9_%25D8%25A7%25D9%2584%25D9%258A%25D8%25A7%25D8%25A8%25D8%25A7%25D9%2586_%25D8%25A7%25D9%2584%25D9%2588%25D8%25B7%25D9%2586%25D9%258A%25D9%2587","external":false},{"link":"/wiki/%25D9%2585%25D9%2583%25D8%25AA%25D8%25A8%25D8%25A9_%25D9%2581%25D8%25B1%25D9%2586%25D8%25B3%25D8%25A7_%25D8%25A7%25D9%2584%25D9%2588%25D8%25B7%25D9%2586%25D9%258A%25D9%2587","external":false},{"link":"/wiki/%25D9%2585%25D9%2584%25D9%2581_%25D8%25A7%25D8%25B3%25D8%25AA%25D9%2586%25D8%25A7%25D8%25AF%25D9%2589_%25D9%2585%25D8%25AA%25D9%2583%25D8%25A7%25D9%2585%25D9%2584","external":false},{"link":"/wiki/%25D9%2586%25D9%2585%25D8%25B1%25D8%25A9_%25D8%25AA%25D8%25AD%25D9%2583%25D9%2585_%25D9%2585%25D9%2583%25D8%25AA%25D8%25A8%25D8%25A9_%25D8%25A7%25D9%2584%25D9%2583%25D9%2588%25D9%2586%25D8%25AC%25D8%25B1%25D8%25B3","external":false},{"link":"/wiki/Hamish_Hamilton_Ltd","external":false},{"link":"/wiki/%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581_%25D8%25A7%25D9%2584%25D8%25BA%25D8%25B1%25D8%25B6_%25D8%25A7%25D9%2584%25D8%25B1%25D9%2582%25D9%2585%25D9%2589","external":false},{"link":"/wiki/%25D9%2585%25D8%25B3%25D8%25A7%25D8%25B9%25D8%25AF%25D8%25A9:CS1_errors","external":false},{"link":"https://www.wikidata.org/wiki/Q1087","external":true},{"link":"https://commons.wikimedia.org/wiki/Category:Rhodium","external":true},{"link":"https://www.quora.com/topic/Rhodium-1","external":true},{"link":"https://www.google.com/search%3Fkgmid%3D/m/025scm0","external":true},{"link":"https://catalogue.bnf.fr/ark:/12148/cb12218903f","external":true},{"link":"https://academic.microsoft.com/v2/detail/521398313","external":true},{"link":"https://academic.microsoft.com/v2/detail/2910290644","external":true},{"link":"https://id.loc.gov/authorities/sh85113755","external":true},{"link":"https://kopkatalogs.lv/F/%3Ffunc%3Ddirect%26local_base%3Dlnc10%26doc_number%3D000307942","external":true},{"link":"https://d-nb.info/gnd/4178038-3","external":true},{"link":"https://archive.org/details/naturesbuildingb0000emsl","external":true},{"link":"https://archive.org/details/elementsvisualex0000gray","external":true},{"link":"https://archive.org/details/periodictableits0000scer","external":true},{"link":"//doi.org/10.1351%252Fgoldbook","external":true},{"link":"//doi.org/10.1351%252Fgoldbook","external":true},{"link":"https://data.bnf.fr/ark:/12148/cb12218903f","external":true},{"link":"https://id.loc.gov/authorities/subjects/sh85113755","external":true},{"link":"https://kopkatalogs.lv/F%3Ffunc%3Ddirect%26local_base%3Dlnc10%26doc_number%3D000307942%26P_CON_LNG%3DENG","external":true},{"link":"https://id.ndl.go.jp/auth/ndlna/00569786","external":true}]}

No idea. Feel free to adjust for the right audience I wasn't sure.

Hmmm very strange! @Pchelolo? Sounds like something is strange with the MW hook.

odimitrijevic moved this task from Incoming to Event Platform on the Analytics board.

Some observations off the top of my head:

  • If a link update (more specifically, a RefreshLinksJob) fails, it will be re-scheduled. That would cause the event to be re-sent (but for the same page and revision ID)
  • when a page gets re-parsed because a template was updated (e.g. by adding links to it), that will trigger an event with links updates that has nothing to do with the current revision ID of that page. Attributing the links update to the edit identified by rev_id is very often wrong.
  • Adding links to a template will cause the same links to show up as added to all pages that use that template.
  • The links go to external "authority files" (BNF, GND, LCCN, LNB are all classification systems used by libraries). Templates for external identifiers are often fed from Wikidata, so an edit on Wikidata would cause the page to be re-parsed and a links-update event to be fired. However, these identifiers are usually specific to a single page, so seeing the same update for multiple pages is surprising. Is it really the *exact* same, or does it just look kind of the same, because it's the same set of external identifiers, and most of them stay the same, and just one of them was updated?