https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/501444/ seems to have changed the behaviour of the links update job to use the parser cache when a page has an entry.
As a result of this when a wikibase repo (wikidata) has an edit that it dispatches to a client to update its pages sometimes ContentAlterParserOutput will not run, which is currently what adds the wikibase_item page prop.
After a little more digging, it looks like perhaps the behaviour actually changed at the end of 2018 in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/465157/
At least in this patch it looks as though refreshlinks requests the parser output from RevisionRendered with with generate-html set to the result of shouldCheckParserCache, which I believe would be false for a client page having a links update triggered by wikidata dispatching.
This will not happen to pages that get updated that do not have a parser cache entry.
All pages can be fixed with a null edit.
I also assume when a parser cache entry expires the new entry will update the page props correctly (TODO verify / confirm?)
**Possible solutions**
- Change where this (and potentially other) updates happen.
- Allow the wikibase update process to ignore the parser cache in the job
- Something else?
**Bug report**
Title: page_props missing links for some Commons category <-> Wikidata sitelinks
Some Commons categories are linked to Wikidata, but are missing from the page_props table in the database. Examples:
https://commons.wikimedia.org/wiki/Category:Broadway_East,_Baltimore
https://commons.wikimedia.org/wiki/Category:Buddhist_temples_in_Lamphun_Province
https://commons.wikimedia.org/wiki/Category:Buddhist_temples_in_Ubon_Ratchathani_Province
https://commons.wikimedia.org/wiki/Category:Civil_law_notaries
https://commons.wikimedia.org/wiki/Category:Climate_change_conferences
https://commons.wikimedia.org/wiki/Category:Former_components_of_the_Dow_Jones_Industrial_Average
https://commons.wikimedia.org/wiki/Category:Dukes_of_the_Archipelago
https://commons.wikimedia.org/wiki/Category:Eastern_Catholic_orders_and_societies
https://commons.wikimedia.org/wiki/Category:English_people_of_Turkish_descent
Lucas confirmed this on Twitter https://twitter.com/LucasWerkmeistr/status/1175747434208727040 and there's discussion at https://www.wikidata.org/wiki/User_talk:Jheald#Quarry_oddity . It's not clear why this is happening - the examples I've given are from categories I've found when looking through commons links from enwp (in alphabetical order, hence B-E).
** Steps to reproduce by Adam:**
So this can still be reproduced on testwiki and testwikidata using the suggested reproduction path in T233520#5805154
- Create an item on https://test.wikidata.org/ with a single simple string statement - https://test.wikidata.org/w/index.php?title=Q214652&oldid=538230
- Create a page on https://test.wikipedia.org/ that includes some data data from a statement on wikidata item `{{#property:P664|from=Q214652}}`
- The wikipedia page will be parsed and rendered with the data from wikidata
- Add a sitelink from the item to the page https://test.wikidata.org/w/index.php?title=Q214652&diff=538234&oldid=538230
- The page_props will not be updated for the page (you could check this via the API or on some of the dbs directly)
- Alter the statement that is being included on the page by editing wikidata
- keep refreshing the page on the wikipedia and the data included on the page will be updated
- page_props will still not be updated
- Perform an edit on the page
- page_props are finally updated.
Thus I'm not convinced that the change in https://gerrit.wikimedia.org/r/c/574868 fixes the whole problem and we probably need to take another dive into this.