Page MenuHomePhabricator

Tracking implicit (extensions) usages of Wikidata
Open, Needs TriagePublic


As the mobile version of search shows the description of item + the page title, we should always track the description of page. Even if the page doesn't explicitly access the description with Lua, the mobile version does. The issue was raised by @Doc_James as there were cases of vandalism in description - and it is hard to track them.

Show changes to wikidata descriptions associated with pages in recentchanges (RC injection), like we do for other changes on wikidata that affect the page. We would also like this kind of "usage" to be visible on ActionInfo, like other wikidata usages are - though is is far less critical than the RC injection.

The current concept of "usage" is: has (potential) impact on the ParserOutput for the page. This is not true for the "usage" of descriptions by the mobile page, as currently implemented.
Similarly, we presently assume that when some information that is "used" on a page changes, that page needs to be re-rendered, in addition to mentioning the change in recentchanges, and purging the respective page from CDN. With information that is pulled in by skin layer code, not during parsing, re-rendering is not necessary, but RC injection and CDN purge is still needed.

There is two ways to tackle this:

  1. using a materialized usage in in the wbc_entity_usage table, and put the description (or other derived information) into the page_props table. This would be done via some hook that was triggered while (or after) parsing page content. MobileFrontend would then take the description from page_props. We already do something similar with displaytitle and page_image. This would trigger a re-rendering of the page even when only the page properties need to be updated. To avoid this, we'd need to be able to distinguish between proper in-content usage, and usage only fro page_props. This would probably require a schema change to wbc_entity_usage.
  2. using the notion of "virtual" usage, with a hook in AffectedPagesFinder that extensions can use to indicate that they consider the description of a given item "affecting" some page, even though that relationship is not in the database. Allowing extensions to specify whether they want to trigger (or avoid) a re-parse would still require some refactoring, but no schema change. However, virtual usages would not work seamlessly, additional hooks would be needed for the appropriate integration with ActionInfo and Special:EntityUsage. Maintaining subscription would however not be an issue, as long as the virtual usage is of an aspect of the "connected" item of a page - the wiki will be subscribed to these anyway.

This is potentially relevant for the following extensions:

  • WikidataPageBanner (statement)
  • In Other Projects Sidebar (sitelinks)
  • MobileFrontend (descriptions)
  • PageImages (statement)
  • GeoData (statement)

Event Timeline

Thanks. Looking forwards to this being fixed.

matej_suchanek renamed this task from Tracking implict (extensions) usages of WIkidata to Tracking implicit (extensions) usages of Wikidata.Sep 5 2017, 5:29 PM

Other "implicit usages" are for instance WPBBannerProperty setting of Wikidata-Page-Banner, $wgWikimediaBadgesCommonsCategoryProperty of WikimediaBadges, or eventually T35704.

I've just added three possible implementations for this above.

Currently I favor the second approach (allow extensions to register new usages via a hook which will be saved into wbc_entity_usage). This is (mostly) in line with the current infrastructure and should be the most flexible way to handle this.

I might miss something, but I have a fundamental question: How is "showing up in a search result" a "usage"? We track usages to be able to notify the pages that contain these usages when something changed. How do you notify a specific search result page? Are the search results cached? Is this about purging such caches?

If this is not about any cache but about literally "showing up in a search result" than any tracking is obsolete. Each and every description might show up in a search result any time, depending on what the users are searching for. In other words: you must consider all descriptions being used by a search feature. Tracking them individually is pointless.

you must consider all descriptions being used by a search feature. Tracking them individually is pointless.


I might miss something, but I have a fundamental question: How is "showing up in a search result" a "usage"?

Usage for recentchanges. E.g: someone vandalize intentionally (or use bot and unintentionly adds wrong description) the description, we would like to have a recentchange list in the client wiki that use this description so the users in that community cna patrol the change.

daniel added a subscriber: daniel.

I updated the task description with more detailed information. I dropped the "introduce API for registering usages". That seems really scary. I'd want to have a much clearer idea of how that would work before we consider it as a viable option.