WeightedTags can be sourced from different mechanisms using a dedicated stream.
Some producers might actually be MediaWiki itself and for such producers we might offer the ability to refresh (refers to the "oldDocument" semantic in the Saneitizer). This would ensure that the tags are "recomputed" once in a while even if no source events triggered a change to the tags of this page.
A concrete example is the PageAssessments extension and T378868: Allow searching articles by WikiProject that made use of the search weighted tags.
The initial approach taken was to send a weighted tag update on every LinksUpdate even in there are no changes in the underlying source data of the weighted tags. This had the advantage of allowing to populate the search index, the main drawback is that we now emit events that lead to NOOP.
If we exposed a hook triggered when an "oldDocument" is found by the Saneitizer we could optimize the such producers by letting them only send tag updates when the data is actually changed but also allow them to fix the search index at the same rate the Saneitizer is running hopefully leading to meaningful decrease in the number of events.
AC:
- A hook is exposed to let weighted_tags producers to compute the set of weighted_tags of a particular page
- The Saneitizer would call this hook when fixing and/or refreshing an old document
- The SUP consumer is updated to understand the new response of the Saneitizer when used from its API endpoints (needs to figure where to fit the weighted_tags in this response)
- The PageAssessments extension is updated to take benefit from this new hook
- The PageAssessments extension is updated to only emit actual changes on LinksUpdate: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageAssessments/+/1088592