Review
There are four LinksUpdate hooks, all of which pass a LinksUpdate object, which is a kitchen sink object with a lot of public properties and methods. I reviewed the extension handlers of these hooks. The aim is to make it easier for extensions to do common things, and to narrow the interfaces so as to make it easier to improve LinksUpdate.
Some extensions are only using LinksUpdate hooks as a convenient place to do an unrelated cache purge. GlobalUserPage, PageTriage, UploadWizard (and fork MediaUploader), PhpTagsWiki, SharedHelpPages and MarkImages only need the title and possibly the current revision. They could use the RevisionDataUpdates hook instead, available since 1.35. That hook is not called on delete, so some of them would have to also hook PageDeletionDataUpdates.
Some extensions are adding a link-like table, usually with their own implementation of incremental updates, specifically Babel, GeoData, GlobalUsage and PageAssessments. They all have some quirk which makes them different to core link tables, but they could probably still benefit from the link table abstraction that I'm developing.
Some extensions use the incremental data from LinksUpdate as input to their own data updates, specifically CirrusSearch, Disambiguator, EventBus and Echo. They could receive a narrower, read-only view into the generated data. They all use LinksUpdateComplete except for Echo, which uses LinksUpdateAfterInsert.
LinksUpdateAfterInsert is only used by the Echo extension. It needs a list of added links, but it could apparently use getAddedLinks() in LinksUpdateComplete instead, like CirrusSearch and EventBus.
LinksUpdateConstructed is only used by SemanticMediaWiki. It updates secondary data based on the ParserOutput. It could use the RevisionDataUpdates hook instead, except that it uses the recursive flag, which is not passed down to the hook despite being an argument to DerivedPageDataUpdater::getSecondaryDataUpdates(). It could apparently also use LinksUpdate or LinksUpdateComplete. It is unclear to me why it is using this unusual hook.
PageImages uses the LinksUpdate hook to modify the page properties before save. It loads the revision text from the database, parses the lead section, extracts extension data from the resulting ParserOutput, and then saves data derived from this operation back to $linksUpdate->mProperties as if it came from the main page parse. This seems inelegant. I think this could use a late parser hook instead, like ParserAfterTidy. It could discover the section of an image by looking for the image tags in the HTML and matching them up by name with previously stored candidates. In Parsoid this would be a DOM pass.
Translate is similarly using the LinksUpdate hook for last-minute modification of the data extracted from ParserOutput. It wipes $linksUpdate->mCategories depending on the page title. I think this could also use a late parser hook. There is ParserOutput::setCategories().
Proposal
- Deprecate LinksUpdateAfterInsert, instead use LinksUpdateComplete
- Deprecate LinksUpdateConstructed, instead use LinksUpdate, LinksUpdateComplete or RevisionDataUpdates
- For unrelated secondary data updates, use RevisionDataUpdates and PageDeletionDataUpdates
- Migrate PageImages and Translate to use a late parser hook instead.
- Make all LinksUpdate properties be private or protected.
- Wrap up incremental table changes derived from LinksUpdate into a narrower object and provide it to a new hook, for CirrusSearch etc.