The way the field containing the number of incoming link is fed today has multiple caveat that makes it hard to port as-is to a new search update pipeline.
The main issue is that CirrusSearch uses it's own index to extract this number:
- a page X is edited to add a link to page Y and remove a link to page Z (known from MW LinksUpdateComplete hook)
- a job to re-compute the number of incoming_links to Y and another one for page Z is scheduled with a delay
- the job to update page X is assumed to run before Y & Z pages are updated
- the elasticsearch index is assumed to be refreshed before Y & Z runs the count(outgoing_links:X) query against elasticsearch
- delaying via changeprop is done by re-submitting a kafka message
- everything assumes that updating page X went well and that the index was refreshed in the given delay
Replicating this technique in the new search update pipeline does not seem wise.
Given that the number of incoming_links is mainly a relevance signal, knowing the value in real-time does not seem to be a strong requirement.
We could investigate if there are ways to have this field be updated from a batch job in a similar way we do update the popularity_score field so that we could better evaluate how this field should be approached by the new search update pipeline.
- investigate possible ways to compute and refresh the number of incoming links from a batch job