Page MenuHomePhabricator

Rewrite fixLinkRecommendationData.php to be able to process more than 10K articles per topic
Open, Needs TriagePublic

Description

Currently, the fixLinkRecommendationData.php script is only able to process up to 10K articles per topic. This is because the standard pagination offered by Elastica (docs) is limited to 10K of results per query. Since fixLinkRecommendationData.php is topic-oriented, this currently corresponds to 10K of articles per topic.

Elastica also offers search_after, which is able to iterate through the whole resultset (this is explicitly recommended as the alternative for those who need more than 10K results). CirrusSearch exposes this as the SearchAfter iterator, but the abstraction in MediaWiki core is unaware of this difference. Since AddLink has a hard dependency on CirrusSearch anyway, it shouldn't be a problem to use SearchAfter directly. That would allow us to process any number of articles CirrusSearch results, overcoming the 10K limitation.

While at it, we might also want to simplify fixLinkRecommendationData.php to not be topic oriented anymore. Iterating through all of hasrecommendation:link would likely result in cleaner code that would be easier to read. It would also simplify debugging cases of the script leaving dangling records behind itself.