Page MenuHomePhabricator

Add Link: Fix production discrepancies between the link recommmendation table and the search index
Open, HighPublic


As per T261407#7088136, a significant number of task creation events did not get processed (hopefully a one-time problem). When a task is stored in the DB, that disqualifies it from being generated again - the assumption is that there is an EventGate event somewhere in the pipeline; if we emit another event, there might be a few hours between the first event arriving (and thus the task becoming available) and the second event arriving, if the task gets done or invalidated during that time, the second event will bring the index and DB out of sync.

That means all articles for which the EventGate event got lost are permanently disqualified from becoming tasks. (Or semi-permanently - an edit to the article will clear the DB record.) On wikis where the number of valid task candidates is not that huge, this is can become a problem. There should be some way fix such pages, either automatically in refreshLinkRecommendations.php or manually in some maintenance script. (fixLinkRecommendationData.php would do it, but it's barred from running in production.)

Event Timeline

So basically we need to find the DB entries which do not match the search index, and either add them to the search index or delete them from the DB. The latter is already implemented in fixLinkRecommendationData.php but disabled in production (since some level of discrepancy between the DB and index is always expected, as DB writes are immediate and index writes take effect in a few hours, so trying to "fix" those would actually introduce a permanent discrepancy for those pages). The former is a lot faster (which might or might not be important, depending on how often this happens).

kostajh triaged this task as Medium priority.May 19 2021, 8:09 AM
MMiller_WMF raised the priority of this task from Medium to Needs Triage.Mon, Jun 7, 5:27 AM
MMiller_WMF triaged this task as Medium priority.
MMiller_WMF raised the priority of this task from Medium to High.Mon, Jun 7, 5:11 PM
kostajh added a subscriber: kostajh.

@Tgr tentatively assigning to you.