Page MenuHomePhabricator

Consider removing link recommendation from search index in PageSaveComplete hook instead of SearchDataForIndex
Closed, DeclinedPublic

Description

Problem:

  • I opened a link recommendation task from test.wikipedia.org and in an incognito browser made an edit to that same article.
  • Then I refreshed the article in my first browser tab. I’d expect to see “No suggestions found” but instead I’m presented with the AddLink plugin UX.

Reason:

  • SearchDataForIndex is triggered via job queue processing. If that lags, then there is a delay in removing the link recommendation row from gelr_recommendations as well as removing the weighted tag from the search index.

Should we use PageSaveComplete hook instead?

Event Timeline

kostajh created this task.

Change 715235 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] AddLinkSubmissionHandler: Use getByLinkTarget and add logging

https://gerrit.wikimedia.org/r/715235

kostajh renamed this task from Fix invalidation logic in PostSave hook / AddLinkSubmissionHandler::run() code to Consider removing link recommendation from search index in PageSaveComplete hook instead of SearchDataForIndex.Aug 30 2021, 10:09 PM
kostajh lowered the priority of this task from High to Medium.
kostajh removed a project: Patch-For-Review.
kostajh updated the task description. (Show Details)
kostajh moved this task from Code Review to Incoming on the Growth-Team (Current Sprint) board.

We could potentially make use of a cache with a TTL of something like 6 hours (long enough to ensure that the job queue had a chance to run and that the search index could have been updated from the onSearchDataForIndex hook). We would add page IDs to the cache when an article is edited and it has link recommendations associated with it; then 1) in BeforePageDisplayHook or in the recommendation provider, we would ensure that we don't include page titles from that cached list, which would prevent the Add Link UX from loading in the scenario described in the description of this phab task and 2) SearchTaskSuggester::filter() could exclude items from the cached list, to prevent those articles from appearing in the scenario described in T290315: Add Link: Ensure user does not see invalid tasks in task queue

Discovery-Search team, we currently have some logic in HomepageHooks::onSearchDataForIndex where we do:

$fields[WeightedTagsHooks::FIELD_NAME][] = LinkRecommendationTaskTypeHandler::WEIGHTED_TAG_PREFIX
				. '/' . CirrusIndexField::MULTILIST_DELETE_GROUPING;
			try {
				$this->linkRecommendationHelper->deleteLinkRecommendation(
					$page->getTitle()->toPageIdentity(), false );
			} catch ( DBReadOnlyError $e ) {
				// Leaving a dangling DB row behind doesn't cause any problems so just ignore this.
			}

The idea is that we remove a cached link recommendation entry from the local database and also tell ElasticSearch to update the article's hasrecommendation field in the search index. We want this to happen as soon as possible after any type of edit to an article containing link recommendations, since those articles should not be presented to the user (the link recommendations might be invalidated by the edit made to the article).

As seen in the task description, there can be a considerable lag after an edit happens to the article and before the search index is updated, and before the cached item is dropped from the database.

So, we were wondering if it would be problematic to issue deletes to CirrusSearch from the PageSaveComplete hook? Example code we use elsewhere:

$cirrusSearch = ( $this->cirrusSearchFactory )();
			$cirrusSearch->resetWeightedTags( $pageIdentity,
				LinkRecommendationTaskTypeHandler::WEIGHTED_TAG_PREFIX );

Would that result in faster updates to the search index or would there still be some lag as those updates are processed?

kostajh claimed this task.

Overall I don't think it would make a difference which hook is used; processing the updates for weighted tags https://wikitech.wikimedia.org/wiki/Search/WeightedTags is not going to be realtime. I will look at other options in T290315: Add Link: Ensure user does not see invalid tasks in task queue