Page MenuHomePhabricator

Add a link engineering: clean up old entries in growthexperiments_link_recommendations
Closed, ResolvedPublic

Description

growthexperiments_link_recommendations caches the responses from the mwaddlink service. Responses for non-current revisions of pages are not useful anymore and should be cleaned up periodically to save storage space. That can be done via a DeferredUpdate. See also T268803#6861562

Event Timeline

IIRC we were going to do this by listening after page edit and invaliding the cache and search index that way, or is the proposed cronjob handling a different concern?

We do that for the index but not for the cache. That might be a better option, I didn't want to impact page save performance but I guess in a deferred update it should be fine.

We do that for the index but not for the cache. That might be a better option, I didn't want to impact page save performance but I guess in a deferred update it should be fine.

Yeah a deferred update seems like it would be fine.

kostajh renamed this task from Add a link engineering: cronjob to clean up old entries in growthexperiments_link_recommendations to Add a link engineering: clean up old entries in growthexperiments_link_recommendations.Mar 22 2021, 10:48 AM
kostajh updated the task description. (Show Details)
kostajh moved this task from Backlog to Post-release backlog on the Add-Link board.

Change 674394 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Delete link recommendation from DB when it becomes invalid

https://gerrit.wikimedia.org/r/674394

Tgr edited projects, added Growth-Team (Current Sprint); removed Growth-Team.
Tgr moved this task from Incoming to Code Review on the Growth-Team (Current Sprint) board.

To QA, you'd have to make an edit, then verify in the link recommendation DB table that the relevant row has been removed.

Change 674394 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Delete link recommendation from DB when it becomes invalid

https://gerrit.wikimedia.org/r/674394

@kostajh, @Tgr
(1) answering a single 'yes' or 'no' will remove an article entirely from the growthexperiments_link_recommendations table even though there are many suggested links on the article that were not reviewed.
(2) reverting add link edits will not reinstate the article as a suggested links article

As far as I remember (1) and (2) are the expected behavior - just double-checking.

(3) in betalabs a suggested link article is removed from the growthexperiments_link_recommendations but it is still present in SE module. When a user clicks on such article, the warning is displayed: "Suggestions are no longer available on this article". Is it specific to betalabs env?

(3) in betalabs a suggested link article is removed from the growthexperiments_link_recommendations but it is still present in SE module. When a user clicks on such article, the warning is displayed: "Suggestions are no longer available on this article". Is it specific to betalabs env?

It might be, the mechanism for updating the search index (which the results in the homepage module are based on) is completely different between beta and production. It's not a known issue though.

(3) in betalabs a suggested link article is removed from the growthexperiments_link_recommendations but it is still present in SE module. When a user clicks on such article, the warning is displayed: "Suggestions are no longer available on this article". Is it specific to betalabs env?

It might be, the mechanism for updating the search index (which the results in the homepage module are based on) is completely different between beta and production. It's not a known issue though.

Thanks, I think it's specific to betalabs env.

Actually, that's only true for adding entries to the search index, not removing them. Still, any non-MediaWiki infrastructure (including the search index) is unreliable in beta, so it might be beta-specific. You should be able to verify on testwiki.

@kostajh, @Tgr
(1) answering a single 'yes' or 'no' will remove an article entirely from the growthexperiments_link_recommendations table even though there are many suggested links on the article that were not reviewed.
(2) reverting add link edits will not reinstate the article as a suggested links article

As far as I remember (1) and (2) are the expected behavior - just double-checking.

Yes, that's right.

Actually, that's only true for adding entries to the search index, not removing them. Still, any non-MediaWiki infrastructure (including the search index) is unreliable in beta, so it might be beta-specific. You should be able to verify on testwiki.

The warning "Suggestions are no longer available on this article" is quite frequent on testwiki wmf.5. Unfortunately, I don't have access rights to production db to verify the issue the same way as I did for betalabs.

Got the Console error "Link suggestion not found for "Murray Gell-Mann"" on cswiki wmf.5.

Actually, that's only true for adding entries to the search index, not removing them. Still, any non-MediaWiki infrastructure (including the search index) is unreliable in beta, so it might be beta-specific. You should be able to verify on testwiki.

The warning "Suggestions are no longer available on this article" is quite frequent on testwiki wmf.5. Unfortunately, I don't have access rights to production db to verify the issue the same way as I did for betalabs.

Got the Console error "Link suggestion not found for "Murray Gell-Mann"" on cswiki wmf.5.

It should be fixed after T282873: Add Link: Fix production discrepancies between the link recommendation table and the search index is done.

The issue that made that task necessary did not affect testwiki though, so maybe there is still some unknown bug with (de)indexing.

The issue that made that task necessary did not affect testwiki though, so maybe there is still some unknown bug with (de)indexing.

Should we reopen this then? Make a new task?

I think this is working (or at least we have no evidence to the contrary). We have two somewhat similar issues: