Page MenuHomePhabricator

Newcomer tasks: Use search to revalidate cached tasks
Closed, ResolvedPublic

Description

We use a search query to generate a set of tasks, then cache it for up to a week, and revalidate when the user next asks for it. Currently, revalidation means checking whether the templates that were used to define those tasks (such as Template:Copyedit for copyediting tasks) are still present. In the future, as the search queries get more complex, we might need to revalidate much more things (whether the page is protected, whether it is in a bad category like articles nominated for deletion, whether it has recommendations etc). Each of these requires complex backend logic. It would be more sustainable if we could just revalidate by re-running the same search query and checking if the tasks still match it. This is doable if we create a search keyword to restrict the search to the set of pages which are in that task set.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 646896 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/CirrusSearch@master] Add PageIdFeature

https://gerrit.wikimedia.org/r/646896

Change 645788 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Replace TemplateFilter with TaskSuggester::filter

https://gerrit.wikimedia.org/r/645788

Change 646896 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add PageIdFeature

https://gerrit.wikimedia.org/r/646896

Change 655376 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Make TaskSuggester::suggest() options easier to expand

https://gerrit.wikimedia.org/r/655376

Change 655377 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Skip topics when revalidating

https://gerrit.wikimedia.org/r/655377

Change 655376 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Make TaskSuggester::suggest() options easier to expand

https://gerrit.wikimedia.org/r/655376

Change 645788 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Replace TemplateFilter with TaskSuggester::filter

https://gerrit.wikimedia.org/r/645788

Change 655377 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Skip topics when revalidating

https://gerrit.wikimedia.org/r/655377

I tested this on test.wikipedia.org by:

  • Going to Special:Homepage and limiting my filters to something that returned a manageable number of tasks (I did Architecture + References which yielded 13 tasks)
  • I picked one of the tasks and removed the {{Citation Needed}} templates.
  • Going back to the homepage, the server-side load of the page shows 13 tasks (since we fetch from the cache without any filtering) while the call to the API then returns 12 tasks (filtering applied)

Performance appears to be fine with this solution. The only downside is that there is a rare possibility that the first task (and only the first one) might have had its maintenance templates removed after the user's task set was cached. I think we can live with this.

Currently we set TTL_UNCACHEABLE but we could also set the TTL based on whether revalidation resulted in removing any tasks, so the server-side rendered results would only be incorrect for the first time.

Currently we set TTL_UNCACHEABLE but we could also set the TTL based on whether revalidation resulted in removing any tasks, so the server-side rendered results would only be incorrect for the first time.

👍 sounds good to me

Etonkovidova subscribed.

The same observations as in @kostajh comment regarding performance and overall users' experience.

For now, the potential task change after JS has loaded is behavior that exists regardless of this task, due to protection filtering happening outside of search. Once that's fixed and there is no need to provide a buffer (ie. 250 results when we are really just looking for 200), we could change SearchTaskSuggester to stop making queries as soon as it has hit the limit, and then a search with a limit of 1 will usually take a single query, at which point IMO we could revisit this.