Page MenuHomePhabricator

Newcomer tasks: task suggestions fail because of search queries exceeding lenght limits
Closed, ResolvedPublic

Description

CirrusSearch has a limit for search query length (currently 300 on Wikimedia wikis) to avoid expensive searches (which are typically made by bots and are pointless or abusive); task suggestion searches (coming in the form of hastemplate:<list of templates defining a task type> morelikethis:<list of 5 pages defining a topic, combined for up to 20 topics selected by the user>) can easily exceed that, causing the user to get an error.

Possible fixes:

  • increase the search limit for newcomer task queries (would have to assure first that these searches are not particularly expensive; since morelikethis works by picking the top 50 representative words from each page and substituting them in the search query, that might well not be the case). This would leave the remote (API-based) task suggester broken since we can't reliably identify newcomer task queries there.
  • split into multiple queries. Currently we make up to 5 queries (one for each task type); this way it would be up to 100. That could take seconds (but worth profiling at least).
  • prevent the user from picking too many topics in some way.

Details

Related Gerrit Patches:
mediawiki/extensions/GrowthExperiments : wmf/1.35.0-wmf.14Newcomer tasks: Make a separate search query for every topic
mediawiki/extensions/GrowthExperiments : masterNewcomer tasks: Make a separate search query for every topic

Event Timeline

Tgr created this task.Jan 12 2020, 11:24 PM
Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald TranscriptJan 12 2020, 11:24 PM

Change 563810 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: Make a separate search query for every topic

https://gerrit.wikimedia.org/r/563810

Tgr claimed this task.Jan 13 2020, 9:59 AM
Tgr moved this task from Incoming to Code Review on the Growth-Team (Current Sprint) board.

Change 563810 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: Make a separate search query for every topic

https://gerrit.wikimedia.org/r/563810

For QA (if this is something to be QA-d, I'm not sure): before this patch, selecting many topics probably caused an error. Now it should work (but maybe cause some slow-down).

When done, this should be moved off the sprint board and 1.1 board, but I'd like to keep the task open as I'd like to find a better solution for this in the long term.

Change 564162 had a related patch set uploaded (by Catrope; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.14] Newcomer tasks: Make a separate search query for every topic

https://gerrit.wikimedia.org/r/564162

Change 564162 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.14] Newcomer tasks: Make a separate search query for every topic

https://gerrit.wikimedia.org/r/564162

Tgr added a comment.Thu, Jan 23, 3:10 AM

When done, this should be moved off the sprint board and 1.1 board, but I'd like to keep the task open as I'd like to find a better solution for this in the long term.

Doing that requires solving T243478: Newcomer tasks: fetch ElasticSearch data for search results and T242476: Newcomer tasks: when selecting multiple topics, one topic should not dominate over the others, both of which are currently also worked around via per-topic queries.

Etonkovidova closed this task as Resolved.Thu, Jan 23, 5:11 PM
Etonkovidova added a subscriber: Etonkovidova.

Checked in betalabs and in production - selecting all topics and all difficulty levels (and do less extreme queries but still extensive queries) does not fail. However, the is a difference in count of articles between the topic overlay count and the task card counter. Often the right count appears only in the task card but not in the topic overlay. The issue would be addressed in separate tickets.

Tgr added a comment.Fri, Jan 24, 10:05 PM

This is fixed right now, but will become an issue again when we undo the separate-search-query-for-every-topic hack.