Page MenuHomePhabricator

M1 mentee overview: Add a link to all/reverted/question-asking contributions or block log
Closed, ResolvedPublic

Description

Figma mockups for the mentor dashboard specify that the numbers displayed in the mentee overview module should be links to the list of said events.

Here is a list of numbers offered, as well as their link destination:

ColumnLink target
Registration dateNone
QuestionsList of questions (via Special:Contributions)
Total editsSpecial:Contributions for the user
Reverted editsSpecial:Contributions, filtered to reverted edits
BlocksSpecial:Log/block, filtered for the given user

Event Timeline

Change 734250 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] Mentee overview: Add link to Special:Contributions

https://gerrit.wikimedia.org/r/734250

Change 734250 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Mentee overview: Add link to Special:Contributions

https://gerrit.wikimedia.org/r/734250

Change 735750 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/core@master] SpecialContributions: Let tagfilter support multiple tags

https://gerrit.wikimedia.org/r/735750

Change 735751 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] Mentee overview: Add link to Special:Contributions from questions

https://gerrit.wikimedia.org/r/735751

Change 735750 merged by jenkins-bot:

[mediawiki/core@master] SpecialContributions: Let tagfilter support multiple tags

https://gerrit.wikimedia.org/r/735750

Change 735751 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Mentee overview: Add link to Special:Contributions from questions

https://gerrit.wikimedia.org/r/735751

Krinkle added subscribers: Tgr, Krinkle.

@Tgr @Urbanecm_WMF It looks like this change might be introducing a new way for end-users to cause large conditional SQL queries. Two questions:

  1. I notice the change mentions that the internal code was already multi-tag aware, but I suspect it was not yet exposed to the API or via a special page before, is is true that this thus exposes user-controlled multi-tag revision queries for the first time?
  1. It looks like this might be lacking an appropiate limit to the number of tags that can be passed via the form control. If true, we should probably restrict that in some way, e.g. we could limit it to 10 to start pessimistically and grow based on demand/usecases, or 500 to start pessimistically if we can confirm that such a query when done by hand today performs reasonably well on large wikis with many tags and then react as-needed to reduce it if we find new timeouts or other db errors. If it turns out this was already exposed elsewhere, then it might be worth checking if that has limits in place and either use that as inspiration and/or restrict those as well while at it.
  1. I notice the change mentions that the internal code was already multi-tag aware, but I suspect it was not yet exposed to the API or via a special page before, is is true that this thus exposes user-controlled multi-tag revision queries for the first time?

I think that's true, apart from recentchanges / watchlists which do include the revision table but are of course quite different queries.

Maybe we should flag it as a risky patch for the next train, although given that this is new functionality and not exposed to many users in an obvious way (it will be used in links from a feature currently only available on some medium-sized wikis to a small set of power users), problems caused by it are unlikely to coincide with the train schedule.

  1. It looks like this might be lacking an appropiate limit to the number of tags that can be passed via the form control. If true, we should probably restrict that in some way, e.g. we could limit it to 10 to start pessimistically and grow based on demand/usecases, or 500 to start pessimistically if we can confirm that such a query when done by hand today performs reasonably well on large wikis with many tags and then react as-needed to reduce it if we find new timeouts or other db errors. If it turns out this was already exposed elsewhere, then it might be worth checking if that has limits in place and either use that as inspiration and/or restrict those as well while at it.

Good point. I think the main use case here would be listing tags which are subclasses of a larger class (e.g. there are three types of revert tags so when filtering for reverts you'd want to search for each) so a small limit should be fine. Special:RecentChanges doesn't seem to have any limits, but filtering is a much more central feature there.

Wrt hand-testing, I did not see much point in it, because the filtering interface has about ten options, any combination of those options would result in a significantly different query, plus it might matter how frequent tags are in a given user's contribution history, so catching potential edge cases manually seems hopeless. The one test I did for the base case (P17915) didn't seem crazy.

Also, conceptually, multiple tags doesn't seem like a problematic change. In theory, filtering by a single tag could be arbitrarily slow since an unbounded number of revision rows need to be read to produce 50 tagged revisions, and we are already using the actor for indexing so we can't rely on change tag related indexes much. Looking for multiple change tags doesn't seem to affect that significantly; it will result in multiple rows per revision if the revision has multiple relevant tags, but that seems like a comparatively minor performance effect.

Looking for multiple change tags doesn't seem to affect that significantly; it will result in multiple rows per revision if the revision has multiple relevant tags, but that seems like a comparatively minor performance effect.

...although now that I typed that out, I wonder if the extra DISTINCT could make the LIMIT ineffective somehow, since the query planner can't easily tell anymore how many rows it needs to read? For comparison, here's an EXPLAIN without a DISTINCT (P17959) and with a single tag (P17958). The latter two are pretty much identical; the ones with and without DISTINCT differ in that the DISTINCT one uses a temporary table (which is bad, not sure if it's avoidable with a GROUP BY type query though).

Maybe we should flag it as a risky patch for the next train

Done in T293953#7545421

I tested a couple tag combinations on huwiki, the slowest (~100K user contribs, 2 tags with a combined ~1M occurrence, ~25M revisions on the wiki in total) took a few seconds. The tags which are exposed via mentor dashboard links were super fast (unsurprisingly, not many revisions tagged with mentorship-related tags yet).

The links were added, closing the task.