Page MenuHomePhabricator

Investigation of QS sampling based on time frame capabilities
Closed, ResolvedPublicSpike

Description

This task is a follow-up of T291500. We need to investigate and document what are the possible options for sampling based on a time frame for a number of edits.

What we know
While it looks like it is possible for QS to sample based on logged-in status and edit count, it gets a bit more fuzzy when looking at a certain amount of edits within a time frame.

Open Questions

  • Can we sample based on a time frame for a number of edits?

Yes, but it would need to be implemented:

  • New filtering criteria on QuickSurveys
    • PHP backend
      • We would need to add the new configuration criteria and parsing in SurveyFactory
      • Only if there are surveys enabled that require this new number of edits we would calculate it
      • We would set the information in a config variable for the frontend to read and bucket the users (for example with Manual:Hooks/ResourceLoaderGetConfigVars
    • JS frontend
      • We would need to add the reading of this new number of edits, and some code so that this filter would be used when selecting a survey for the user
  • The edit counting:
    • We could follow for example the approach taken by core in ActiveUsersPager and query recent changes to count the edits of the user viewing the page
    • Another approach could be querying the revision table, depending on how busy the recent changes table is
    • Depending on the time frame, the queries may be very slow, so we would need to add caching, likely using MediaWikiServices::getInstance()->getMainWANObjectCache(); (| docs) with a sensible TTL, so that the computation is not repeated on each of the user's page views
    • We should also be careful to only query where there is an index since those tables are very big. We can check this kind of thing in doc pages like Manual:Recentchanges_table#Indexes for example.
  • If we can, what kind of effort would it be?

It is hard to say, but eyeballing I'd probably say an 8 pointer for the QuickSurveys new criteria feature and an 8 to 13 pointer for the edit counting one.

  • What other ways do we have to get information about if a user is active? Other data we could use as a proxy to filter inactive editors from seeing a survey?

There are other options that could be a proxy, like the date of last edit. This combined with the total number of edits (which is readily available) could be an interesting proxy for finding out if a user is active.

This information is available in PHP using MediaWiki\User\UserEditTracker::getLatestEditTimestamp.

We would still need to do the "New filtering criteria on QuickSurveys" part, but that is only about half the work.

Event Timeline

@Madalina I added a couple of questions, let me know if they look right to you.

I've added all my related findings to the description and answered the questions.

Let me know if there are any questions.

This looks very clear to me. I'm giving it a +2. I'm curious if we want to open more tasks from it.

Looks like we got what we were looking for, I'll mark this as solved. We decided we won't be touching on time frame for the survey MVP but it's good documentation for further iteration