Page MenuHomePhabricator

Explore using mediawiki_history to count temp accounts
Closed, ResolvedPublicSpike

Description

Problem Statement:

It is possible that there will be an impact on topline numbers for active editors (and potentially new and returning active editors), regardless of the size of the pilot wikis chosen for rolling out the Temp accounts initiative.
The business data steward, @OSefu-WMF will work with his team to identify a reasonable approach for handling monthly and quarterly reporting of Movement Metrics.

Solution:

In this task we would like to explore the possibility of calculating the number of temp users from mediawiki_history that are active editors. Temporary editors can be identified using the user names format ~YYYY-nnnnn-nnn. eg. ~2023-27459-041; see reference.
We can then minus this number from the total active editor numbers each month to get the actual permanent (registered) editors. And repeat the same for new and returning active editors.
PS: This is a temporary solution until DPE updates our pipelines and adds the flag (see T356701)

Definition of Done:

  • Run a query to identify the temp accounts that are active editors
  • Performance evaluation: query shouldn't take more than 5 mins to run each month
  • Getting the count of temp editors does not require a complex query
  • This calculation is computationally efficient
  • Does not add an overhead to the monthly metric repo, i.e. it can be integrated into monthly_report notebook as a temporary solution and then easily disintegrated from the repo once we have the [[ URL | user_is_temp ]] flag provided by Data Platform Engineering

Next Steps:

  • we will open a new task to implement this temporary solution, until we are able to work on T371651

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Update queries to filter out temporary accountsrepos/movement-insights/movement-metrics!25hghanitemporary_account_initiativemain
Customize query in GitLab

Event Timeline

Mayakp.wiki changed the subtype of this task from "Task" to "Spike".Sep 9 2024, 4:55 PM

changing to spike, as this is an exploration of the solution/temporary workaround, and not the actual implementation.

Just a point of clarification: the format of a temp user account will not necessarily take the form of ~YYYY-nnnnn-nnn. Depending on the conditions, the number of digits will change (source) and so a regexp to identify temp accounts would not be the best. The recommended solution is to use a string match on users names that start with ~2. As an example: there are currently temp accounts with a 4 digit number after the year prefix (see screenshot).

Additional note, the user table in mariadb contains a flag for temp accounts as indicated in the screenshot, so this table can be used to QA our workaround method to ensure all temp accounts are accounted for in any filtering.

image.png (319×1 px, 38 KB)

OSefu-WMF moved this task from Incoming to Doing on the Movement-Insights board.

LIst of pilot wiki's has been published by the temp accounts team

On October 29, we will deploy Temporary Accounts to:

  • Czech Wikiversity
  • Igbo Wikipedia
  • Italian Wikiquote
  • Swahili Wikipedia
  • Serbo-Croatian Wikipedia

On November 5, we will deploy to:

  • Persian Wiktionary
  • Japanese Wikibooks
  • Cantonese Wikipedia
  • Danish Wikipedia
  • Serbian Wikipedia
  • Romanian Wikipedia
  • Norwegian Bokmål Wikipedia

For the temporary solution, we will add a NOT LIKE '~2%' filter to our monthly metrics queries rather than modifying the underlying tables, which will be addressed later with a permanent solution. The following adjustments will be made: mobile_edits.sql will include the NOT LIKE '~2%' filter under the WHERE clause; active_editors.sql will have the same filter added in the first CTE; and new_editor_retention.sql will include the filter in the WHERE clause. We will defer updates to regional active editors until after the permanent solution is implemented, as these tables do not currently support filtering by user name.

nshahquinn-wmf subscribed.

I believe this is done.