Problem Statement:
It is possible that there will be an impact on topline numbers for active editors (and potentially new and returning active editors), regardless of the size of the pilot wikis chosen for rolling out the Temp accounts initiative.
The business data steward, @OSefu-WMF will work with his team to identify a reasonable approach for handling monthly and quarterly reporting of Movement Metrics.
Solution:
In this task we would like to explore the possibility of calculating the number of temp users from mediawiki_history that are active editors. Temporary editors can be identified using the user names format ~YYYY-nnnnn-nnn. eg. ~2023-27459-041; see reference.
We can then minus this number from the total active editor numbers each month to get the actual permanent (registered) editors. And repeat the same for new and returning active editors.
PS: This is a temporary solution until DPE updates our pipelines and adds the flag (see T356701)
Definition of Done:
- Run a query to identify the temp accounts that are active editors
- Performance evaluation: query shouldn't take more than 5 mins to run each month
- Getting the count of temp editors does not require a complex query
- This calculation is computationally efficient
- Does not add an overhead to the monthly metric repo, i.e. it can be integrated into monthly_report notebook as a temporary solution and then easily disintegrated from the repo once we have the [[ URL | user_is_temp ]] flag provided by Data Platform Engineering
Next Steps:
- we will open a new task to implement this temporary solution, until we are able to work on T371651
