Page MenuHomePhabricator

Clarify analytics and metrics definitions around anonymous and temporary editors
Closed, ResolvedPublic

Description

Currently, we count "anonymous editors" by counting the number of distinct IP addresses in a given period. We will need to update and document definitions around "anonymous editors" editors in line with technical changes made for IP Masking.

With the roll-out of IP Masking MVP (T324492), the way we identify and count anonymous or unregistered editors will change. We define "unregistered users" as users who are not logged in (see the most recent edit to the Meta "Unregistered user" page as of 15 March, 2023). As part of the IP Masking MVP, the following is planned for identifying temporary editors (from Office Wiki (internal)):

User::isRegistered() will return true for all registered accounts, including temporary accounts.
User::isAnon() will return false for such temporary accounts.
User::isTemp() will return true exclusively for temporary accounts.
User::isNamed() will return true exclusively for registered accounts that are not temporary accounts.

Given this change, we will also need to clarify or redefine what it means to be a "registered" account.

See also

Event Timeline

Adding Data-Engineering since we will work with them on the technical details.

Hi! I want to clarify that the following are code functions and not database fields:

  • User::isRegistered() will return true for all registered accounts, including temporary accounts.
  • User::isAnon() will return false for such temporary accounts.
  • User::isTemp() will return true exclusively for temporary accounts.
  • User::isNamed() will return true exclusively for registered accounts that are not temporary accounts.

Under the currently proposed plan, temp accounts will have user IDs just like a registered account and the best way to identify temp accounts will be to run a regex on the username column and look for usernames beginning with an asterisk (*).
There are ongoing conversations between Data Engineering and DBAs about the possibility of adding an explicit flag to identify temp accounts on the user table.

kzimmerman moved this task from Triage to Upcoming Quarter on the Product-Analytics board.
Mayakp.wiki added a subscriber: Milimetric.

We will begin working on this in Q4FY22-23 with @jwang and @Milimetric

updates from meeting with @kzimmerman , @Iflorez and @jwang

  • Where do we track or report anonymous editors? what could be impacted?
    • Form 990
    • Community Insights annual reporting?
    • Wiki Comparison
    • Equity Dashboard
  • What will break?
    • any code that relies on user_is_anonymous being purely anonymous users
    • any of the above reports of queries we have that uses event_user_id = 0 to calculate anonymous users
  • What should we be prepared for?
    • We do get questions on anonymous editors some times and it would be helpful if we have that on hand
    • Monitor as IP masking rolls out and differentiate between these 2 groups
    • given past experience, Build in buffers to deal with consequences and broken jobs/queries!

@Milimetric , can we pls get a list of all the tables where the proposed changes will be applied?

Next:

  • discuss in Product-Analytics sharing meeting
  • alternate proposal: in tables like mediawiki_history, can we have a separate column called unregistered_user_id to store the temporary user ids? this will avoid breaking queries that use event_user_id=0 to identify anonymous users since temporary and anonymous users both will have this field = 0. what would be its implications ?

After my discussion with Product-Analytics as well as Research and Decision Science, @Milimetric , @nshahquinn-wmf and I started discussing how the different types of users - registered, temporary, anonymous (ip) are defined in the upstream mediawiki tables vs. downstream data tables with the introduction of the new "temp" user.
This is being discussed with a larger audience in this discussion https://www.mediawiki.org/wiki/Talk:User_account_types#What_does_%22anonymous%22_mean and in T337103. Once we resolve this we can clarify definitions around anonymous and temporary editors.

nshahquinn-wmf renamed this task from Clarify definitions around anonymous and temporary editors to Clarify analytics and metrics definitions around anonymous and temporary editors.Jun 16 2023, 2:33 AM

This was actually done a while ago.