Page MenuHomePhabricator

Document what new editing metrics IP Masking could enable us to report on
Closed, ResolvedPublic

Description

The IP masking project will create a temporary account for anonymous users on their first edit. EditAttemptStep and other schemas will be updated to support this change in T332437

This task involves identifying and documenting what new editing metrics we will be able to track for these temporary users and any potential limitations.

Metrics to explore:
This is just a starting list of metrics to explore and will be updated as new editing metrics as they are identified as part of the work associated with this task

  • Editor retention
  • Number of unique temporary users in addition to unique registered users.
  • Proportion of temporary users that perform an editing action.

Event Timeline

MNeisler triaged this task as Medium priority.
MNeisler updated the task description. (Show Details)
MNeisler added a project: Product-Analytics.
MNeisler moved this task from Triage to Current Quarter on the Product-Analytics board.
MNeisler updated the task description. (Show Details)

User Types Definitions and Terminology

There will be three user types once IP masking is rolled out. There have been ongoing discussions about how to define and name each of these user types. See discussion on mediawiki user account type talk page and Slack thread.

Based on the current proposal, we should use the following terminology when referencing these three user types:

  • Registered User (indicated as user_id != 0 and user_is_temp = false in editing schemas).
  • Unregistered user
    • Temporary User (user_id != 0 and user_is_temp = true).
    • IP User (user_id == 0 ).

We should avoid the term "anonymous" moving forward to avoid confusion.

In T332437, a user_is_temp field was added so that we can identify edit events from all of the above three user types in editing-related schemas (i.e. editattemptstep, visualeditorfeatureuse, talk_page_edit).

Background Info

Temp accounts will be linked to auto-generated username that is based on a cookie on the user’s browser. The cookie will last 12 months by default. When the cookie is about to expire, the user will receive a notice to log in or create an account to get credit for future edits and access other features. Instead of being attributed an IP address, multiple IP addresses can be linked to a temporary account.

Other features: Similar to IP users, temp users will have no access to preferences, emails or user groups. Temp users can receive notifications but will have no access to notification-related preferences.

Reference: IP Masking Hackathon Session Presentation

New Editing Metrics

For future impact and engagement analyses

  • Number of distinct temporary users/Proportion of temporary users. For example, we can look at the proportion of temp users that successfully attempt and complete an edit that is not reverted or the proportion of temp users that add a comment/new topic to a talk page.
  • Temp user editor retention (max retention window 12 months). The proportion of temporary users that publish at least 1 edit and return to publish an edit X days later. Note: We will only be able to track temp user retention for up to 12 months after they received a temp account after their first edit.
  • User edit count of temp users. What percent of temp users completed 1-5 edits, 5-10 edits, etc?
    • Note: A temporary account is not created until after a user saves their edit. This means that all temporary account users will have by definition completed at least 1 edit. Also, a temp user account only lasts for 12 months so we would only be able to track their edit count within that timeframe.
    • Recommend not applying our current Junior and Senior Contributor definitions to these users as we can only track their edit count for the duration their temp account exists (1 year) vs registered edits where we can track cumulatively since the user's registration date.
  • Distinct temp users in AB Test. We should be able to track distinct temp users that performed an action in the AB test without requiring the use of the anonymous_user_token field. This field will still need to be used to track distinct IP users bucketed into the test.
  • Percent of temp users blocked after making an edit. Per details described on the project page, there will be a workflow to block temp users.
  • Notifications sent to and read by temp users. Notifications sent to temp users should be logged in EchoInteraction and echo_notification database, however, those schemas do not currently have a user_name or user_is_temp field to quickly distinguish these users. We would need to identify user_ids associated with temp user accounts from the user table and then join to these databases.

For tracking impacts of IP masking rollout
As IP masking is rolled out, it would be useful to monitor any significant changes in the following editing metrics:

  • Changes in the number of edit attempts and edits published by registered and IP users. Does the IP masking project lead to more IP users creating an account/registering?
  • Changes in block rates by user type. How do the block rates of temp users compare to the block rates of IP and registered users?

Open Questions:

  • Will the user_editcount field in EditAttemptStep be updated with a count of edits completed by the temp user account?
  • Will an IP user's first edit that caused them to be issued a temp account be linked to their account?

Temp user editor retention
User edit count of temp users. What percent of temp users completed 1-5 edits, 5-10 edits, etc?
Distinct temp users in AB Test

@MNeisler, the investigation you summarized in T332842#8881103 is wonderful. I'm particularly keen for us to be able to start reporting on metrics like the ones I've pulled out above.

Now, to the two questions you posed...

  1. "Will the user_editcount field in EditAttemptStep be updated with a count of edits completed by the temp user account?"
  2. "Will an IP user's first edit that caused them to be issued a temp account be linked to their account?"

Who do you think is best equipped to comment on the above? My instinct is that the AHT will be able to address "2." although I'm not sure about "1."...

I've provided some updates to the open questions below:

"Will the user_editcount field in EditAttemptStep be updated with a count of edits completed by the temp user account?"

The user_editcount field in EditAttemptStep retrieves the edit count from the user_editcount field of the user table. I asked for confirmation on the #talk-to-ip-masking slack channel (see thread) if there were plans to aggregate temp user edit counts within this field.

Based on responses there so far, it looks like there are no current plans to do that. If we think it would be valuable for us to track that info, we would need to make a request through Phab and assign it to the Data products team to be completed as part of the bigger IP masking changes.

"Will an IP user's first edit that caused them to be issued a temp account be linked to their account?"

A temp user account is not created until after the IP user completes their edit. Based on current documentation, I believe that the initial edit would be tagged as being completed by an IP unregistered user and only their subsequent edits would be attributed to their temp-generated user name. Recommend reaching out to AHT to confirm.

@MNeisler all that you described in T332842#9029294 sounds great. Per what we talked about offline today, no further action needed.