Page MenuHomePhabricator

Post-deployment Vector 2022 metrics analysis on English Wikipedia
Closed, ResolvedPublic

Assigned To
Authored By
ovasileva
Jan 19 2023, 6:53 PM
Referenced Files
F36890485: image.png
Mar 3 2023, 6:02 PM
F36890436: Screenshot 2023-03-03 at 10.40.39 AM.png
Mar 3 2023, 4:41 PM
F36371526: image.png
Jan 20 2023, 7:57 PM
F36340779: image.png
Jan 20 2023, 7:42 AM
F36340776: image.png
Jan 20 2023, 7:42 AM
F36340757: image.png
Jan 20 2023, 7:42 AM

Description

Background

We would like to monitor the metrics for the Vector 2022 skin post deployment on enwiki.

Acceptance criteria

Week of Jan 18
Observational analysis on the following before and after the deployment:

  • [DONE] Pageviews T327440#8542723
  • [DONE] Edit rates Edits. ( Edit rates is not a well defined metric. Instead, we'd like to measure the number of edits, which is a core product metric). T327440#8542723
  • [DONE] Account creation T327440#8542723
  • [DONE] Opt-out rates T327440#8544894
  • [DONE] Number of users (split by all, all editors, active editors) using each available skin T327953
  • [DONE] Analyze number of users using each available skin on English Wikipedia who have recently edited T328088
  • [DONE] Usage of fixed width toggle T327690
  • [Under discussion] Usage of "Enable limited width mode" preference (A snapshot of how many users total have enabled limited width mode vs disabled) T327688

If possible:

  • [Explored] Logins

Note: We don't have schema to log each login events. user table only logs the time when the user logged in and changed settings, not the time when the user visited in a logged-in mode. event.CentralAuth schema only logs auto loggins into the sister project. It does not log the readers who always stay on their local wikis, which is the majority of the user scenarios.

  • Content translation
  • [Done] Percentage of daily accounts created who turn off Vector 2022 (opt-outs from new accounts)

Note: The opt-out events are logged in mediawiki_skin_diff schema, which only stores hashed user_id. We cannot join with the other schema to identify whether s/he is a new comer. Suggest to measure number of new comers who created account after deployment using each available skin on English Wikipedia. T328609
As for the daily trend, the event.prefupdate schema logs skin preference user selected with user_id, but does not log the initial skin preference. It can provide daily trends of how many newcomers who have updated skin preference and which skin they opted for, assuming they are opting out from vector2022 to other skins. T328609#8651077

  • Percentage of daily account creations who have 0 edits after 24 hours

For first report
General

  • Pageviews
  • Edit rates Edits
  • Account creation
  • Opt-out
  • Content translation
  • Number of users (split by all, all editors, active editors) using each available skin
  • Logins
  • Percentage of daily accounts created who turn off Vector 2022 (opt-outs from new accounts)
  • Percentage of daily account creations who have 0 edits after 24 hours
  • Usage of fixed width toggle
  • Usage of "Enable limited width mode" preference

Feature-specific

  • [DONE] Search usage T328600
  • [DONE] ToC usage T329234
  • [DONE] Scrolling to top of the page T329235
  • Sidebar collapsing
  • Tools usage

Related Objects

Event Timeline

jwang renamed this task from Post-deployment Vector 2022 metrics analysis to Post-deployment Vector 2022 metrics analysis on English Wikipedia.Jan 20 2023, 5:15 AM
jwang updated the task description. (Show Details)

Here are the recent trends of high level metrics.

  • Pageviews on English Wikipedia

Numbers of pageviews on 1/18 and 1/19 are close to the level before the holidays. Data is extracted from pageviews_hourly schema.
Year over year comparison for January 2023 will be available after pageviews_daily schema has 2023-01 data.

image.png (808×1 px, 102 KB)

  • Edits on English Wikipedia

Number of edits has no significant change on 1/18 and 1/19. Data is extracted from mediawiki_revision_create schema.
Year over year comparison for January 2023 will be available after edits_hourly schema has 2023-01 data.

image.png (802×1 px, 96 KB)

  • Account creation on English Wikipedia

Number of new accounts increased significantly on 1/18 and 1/19. It rose from 4.01k on 1/17 to 7.5k on 1/19. Data is extracted from event.serversideaccountcreation schema.

image.png (786×1 px, 73 KB)

I have put above graphs in one dashboard for regular monitoring. Link: English Wikipedia Vector 2022 Desktop Deployment dashboard

Are you planning to measure the percentage of Vector 2022 users who opt out of the limited width mode? (using Preferences, not the toggle)

Are you planning to measure the percentage of Vector 2022 users who opt out of the limited width mode? (using Preferences, not the toggle)

Thank you and good catch @Jonesey95! Yes, that is definitely one of the things we want to track. Adding it to the description now.

  • Opt-out rates on English Wikipedia

The opt-out rate is defined as number of users who opted out Vector 2022 out of number of logged-in editors who made at least one content edit in the past year. The number of optout users is extracted from mediawiki_skin_diff schema. The number of editors in the past year is extracted from wmf.mediawiki_history schema.

The opt-out rate by this definition is an over-estimation. The user who changed skin version might not be an editor in the past year. But we don't have a good way to identify editors who opted out, due to a lack of common join key in two schemas. The mediawiki_skin_diff schema only log hashed user_id of the out-opt user.

Note:

  1. Number of optout users increased after the deployment, 9558 on 2023-01-18, 11549 on 2023-01-19.
  2. Daily opt-out rate is 1.5220% on 2023-01-18, and 1.8391% on 2023-01-19.
  3. Cumulative opt-out rate since deployment is 4.11% by the end of 2023-01-20.
Number of editors with at least 1 content edits on enwiki (2022-01-01 ~2022-12-31)Number of opt-out users since deploymentCumulative opt-out rate by 2023-01-20
627987258104.11%

Daily trend:

image.png (1×1 px, 132 KB)

Event dateOptout usersOptout rate
2023-01-01400.0064%
2023-01-02440.0070%
2023-01-03510.0081%
2023-01-04400.0064%
2023-01-05460.0073%
2023-01-06370.0059%
2023-01-07520.0083%
2023-01-08260.0041%
2023-01-09490.0078%
2023-01-10520.0083%
2023-01-113070.0489%
2023-01-122840.0452%
2023-01-132620.0417%
2023-01-143890.0619%
2023-01-152170.0346%
2023-01-162250.0358%
2023-01-171990.0317%
2023-01-1895581.5220%
2023-01-19115491.8391%
jwang updated the task description. (Show Details)
jwang updated the task description. (Show Details)
jwang updated the task description. (Show Details)

Is it possible to make these metrics publicly viewable? The link above goes to a Wikimedia Developer login page.

I would find it helpful to see, using the same user_id hash function so that you can get the breakdown:

  • # of accounts with 1 edit in the past year / with 1+/5+/100+ edits/month in the last 3 months
  • # of these accounts which have made an edit, logged in, or changed any preference since the launch (or: is there a better way to confirm if they have been online/reading this week?)
  • # of new accounts created since the launch
  • # of accounts in each of the above categories which have opted out

(for any major change w/ an opt-out, not just a skin.)

An argument could be made for a separate cache cluster dedicated to "opt-out for ongoing rollout" to enable gathering opt-out data for readers, to complement the above list..

Accounts with just 1 edit will probably be just . Even those that are experienced wikipedians (e.g. a cross-wiki user from another language) and would thus perfectly know how to edit their preferences and change the skin, would probably not bother changing the skin if they are just passing by for doing a single edit, even if they didn't like the skin. The cost of editing the preferences is actually larger than standing with it for their brief interaction. It's after some editing that it would pay back. Maybe worth checking for people that edited for at least N days.

PS: are bots being excluded from these metrics? While huge editors, they are unlikely to care about the skin preference.

@Platonides thank you for your comments. The metric includes bots. The skin preference won't be skewed if metric includes bot. Here is the comparison of metrics including bots and excluding bots: T328088#8573021.

Does the edits graph in T327440#8542723 include bots? Bots may not be a large proportion of users but they do contribute a large proportion of edits.

@Sj - apologies for the late reply. We did some additional digging into users who have recently edited.

In T327440#8567647, @Sj wrote:

I would find it helpful to see, using the same user_id hash function so that you can get the breakdown:

  • # of accounts with 1 edit in the past year / with 1+/5+/100+ edits/month in the last 3 months

We looked into this here T327953: Analyze number of users using each available skin on English Wikipedia. We haven't yet looked at 100+ edits since that's a very small percentage of users in terms of overall opt-out trends, but have marked it as something to check in the future. We're also looking into this veteran editor category across a number of the feature-level metrics (more detail in the subtasks for this ticket)

  • # of these accounts which have made an edit, logged in, or changed any preference since the launch (or: is there a better way to confirm if they have been online/reading this week?)

This is also a bit tricky - right now, we're proxying by edits made, although we've also noted that we have a lot of people with accounts using the new skin without making any edits. For example, see T328600#8625531, where we're seeing the biggest increases in search amongst account holders with no edits. In general, for the future, we're curious to learn more about this group - did they create accounts in order to edit? Are they looking for other reader-focused functionality that could theoretically be linked to an account in the future?
I really like your idea about looking into the the data for everyone who has logged-in during this time - we'll try to follow up on this.

  • # of new accounts created since the launch

Here's a quick snapshot of account creation right now

Screenshot 2023-03-03 at 10.40.39 AM.png (1×2 px, 188 KB)

Our assumption is that the spike is due to both users trying to create an account in order to opt-out, as well as due to the new increased prominence of the create account link (which is sustaining the growth after the spike). Overall, over the course of the spike, we assume that about 15K-20K accounts were created (or roughly the same amount that would be created during 1.5 weeks in regular circumstances).

  • # of accounts in each of the above categories which have opted out

(for any major change w/ an opt-out, not just a skin.)

This is available here T328609#8651077. Overall, among users who created accounts between January 18 and Feb 1, 93% of them use vector-2022 skin.
Generally, we're seeing the data roughly line up with the spike above, with about 35% of new accounts opting out in within 24 hours of registration for the first four days after the deployment, which drops significantly afterwards. Cumulatively, between January 18 and January 30, about 7% of accounts changed their preference 24 hours after registration (with the large majority opting back to Vector legacy, but some opting into other skins, or switching between Vector legacy and Vector 2022).

@Sj, We did consider measuring logged-in users, but since we don't track user logins, we don't have data on it. It was documented in T327440 description as below:

[Explored] Logins
Note: We don't have schema to log each login events. user table only logs the time when the user logged in and changed settings, not the time when the user visited in a logged-in mode. event.CentralAuth schema only logs auto loggins into the sister project. It does not log the readers who always stay on their local wikis, which is the majority of the user scenarios.

@ori, Thanks for the comment. Here is the trend of edits excluding bot on English Wikipedia.

image.png (760×1 px, 110 KB)

Many thanks @jwang and @ovasileva.

Re: very active editors (100 edits/mo)

that's a very small percentage of users in terms of overall opt-out trends

Yes, and also a large percentage of the users ( editors, layout designers, &c ) who will be fixing templates broken by a major skin upgrade.
Please don't average out that group. You should be able to see how a change is/isn't working for them in the data before there is a public outcry.

Re: getting stats on logged-in users without unwanted tracking

If you can tell when someone changes any preference, perhaps the # of people changing skins could be shown as a proportion of all users that change any preferences?

Re: cleaner account-creation + opt-out stats

Are many new account-creations still automated? It would be helpful to be able to classify the baseline of ~3500 new accounts/day, to capture signs of life other than edits. (maybe your observations about how people are using search + other novel interface uses, suggests a non-intrusive way to do this)

For cleaner opt-out data, I've said this before, but: a one-click option to change back (rather than a link to a confusing preferences page) would make a real difference.
A/B testing the effect of making the opt-out much more visible, would also be worth doing. (ditto for the width-toggle) I recall someone said that A/B testing small cohorts of logged-out readers wasn't easy or wasn't the norm... that's worth fighting for. :)

All done here. Will open new tickets for further inquiries