Page MenuHomePhabricator

Explore global preference and its impact on our skin stats
Closed, ResolvedPublic

Description

As @Bawolff mentioned in T325193#8545431, the global preference could override the skin value in user_properties. We'd like to explore the size of users with global preference and how it skews our skin stats.

Event Timeline

jwang triaged this task as Medium priority.Jan 24 2023, 6:05 PM
Methodology

As the skin status pulled from user_properties schema only reflects as of now status, we rerun the analysis of T327953 with two versions at the same time, 1) consider global preference 2) not consider global preference, then compare the two skin distributions.
Global preference is pulled from global_preferences schema in centralauth database, collected on 2023-02-01.
Editors are pulled from wmf.mediawiki_history schema. We focus on the non-bot users who edited content pages at least once/five times on English Wikipedia between 2022-01-01 and 2022-12-31.

Summary
  • 627987 users edited on English Wikipedia in 2022. 0.56% of them (3531 editors) have global skin preference.
  • Among the 3531 editors who have global skin preference, 51.29% of them use vector, 25.74% of them use vector2022.
  • The skin distribution does not show a significant difference after considering the global preference.

Below table compares the skin distribution of editors (1 edits per year on enwiki) considering global preference and not considering global preference

Global prefGlobal prefNot consider prefNot consider pref
skin_namenum_editorseditors_pctnum_editorseditors_pct
amethyst10.00%10.00%
cologneblue4720.08%4770.08%
minerva12430.20%12200.19%
modern12040.19%11850.19%
monobook132392.11%132282.11%
timeless26850.43%26320.42%
vector367245.85%351245.59%
vector-202257241991.15%57412091.42%

Below table compares the skin distribution of slightly active actors (>=5 content edits per year on enwiki) considering global preference and not considering global preference

Global prefGlobal prefNot consider prefNot consider pref
skin_namenum_editorseditors_pctnum_editorseditors_pct
cologneblue1600.08%1620.08%
minerva3910.20%3810.20%
modern4860.25%4800.25%
monobook52112.69%52052.69%
timeless10710.55%10490.54%
vector182739.44%173048.94%
vector-202216791986.77%16893087.30%
  • Still recommend to consider global preference to make the result more accurate if analysis environment can access to centralauth database
  • The previous analysis without considering global preference still can be used as the baseline for comparison, if our later analysis takes into account the preference preference.

Hello, could you include data about currently active (5+ edits/mo) and very active (100+ edits/mo) editors? They may be more heavily affected by global prefs.

And could you use a different term than "active" for 5 edits/yr? Perhaps "slightly active".
"active" traditionally meant 5 edits/month.