Page MenuHomePhabricator

Re-run active editors skin statistics
Open, NormalPublic

Description

This was run around a year ago: T147696: Statistics about /active/ users of skins on the Wikimedia cluster.

I'd like for this to be re-run so we can see what impact (if any) Timeless has made. I think @Isarra would also appreciate if we could also get stats for the fr projects in addition to the global stats. Thanks :)

Event Timeline

Legoktm created this task.Nov 17 2017, 11:51 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 17 2017, 11:51 PM
Neil_P._Quinn_WMF triaged this task as Low priority.Nov 18 2017, 12:10 AM

I'm happy to do this! But it's a busy time for me and I can't make any particularly promises about timing. Feel free to ping me in the future if you want an update :)

@Neil_P._Quinn_WMF will the queries posted on the older task still work? I think I have access to the right analytics machine, so would I be able to run this report myself?

Isarra added a comment.Dec 1 2017, 2:23 AM

Yeah, I'd love to have this now that Timeless has had a chance to stew everywhere for a bit. Be good to have an idea where we're starting from.

Yep, those queries would still work if you wanted to do it yourself. A few notes:

  • The query for creating the intermediate table pulls from my editor-month table. I'm still keeping that updated, so that's not an issue.
  • You'd need to install the multiquery package.
  • I took the multiquery output and manually adjusted it in a spreadsheet to total across wikis and assign users with deactivated skins set to Vector. If I were doing it again, I'd automate that and put it in a Jupyter notebook so it was trival to rerun.

Hope that helps!

First you'd need to restore the actual user preferences after they were polluted/corrupted by T114208.

Neil_P._Quinn_WMF added a comment.EditedDec 1 2017, 9:22 PM

First you'd need to restore the actual user preferences after they were polluted/corrupted by T114208.

As I said in that task:

While that was definitely a bad incident, I doubt it had much effect on this data. I looked only at the skin preferences of active users (using varying definitions of 'active') on the wiki where they're most active.
The preference corruption happened on September 30, 2015. I pulled this data on December 1, 2016. If an active user didn't change their preference back on the one wiki where they're most active given 14 months to do so, I think it's fair to say they don't care :)

First, that issue is far outside the scope of this task. Second, please stop beating this horse that died two years ago.

Nemo_bis added a comment.EditedDec 2 2017, 3:09 PM

The horse is very much alive: for instance I still find myself fixing corrupt preferences for my account pretty much every week. It's also a bit hard to claim that the reliability of the underlying data doesn't matter to your analysis, or that the analysis cannot be fixed to keep into account the data corruption.

Neil_P._Quinn_WMF added a comment.EditedDec 4 2017, 10:01 PM

It's also a bit hard to claim that the reliability of the underlying data doesn't matter to your analysis

The underlying data is the user preferences that are actually operational now, which these queries will accurately reflect. If you think more remediation for the preferences corruption needs to be done, you can take that discussion to the engineers and ops folks who work in that area. As far as I can tell, they disagree with you, and to be honest I trust them more.

or that the analysis cannot be fixed to keep into account the data corruption.

I am sure that with extensive effort it would be possible. In addition to being of low value for the reasons I described above, it would be totally unhelpful for the point of this task, which is understanding the adoption of Timeless.

revi added a subscriber: revi.Mar 21 2018, 5:23 PM

@Neil_P._Quinn_WMF it would be useful in the same query to have a look at "Minerva Neue/Minerva skin". I expect adoption is low, but I'm interested to get a sense of how worthwhile splitting it out from MobileFrontend was.

Bawolff added a subscriber: Bawolff.EditedApr 16 2018, 8:00 AM

I was curious. I did some one off queries just now for quick stats (This is not exactly what's being asked for here.) i did not do enwiki or commons due to slowness.

In case anyone is interested - of people in Special:ActiveUser (1 edit in last 30 days) excluding anon users, with various versions of "vector" (0 '' null vector) being merged, but not simple, etc being merged into vector:

[The wikis queried were chosen pretty much at random as big but not huge wikis]

frwiki: 17007 vector, 513 monobook, 53 timeless, 51 modern, 26 cologneblue, 4 minerva
hewiki: 2580 vector, 13 monobook, 1 timeless, 1 modern, 1 clogneblue, 0 minerva
arwiki: 4620 vector, 29 monobook, 12 modern, 6 timeless, 6 cologneblue, 3 minerva
zhwiki: 7123 vector, 198 monobook, 27 modern, 24 timeless, 12 cologneblue, 3 minerva
dewiki: 15273 Vector, 1372 monobook, 81 modern, 30 cologneblue, 29 timeless, 6 minerva
enwiktionary: 1332 vector, 51 monobook, 2 modern, 1 minerva, 1 cologneblue, 1 timeless
frwiktionary: 413 vector, 16 monobook, 9 timeless, 1 cologneblue, 0 modern, 0 minerva

[edit late addition - run about a week after the initial stats were done. Did the remaining top 10, then some random small wikis for variety]
enwiki: 111969 vector, 4200 monobook, 492 modern, 202 cologneblue, 184 timeless, 51 minerva, [30 invalid skins]
commonswiki: 26978 vector, 883 monobook, 55 modern, 17 timeless, 14 cologneblue, 0 minerva [ 1 invalid]
officewiki: 122 vector, 2 timeless. 0 any other skin
jawiki: 11769 vector, 66 monobook, 24 modern, 11 timeless, 11 cologneblue, 1 minerva [0 invalid]
ruwiki: 9843 vector, 138 monobook, 34 modern, 11 timeless, 1 cologneblue, 1 minerva [ 0 invalid]
itwiki: 7413 vector, 244 monobook, 12 modern, 8 cologneblue, 5 timeless, 1 minerva [2 invalid]
eswiki: 15634 vector, 341 monobook, 32 modern, 8 timeless, 4 cologneblue, 4 minerva [ 1 invalid]
ptwiki: 5897 vector, 141 monobook, 9 modern, 6 timeless, 5 cologneblue, 1 minerva [4 invalid]
plwiki: 3728 vector, 241 monobook, 15 modern, 7 cologneblue, 5 timeless, 0 minerva [ 0 invalid]
scowiki: 95 vector, 4 monobook, 0 everthing else
simplewiki: 965 vector, 15 monobook, 1 modern, 1 cologneblue, 0 minerva [ 0 invalid]
eowiki: 280 vector, 7 monobook, 0 everything else
elwiki: 828 vecotr, 7 monobook, 1 modern, 1 cologneblue
nowiki: 1400 vector, 66 monobook, 3 timeless, 3 modern, 1 cologneblue
metawiki: 3916 vector, 141 monobook, 10 timeless, 8 modern, 1 cologneblue, 1 minerva [0 invalid]
wikidatawiki: 16292 vector, 208 monobook, 17 timeless, 17 modern, 3 cologneblue, 0 minerva [1 invalid]
ruwiki: 9844 vector, 138 monobook, 34 modern, 11 timeless, 1 cologneblue, 1 minerva [0 invalid]
mediawikiwiki: 1396 vector, 25 timeless, 22 monobook, 3 modern, 1 minerva, 1 cologneblue [0 invalid]
wikitech: 88 vector, 3 monobook, 1 timeless

Query used:

select count(*), up_value from querycachetwo  straight_join user on user_name = qcc_title left join user_properties on up_user = user_id and up_property = 'skin' where qcc_type = 'activeusers' and qcc_namespace = 2 group by 2 order by 1 desc limit 30;

Main takeaway seems to be that the french like Timeless. Timeless seems to be between modern and cologneblue in popularity. There are not that many people using Minerva, but I think this is expected as it was just newly introduced, rather quietly, and I imagine it will take some time before people "discover it" [same thing could probably also be said of timeless]

Bawolff added a comment.EditedApr 21 2018, 4:21 AM

The other question, is what about "readers" and non-editors. How do they choose skins? Here is the breakdown of what skin users of enwiki (who are using a non-default skin) and registered after timeless was enabled (Nov 22, 2017), and what their edit count is. There are 1110731 users total in this time period:

mysql:research@analytics-store.eqiad.wmnet [enwiki]>   select IFNULL(concat( power(10,floor(log(user_editcount)/log(10))), ' - ', power(10,floor(log(user_editcount)/log(10))+1)),'0')  'editcount', up_value, count(*) from user inner join  user_properties  where up_property = 'skin'  and up_user = user_id and up_user > 32449779 group by 2,1 order by 2,1 limit 60;
+--------------+-------------+----------+
| editcount    | up_value    | count(*) |
+--------------+-------------+----------+
| 0            | cologneblue |      151 |
| 1 - 10       | cologneblue |       60 |
| 10 - 100     | cologneblue |        9 |
| 100 - 1000   | cologneblue |        1 |
| 0            | minerva     |      228 |
| 1 - 10       | minerva     |       95 |
| 10 - 100     | minerva     |       11 |
| 100 - 1000   | minerva     |        1 |
| 0            | modern      |      269 |
| 1 - 10       | modern      |      196 |
| 10 - 100     | modern      |       10 |
| 100 - 1000   | modern      |        2 |
| 0            | monobook    |      134 |
| 1 - 10       | monobook    |       71 |
| 10 - 100     | monobook    |       35 |
| 100 - 1000   | monobook    |       10 |
| 1000 - 10000 | monobook    |        1 |
| 0            | timeless    |      554 |
| 1 - 10       | timeless    |      226 |
| 10 - 100     | timeless    |       67 |
| 100 - 1000   | timeless    |        8 |
+--------------+-------------+----------+

And for comparision here is over a longer term. Its hard to make direct comparision as the number of users is different. Also this may be unfair to timeless since it goes back before it was a choice. (registered after Feb 27, 2017 18:12:10 UTC. There are 3057136 users total in this time period):

select IFNULL(concat( power(10,floor(log(user_editcount)/log(10))), ' - ', power(10,floor(log(user_editcount)/log(10))+1)),'0')  'editcount', up_value 'skin', count(*) '# of users' from user inner join  user_properties  where up_property = 'skin'  and up_user = user_id and up_user > 30470109 group by 2,1 order by 2,1 limit 60;
+------------------+-------------+------------+
| editcount        | skin        | # of users |
+------------------+-------------+------------+
| 0                | cologneblue |        615 |
| 1 - 10           | cologneblue |        337 |
| 10 - 100         | cologneblue |         40 |
| 100 - 1000       | cologneblue |          6 |
| 0                | minerva     |        508 |
| 1 - 10           | minerva     |        209 |
| 10 - 100         | minerva     |         30 |
| 100 - 1000       | minerva     |          5 |
| 0                | modern      |        998 |
| 1 - 10           | modern      |        657 |
| 10 - 100         | modern      |         91 |
| 100 - 1000       | modern      |         14 |
| 1000 - 10000     | modern      |          1 |
| 0                | monobook    |        547 |
| 1 - 10           | monobook    |        364 |
| 10 - 100         | monobook    |        136 |
| 100 - 1000       | monobook    |         33 |
| 1000 - 10000     | monobook    |          5 |
| 10000 - 100000   | monobook    |          1 |
| 100000 - 1000000 | monobook    |          2 |
| 0                | timeless    |        627 |
| 1 - 10           | timeless    |        253 |
| 10 - 100         | timeless    |         98 |
| 100 - 1000       | timeless    |         14 |
| 1000 - 10000     | timeless    |          2 |
+------------------+-------------+------------+

And just to throw some variety, here is what commons looks like (time period of registered later than Nov 22):

[commonswiki]>   select IFNULL(up_value, 'vector') 'skin', IFNULL(concat( power(10,floor(log(user_editcount)/log(10))), ' - ', power(10,floor(log(user_editcount)/log(10))+1)),'0')  'editcount', count(user_id) '# of users' from user left join  user_properties  on up_user = user_id and up_property = 'skin'  where   user_id > 6908927 group by 1,2 order by 1,2 limit 60;
+-------------+----------------+------------+
| skin        | editcount      | # of users |
+-------------+----------------+------------+
| cologneblue | 0              |         16 |
| cologneblue | 1 - 10         |         11 |
| cologneblue | 10 - 100       |          1 |
| minerva     | 0              |         14 |
| minerva     | 1 - 10         |          6 |
| modern      | 0              |         22 |
| modern      | 1 - 10         |         14 |
| modern      | 10 - 100       |          6 |
| monobook    | 0              |          8 |
| monobook    | 1 - 10         |         17 |
| monobook    | 10 - 100       |          4 |
| timeless    | 0              |         33 |
| timeless    | 1 - 10         |         18 |
| timeless    | 10 - 100       |          3 |
| timeless    | 100 - 1000     |          1 |
| vector      | 0              |     179584 |
| vector      | 1 - 10         |      62047 |
| vector      | 10 - 100       |       4429 |
| vector      | 100 - 1000     |        295 |
| vector      | 1000 - 10000   |         27 |
| vector      | 10000 - 100000 |          3 |
+-------------+----------------+------------+

Probably more careful analysis is needed to figure out what this means, but some take aways I had:

  • Timeless seems to be quite popular among new users who don't use the default skin. This is especially true among users with few to no edits, but its still a strong second place among new users with high edit count.
  • Monobook is surprisingly popular among new users. I used to assume that monobooks primary popularity was really old users who hate change, but it seems like monobook is popular even among newly registered users
    • In particular, monobook seems to be very popular among new users who make lots of edits. It would be interesting to know why. Maybe its partially cultural (All the oldhands use monobook, maybe new super active contributors want to fit in) or maybe monobook is just better at the high activity use case.

So what exactly does one need to actually run these queries? Are you guys doing this on labs or quarry or what? Could someone rerun this or tell me exactly how I would do it myself?

So what exactly does one need to actually run these queries? Are you guys doing this on labs or quarry or what? Could someone rerun this or tell me exactly how I would do it myself?

They were being run on the analytics db ( https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas ). Getting access to that generally requires an NDA, approval, WMF sponser, etc.

Skin preferences are generally considered private, so the public quarry servers only have user_properties_anon. This is an anonymized query based on

select cast(extract(year_month from user_touched)*100+1 as date) upa_touched, up_property, up_value 
from user_properties, user, meta_p.properties_anon_whitelist
WHERE user_id=up_user and up_property like pw_property;

Which basically means you get a list of skin properties and the year_month of when user_touched last changed. user_touched behaviour is probably not what most people think it is, so its hard to interpret this table other then to get the total number of people who have set their skin to timeless (e.g. select count(*) from user_properties_anon where up_property = 'skin' and up_value = 'timeless'). To do any sort of more complex analysis than that you need access to the unredacted analytics dbs.

So no way to check against users that have even been active since the skin was created, or types of users in terms of edits.

Would have been really nice if we could have just set up something automated for this. And, well, other such features, for that matter. What's the buy-in on any given extension, generally? Should I even be bothering to support it?

In T180860#4147344, @Bawolff ran some queries...

Dumb question, but do we have any idea what's with all the 0 edit users?

In T180860#4147344, @Bawolff ran some queries...

Dumb question, but do we have any idea what's with all the 0 edit users?

Before GlobalPreferences, lots of power users with Opinions™ on their skin had a script to go to every wiki and set their skin to whatever.

(For Vector, I have no idea.)

Before GlobalPreferences, lots of power users with Opinions™ on their skin had a script to go to every wiki and set their skin to whatever.
(For Vector, I have no idea.)

Wikimedia is truly strange.

Neil_P._Quinn_WMF added a subscriber: kzimmerman.

@Isarra, @Bawolff is right that this needs special access to run. In addition, it's not possible to simply re-run the code I wrote in 2016; since then, the Analytics MediaWiki replica databases have been split across multiple hosts, so it's no longer possible for an analyst to create a new table in the staging database (as I created a new table of active users) and join it with the user_properties table of each wiki.

However, it seems like a lot of people are interested; in addition to all the comments here, @Jdlrobson recently asked me for this to help see the impact of T223824. So, I will raise this with my manager @kzimmerman and see if we can make time for this.

Since I'm thinking about it, let me ask some questions I was wondering about:

  • As I understand it, the only valid skin preferences are currently vector, monobook, modern, cologneblue, minerva, and timeless. Any other preferences should be normalized to the default (currently vector). Is that right?
  • Now that we have global preferences, is it necessary to take those into account in some way? Are global preferences replicated in individual wiki databases?

@Isarra, @Bawolff is right that this needs special access to run. In addition, it's not possible to simply re-run the code I wrote in 2016; since then, the Analytics MediaWiki replica databases have been split across multiple hosts, so it's no longer possible for an analyst to create a new table in the staging database (as I created a new table of active users) and join it with the user_properties table of each wiki.
However, it seems like a lot of people are interested; in addition to all the comments here, @Jdlrobson recently asked me for this to help see the impact of T223824. So, I will raise this with my manager @kzimmerman and see if we can make time for this.
Since I'm thinking about it, let me ask some questions I was wondering about:

  • As I understand it, the only valid skin preferences are currently vector, monobook, modern, cologneblue, minerva, and timeless. Any other preferences should be normalized to the default (currently vector). Is that right?

Technically no (although the difference is maybe too small to care). Certain numeric values will get converted to cologneblue or monobook that are artifacts ffrom before we used string names (i dont remember the conversion, you would have to look it up in the php)

  • Now that we have global preferences, is it necessary to take those into account in some way? Are global preferences replicated in individual wiki databases?

Yes. When i was running the stats i did i checked the global prefs to see if it affected much. It didnt seem to at the time and it significantly would have complicated things (esp. With split analytics db) so i decided to ignore at the time. Global prefs are stored in the centralauth db. They are not replicated to individual wikis (although users can override on a per wiki basis i think)

In T180860#4147344, @Bawolff ran some queries...

Dumb question, but do we have any idea what's with all the 0 edit users?

Before GlobalPreferences, lots of power users with Opinions™ on their skin had a script to go to every wiki and set their skin to whatever.
(For Vector, I have no idea.)

I dont remember when global prefs happened but i think my queries were post global prefs. That query was counting people who dont set a skin as vector.

kzimmerman raised the priority of this task from Low to Normal.
kzimmerman moved this task from Triage to Next Up on the Product-Analytics board.
kzimmerman removed a project: Contributors-Analysis.

It sounds like @Jdlrobson just asked @MNeisler a similar question, so looping them into this ticket.

I'm seeing questions about both editors (with varying degrees of activity) and non-editors/readers, and some questions about use over time. I'm guessing that the "over time" part would be a lot more involved, so maybe a comparison of current data to previous queries would help with product decisions?

Planned for end of Q2; please update ticket if more urgent needs relating to this arise.