Page MenuHomePhabricator

Contribution inequality graphs for Wikistats
Open, LowPublic

Description

After seeing the presentation for WikiChron, T192652, I'd be excited to see inequality statistics for Wikimedia projects. For example,

  • Gini coefficient for edits
  • 10:90 ratio

There is some academic precedent for using these measures to understand social dynamics on-wiki,

Event Timeline

awight created this task.May 19 2018, 8:41 AM
Restricted Application added projects: Analytics, Product-Analytics. · View Herald TranscriptMay 19 2018, 8:41 AM
awight added a subscriber: Halfak.May 19 2018, 8:41 AM
Vvjjkkii renamed this task from Contribution inequality graphs for Wikistats to qpcaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
CommunityTechBot renamed this task from qpcaaaaaaa to Contribution inequality graphs for Wikistats.Jul 1 2018, 6:46 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
Milimetric triaged this task as Low priority.Mar 9 2020, 4:26 PM
Milimetric moved this task from Backlog (Later) to Wikistats on the Analytics board.
Nemo_bis added a subscriber: Nemo_bis.EditedMar 10 2020, 10:12 AM

Gini coefficient for edits

I've never tried anything very serious, but in my experience that's needlessly expensive to compute, while the Theil index is very easy. We've been using it for a while on https://tools.wmflabs.org/erwin85/xcontribs.php and also on https://tools.wmflabs.org/pageviews/ to decide whether to use log scale.

Gini coefficient for edits

I've never tried anything very serious, but in my experience that's needlessly expensive to compute, while the Theil index is very easy. We've been using it for a while on https://tools.wmflabs.org/erwin85/xcontribs.php and also on https://tools.wmflabs.org/pageviews/ to decide whether to use log scale.

Great to learn about this, thank you for mentioning. @Jan_Dittrich was also doing some work along these lines, and might be interested in the Theil index.

You sure is that expensive?
In WikiChron we don't have big resources and we are able to compute the Gini coefficient on very large wikis. It just a matter of getting the edits per each user and do some calculations with them. (You can see the actual code here)

I wrote down a small library in python which includes these two inequality metrics: https://github.com/Grasia/inequality_coefficients/ However, It doesn't make use of Apache Spark / Haddop or any sort of MapReduce / Big data framework.

Quasipodo added a comment.EditedMar 27 2020, 7:23 AM

Hello there!

I'm thinking the possibility to do the GSoC with WMF this year, would you see implementing this (and maybe add some other metrics) as a project idea?

+1, I can mentor you unless someone else was already planning on doing it.

According to the WMF GSoC coordinators is best if there are two mentors instead of one, so feel free anyone else to jump in :)

@Quasipodo If you plan on submitting a proposal for this task, please note that the deadline is 31st March 6 pm UTC, in ~28 hours. :)

@Quasipodo I am happy to jump in as a 2nd mentor. though I am not in analytics but in the research team, I hope I can be of help in this project.

I calculated the hoover index for some contribution metrics recently and it was rather easy to do using a table with the columns edit count | count of accounts with this edit count which has somewhat between 100 and 1000 rows. Hoover also seems to be rather easy to understand, working without area-under-the-curve and varies between 0 (all equal) and 1 (maximum unequal).

I am not sure if hoover has particular disadvantages like changing scores with higher data resolution or being dependent on units of measurement or something like this – so far I could not find a summary of caveats.

Thanks – that’s great! I created the calculation in JavaScript but did not have an R version yet.