Page MenuHomePhabricator

Please review: public data sets for the WDCM Biases Dashboard
Closed, ResolvedPublic

Description

Please review the following data sets and files that need to be made public (i.e. placed in https://analytics.wikimedia.org/datasets/wdcm/) for the WDCM Biases Dashboard:

  • genderProjectDataSet.csv
  • globalIndicators.csv
  • mfPropProject.csv
  • genderUsage_Distribution.png
  • genderUsage_Distribution_jitterp.png
  • M_Items_Distribution.png
  • F_Items_Distribution.png
  • Gender_LorenzCurves.png

All png files are produced from aggregated data only. The three csv tables do not contain any private data: they all encompass only Wikidata usage data and are produced by combining the data from the WDQS with the data obtained by querying the WDCM Hive tables; thus nothing here refers to any particular users.

All files are currently stored on stat1005 in /home/goransm/RScripts/_pdReview/WDCM_Biases/

The README.txt is also present in this directory.

Thank you.

Event Timeline

+1 from me, @GoranSMilovanovic, this doesn't involve any user or aggregate user data, it's just content statistics. It's always safe to publish data that's 100% based on publicly-available information, and is usually a service to the community. In this case, some of these findings are very interesting!