Page MenuHomePhabricator

Please review: public data sets for the WDCM Biases Dashboard
Closed, ResolvedPublic


Please review the following data sets and files that need to be made public (i.e. placed in for the WDCM Biases Dashboard:

  • genderProjectDataSet.csv
  • globalIndicators.csv
  • mfPropProject.csv
  • genderUsage_Distribution.png
  • genderUsage_Distribution_jitterp.png
  • M_Items_Distribution.png
  • F_Items_Distribution.png
  • Gender_LorenzCurves.png

All png files are produced from aggregated data only. The three csv tables do not contain any private data: they all encompass only Wikidata usage data and are produced by combining the data from the WDQS with the data obtained by querying the WDCM Hive tables; thus nothing here refers to any particular users.

All files are currently stored on stat1005 in /home/goransm/RScripts/_pdReview/WDCM_Biases/

The README.txt is also present in this directory.

Thank you.

Event Timeline

+1 from me, @GoranSMilovanovic, this doesn't involve any user or aggregate user data, it's just content statistics. It's always safe to publish data that's 100% based on publicly-available information, and is usually a service to the community. In this case, some of these findings are very interesting!