Page MenuHomePhabricator

WDCM: Wikidata Usage Biases
Closed, ResolvedPublic

Description

  • Develop statistical indexes to describe biases (gender divide, north-south divide, and similar) in Wikidata usage.
  • Serve as WDCM Biases dashboard.

Event Timeline

GoranSMilovanovic renamed this task from Њ to WDCM: Wikidata Usage Biases.
GoranSMilovanovic claimed this task.
GoranSMilovanovic triaged this task as Medium priority.
GoranSMilovanovic updated the task description. (Show Details)
GoranSMilovanovic added a subscriber: Lydia_Pintscher.
  • ETL completed;
  • Dashboard back-end engineering (almost) completed.

(Finally - this really took a lot to complete) the Dashboard is ready to go public.

Please review carefully: http://wdcm.wmflabs.org/WDCM_BiasesDashboard/ and let me know of anything that you would like changed.

For me, the results on Gender Bias in Wikidata item usage were so stunning that I have definitely decided to write a blog post on this. Of course, it has nothing to do with Wikidata in itself: it's simply a representation of the shared knowledge, assumptions, and sentiments in the editor community, and (given the size of the community) it thus truly reflects the state of the problem in the world in general. When I've started working on this, I knew I will obtain a huge gender bias; but the results that I am seeing are way beyond my expectations. Example: please check-out the Gender Bias per Occupation tab, and scroll down to Occupations were Female items are mentioned more than Male items table.

Please, @Lydia_Pintscher , @Lea_Lacroix_WMDE: your feedback on this Dashboard is very important. No rush, but I really need to hear from you at some point. Thank you.

In the meantime, I am waiting for the resolution of T189653 to be able to put the data sets public and feed the dashboard that is running on an Labs instance from Production; then I will sync a monthly update for it.

@GoranSMilovanovic Looks great! Thanks for adding other genders :)

@Lydia_Pintscher
@Lea_Lacroix_WMDE

The first version of this Dashboard is finally deployed: http://wdcm.wmflabs.org/WDCM_BiasesDashboard/, and can be announced.

The Dashboard is now running on regular monthly updates. For many reasons, this was really tricky.

Please review and let me know what you think (I am not completely satisfied with some of the visualizations, but at this point I am calling it: good enough).

Thank you for your patience.

P.S. @Lea_Lacroix_WMDE Since now I have many interesting empirical findings on gender usage from Wikidata, the currently missing sections of the WDCM Journal will be gradually populated by them. I think that should cover most of the March 2018.

  • Dashboard Update broken per T180891 - investigating now.
  • False alarm: the matter was related to another WDCM dashboard (WDCM Geo), not WDCM Biases.
  • Dashboard update broken again (this time for real).

Hello

I just discovered this Dashboard and thought it very interesting.
I particularly love the "Occupations were Female items are mentioned more than Male items" which is perfect for my own needs as active in Wiki Loves Women project and Les sans pagEs project.

This said, two things

  1. obviously, this tool overlaps with Dicare Tools (https://denelezh.dicare.org/gender-gap.php). It brings in new very interesting elements but does not go into fine details on the matter of "occupation". As project leader in Wiki Loves Women, one of my needs is to get fine grain info, and the possibility of sorting by year of birth, project, country, are important features that I need. Dicare Tools shall close end of October, so the invaluable source of info they provide to my team will be lost but not replaced by the WDCM. Which lead me to ask whether you intend a V2 for WDMC (has it been considered ? discussed ? funded ?) to possibly go more into details that would be awesome to us.

Practically speaking, for example, "Occupations were Female items are mentioned more than Male items" is super interesting. but if it had a sorting by nationality of people concerned... it would be invaluable. So wondering what are your plans for the future...

  1. I will suggest that the last tab, with a separation of results "North" and "South"... is... actually not the best choice you could make. I *do* understand that a choice needs to be made and that one is not irrelevant or wrong. It is simply that the issue has been discussed quite a bit within the community and there seems to be a sort of agreement that for our community, geographical definition is not the right thing to use. We want the community to grow, we want the articles base to expand, we want the quality of content to increase, and the way we tackle this is not based on whether we are North or South. But rather depends or "well represented communities" and "poorly represented communities". There are areas in the North which are very badly covered. There are areas in the South which are well covered. It also depends on how you exactly define North and South... Would that be purely geographical ? Or with some twitches to make that more political ? (= including Australia in the North ?)

This North/South divide has been criticized for it is vague and not necessarily a good representation of our current situation. We need to be more proactive in some areas, but this is not South, this is "underrepresented areas".
So when it comes to project leaders who need to communicate about the URGENT and IMPORTANT needs, the North/South approach just does not do. There are other options. Perhaps continents. Perhaps cultural influence areas (as in WIGI). Perhaps countries (as in Dicare Tools).
For I, the approach by countries is the most interesting one as a working tool as it provide the fine grain I need when running a project. But for general communication, an approach based by continent or subcontinental area might be the best one.

Thank you for the WDCM

Hello @Anthere,

Thank you very much for your feedback.

Even if the two projects are addressing the same topic, gender biased-related data, the two projects are quite different. While the WDCM board focuses on the Wikidata usage across the WMF projects (e.g. how many time and where is particular Wikidata item mentioned across all WMF projects), Dicare gender gap tool assesses Wikidata statistics themselves (e.g. how many items in Q5 have gender = female and similar).

We are aware that Dicare Tools is about to be closed, and we are sad about that, because this tool is widely used and very useful for the community, especially the projects working on gender gap. However, this project is maintained by a volunteer, and their decision to shut down the tool is not in our hands. There are plenty of community tools based on Wikidata, and we cannot simply decide to take over one of them. We strongly think that the tools built by and for the community should stay in the community. If resources are a problem, the Foundation provides solutions such as wmflabs.org for hosting, and the IEG grants for funding. We hope that another volunteer would have the time and resources to take over the project and continue improving it. I can offer my help for this matter.

The WDCM Bias Dashboard is a project developed at the beginning of last year with a precise goal (showing the existing biases in the usage of Wikidata's data) that has been achieved, and no further development is planned. We may continue working on different kind of biases next year.

Thanks for your feedback about South/North. This raises a central question: what categories in the real world (e.g. North/South, Male/Female, Global/Continental/Regional/Country level of aggregation, and similar) are really useful to be matched with the statistics that we can obtain through Wikidata/WDCM about the structure of our projects?