While discussions of the gender gap on Wikipedia often focus on the number of articles about people of different gender identities, another core consideration regarding bias is the distribution of links on Wikipedia -- i.e. it's not just important for content to be available about women, it should also be discoverable. Research has demonstrated that disparities do exist: Wagner et al. showed "...that there exists a bias in the generation of links by Wikipedia editors, favoring articles about men." Adding appropriate links within Wikipedia to articles about women and individuals with non-binary identities then is an important part of addressing some of these systemic biases on Wikipedia (example).
One potential approach to better surfacing these link gaps is via better surfacing of data about the gendered nature of links on Wikipedia. To this end, I have already created a simple API that takes any article on Wikipedia and provides data on the gender distribution of the links in that article -- e.g., https://article-gender-data.wmcloud.org/api/v1/details?lang=en&title=Computer_programming. The API works for any Wikipedia article in any language but lacks a nice interface for showing the data.
The goal of this project would be to build a simple user script that visualizes the data for Wikipedia articles as a user browses. There would be a few potential components to this:
- A button for turning on / off the script. The latency of the API isn't particularly high but given that gathering the data requires several API calls, it'd be nice to only make the request when necessary.
- A simple summary of the distribution of links -- e.g., see mock-up below that uses the same colors used by the Humaniki dashboard.
- Link-specific shading so that it's also possible to see if there are differences in the prominence of links to individuals of different gender identities -- e.g., see mock-up below.
Probably easiest via phabricator but you can also find me on IRC at #wikimedia-research as isaacj. I'm based on the east coast of the United States (UTC-4) so depending on the time of day may be faster/slower to respond. If there's a better medium for communication, feel free to ask.
Summary (grey/orange/blue bar on top of article; potentially could also label the colors or have labels pop up when hovering or exclude data on links to non-people):
Link highlighting (orange/blue highlights on links to people):