Page MenuHomePhabricator

Improve understanding and description of Wikidata gender labels
Closed, ResolvedPublic

Description

During our presentation at Queering Wikipedia we had confirmation that gender labels on Wikidata taken from P21 ( https://w.wiki/6hma) are not accurately representing the notion of gender.
We are going to learn more about this and potentially make changes to the way we label articles in the knolwedge gap index.

Details

Other Assignee
fkaelin

Event Timeline

This comment was removed by Miriam.

@fkaelin as you attended the session at Queering Wikipedia, is there any specific action items that we want to add here? Otherwise we might close this task for now. Thanks!

This comment was removed by Miriam.

I attended the session "Lectures and workshop on gender in Wikidata" at Queering Wikipedia, and pasted some bullet points for the two talks - the slides/videos are available on the website.

Takeaway

I don't have an immediate recommendation on how to improve the features used for gender gap. Using the P21 wikidata property seems like the best option currently available to us. The presentations illustrated not only the issues with this approach, but also the difficulty of devising&introducing an improved schema. Such an improvement is a large community effort and outside the scope of the knowledge gaps index. However, we should definitely switch the knowledge gaps pipeline to using a more appropriate schema once available.

Presentation summaries

Talk by Daniele Metili

  • center on marginalized gender identifies
    1. modeling of gender in wikidata
    2. how is gender represented in wikidata
  • capture user discussions happening on Wikipedia about gender
  • P21 (gender or sex) is the main way gender is modelled in Wikidata
    • somewhat ambigious (not clear whether it refers to sex assigned at birth or gender identity)
    • confuses users
    • after creation of P21, the question was which values do we use and how do we populate this?
    • two groups: one focused on completeness, one on correctness. Can't have both, but both can be metric
      • completeness won. in 2 years, 95% of people have gender assigned.
      • happened quickly, at what cost? added quickly, contains errors.
      • mistaken assumptions:
        • e.g. each user must have a single one
        • difficult for non-binary/trans
      • how was gender assigned
        • manual annotated
        • imported from wikipedia, not a trusted source
        • also imported from external databases,
        • two problematic approaches
          • based on the name of the person
          • based on pronouns used e.g. in wikipedia articles
    • transgender man, transgender woman
      • was renamed to trans man, trans woman
      • until 2016 was wrongly classified, e.g. trans woman was not classified as woman
    • gender properties
      • father/mother are all gendered
      • some are corrected,e.g. brother/sister as sibling
      • mother / father not replaced with parent by community, though parent was added
    • language
      • users understanding very much related to language
      • e.g. france/germany doesn't have a term for sibling

Recommendations from the personal pronouns project

  • prerecorded talk: https://www.youtube.com/watch?v=WRIl6W8n7io
  • P21 used for
    • humans: living and deceased. fictional and non-fictional
    • animals
    • sex/gender and gender modality are distinct properties in living humans
    • P21 conflates these, plus no references are required
    • when references are used, inconsistent, without consent. might also violate privacy of living people.
  • personal pronouns because it seemed more tractable, still very complex
    • small number of existing statements
    • limited to enwiki to to authors limit of knowledge, hoping it would spread
    • P6553 personal pronoun
      • concern around accuracy, incorrect pronouns are assigned
      • conflation between personal pronouns and gender
      • P21 is used to assign P65553, can be incorrect
      • enwiki article on personal pronouns, lots of controversy (see talk page)
    • Tension between editors focused on honoring identify/feelings vs editors who focus on completeness
      • lack of guidance when to add personal pronouns
      • what requires a reference? reference quality varies widely
      • what to do when a person's personal pronouns change?
    • Issues with data modeling
      • invidual
    • Proposed changes

Thank you @fkaelin for this great summary! I will close this task for now as we have reached a conclusion about this specific topic.