Wikimedia Foundation's Research team has partnered with the Fair Ranking Track at TREC (a long-standing text retrieval benchmarking conference) for 2021. As a partner, our role is to work with the track organizers to identify and provide a Wikimedia related dataset and a specific question for the participants of the track to work/compete on. (For example, in 2019, the track partnered with Semantic Scholar from the Allen Institute for Artificial Intelligence in designing a competition for fair ranking of scholarly paper abstracts).
NIST interests are in how we measure and audit systems in terms of fairness.
This dataset will focus on English Wikipedia WikiProjects and building lists of relevant articles to a WikiProject that are fairly ranked. For example, for WikiProject Jazz, what articles are relevant and how do you rank them in such a way that fairly represents different gender identities and geographic regions. This initial challenge will focus on English Wikipedia only so that effort can be focused on the chosen fairness aspects as opposed to challenges of working with multilingual data. Future challenges may expand to other languages but English Wikipedia is one of only a few that uses the PageAssessments extension, which greatly simplifies the process of identifying WikiProjects and what articles are tagged as relevant (and any quality / importance ratings).
@Isaac will act as the coordinator and point of contact on WMF's end.