Page MenuHomePhabricator

ML Equity: Data Gaps
Closed, ResolvedPublic

Description

Document data gaps that exist across Wikimedia projects and their relationship to modeling and equity: https://meta.wikimedia.org/wiki/User:Isaac_(WMF)/Content_tagging/Data_gaps

Related Objects

StatusSubtypeAssignedTask
OpenIsaac
ResolvedIsaac

Event Timeline

Weekly updates:

Weekly updates:

  • Been getting caught up on DS' language-agnostic vandalism project as a potential use-case to do focus on in this data gaps work

Weekly updates;

  • Preparing to discuss ML Equity strategy more broadly with some folks -- I'll be joining ML Team's model cards meeting next week to get updated on their status and reached out to KR on Policy to discuss their interest in the space per rec from JG.
  • Began coalescing my thoughts on the different approaches the org could take in this space. Had done a lot of this when preparing for this project but trying to better contextualize for a broader audience now.
  • Begun to sketch out talk for NARA workshop: https://smithsonian.github.io/AIandPublicArchives2022/

Weekly updates:

  • Put together slides for and presented at NARA workshop (slides). Interesting group of folks from libraries and cultural collections working to put together an Institutional AI Ethics Statement. Largely aligned with much of their thinking though they are often working in very different contexts -- e.g., mostly not doing in-house ML work like us. A good reminder to us to consider this aspect as well as we might use more third-party models ourselves -- e.g., pre-trained language models or machine translation APIs -- that we'd want to have clarity around our expectations for.

Weekly updates:

  • Did interview with CMU researchers focused on understanding how to help the Wikimedia communities to better evaluate ML on the platforms. Some takeaways from that to pass back to ML Platform (desire for dashboards on current model performance / when it was last trained) and I'll continue to watch the project as it will hopefully generate some good ideas around more community involvement with ML modeling: https://meta.wikimedia.org/wiki/Research:Community-centered_Evaluation_of_AI_Models_on_Wikipedia
  • Need to reach back out to KR in Legal to try to set up a meeting to discuss policy + ML.
  • Continue to join weekly model cards meeting to help move along that part to bring transparency to the details of the existing models

Weekly updates:

  • Some interest in working on edit summaries as part of the edittypes work might be a good place to showcase the power of thinking of data as a first-class citizen in product designs

Weekly updates:

  • Added details about sources to the data gaps doc as another important area where our lack of good structured data inhibits important modeling work
  • Finished revamping planning doc for Ethical ML and now can share with ML Platform folks to get their feedback

Closing this task. Data Gaps extensively on meta and promising plans with Editing team to address data gap related to edit summaries / intentions. I'll continue to push strongly for work on citations data gap as I think that's probably the most pressing one.