ML Equity: Data Gaps
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Isaac
	Aug 26 2022, 9:13 PM

Description

Document data gaps that exist across Wikimedia projects and their relationship to modeling and equity: https://meta.wikimedia.org/wiki/User:Isaac_(WMF)/Content_tagging/Data_gaps

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		Isaac	T293516 Recommender Systems + Content Equity
		Resolved		Isaac	T316411 ML Equity: Data Gaps

Event Timeline

Isaac created this task.Aug 26 2022, 9:13 PM

Isaac moved this task from Backlog to FY2022-23-Research-July-September on the Research board.

Isaac edited projects, added Research (FY2022-23-Research-July-September); removed Research.

No updates

Weekly updates:

Cleaned up intro to meta page so can be shared with less context
Added section on common causes / possible solutions: https://meta.wikimedia.org/wiki/User:Isaac_(WMF)/Content_tagging/Data_gaps#Common_causes_and_potential_fixes

Weekly updates:

Been getting caught up on DS' language-agnostic vandalism project as a potential use-case to do focus on in this data gaps work

Weekly updates:

none

Isaac moved this task from FY2022-23-Research-July-September to FY2022-23-Research-October-December on the Research board.Oct 19 2022, 5:30 PM

Isaac edited projects, added Research (FY2022-23-Research-October-December); removed Research (FY2022-23-Research-July-September).

Weekly updates;

Preparing to discuss ML Equity strategy more broadly with some folks -- I'll be joining ML Team's model cards meeting next week to get updated on their status and reached out to KR on Policy to discuss their interest in the space per rec from JG.
Began coalescing my thoughts on the different approaches the org could take in this space. Had done a lot of this when preparing for this project but trying to better contextualize for a broader audience now.
Begun to sketch out talk for NARA workshop: https://smithsonian.github.io/AIandPublicArchives2022/

Weekly updates:

Put together slides for and presented at NARA workshop (slides). Interesting group of folks from libraries and cultural collections working to put together an Institutional AI Ethics Statement. Largely aligned with much of their thinking though they are often working in very different contexts -- e.g., mostly not doing in-house ML work like us. A good reminder to us to consider this aspect as well as we might use more third-party models ourselves -- e.g., pre-trained language models or machine translation APIs -- that we'd want to have clarity around our expectations for.

Weekly updates:

Did interview with CMU researchers focused on understanding how to help the Wikimedia communities to better evaluate ML on the platforms. Some takeaways from that to pass back to ML Platform (desire for dashboards on current model performance / when it was last trained) and I'll continue to watch the project as it will hopefully generate some good ideas around more community involvement with ML modeling: https://meta.wikimedia.org/wiki/Research:Community-centered_Evaluation_of_AI_Models_on_Wikipedia
Need to reach back out to KR in Legal to try to set up a meeting to discuss policy + ML.
Continue to join weekly model cards meeting to help move along that part to bring transparency to the details of the existing models

Weekly updates:

Some interest in working on edit summaries as part of the edittypes work might be a good place to showcase the power of thinking of data as a first-class citizen in product designs

Weekly updates:

Added details about sources to the data gaps doc as another important area where our lack of good structured data inhibits important modeling work
Finished revamping planning doc for Ethical ML and now can share with ML Platform folks to get their feedback

Closing this task. Data Gaps extensively on meta and promising plans with Editing team to address data gap related to edit summaries / intentions. I'll continue to push strongly for work on citations data gap as I think that's probably the most pressing one.

ML Equity: Data GapsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

ML Equity: Data Gaps
Closed, ResolvedPublic
Actions

Related Objects
Search...