Page MenuHomePhabricator

[M] Document technical architecture for section-level image suggestions onwiki
Closed, ResolvedPublic

Description

Now that the technical architecture for the section-level image suggestions pipeline is completed, let's document it onwiki at https://www.mediawiki.org/wiki/Structured_Data_Across_Wikimedia/Section-level_Image_Suggestions/Data_Pipeline

Documentation can be similar to what we did for the image suggestions pipeline here: https://www.mediawiki.org/wiki/Structured_Data_Across_Wikimedia/Image_Suggestions/Data_Pipeline

Architecture diagram is here https://miro.com/app/board/uXjVPXaPDdQ=/

Event Timeline

CBogen renamed this task from Document technical architecture for section-level image suggestions onwiki to [M] Document technical architecture for section-level image suggestions onwiki.Jan 11 2023, 5:30 PM
CBogen updated the task description. (Show Details)

Blocked until we have a full picture of the system

Noting here that it's a Q4 22-23 team KR that specific pipeline architecture decisions, image and section filtering, image suggestion generation logic, data quality (SLISE results) and bias is documented on wiki.

For the part about bias, I'm copying this text from Isaac and Miriam to include:

The section-level image suggestion algorithm works as follows. First, a ML algorithm matches an unillustrated section with similar sections in other languages. It then suggests images from the matching sections as potential candidates for the unillustrated section. Bias can occur in multiple places.
First, the set of articles for which the algorithm will find recommendations will likely have the known selection biases on Wikipedia: the majority of articles is about men, and english-speaking countries (see Metrics that Matter - Research/Content). The mitigation and measurement of this bias should happen at the “selection” stage, i.e., when the system/product decides what recommendations to surface. Topic filters and edit tags can be helpful for this purpose, see: RecSys + Content Equity.
A complement to bias is harm. In this case, because the model is surfacing existing content on Wikipedia, the likelihood of harm is quite low because the recommendation space is quite constrained and already patrolled by the community.
(Note from Carly: incorporating the category-based image suggestion notifications into events with WELx is a good example of doing this.) However, because notifications are based on a user's watchlist, we are propagating (but not worsening) existing bias in users watchlists. This is similar with the Growth newcomer tool, where users are choosing which topics they are interested in.
The second question is whether, aside from the known selection biases of Wikipedia, we are amplifying the bias towards the topics that are generally more illustrated on Wikipedia. Our recent research on visual knowledge gaps, shows that most of the bias happens at the “selection” stage, while the quality of articles, and the proportion of illustrated articles, is similar across different genders. Therefore, we can reasonably say that the algorithm will replicate but not amplify the existing gender gap. An initial exploration of the % of illustrated articles by country also shows not significant trends of western countries being on average more illustrated than the rest.
Another aspect is that we are assuming that images that are acceptable for one language community are also culturally appropriate for other projects. While this is a big assumption, the evaluation of the image suggestion algorithm, which uses a similar cross-language logic, showed that only 1% of wrong image-article matches were marked as “offensive”. (NOTE: Update with SLIS round 2 evaluation results).
ML Algorithm and Language Bias: The section alignment at the heart of the image recommendation algorithm is a Multilingual Language Model. These models are known to work better on languages with large presence on the Internet (and Wikipedia size), while for under-resourced languages the model might be less precise. For example, the precision of the section alignment algorithm for the english-spanish pair is 95%, while english-japanese is 83%. Nevertheless, section alignments are available for all languages on Wikipedia, therefore expanding to new languages should be feasible. However, based on the algorithm evaluation, we can be more confident about section and image recommendations coming from larger languages.

mfossati changed the task status from Open to In Progress.May 12 2023, 3:20 PM
mfossati added subscribers: matthiasmullie, Sannita.

First complete revision done, pinging @Sannita for general proof reading.
Also pinging @matthiasmullie to double-check the confidence scores section.

I made a couple of minor changes (diff) but overall looks great to me, thank you! Feel free to close this task.

Awesome, closing.