Page MenuHomePhabricator

[L] Make refinements to and incorporate P18 based section-level image suggestions
Closed, ResolvedPublic

Description

Based on the first round of section-level image suggestions evaluation results, we will be moving forward with the section alignment and intersection based suggestions, which all scored very well. We will not be moving forward with depicts-based suggestions.

We decided to temporarily pause on section topics/P18 (Wikidata image) based suggestions until we can do more work to refine and evaluate those results -- while those till remain in the pipeline data, we can remove them on the client side. However, the second round of evaluation showed that with updates, section topics/P18 results were much more promising.

This ticket is to do more work to refine and evaluate the P18/section topics based results so that we can hopefully adopt them in the next version of the Growth tool and notifications.

Acceptance Criteria

Details

TitleReferenceAuthorSource BranchDest Branch
Section topics suggestions, round 2repos/structured-data/image-suggestions!28mlitnT330773main
Customize query in GitLab

Event Timeline

CBogen renamed this task from Second round of manual evaluation for section-level image suggestions focused on P18 to Another round of manual evaluation for section-level image suggestions focused on P18.Feb 28 2023, 6:21 PM

I think that we should tackle T330516: Sections with images still appearing in the section-level image suggestions pipeline, T323505#8656938, and T330841: [L] Exclude tables and lists from section alignment-based image suggestions before calling the evaluation round.
Update: the section alignment ticket is actually not needed for p18 data.

As I understand things, simplified:

We’ve found that those suggestions often aren’t good enough.
Unsurprising, because we’re only checking for the image to have a relationship to the topic; unless the topic is highly specific to only a certain subject, it may not have anything to do with the page’s subject (as is the case for above example)

IMO, we should check that an image has a relationship with both the entity associated to the topics, AND the entity associated to the page.
If we’d do that, I theorize that our suggestions would be an awful lot more relevant.

Doing so, however, drastically reduces the amount of suggestions.
I did a quick test, and the amount of available suggestions (based on section topics) would drop from 173961759 (~174M) to 206245 (~206K) - and that is even before filtering out sections with lists, and images already used on page; actual numbers would be even lower.
While significantly lower, it still leaves us a decent amount of (theoretically likely good, I think) suggestions.

From there, I think we could take things further, and reconsider sources other than only Wikidata P18 (which suffered the same issues; here’s a quote from @mfossati about depicts statements: “the intuition is that a given image indeed depicts a section topic, but is unrelated to the actual Wikipedia content where that section topics originates”)

There are 2 benefits to including more sources:

  • Bringing in more data is (likely) to yield to the subset of suggestions matching both criteria (matching both page & topic) growing exponentially
  • Bringing in more data and cross-referencing it helps us be more comfortable about some suggestions (i.e. a suggestions that matches in multiple sources is more likely to be relevant than one that matches just 1)

Luckily, we already have some of these things from page-level image suggestions, where we have 4 groups:

  • suggestions for an entity based on Wikidata P18 (Commons image)
  • suggestions for an entity based on Wikidata P373 (in Commons category)
  • suggestions for an entity based on lead image data
  • suggestions for an entity based on SDC/depicts

I figured I’d give those a shot: generate a bunch of suggestions and evaluate how relevant they subjectively feel for me.

But first, a baseline, for the implementations we already have:

  • Suggestions based on section alignment
    • Accuracy: 66-88%
  • Suggestions based on topics matching P18 only (which we’ve already discarded)
    • Accuracy: 8-12%

What do these numbers mean?
The lower number is good suggestions, the higher number is “ok, but not great” (e.g. relevant to the article’s contents, but belongs in another section)

Note that these ranges should be taken with a massive grain of salt: I only evaluated 50 suggestions each, all on enwiki. And I’m not an expert in 99% of these pages, so I may have misjudged some.
Some filters (e.g. duplicates, sections with tables) had not yet been applied, which may also skew findings slightly (i.e. it is possible that, after filtering out images that are already on page, accuracy is slightly lower)
Still, given that samples are random and have been evaluated consistently, they should be safe to compare.

On to the meat of this comment - here are my findings after sampling/evaluating 50 images for all combinations of image suggestions from other sources, provided that they match both the topic & the page entities:

  • Suggestions where topic entity matches P18, page entity matches P18
    • Accuracy: 73-92%
  • Suggestions where topic entity matches P18; page entity matches P373
    • Accuracy: 72-85%
  • Suggestions where topic entity matches P18; page entity matches lead image
    • Accuracy: 70-84%
  • Suggestions where topic entity matches P373; page entity matches P18
    • Accuracy: 86-98%
  • Suggestions where topic entity matches P373; page entity matches P373
    • Accuracy: 58-76%
  • Suggestions where topic entity matches P373; page entity matches lead image
    • Accuracy: 80-92%
  • Suggestions where topic entity matches lead image; page entity matches P18
    • Accuracy: 72-92%
  • Suggestions where topic entity matches lead image; page entity matches P373
    • Accuracy: 70-82%
  • Suggestions where topic entity matches lead image; page entity matches lead image
    • Accuracy: 66-80%

Judging from the numbers above, suggestions based on section topics appear similarly relevant to suggestions based on section alignment, provided we cross-reference those images with both the topic AND the page.
This seems to be true for all sources tested.
See https://phabricator.wikimedia.org/P45919 for the full list of samples.

Sidenote: because the application of topic & page entities are a little different, I figured I’d also check whether there are substantial differences between sources and type of entity we’ll validate them for.
Broken down in a matrix, here’s the (subjective) accuracy of each source, for each type of entity:

		topic		page
image		72-87%		77-94%
category	74-88%		66-81%
lead		69-84%		73-86%

P18 seems to have a slight edge overall, but the others aren’t far behind.

Note: I was unable to test SDC/depicts statements, but I’m fairly confident scores would be similarly good from that source.

I haven’t yet run a full count on how many suggestions we’d be left with if we were to use all of these sources, but it would definitely bring us back to many millions of section topics based image suggestions.
And judging from my samples evaluation, unlike the current “topic entity = P18” implementation, those are millions of relevant suggestions.

Quick recap:

  • current section topics-based suggestions are no good; we already knew that and planned to take that out
  • the reason they’re no good is probably because we don’t also cross-reference topics-based suggestions with the subject of the page
  • doing so appears to make suggestions relevant; looks to be of similar quality as alignment-based suggestions
  • ...but leaves us with significantly fewer suggestions
  • it looks like suggestions from additional sources (P373, lead image, probably also depicts) also remain relevant provided we cross-reference both topic & page
  • ...which significantly increases the amount of suggestions again

In terms of work, that would mean:

  1. also cross-referencing images with page entity (in addition to topic entity)
  2. repurpose the work from page image suggestions for these other sources; some refactoring will be needed
  3. investigate/fix the depicts data skew issues (or drop that one as a source, for now)
  4. another round of manual evaluation once the actual work is done to confirm/deny my preliminary findings

1 & 2 aren’t that much work. I’d estimate about an L combined.
3 is unknown & has the potential to be big. But we could decide to skip SDC/depicts for now (although this may eventually require some looking into for page suggestions anyway)

IMO, these initial findings seem very promising, and likely wouldn’t add too much additional workload, and I would recommend we proceed to implement the suggested changes & evaluate them properly (in less quick-and-dirty fashion than above)

matthiasmullie renamed this task from Another round of manual evaluation for section-level image suggestions focused on P18 to Another round of manual evaluation for section-level image suggestions.Mar 23 2023, 1:30 PM
matthiasmullie updated the task description. (Show Details)
CBogen renamed this task from Another round of manual evaluation for section-level image suggestions to [L] Another round of manual evaluation for section-level image suggestions.Mar 23 2023, 2:01 PM
CBogen assigned this task to matthiasmullie.
CBogen updated the task description. (Show Details)

In terms of work, that would mean:

  1. also cross-referencing images with page entity (in addition to topic entity)
  2. repurpose the work from page image suggestions for these other sources; some refactoring will be needed
  3. investigate/fix the depicts data skew issues (or drop that one as a source, for now)
  4. another round of manual evaluation once the actual work is done to confirm/deny my preliminary findings

1 & 2 aren’t that much work. I’d estimate about an L combined.
3 is unknown & has the potential to be big. But we could decide to skip SDC/depicts for now (although this may eventually require some looking into for page suggestions anyway)

Based on discussion in Slack, we will move forward with 1 & 2 and we will skip 3 for now. 4 will be covered in T330784.

CBogen renamed this task from [L] Another round of manual evaluation for section-level image suggestions to [L] Make refinements to and incorporate P18 based section-level image suggestions.Mar 23 2023, 2:04 PM