Page MenuHomePhabricator

[L] Create tool to manually test image recommendations POC results
Closed, ResolvedPublic

Description

The goal of this task is to create a tool to manually test the image recommendations POC API, once T260832 is complete. There will be a follow up task to do the manual testing itself.

T273527 is the task to publish the API spec, so once that is complete, work on this can start - though it can't be finished until T260832 is complete.

  • The tool will evaluate results on Arabic, Cebuano, English, Vietnamese, Bengali and Czech wikis
  • The tool will allow the user to choose which wiki/language they want to evaluate
  • The tool will evaluate 500 unillustrated articles from each wiki
  • The tool will run the API to get 500 random unillustrated articles from each wiki and all of their image recommendations
  • The tool will ensure that the 500 unillustrated articles provide a (close to) equal number of results from the Image Recommendations Algorithm and from MediaSearch
  • The tool will display and evaluate the output (both a preview of the article text and the image), similar to https://media-search-signal-test.toolforge.org/
  • The tool will allow testers to manually decide whether the match is good, okay, or bad for each result for each unillustrated article. The tool will also allow users to say that they are unsure if the match is good.
  • The tool will allow testers to manually decide whether the recommended image is explicit/NSFW (okay/explicit/unsure)
  • The tool will output the results into a spreadsheet, showing how many good, okay, and bad matches were produced for each article, whether the annotator was "unsure", and what the source of each of those matches was
    • Spreadsheet columns will be: wiki; article name; image; match strength (good/okay/bad/unsure); source (Wikidata, interlinks, Commons category, or MediaSearch); explicit
  • The tool will log the API response time to evaluate performance.

Event Timeline

CBogen renamed this task from Create tool to manually test image recommendations POC results to [L] Create tool to manually test image recommendations POC results.Jan 27 2021, 5:57 PM

The API POC documentation has been published here: https://image-suggestion-api.toolforge.org/?doc#/

@Cparle does that unblock being able to start this task?

Yes we can begin it, though as the description says we can't complete it until the PoC is ready

The tool will run the API to get 500 random unillustrated articles from each wiki and their image recommendations

@CBogen Is that 1 image recommendation per article, or however many the API returns by default?
(asking because it'll have a significant impact on the amount of images that will need to be evaluated)

The tool will evaluate 500 unillustrated articles from each wiki

@Miriam I suspect you already have such list of unillustrated articles from those wikis - can you tell me where I can find that?

The tool will run the API to get 500 random unillustrated articles from each wiki and their image recommendations

@CBogen Is that 1 image recommendation per article, or however many the API returns by default?
(asking because it'll have a significant impact on the amount of images that will need to be evaluated)

Good point. Yes, I think we should return all 10 images per article that the API will return by default, so that we can evaluate which match types are best. I suppose we could try one random match per article, but I worry we wouldn't get enough data that way. It does mean a significantly larger of images to evaluate, so I'll see who I can get on board to help out with the manual testing.

The tool will evaluate 500 unillustrated articles from each wiki

@Miriam I suspect you already have such list of unillustrated articles from those wikis - can you tell me where I can find that?

@matthiasmullie Sure. Do you want the list of unillustrated articles for which the ImageMatching algorithm found an image candidate, or just the full list of unillustrated articles for each wiki?

If you are looking for the former -> please find it on stat1005:/user/mirrys/edit/ImageRecommendation/V3.1/Output/
If you are looking for the latter -> I probably have to re-run the algorithm as I am not currently saving the list of all unillustrated articles

I hope this helps!

Do you want the list of unillustrated articles for which the ImageMatching algorithm found an image candidate, or just the full list of unillustrated articles for each wiki?

Thanks Miriam! We need the latter (the full list of unillustrated articles). This brings up an important question - is the API only going to work with unillustrated articles for which the ImageMatching algorithm found an image candidate? We definitely want to be able to expand that to include articlese with matching candidates from MediaSearch and not just the algorithm. cc @sdkim

Got it, @CBogen!

@matthiasmullie, please find the full lists of unillustrated articles in this folder: stat1005:/home/mirrys/ImageRecommendation/V3.1/Output/full_unillustrated

  • These are the unillustrated articles as of December. I will re-run this shortly to update these lists with the February data.
CBogen updated the task description. (Show Details)

@Cparle is it possible to add Bengali and Czech wikipedias to the tool, and if so, how much extra effort would that be? Growth has ambassadors in both languages who are willing to help, and the more evaluation the better.

Also, I realize in the acceptance criteria that I wasn't specific that each language should be evaluated separately so that each ambassador/evaluator can focus on their own language. Is it already built that way or would that require changes?

@Cparle FYI I updated a couple of acceptance criteria based on discussions with @MMiller_WMF and @Miriam:

  • We're now adding Bengali and Czech to the list, and also making sure that the user can choose which language/wiki they want to evaluate
  • We're changing the evaluation options from "strong, okay, weak" to "good, okay, bad"
  • We're adding an "unsure" checkbox so that annotators can note if they weren't confident about their selection

Please let me know if any of this presents any issues. Thanks!

Can't do this until either we get the /suggestions/ endpoint or randomization is implemented, so moving back into blocked

Actually, this is probably possible by picking random numbers and trawling through ALL the results. Will take ages to run, but that's just the initial setup, so doesn't matter that much. Moving out of blocked, will try that approach on Mon

Here's the link to the tool: https://image-recommendation-test.toolforge.org/

@Cparle as much as I enjoy the "meh" and "dunno" and "dodgy" colloquialisms, can you change them to the wording in the acceptance criteria so that our ambassadors who aren't as familiar with english don't struggle with their meaning? Thanks!