Page MenuHomePhabricator

[EPIC] Experiment with incorporating Image Matching Algorithm data into commons search index
Closed, ResolvedPublic

Description

The PET would like to have one data source for the image-suggestion-api, rather than the two we have at the minute. Our options are to incorporate the search signal data into the output from research's data pipeline, or to incorporate research's data into the search indices on commons

This is a proposal for an experiment to investigate if doing the latter will give us acceptable results for image-recommendations

ATM the Image Matching Algorithm returns as suggested images for an article

  1. the image for the wikidata item corresponding to the article
  2. images from the commons category associated with the wikidata item corresponding to the article
  3. lead images from articles in other wikis that correspond to the same wikidata item

I propose that we inject the above three pieces of data as property-value pairs into the weighted tags field:

  • image.linked.from.wikidata.P18 can store any wikidata item for which this image is the primary image
  • image.linked.from.wikidata.P373 can store the wikidata item for any commons category that the image belongs to
  • image.linked.from.wikidata.sitelink can store wikidata item ids for any wiki page this image is included on

So for example, this image of the mona lisa will gain the following elements in its weighted_tags array

image.linked.from.wikidata.P18/Q12418
image.linked.from.wikidata.P18/Q13369744
image.linked.from.wikidata.sitelink/Q12418

The experiment:

  1. Create a new search index (on relforge) based on a commons dump, and enhanced with the data described above T286562
  2. Incorporate searching for the new data Search/ASTQueryBuiler/WordsQueryNodeHandler.php or its successor, using a new search profile T286563
  3. Tune the new search profile based on a training dataset extracted from our labeled data
  4. Calculate a balanced accuracy score for the labeled data we already have from the existing image-recommendations API via image-recommendation-test
  5. Run equivalent searches on the new search index, and calculate the balanced accuracy score for that
  6. Iterate on tuning if necessary

Expected outcome:

If the balanced accuracy score using the new search index is better or at least no worse than from the existing image-recommendations API, then we can make a case for injecting the new data into the production commons indices (probably using events), and using the new search profile in production

Event Timeline

Cparle updated the task description. (Show Details)
Cparle renamed this task from Proposal to incorporate Image Matching Algorithm data into commons search index to Experiment with incorporating Image Matching Algorithm data into commons search index.Jun 11 2021, 10:58 AM
Cparle updated the task description. (Show Details)
CBogen changed the subtype of this task from "Spike" to "Task".Jun 30 2021, 4:41 PM
CBogen renamed this task from Experiment with incorporating Image Matching Algorithm data into commons search index to [EPIC] Experiment with incorporating Image Matching Algorithm data into commons search index.Jul 14 2021, 4:08 PM
CBogen added a project: Epic.

@dcausse and @EBernhardson: as discussed at our meeting earlier today, please let us know when you've been able to investigate whether there's enough space for this bigger index. Please post the results in this ticket. Thanks!

@CBogen the "experiment" part of this is done, and we've moved onto production-izing it. Should we close this epic? Do we need another one to cover the production-ization?

@CBogen the "experiment" part of this is done, and we've moved onto production-izing it. Should we close this epic? Do we need another one to cover the production-ization?

@Cparle I think a new epic about productionizing this work would be great, do you mind creating one? Then we can move the production-related tickets over from this epic to that one and close this out.

Cparle claimed this task.

Epic to productionize this work T299781

Experimental stage is complete, so closing this epic