In order to experiment with integrating Image Matching Algorithm data in the commons search index, we need to create a new index on relforge to experiment with
For an example of copying an index from production to relforge [[ https://phabricator.wikimedia.org/P16419 | see here ]]
For an example of augmenting a wiki dump with extra data and writing the whole lot to elastic [[ https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/relevanceForge/+/refs/heads/master/other_tools/augmentdump.py | see here ]]
The new data that we want in the dump is three new sets of property-value pairs in the weighted_tags field:
* `wikidata.via.P18` will store wikidata item ids from which the image is linked via the P18 (image) property
** e.g. if the value of the P18 (image) property for wikidata items Q144 and Q38280 is set to **Image_X**
** then for **Image_X** we'll set the fields `wikidata.via.P18/Q144|1` and `wikidata.via.P18/Q38280`
* `wikidata.via.P373` will store ids for any wikidata item that is linked via P373 (commons category) to any commons category that the image belongs to
** e.g. if wikidata item Q144 has its property P373 (commons category) set to `Dogs`
** AND **Image_X** is in the commons category `Dogs`
** then for **Image_X** we'll set the field `wikidata.via.P373/Q144|1`
* `wikidata.via.article` will store the wikidata items of any wiki article the image is used in
** e.g. if **Image_X** is used on `https://ga.wikipedia.org/Page_Y`
** AND `https://ga.wikipedia.org/Page_Y` has a corresponding wikidata id `Q12345`
** then for **Image_X** we'll set the field `wikidata.via.article/Q12345|1`
The extra search data should **not** be added to any image that is excluded by the current Image Suggestions Algorithm, namely:
* images in any of the "placeholder images" categories (or their subcategories) on commons
* images that are already used on a large number of pages on any wiki (as they are likely to be placeholders)
* images whose titles contain strings that indicate they are likely to be placeholders
For more exact definitions of the above see [[ https://github.com/mirrys/ImageMatching/blob/main/algorithm.ipynb | the Image Suggestions Algorithm code ]]