Page MenuHomePhabricator

Reimport image recommendation data into search index
Closed, DeclinedPublic

Description

In T285817: Add an image: load static file to search index a list of pages which have image recommendations have been imported into the search index (as recommendation.image weighted tags) for use by the GrowthExperiments suggested edits feature. Initial versions of the suggested edits code can't handle infoboxes; our plan was to filter these out via additional search terms. Unfortunately we have found (T291232: Add an image: exclude certain articles ) that the number of infobox templates is too large for that, even taking into account that some infobox templates reuse others as building blocks. So unfortunately we need to rethink our approach.

The script that generated the list of pages can deal with infoboxes (the exact logic is: find wikidata items which are "instance of: Wikimedia infobox template", follow the site links to the templates, exclude pages which transclude any such template). So one possible option we'd like to discuss is regenerating the list and redoing the import. (The existing index entries would have to be deleted first, presumably.)

This is somewhat suboptimal in that we couldn't change the list of infobox templates over time, and in that an article might start or stop using an infobox over time, and this would be a one-time import so it couldn't follow that; but we can deal with that much discrepancy.

The other idea we could come up (preferred, if feasible) with is T292141: Add an image: search keyword for articles which have infoboxes.

Event Timeline

If we still have the old file we sent I believe that we can diff the two versions of the file (adding support for tag removal via the __DELETE_GROUPING__ magic word) and ship the delta (we could also go for a brute-force delete all re-add all I suppose). Having such possibility would allow to iterate again and possibly fix another problem in the future.

@Tgr , do you have a priority for this task? Want to make sure we're not blocking anything on your side.

We are hoping to get the application ready by the beginning of November so if it would be possible to get either this or T292141 done by that time, that would be great. If we go with T292141, we can probably write the code, but would like to get a review.