Page MenuHomePhabricator

Some pages' search index docs indicate they have a suggestion when they do not
Closed, ResolvedPublic

Description

Example: https://pt.wikipedia.org/wiki/You_Need_to_Calm_Down

If you look at https://pt.wikipedia.org/wiki/You_Need_to_Calm_Down?action=cirrusDump you can see recommendation.image/exists|1 in weighted_tags. The page already has an image, so it shouldn't have a suggestion

The image actually has no suggestions, and in analytics_platform_eng.image_suggestions_search_index_delta for snapshot=2022-06-27 there is this line, which should have caused the flag to be removed from the search index:

ptwiki	0	6019454	recommendation.image	["__DELETE_GROUPING__"]

I suspect that the _full search index dataset (which doesn't contain deletes) was imported rather than the _delta dataset. We know that the _full dataset was being imported until recently, and we also know that old _delta snapshots contained far more deletes than they should have. Not really clear what we need to do to repair the data, but we probably need to figure out a way

edit: one way we might repair the data is to do a search for all images with suggestions, then use the api to check if they really have suggestions, and delete the hassuggestion flag from the search index if not

Event Timeline

With respect to the 3 target wikis (pt, id, ru), this looks like a minor issue, see report at T314473#8133342.

For the other wikis, I'm expecting a similar impact, which should be proportional to the size of the wiki.

@CBogen - just wanna make sure this is ok in the backlog

@CBogen - just wanna make sure this is ok in the backlog

Yep, we discussed with product at backlog grooming and decided the impact was minimal enough to leave in the backlog for now.

Cparle claimed this task.

Fixed as a consequence of the data cleanup for T320656