Page MenuHomePhabricator

Unable to locate image suggestions for enwiki, and many hasrecommendation:image pages already have images
Closed, ResolvedPublic

Description

while looking for image recommendations in a topic (https://en.wikipedia.org/w/index.php?search=hasrecommendation%3Aimage+articletopic%3A[…]3=1&ns15=1&ns101=1&ns119=1&ns711=1&ns829=1&ns2301=1&ns2303=1) I noticed that several of the results already have images.

Some examples found via hasrecommendation:image or hasrecommendation:image articletopic:food-and-drink on enwiki:
https://en.wikipedia.org/wiki/Not_out
https://en.wikipedia.org/wiki/Araqi_(drink)
https://en.wikipedia.org/wiki/Liebeck_v._McDonald%27s_Restaurants

Event Timeline

kostajh triaged this task as High priority.Jul 4 2022, 11:09 AM
kostajh created this task.

Currently running the cleanup_weighted_tags.py script from CirrusSearch to remove all recommendation.image/exists tags in the live search indices. Once complete will re-import the latest dataset (from 2022-06-20).

Verified all indices cleared with the following. With it all claiming to be cleared out I've re-shipped the updates that were prepared in the june 20th dataset.

for dc in eqiad codfw; do 
  for port in 9{2,4,6}43; do
   echo -n "https://search.svc.$dc.wmnet:$port/ : "
    curl -s https://search.svc.$dc.wmnet:$port/_search -H 'Content-Type: application/json' -d '{"query":{"match":{"weighted_tags": "recommendation.image/exists"}}}' | jq .hits.total
  done
done

outputs:

https://search.svc.eqiad.wmnet:9243/ : 0
https://search.svc.eqiad.wmnet:9443/ : 0
https://search.svc.eqiad.wmnet:9643/ : 0
https://search.svc.codfw.wmnet:9243/ : 0
https://search.svc.codfw.wmnet:9443/ : 0
https://search.svc.codfw.wmnet:9643/ : 0

Thanks @EBernhardson!

@Cparle another issue, perhaps deserving a separate task, is that some articles with infoboxes are returned in the hasrecommendation:image results.

For example, this https://en.wikipedia.org/wiki/Red_wine article has an infobox, and an image is in the infobox. IIRC, in the initial batch of recommendations loaded into the search index for use with the old API, the algorithm excluded articles with infoboxes. Is that no longer the case?

Thanks @EBernhardson!

@Cparle another issue, perhaps deserving a separate task, is that some articles with infoboxes are returned in the hasrecommendation:image results.

For example, this https://en.wikipedia.org/wiki/Red_wine article has an infobox, and an image is in the infobox. IIRC, in the initial batch of recommendations loaded into the search index for use with the old API, the algorithm excluded articles with infoboxes. Is that no longer the case?

I believe infobox filtering was done on the client side in T291232?

Thanks @EBernhardson!

@Cparle another issue, perhaps deserving a separate task, is that some articles with infoboxes are returned in the hasrecommendation:image results.

For example, this https://en.wikipedia.org/wiki/Red_wine article has an infobox, and an image is in the infobox. IIRC, in the initial batch of recommendations loaded into the search index for use with the old API, the algorithm excluded articles with infoboxes. Is that no longer the case?

I believe infobox filtering was done on the client side in T291232?

Oh sorry, that is my mistake. Thanks for the pointer to T291232. What we ended up doing is using a combined search of hasrecommendation:image -hastemplatecollection:infobox to get articles with image suggestions that don't have the set of community-defined infobox templates.

kostajh claimed this task.

Just for the posterity - the articles above are no longer have image-suggestions related data in weighted_tags, so we think the problem has been fixed in the image-suggestions pipeline