Search result on Commons does not contain search term
Closed, DeclinedPublic

Assigned To
None
Priority
Normal
Author
Scott
Subscribers
McZusatz, Scott, Deskana and 2 others
Projects
Reference
bz48573
Description

(CAUTION: EXPLICIT SEXUAL CONTENT IN LINKS BELOW)

Searching the Wikimedia Commons for the word "purpose":

http://commons.wikimedia.org/w/index.php?search=purpose&button=&title=Special%3ASearch

Returns this file as the first result:

http://commons.wikimedia.org/wiki/File:Black_genitalia.jpg

Investigation into the matter (http://commons.wikimedia.org/wiki/Commons:Village_pump#Bizarre_search_result) has revealed that the word "purpose" is not in the file's description, EXIF, or any backlinks.

A suggestion has been made that it may somehow be related to the word "pose", which several other files in the same series include in their name, such as this (also explicit):

http://commons.wikimedia.org/wiki/File:Pose.jpg

That makes me wonder if this is somehow related to bug 2511, from way back in 2005, in that it could relate to stemming, but that's pure guessing.


Version: wmf-deployment
Severity: normal
Whiteboard: cirrus-fixed

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz48573.
Scott created this task.Via LegacyMay 17 2013, 10:32 AM
Aklapper added a comment.Via ConduitMay 17 2013, 11:03 AM

Confirming.

bzimport added a comment.Via ConduitMay 18 2013, 2:00 PM

russavia.wikipedia wrote:

The issue can be found at:

http://commons.wikimedia.org/wiki/Commons:Deletion_requests/Image:Black_genitalia.jpg

The DR stated the word "purpose" so it is including backlinks to the image in how the search is handled.

I must say it is quite funny and sad that a DR which was trying to delete the file is what is responsible for shooting it to the top of the search results :)

bzimport added a comment.Via ConduitMay 18 2013, 2:14 PM

russavia.wikipedia wrote:

So that this does not occur in future, we should be making it so that deletion requests on Commons do not count for backlinks, etc.

The same issue occurs from Commons:Quality_images_candidates/Archives_December_2010#File:TortoiseshellCat.JPG where http://commons.wikimedia.org/wiki/File:PieCrust_masked.jpg is returned as the top result for a search for "result"

Ideally, project pages should not be included for such search results.

http://commons.wikimedia.org/wiki/Commons:Requests_for_comment/improving_search#A_little_bit_of_intelligence is looking to be a better and better solution to search

bzimport added a comment.Via ConduitMay 19 2013, 3:35 PM

peterjames28 wrote:

(In reply to comment #3)

So that this does not occur in future, we should be making it so that
deletion
requests on Commons do not count for backlinks, etc.

The same issue occurs from
Commons:Quality_images_candidates/Archives_December_2010#File:
TortoiseshellCat.JPG
where http://commons.wikimedia.org/wiki/File:PieCrust_masked.jpg is returned
as
the top result for a search for "result"

Ideally, project pages should not be included for such search results.

http://commons.wikimedia.org/wiki/Commons:Requests_for_comment/
improving_search#A_little_bit_of_intelligence
is looking to be a better and better solution to search

The "purpose" result was from Commons:Deletion requests/File:Geschändetehostie.jpg. I've changed the links to go to the deletion request pages instead of the files; I don't know how long it takes for search results to update. The "result" link is at User_talk:Jonathunder/1#Masking_of_a_pie.

Deskana added a comment.Via ConduitFeb 8 2014, 2:39 AM

This problem does not appear to exist in CirrusSearch, as the file in question was not returned in the first 2,000 results for the search query "purpose" on Commons.

As we're in the process of migrating from Lucene to CirrusSearch, I'm marking this as RESOLVED WONTFIX.

bzimport added a comment.Via ConduitMar 23 2014, 2:46 AM

peterjames28 wrote:

I changed the "purpose" link several months ago (comment 4 and https://commons.wikimedia.org/w/index.php?title=Commons:Deletion_requests/File:Gesch%C3%A4ndetehostie.jpg&diff=prev&oldid=96520537). The pie is still the top result for "result".

Scott added a comment.Via ConduitMar 23 2014, 4:02 AM

https://commons.wikimedia.org/w/index.php?search=result&title=Special%3ASearch&go=Go&uselang=en

Link for "result" search. I would say that this bug should remain open on that basis.

Aklapper added a comment.Via ConduitMar 23 2014, 1:28 PM

This valid bug will not be fixed and we are against wasting time fixing it as the search infrastructure will be replaced soon.
That's why this report was closed as "WONTFIX" (will not fix). See comment 5.

Scott added a comment.Via ConduitMar 23 2014, 8:11 PM

Yes, I saw what Dan said (and I know very well what WONTFIX means). His comment seemed to imply to me that Commons was already using CirrusSearch and the bug still existed in it. If the bug exists in some other element of the infrastructure that will soon be replaced, then that's fine.

Deskana added a comment.Via ConduitMar 24 2014, 4:05 PM

(In reply to Scott Martin from comment #9)

Yes, I saw what Dan said (and I know very well what WONTFIX means). His
comment seemed to imply to me that Commons was already using CirrusSearch
and the bug still existed in it. If the bug exists in some other element of
the infrastructure that will soon be replaced, then that's fine.

My apologies. I wasn't very clear in my original comment.

Commons is still using LuceneSearch as its default search. I can reproduce the "result" bug in comment 7 using the provided URL.

However, Commons has CirrusSearch enabled as a Beta Feature. This means that users can, on a user-by-user basis, switch over to using CirrusSearch instead by ticking the "New search" box in their Beta preferences. I unticked the box to reproduce the bug above. If I tick the box again, and use CirrusSearch instead, I am unable to reproduce this bug in CirrusSearch.

I'm WONTFIXing this because we're in the process of (gradually) rolling out CirrusSearch to be the default on all wikis, so since this bug is fixed in CirrusSearch our time is better spent improving CirrusSearch rather than fixing LuceneSearch bugs.

Some wikis, such as all Wikivoyages, are already using CirrusSearch as the default search engine [1]. While we iron out bugs, we've got most wikis set to have CirrusSearch as a Beta Feature, so if there are any game breaking bugs then people can turn off CirrusSearch really easily. A few unfortunate wikis still use LuceneSearch and don't have the Beta Feature enabled, though this is entirely due to technical reasons related to the server cluster that handles search not being able to support CirrusSearch being turned on everywhere yet.

Hopefully this helps explain the situation. Let me know if you have any questions.

[1]: It's still possible to use Lucene on these wikis by appending &srbackend=LuceneSearch to the end of a search URL, though I have no idea why the end user would actually wish to do that.

Scott added a comment.Via ConduitMar 24 2014, 5:28 PM

That's great. Thanks Dan.

Add Comment