Page MenuHomePhabricator

Search result on Commons does not contain search term
Closed, DeclinedPublic

Description

(CAUTION: EXPLICIT SEXUAL CONTENT IN LINKS BELOW)

Searching the Wikimedia Commons for the word "purpose":

http://commons.wikimedia.org/w/index.php?search=purpose&button=&title=Special%3ASearch

Returns this file as the first result:

http://commons.wikimedia.org/wiki/File:Black_genitalia.jpg

Investigation into the matter (http://commons.wikimedia.org/wiki/Commons:Village_pump#Bizarre_search_result) has revealed that the word "purpose" is not in the file's description, EXIF, or any backlinks.

A suggestion has been made that it may somehow be related to the word "pose", which several other files in the same series include in their name, such as this (also explicit):

http://commons.wikimedia.org/wiki/File:Pose.jpg

That makes me wonder if this is somehow related to bug 2511, from way back in 2005, in that it could relate to stemming, but that's pure guessing.


Version: wmf-deployment
Severity: normal
Whiteboard: cirrus-fixed

Details

Reference
bz48573

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:43 AM
bzimport set Reference to bz48573.
bzimport added a subscriber: Unknown Object (MLST).

russavia.wikipedia wrote:

The issue can be found at:

http://commons.wikimedia.org/wiki/Commons:Deletion_requests/Image:Black_genitalia.jpg

The DR stated the word "purpose" so it is including backlinks to the image in how the search is handled.

I must say it is quite funny and sad that a DR which was trying to delete the file is what is responsible for shooting it to the top of the search results :)

russavia.wikipedia wrote:

So that this does not occur in future, we should be making it so that deletion requests on Commons do not count for backlinks, etc.

The same issue occurs from Commons:Quality_images_candidates/Archives_December_2010#File:TortoiseshellCat.JPG where http://commons.wikimedia.org/wiki/File:PieCrust_masked.jpg is returned as the top result for a search for "result"

Ideally, project pages should not be included for such search results.

http://commons.wikimedia.org/wiki/Commons:Requests_for_comment/improving_search#A_little_bit_of_intelligence is looking to be a better and better solution to search

peterjames28 wrote:

(In reply to comment #3)

So that this does not occur in future, we should be making it so that
deletion
requests on Commons do not count for backlinks, etc.

The same issue occurs from
Commons:Quality_images_candidates/Archives_December_2010#File:
TortoiseshellCat.JPG
where http://commons.wikimedia.org/wiki/File:PieCrust_masked.jpg is returned
as
the top result for a search for "result"

Ideally, project pages should not be included for such search results.

http://commons.wikimedia.org/wiki/Commons:Requests_for_comment/
improving_search#A_little_bit_of_intelligence
is looking to be a better and better solution to search

The "purpose" result was from Commons:Deletion requests/File:Geschändetehostie.jpg. I've changed the links to go to the deletion request pages instead of the files; I don't know how long it takes for search results to update. The "result" link is at User_talk:Jonathunder/1#Masking_of_a_pie.

This problem does not appear to exist in CirrusSearch, as the file in question was not returned in the first 2,000 results for the search query "purpose" on Commons.

As we're in the process of migrating from Lucene to CirrusSearch, I'm marking this as RESOLVED WONTFIX.

peterjames28 wrote:

I changed the "purpose" link several months ago (comment 4 and https://commons.wikimedia.org/w/index.php?title=Commons:Deletion_requests/File:Gesch%C3%A4ndetehostie.jpg&diff=prev&oldid=96520537). The pie is still the top result for "result".

This valid bug will not be fixed and we are against wasting time fixing it as the search infrastructure will be replaced soon.
That's why this report was closed as "WONTFIX" (will not fix). See comment 5.

Yes, I saw what Dan said (and I know very well what WONTFIX means). His comment seemed to imply to me that Commons was already using CirrusSearch and the bug still existed in it. If the bug exists in some other element of the infrastructure that will soon be replaced, then that's fine.

(In reply to Scott Martin from comment #9)

Yes, I saw what Dan said (and I know very well what WONTFIX means). His
comment seemed to imply to me that Commons was already using CirrusSearch
and the bug still existed in it. If the bug exists in some other element of
the infrastructure that will soon be replaced, then that's fine.

My apologies. I wasn't very clear in my original comment.

Commons is still using LuceneSearch as its default search. I can reproduce the "result" bug in comment 7 using the provided URL.

However, Commons has CirrusSearch enabled as a Beta Feature. This means that users can, on a user-by-user basis, switch over to using CirrusSearch instead by ticking the "New search" box in their Beta preferences. I unticked the box to reproduce the bug above. If I tick the box again, and use CirrusSearch instead, I am unable to reproduce this bug in CirrusSearch.

I'm WONTFIXing this because we're in the process of (gradually) rolling out CirrusSearch to be the default on all wikis, so since this bug is fixed in CirrusSearch our time is better spent improving CirrusSearch rather than fixing LuceneSearch bugs.

Some wikis, such as all Wikivoyages, are already using CirrusSearch as the default search engine [1]. While we iron out bugs, we've got most wikis set to have CirrusSearch as a Beta Feature, so if there are any game breaking bugs then people can turn off CirrusSearch really easily. A few unfortunate wikis still use LuceneSearch and don't have the Beta Feature enabled, though this is entirely due to technical reasons related to the server cluster that handles search not being able to support CirrusSearch being turned on everywhere yet.

Hopefully this helps explain the situation. Let me know if you have any questions.

[1]: It's still possible to use Lucene on these wikis by appending &srbackend=LuceneSearch to the end of a search URL, though I have no idea why the end user would actually wish to do that.