Page MenuHomePhabricator

MediaSearch: "Sobbing" results in train media
Open, Needs TriagePublic

Description

Search: https://commons.wikimedia.org/w/index.php?search=sobbing&title=Special%3AMediaSearch&go=Go&type=image

I get why this is happening (I think it's stemming to SOB in the image name), but I don't think this is a good end result:

image.png (539×1 px, 1 MB)

It also seems to impact a search for crying too, though the trains are further down than in that image such that it's non-trivial to screenshot it "from the top". (I didn't know search had a thesaurus!) https://commons.wikimedia.org/w/index.php?search=crying&title=Special%3AMediaSearch&go=Go&type=image

Event Timeline

Wasn't sure what a good blue product name to file this in would be.

Aklapper renamed this task from MediaSearch: Sobbing brings us trains to MediaSearch: "Sobbing" results in train media.Apr 9 2025, 6:16 AM
Aklapper edited projects, added MediaSearch; removed Discovery-Search.

I didn't know search had a thesaurus!

I think this is simple suffix stripping and it is part of the language support for stemming.

I get that it's not really ideal in this particular case, but i doubt there is an easy way to solve this.

Google probably solves this by making multiple internal interpretations of your search query based on their knowledge/language graph or based simply on previous likelihood that people are looking for sobbing instead of an SOB train when typing that ? And then they limit your search by adding a hidden subquery ? And then they also use the quick grouping switches at the top to switch between even more subsets of their graph.

I didn't know search had a thesaurus!

I think this is simple suffix stripping and it is part of the language support for stemming.

Here I'm making the not-funny that crying also results in sob trains. I fully understand that stemming is how we're ending up at sob trains for the sobbing case.