Page MenuHomePhabricator

Search seems to ignore minus terms when only searching the file namespace
Closed, ResolvedPublic

Description

For example, searching Commons for "wikimedia.org -commons.wikimedia.org" in the main and file namespaces finds 168,000 results. Searching only the file namespace finds 37.5 million results. Searching for "wikimedia.org" in only the file namespace also finds 37.5 million results, so it appears to be ignoring the "-commons.wikimedia.org" part of the search when only the file namespace is selected, counterintuitively resulting in more results when fewer namespaces are selected.

Event Timeline

Whats happening here is that StructuredDataOnCommons transforms the search query into many related structured data statements and uses that to create a second query. The result set becomes, roughly, provided text OR structured query. All of the additional matches come from the structured query not respecting the additional limits in the provided text query.

This is certainly a bug, fixing it likely requires at least a light restructuring of how sdoc builds it's query.

The task description has three cases:

search terms(Gallery)+FileFile
wikimedia.org37,600,29037,600,402
wikimedia.org -commons.wikimedia.org168,00037,597,840

It seems that there is discrepancy in calculated results even for a simple search - wikimedia.org. With Gallery+File namespace selection the number of results should be higher than the number of results only for File namespace selection.

Some additional testing is done to test the case with a "-" (minus sign) for Special:MediaSearch (and compare it with Special:Search):

(1) The case to check if searching for "-" returns less results - ✅ Yes

testSpecial:Search (namespaces All )Special:MediaSearch (filter All namespaces)
wikimedia.org37,942,047341,803
wikimedia.org -commons.wikimedia.org172,5513,887

(2) The case to check if selecting more namespaces (when searching with "-") returns more results if more namespaces are selected

testSpecial:SearchSpecial:MediaSearch (Categories and Pages)
wikimedia.org(Gallery) + File: 37,600,142Gallery + Talk: 195
wikimedia.org -commons.wikimedia.org(Gallery) + File: 168,687(Gallery) + Talk: 29
wikimedia.orgFile: 37,600,526Talk : 87
wikimedia.org -commons.wikimedia.orgFile: 37,598,047Talk: 6

Testing simple search on Special:Search

NamespacesNumber of results
cats -dogsFile6,107,976
cats -dogsFile +(Gallery)6,065,711
cats -dogsFile +(Gallery)+User6,073,282
CBogen added a subscriber: CBogen.

@Etonkovidova moving this to Needs QA because on first glance it appears to be fixed. Matthias said it might have been fixed as part of refactoring done in the synonyms patch.

Etonkovidova claimed this task.

Re-checked on commons wmf.2 the searches from the task descriptions.

Task descriptionurlReported resultActual result (commons wmf.2
"searching Commons for "wikimedia.org -commons.wikimedia.org" in the main and file namespace"https://commons.wikimedia.org/w/index.php?search=wikimedia.org+-commons.wikimedia.org&ns0=1&ns6=1168,000169,518
"Searching [the same search terms as above: "wikimedia.org -commons.wikimedia.org" ] only the file namespace "https://commons.wikimedia.org/w/index.php?search=wikimedia.org+-commons.wikimedia.org&ns6=137.5 million777
"Searching for "wikimedia.org" in only the file namespace"https://commons.wikimedia.org/w/index.php?search=wikimedia.org&ns6=137.5 million1,710