Page MenuHomePhabricator

[Bug] deepcategory created empty result set for umlauts
Closed, ResolvedPublic

Description

Testing deepcategory more thoroughly, we think there might be a problem related to umlauts.
See the examples:
working:
deepcategory:"Friedhof in Hamburg"

not working:
deepcategory:"Friedhof in Köln"
deepcategory:"Friedhof in Münster"
deepcategory:"Sakralbau in Gießen"

All examples contain at most one subcategory with no further sub categories, but all contain articles.

This bug is blocking the inclusion of deepcategory search in advancedsearch

Event Timeline

Lea_WMDE raised the priority of this task from Medium to High.Apr 16 2018, 11:31 AM

It seems like other non-ascii characters like Cyrillic letters don't work too.

@Smalyshev do you know when you will have time to look into this bug and T188350#4133189 ?

Note:
deepcategory:"Friedhof in Hamburg" does search the category Friedhof in Hamburg but not it's subcategories Kriegsgräberstätte in Hamburg‎ and Jüdischer Friedhof in Hamburg‎ – presumably due to the umlauts in the names of those.

Looks like some problem with decoding results, since the query dump shown it searching for Friedhof_in_K%C3%B6ln and J%C3%BCdischer_Friedhof_in_K%C3%B6ln which is obviously wrong.

Change 427260 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/CirrusSearch@master] Decode category names received form the DB

https://gerrit.wikimedia.org/r/427260

Change 427260 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Decode category names received form the DB

https://gerrit.wikimedia.org/r/427260