Page MenuHomePhabricator

Serbian language search does not allows for use of bald Latin alphabet
Open, NormalPublic

Description

In search, most Internet users use bald Latin alphabet (without letters č, ć, š, ž and đ). This is similar to how in German language the search for "Muenchen" will return the results for "München". Thus, Serbian Wikipedia should support searching in this way, but it doesn't. Example:

  1. Search for "marković": https://sr.wikipedia.org/w/index.php?title=%D0%9F%D0%BE%D1%81%D0%B5%D0%B1%D0%BD%D0%BE:%D0%9F%D1%80%D0%B5%D1%82%D1%80%D0%B0%D0%B6%D0%B8&profile=default&fulltext=Search&search=markovi%C4%87&searchToken=cibdktt9t7eu2hv4o3n1hgg84
    1. Observed: 207 search results.
  2. Search for "markovic": https://sr.wikipedia.org/w/index.php?title=%D0%9F%D0%BE%D1%81%D0%B5%D0%B1%D0%BD%D0%BE:%D0%9F%D1%80%D0%B5%D1%82%D1%80%D0%B0%D0%B6%D0%B8&profile=default&fulltext=Search&search=markovic&searchToken=gf3dawrz4tio3a91fujm144m
    1. Expected: all the 207 previous search results should appear.
    2. Observed: Only 47 results appear.

An overview of the issue is given at https://wiki.apache.org/solr/SerbianLanguageSupport

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 28 2016, 4:11 PM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptJun 28 2016, 4:15 PM

Wikimedia sites do not use MediaWiki's default search backend (MediaWiki-Search), hence setting CirrusSearch.

debt triaged this task as Normal priority.Jul 1 2016, 4:45 PM
debt moved this task from needs triage to This Quarter on the Discovery-Search board.
debt added a subscriber: debt.

We'll take a look and hopefully it'll be fairly 'easy' to fix.

TJones added a subscriber: TJones.Jun 29 2017, 5:30 PM

If we want both bald Latin and Cyrillic-to-Latin mapping, it looks to be straightforward. See T138857#3391852 for more details.

Restricted Application added a subscriber: Petar.petkovic. · View Herald TranscriptJan 29 2019, 6:40 PM