Serbian language search does not allows for use of bald Latin alphabet
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Nikola_Smolenski
	Jun 28 2016, 4:11 PM

Description

In search, most Internet users use bald Latin alphabet (without letters č, ć, š, ž and đ). This is similar to how in German language the search for "Muenchen" will return the results for "München". Thus, Serbian Wikipedia should support searching in this way, but it doesn't. Example:

Search for "marković": https://sr.wikipedia.org/w/index.php?title=%D0%9F%D0%BE%D1%81%D0%B5%D0%B1%D0%BD%D0%BE:%D0%9F%D1%80%D0%B5%D1%82%D1%80%D0%B0%D0%B6%D0%B8&profile=default&fulltext=Search&search=markovi%C4%87&searchToken=cibdktt9t7eu2hv4o3n1hgg84
1. Observed: 207 search results.
Search for "markovic": https://sr.wikipedia.org/w/index.php?title=%D0%9F%D0%BE%D1%81%D0%B5%D0%B1%D0%BD%D0%BE:%D0%9F%D1%80%D0%B5%D1%82%D1%80%D0%B0%D0%B6%D0%B8&profile=default&fulltext=Search&search=markovic&searchToken=gf3dawrz4tio3a91fujm144m
1. Expected: all the 207 previous search results should appear.
2. Observed: Only 47 results appear.

An overview of the issue is given at https://wiki.apache.org/solr/SerbianLanguageSupport

Related Objects

Mentioned In: T223787: Investigate impact of folding diacritics in Slovak
T138854: Serbian Wikipedia search offers to create existing articles
T138857: Serbian language search differentiates between Cyrillic and Latin alphabets
Mentioned Here: T138857: Serbian language search differentiates between Cyrillic and Latin alphabets

Event Timeline

Nikola_Smolenski created this task.Jun 28 2016, 4:11 PM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 28 2016, 4:11 PM

Nikola_Smolenski added projects: MediaWiki-Internationalization, MediaWiki-Search.Jun 28 2016, 4:15 PM

Restricted Application added projects: Discovery-ARCHIVED, Discovery-Search. · View Herald TranscriptJun 28 2016, 4:15 PM

Wikimedia sites do not use MediaWiki's default search backend (MediaWiki-Search), hence setting CirrusSearch.

We'll take a look and hopefully it'll be fairly 'easy' to fix.

debt mentioned this in T138857: Serbian language search differentiates between Cyrillic and Latin alphabets.Jul 1 2016, 4:46 PM

debt mentioned this in T138854: Serbian Wikipedia search offers to create existing articles.

If we want both bald Latin and Cyrillic-to-Latin mapping, it looks to be straightforward. See T138857#3391852 for more details.

TJones moved this task from This Quarter to Tech Debt/Misc on the Discovery-Search board.Oct 24 2017, 5:35 PM

debt moved this task from Tech Debt/Misc to Language Stuff on the Discovery-Search board.Jan 29 2019, 6:40 PM

Restricted Application added a subscriber: • Petar.petkovic. · View Herald TranscriptJan 29 2019, 6:40 PM

TJones mentioned this in T223787: Investigate impact of folding diacritics in Slovak.Jun 3 2019, 4:13 PM

Aca added a project: Serbian-Sites.Jun 18 2019, 10:08 AM

Aca subscribed.

Aca moved this task from Backlog to Working on on the Serbian-Sites board.Jun 18 2019, 10:11 AM

Serbian language search does not allows for use of bald Latin alphabetOpen, MediumPublicActions

Description

Related Objects

Event Timeline

Serbian language search does not allows for use of bald Latin alphabet
Open, MediumPublic
Actions