Page MenuHomePhabricator

Wikidata Query Service removes ZWNJ from the results
Closed, DuplicatePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Open

https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%20%3Fitem%20rdfs%3Alabel%20%22%D9%82%D9%87%D9%88%D9%87%E2%80%8C%D8%AE%D8%A7%D9%86%D9%87%22%40fa.%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22fa%22.%20%7D%20%7D

What happens?:
"قهوهخانه" returned as the result.

What should have happened instead?:
"قهوه‌خانه" should be returned (for whose don't know the script, see that circle like letter in the middle of the word which should be attached to other letters)

Guess there is some normalization going on that removes whitespaces but ZWNJ shouldn't be removed as is discussed also in https://github.com/w3c/charmod-norm/issues/44

As far I can tell this isn't from the interface as the API shows the issue also,

image.png (346×1 px, 138 KB)

Event Timeline

(Added recent editors of LabelService.java though I'd guess this isn't from the particular file, I just hope someone would have a clue on where this sort of normalization can come from)