Page MenuHomePhabricator

Map labels fall back to an arbitrary name in the same writing system
Closed, DuplicatePublicBUG REPORT

Assigned To
None
Authored By
mxn
Oct 9 2023, 5:30 PM
Referenced Files
F38168534: ảnh.png
Oct 9 2023, 5:30 PM
F38168468: ảnh.png
Oct 9 2023, 5:30 PM
F38168464: ảnh.png
Oct 9 2023, 5:30 PM
F38168461: ảnh.png
Oct 9 2023, 5:30 PM
Subscribers

Description

When viewing Wikimedia Maps in a language for which OpenStreetMap lacks a name:* tag, the map tiles fall back to the first tag in alphabetical order whose language code explicitly indicates the requested language’s writing system, even if it’s for a language with a very different Latin alphabet than the requested language.

In OSM, neither the boundary relation nor the place node for Los Angeles has a name:vi (Vietnamese) or name:tlh (Klingon) tag. This is perhaps suboptimal, but not entirely surprising in OSM. If I request a tile in Vietnamese, I would expect it to say either “Los Ángeles” based on the Spanish name, since Spanish is in my list of preferred languages, or perhaps just “Los Angeles” based on the local language (English) in name. Instead, it appears as “Los Anđeles”:

Vietnamese at z5:

ảnh.png (512×512 px, 51 KB)

Vietnamese at z12:

ảnh.png (512×512 px, 406 KB)

This happens in any language that isn’t explicitly tagged in OSM, such as Klingon:

ảnh.png (512×512 px, 406 KB)

“Los Anđeles” has never been tagged as a Vietnamese name on the boundary relation or place node for Los Angeles in OpenStreetMap, nor has it been as a label in any language on Q65 on Wikidata. However, the place node in OSM does have a name:sr-Latn tag set to “Los Anđeles”.

As far as I can tell, the tiles are falling back to a name:*-Latn tag just because Vietnamese and Klingon happen to be written in Latin and the language code names Latin explicitly. The only reason name:sr-Latn contains “Latn” is that Serbian can also be written in Cyrillic, but this is completely irrelevant to choosing a good Latin-script fallback for Vietnamese. This is especially jarring in Vietnamese because the Serbian and Vietnamese alphabets share the letter “đ”. A name like “Los Anđeles” looks like someone tried to transliterate into the Vietnamese alphabet but completely missed the mark. (It would be either “Lốt An-giơ-lét” or “Lốt Angiơlét” in that case.)

This issue affects other places as well, but not necessarily in the same way. Here’s a Vietnamese tile labeling Fort Wayne, Indiana, as “Fōtou~ein” based on the name:ja-Latn tag on this OSM node. I believe name:ja-Latn took precedence over name:sr-Latn in this case just because it came first alphabetically. name:ja-Latn is a transliteration, not even intended for user display in most cases.

ảnh.png (512×512 px, 244 KB)

This issue affects Vietnamese at 42,179 places worldwide. It affects at least one language that uses a Latin alphabet at 45,278 places and presumably affects at least one language in any writing system at 156,788 places worldwide. These statistics do not account for cases in which we happen to get lucky and the arbitrarily chosen language happens to spell the place name the same way as the requested language would.

T193198 would mitigate this issue by instead falling back to a Wikidata label in the requested language. Presumably Wikidata labels would offer better coverage of place names. The issue would still remain, but at least it would be easier for Wikimedians to work around the issue on a case-by-case basis.

Event Timeline

See T195318. We have been considering it might be better to simply ignore suffixed language keys.

This issue affects Vietnamese at 42,179 places worldwide. It affects at least one language that uses a Latin alphabet at 45,278 places and presumably affects at least one language in any writing system at 156,788 places worldwide.

It isn't necessarily as bad as these usage counts suggest because most uses of sr-Latn and ja-Latn are respectively for places in Serbia and Japan, and for most of these places local language latinization is likely fine in most Latin-script languages, too. But indeed it's still pretty bad as use cases in other countries tend to be for more prominent place names, like these of countries and big cities, that shouldn't use Serbian or Japanese latinization in most cases.