Page MenuHomePhabricator

All map location names should be in the user's language
Closed, ResolvedPublic

Description

Currently the map at https://maps.wikimedia.org shows location names in the language of the location.

This is problematic for several reasons.

  • Most importantly, most people want to see the location name in their own language. If a translation is not available, then it should be a reasonable fallback. If, for example, a name of a city in China is not translated to Hebrew, I would prefer to see it in English and not in Chinese; seeing Chinese alone is useless to me, unfortunately, although seeing a Chinese name beside a name in Hebrew or English could be nice.
  • Furthermore, some locations have people speaking different languages, and attaching a name in only one language is POV. For example, the name of the city of Tetovo in Macedonia is shown in the Cyrillic alphabet (i.e. Macedonian language), even though many of this city's residents speak Albanian, which is written in the Latin script.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedTheDJ
DeclinedFeatureNone
ResolvedEtonkovidova
DeclinedNone
DeclinedNone
DuplicateNone
ResolvedTheDJ
ResolvedSBisson
ResolvedPnorman
InvalidNone
InvalidNone
ResolvedSBisson
ResolvedMooeypoo
ResolvedMooeypoo
DeclinedNone
ResolvedMooeypoo
ResolvedSBisson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • When is this likely to roll out in terms of becoming something users can actually see and use (and that we can brag about having completed)?

Best case scenario: end of next week. I'm working towards getting ready but I won't put it live before the whole team is ready, that includes testing, documentation, announcement, etc.

  • What will the actual experience be? How should we describe it in documentation, and what are the things users need to know? E.g.,
    • Assuming labels in a given language are available, will this just start working on maps around the world? Or do editors have to do something to trigger a local-language update for existing maps?

It will just start working. Nothing the editors have to do.

  • What about for new maps? Is this a feature editors have to turn on, or is this just how things work now?

It will be the default behavior. You can't turn it on or off.

  • What language will be the default? Will maps try to default to the language of the wiki, I gather?

It defaults to the page language, then the configured fallback(s) for that language, then english, then local name.

  • If that's not available, my understanding is that there is an established language fallback protocol. What's that called, and can you point to it please?

Fallbacks have been extracted from the MediaWiki codebase. They are here: https://github.com/kartotherian/babel/blob/master/lib/fallbacks.json

  • Do you happen to know where the OSM documentation is that shows users how to add labels (is that the right word?) in new languages, in the event the one they want isn't available?

I don't know. Maybe @Trizek-WMF does.

Thanks!

  • Do you happen to know where the OSM documentation is that shows users how to add labels (is that the right word?) in new languages, in the event the one they want isn't available?

I don't know. Maybe @Trizek-WMF does.

See https://wiki.openstreetmap.org/wiki/Multilingual_names and https://wiki.openstreetmap.org/wiki/Names#Localization. That last one is the more detailed for a user.
Translations are really an advanced feature. There is ways to get places translations in the integrated editor, but that's not a first-line feature.

Change 420315 abandoned by Sbisson:
Configure maps source for localized labels

https://gerrit.wikimedia.org/r/420315

Change 422239 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[operations/puppet@production] Make 'style' and 'storage id' available to maps services

https://gerrit.wikimedia.org/r/422239

Change 422447 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[maps/tilerator/deploy@master] Configure sources here instead of in puppet

https://gerrit.wikimedia.org/r/422447

Change 422449 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[maps/kartotherian/deploy@master] Configure sources here instead of in puppet

https://gerrit.wikimedia.org/r/422449

Hi,
would be possible to consider labels from Wikidata in case in OSM are missing?
See T159205

In T112948#4091265, @Sabas88 wrote:

Hi,
would be possible to consider labels from Wikidata in case in OSM are missing?
See T159205

That's a really interesting idea. Some time in the future, perhaps, but it's out of scope for our current Map Improvements project.

@jmatazzoni as far as I know, most consumers of OSM, e.g. Mapbox and Klokane, use Wikidata for international labels. Simply because OSM's international coverage is nowhere as good as what Wikidata has. I have built Sophox query service (based on Wikidata's query service) - its a database that combines both the Wikidata and OSM data into one place, thus allowing to easily merge the two results.

Change 422239 merged by Gehel:
[operations/puppet@production] Make 'style' and 'storage id' available to maps services

https://gerrit.wikimedia.org/r/422239

Change 422449 merged by jenkins-bot:
[maps/kartotherian/deploy@master] Configure sources here instead of in puppet

https://gerrit.wikimedia.org/r/422449

Change 422447 merged by Gehel:
[maps/tilerator/deploy@master] Configure sources here instead of in puppet

https://gerrit.wikimedia.org/r/422447

Change 423721 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: remove sources.yaml

https://gerrit.wikimedia.org/r/423721

Change 423722 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: cleanup of sources.yaml code

https://gerrit.wikimedia.org/r/423722

Hi folks,

I'm all new to phabricator and this topic, but have some experience in IT.

Question:

  • Is there already a place, where all fixed decisions to this topic are collected and open questions discussed one-by-one? If required I would be happy to moderate the topic as good as I could.

Regards, Olivier

Provide a map for each languages under a seperate URL: {locale}.maps.wikimedia.org

Multiple languages may be served from the same source, as long as no distinguished versions exists

@Next2u I actually really like this idea! Makes testing and serving much better, and avoids ugly URL parameters. I just wonder if some sort of caching optimization is needed/possible in this case (cc: @Gehel)

For grouping the languages to get the best match for each user, we have to group the languages:

But the fallback path of a language is rather a political than a technical problem.

Proposal:

  • This should be best solved by each owners of a wikipedia language.

We've figured a number of these things out already, and will probably have this feature working in production in the next week or so. It's already working in beta: https://maps-beta.wmflabs.org/?lang=fr . We also have fallback paths defined here, which are just exported from MediaWiki for now.

@Next2u I actually really like this idea! Makes testing and serving much better, and avoids ugly URL parameters. I just wonder if some sort of caching optimization is needed/possible in this case (cc: @Gehel)

Caching is always an optimization problem, but I hope this proposal is not against it.

If we have something like a hash of every tile, this is for sure a good idea. It would be ideal, if the "Top 4" empty tiles would be kind of static:

Here is a list of all URLs served today from "maps.wikimedia.org" (may be incompletet):

@Catrope @Next2u 's propsal doesn't preclude all that work. It could be a one line change in Varnish - simply change domain name into a lang parameter, and let the backend handle it as before. My only only other concern with the proposal is that lang parameter allows anything at all, whereas domain name usually comes from a whitelist. Not sure which is better - merits for both.

The mapping is great work. It makes absolute sense to take this json data as master to generate all other technical configurations like the DNS entries, etc. periodically.

The DNS peoposal is to match the user logic with the wikipedia logic.

The URL paths won't be user-facing, but we still want sane ones. The industry convention is for changes to either URL parameters or the path, not DNS[1], not multiple domains. Part of the reason is that it's simpler to do URL changes than DNS changes, particularly in development when everything is served from localhost.

It would have to be a wildcard since we don't know a list of languages. Turning a wildcard DNS into a URL parameter includes varnish changes, and I believe policy is not to do any URL mangling in Vanish. Even if not against policy, it adds complexity and impedes understanding, because the URL requested is not the URL the backend sees.

Given the above, I recommend sticking with query parameters.

None of this impacts language fallbacks which are done on the backend either way.

[1]: e.g. tegola, mapzen, mapbox, Jochen's MLM, and other works

Re caching, @Gehel has previously said that he was confident that Varnish would easily be able to cope with the fragmentation introduced by the ?lang=xx parameter.

As for the DNS idea: maps.wikimedia.org is really just a test page, it's not really intended for public consumption, and the main thing we care about is the URLs for individual tiles loaded by Kartographer, which would go from https://maps.wikimedia.org/osm-intl/15/16594/11273.png to https://maps.wikimedia.org/osm-intl/15/16594/11273.png?lang=xx . These URLs are programmatically generated, and for that purpose using a query param is easier than changing the domain name.

@Pnorman makes sense. @Catrope wrt caching, I was thinking more in terms of caching fully identical tiles in an efficient way (e.g. if tiles is identical in multiple languages, or even if it has identical content such as water to use some optimized caching strategy, but I guess since tiles are tiny, the mgmt overhead won't make it worth while)

Deduplication of identical tiles should be in a separate issue if someone wants to follow up on it. Because this would require a third cache layer (WMF standard Varnish fast, WMF standard Varnish slow, something custom) I really doubt it's worth the complexity.

All patches and PR are merged.

See T191655 to know when it can be tested in production.

So the next version will support the "?lang=xx" parameter, right?

So the next version will support the "?lang=xx" parameter, right?

Right

Question: Is there a reason "round abouts" and "bus-stopps" not transliterated (e.g. in Israel)?
https://maps-beta.wmflabs.org/?lang=fr#19/31.52647/394.60170

image.png (404×636 px, 80 KB)

The same place looks better in "www.openstreetmap.de":

image.png (302×475 px, 86 KB)

Question: Is there a reason "round abouts" and "bus-stopps" not transliterated (e.g. in Israel)?
https://maps-beta.wmflabs.org/?lang=fr#19/31.52647/394.60170

image.png (404×636 px, 80 KB)

The system looks for the requested language first, then the fallback language(s) if those exist, then it requests English, and if none of those are found, it asks for local language. It does this per each label.

In the case above, it might be that there are no labels in French, and french has no fallbacks, and there are no labels in English, which is why it's showing it in Hebrew (local).

Below, you seem to display German, which, apparently, exists, for those labels.

The same place looks better in "www.openstreetmap.de":

image.png (302×475 px, 86 KB)

Actually, that looks worse; the Hebrew is backwards... :\

Not sure if this was discussed before. Many labels in OSM are encoded with an extra param, e.g. Latn (75k), _rm (250k+), _kana, _pinyin, hira, and other. Search for "name:" for the full list. How should these be handled? I think babel used to have some support for it, but it might have been recently removed.

Not sure if this was discussed before. Many labels in OSM are encoded with an extra param, e.g. Latn (75k), _rm (250k+), _kana, _pinyin, hira, and other. Search for "name:" for the full list. How should these be handled? I think babel used to have some support for it, but it might have been recently removed.

It's been mentioned, but I'm not sure if it's been addressed yet.

Question: Is there a reason "round abouts" and "bus-stopps" not transliterated (e.g. in Israel)?

No labels are transliterated. Names from different languages are used.

In general, if you're not sure why a particular label is what it is, the first step to figuring out why is to find the object on openstreetmap.org and look at the value of the name and name:* tags.

Not sure if this was discussed before. Many labels in OSM are encoded with an extra param, e.g. Latn (75k), _rm (250k+), _kana, _pinyin, hira, and other. Search for "name:" for the full list. How should these be handled? I think babel used to have some support for it, but it might have been recently removed.

We've produced this list, and a lot of those labels are pretty non standard (to say the least) which makes it very difficult to address without making assumptions on the language used. We could try and get those conversions going again, but I would be mildly happier if we considered, perhaps, doing it as a cleanup script after pulling the data off of OSM.

We could just state that any "Latn" label is converted to "en" (but only if there is no other "en" existing) - which would make it the automatic fallback immediately after the requested language anyways, which would work, but that doesn't quite work well with a lot of the other non standard labels, and it definitely won't work with them all.

I'm not sure I understand why OSM allows for random labels as it is, but fixing it is going to be something we need to look into how to do properly, if we get into that at our scope at all.

Getting OSM to even discuss standardization is very hard - there are always "territorial" people who say that they prefer things as is for their neck of the woods, even for small things like these (I had a fair share of fights on this one). I do agree that the system should be flexible to cover all use cases, but I also think all communities should work together to harmonize the data in order to increase its value, and to allow data consumers to actually make sense of it.

That said, the list of these things is not that big - there are several "semi-standard" suffixes, e.g. "-Latn", "_rm", and a few more that cover 99%. I think they should be part of the Babel processing - e.g. given "zh" and "ja-Latn", pick "ja-Latn" when rendering "fr" because French is in the "Latn" group, thus prefer Latin script over "zh". Obviously then the question is - should "name" be used at all, and for that I think the answer is to actually analyze name itself - if all characters are in the Latin range.

A more complex handling might involve feature location - e.g. create a map of "default" languages, and check against it, but that's much further out.

Ouch. This is pretty bad.

I suspect that it will easy to fix if the fonts the relevant languages are installed on the server that renders the maps.

I tested it with a few more languages that often fail. These worked well:

  • Arabic
  • Amharic
  • Armenian
  • Bengali
  • Burmese
  • Chinese
  • Georgian
  • Gujarati
  • Greek
  • Hebrew
  • Hindi
  • Japanese
  • Kannada
  • Khmer
  • Korean
  • Lao
  • Odia
  • Russian
  • Sinhalese
  • Tamil
  • Thai
  • Tibetan
  • Urdu

In a few of them there was unnecessary word breaking, but otherwise it was OK: actual letters were displayed and not "tofu" boxes. I was surprised that some of these work well, because they fail very often, e.g. Burmese, Lao, or Tibetan. So there's hope this can be resolved easily.

These didn't work:

  • Aramaic (arc)
  • Cherokee (chr) (Even though it's a very small language, "Switzerland" is translated! And we do have a Cherokee Wikipedia.)
  • Divehi (dv)
  • Malayalam
  • Telugu

First thing I suggest to do is to check which fonts are installed on the rendering server. Aramaic, Cherokee, and Divehi are indeed rare, and aren't always installed by default, but Malayalam and Telugu are fairly common, so I'm surprise that they don't work.

If fonts for these languages aren't installed, and can be installed, then we should do it and test.

I was able to get it to render Telugu by adding the Noto Sans Telugu ttf files to osm-bright-fonts/fonts/ and registering them in osm-bright.tm2/project.xml.

Perhaps @MaxSem can advise about how to add fonts properly in these repos.

name:ko and name:ko_rm exist because they're different, and are often found on the same object. I've seen street signs with both name:ja and name:ja_rm on them.

I know with some languages, to get acceptable rendering the latest version of the fonts is needed along with recent versions of fontconfig and harfbuzz.

@Amire80, thanks for checking these. It sounds like there is an issue with fonts we need to go into fully. Does this need to be a new subtask? If so, would you please write it up, since you have a good handle on the issues. Please tag it with "Collaboration-Feature-Rollouts (Collaboration-Maps)" and "Collaboration-Team-Triage (Collab-Team-This-Quarter)" Thanks!

@Amire80, thanks for checking these. It sounds like there is an issue with fonts we need to go into fully. Does this need to be a new subtask? If so, would you please write it up, since you have a good handle on the issues. Please tag it with "Collaboration-Feature-Rollouts (Collaboration-Maps)" and "Collaboration-Team-Triage (Collab-Team-This-Quarter)" Thanks!

Fixed, but will need to be pushed to beta and production in the next cycle. If it's easier to follow up in a separate task, we can open one, or continue following it here as part of the bigger language effort.

Change 423721 merged by Gehel:
[operations/puppet@production] maps: remove sources.yaml

https://gerrit.wikimedia.org/r/423721

Change 423722 merged by Gehel:
[operations/puppet@production] maps: cleanup of sources.yaml code

https://gerrit.wikimedia.org/r/423722

@Amire80
Aramaic (arc)
Cherokee (chr) (Even though it's a very small language, "Switzerland" is translated! And we do have a Cherokee Wikipedia.)
Divehi (dv)
Malayalam
Telugu

Only Aramaic (arc) still does not work. Others do work (no in betalabs; it has not been updated yet):
https://maps.wikimedia.org/?s=osm-intl-i18n&lang=chr#8/50.535/4.662
https://maps.wikimedia.org/?s=osm-intl-i18n&lang=chr#8/50.535/4.662
https://maps.wikimedia.org/?s=osm-intl-i18n&lang=ml#8/50.535/4.662
https://maps.wikimedia.org/?s=osm-intl-i18n&lang=te#8/50.535/4.662

First thing I suggest to do is to check which fonts are installed on the rendering server. Aramaic, Cherokee, and Divehi are indeed rare, and aren't always installed by default, but Malayalam and Telugu are fairly common, so I'm surprise that they don't work.

If fonts for these languages aren't installed, and can be installed, then we should do it and test.

If configuring fonts for these languages works, please use any fonts available in Debian for Malayalam. I recommend 'Manjari' (Disclaimer: I designed it)

The way Mapnik works, it's much easier to create a repository of fonts and use it - which is what we're doing. Note that it's impossible to just point it at a directory with fonts - they have to be listed in the style's XML file. As of particular fonts, we standardised on Google's Noto fonts that provide consistent styling across different alphabets.

Aramaic is probably my fault; there are several options for Aramaic, I went with Imperial Aramaic, it might not be correct?

The options are:

  • Aramaic (Imperial Aramaic script)
  • Noto Sans Imperial Aramaic (this is what I chose)
  • Samaritan Aramaic (Hebrew script)
  • Samaritan Aramaic (Samaritan script)

@Amire80 @santhosh did I pick incorrectly?

Also, @santhosh, from my rudamentary testing, the Google Noto font for Malayalam works well; we're using Noto fonts as standard, so I took what it has. We'll release soon so it can be fully tested, and at that point, if you think it's not good, we can try and look into changing it away from the "standard" Noto font.

My normal guideline is to use OpenStreetMap Carto's font list for ordering, because they've got more resources for appropriate font selection, but they don't have Aramaic. I filed an issue there.

We should not use any fonts other than Noto with our styles, unless Noto does not have coverage, e.g. some CJK characters outside the BMP.

Change 431612 had a related patch set uploaded (by Catrope; owner: Catrope):
[operations/mediawiki-config@master] Enable $wgKartographerUsePageLanguage everywhere in beta

https://gerrit.wikimedia.org/r/431612

Change 431612 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable $wgKartographerUsePageLanguage everywhere in beta

https://gerrit.wikimedia.org/r/431612

Checked in beta labs and in wmf.2 (for lang attribute):
(1) lang works according to the description on Map improvements 2018 - needs to be updated):

lang=”xx” Shows map labels in the language you specify.
lang=“local” Shows map labels in the languages of the territory mapped (essentially opting out of internationalization).

Without `'lang' attribute, maps will be displayed in a wiki language.

(2) Additional important tasks were resolved:

(3) I'll do follow-ups for

  • Aramaic (arc) support
  • reviewing related open tasks

QA Recommendation: Resolve

I wonder if you can add an option with blank map? Can be useful for geolocation maps: https://en.wikipedia.org/wiki/Template:OSM_Location_map

@Ghybu I believe it's already available - https://maps.wikimedia.org/?s=osm#4/40.75/-73.96 (style = "osm")

Thanks but I do not see concretely how to use it with <mapframe> to have a blank map.
For example to have the same symbol as here but with a blank map (style = "osm" does not seem to work). The idea is to get closer to the model of the maps "Template:Location map".

I take this opportunity to ask another question:
<maplink> and <mapframe> displays coordinates in English (N-S-E-W). I know we can translate them locally; but you do not think it's better to translate them on translatewiki.net with the other translations of "Kartographer"? So it will be easier to copy a code from one wiki to another especially for small wikis.

Not: Another thing, it seems to me that the names of rivers of mountain peaks or stretch of water are not displayed. Is it wanted?

@Ghuron use nolabels=1 parameter.

style = "osm"

nolabels=1

I'm learning new things here! @Yurik can you help refresh my memory if we have these parameters documented anywhere?

i learned new things... this indicates a documentation gap,...
checked, that's because it is indeed not documented :)

https://www.mediawiki.org/wiki/Help:Extension:Kartographer

@Ghuron use nolabels=1 parameter.

Is this working in Kartographer's <mapframe> or in the maps.wikimedia.org params? I can't find nolabels in the Kartographer codebase, but I might be missing something.
We should consider implementation to mapframe parameters, if needed.

i learned new things... this indicates a documentation gap,...
checked, that's because it is indeed not documented :)

https://www.mediawiki.org/wiki/Help:Extension:Kartographer

That would be because it doesn't exist. The string "nolabels" does not appear anywhere in any of the maps-related code bases.