Maniphest T192662

name:<local name code> is not always available in OSM
Closed, ResolvedPublic
Actions

Description

tl;dr In many cases, looking at a map with lang=<local lang> shows less labels (and more fallbacks, mostly english) than looking at the same map without the i18n feature.

Example
Localized map with lang=zh: https://maps.wikimedia.org/?s=osm-intl-i18n&lang=zh#13/39.9122/116.3925
Original map of the same area: https://maps.wikimedia.org/#13/39.9122/116.3925

The guideline from OSM seem to be to provide name=<local name> AND name:<local lang code>=<local name> but the latter is not always provided so we end up falling back on many imperfect options instead of showing the local name.

Since we don't have information about the language of the name: attribute I don't know how we can solve this. Maybe we can just accept that the data is not good enough but it feels like a regression when looking at a map with lang=<local lang>

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		None	T43307 Alow comparison with old revisions while editing an old version of a page
Resolved		TheDJ	T70008 PDF-related improvements needed at Wikivoyage, especially for dynamic maps
Declined	Feature	None	T64572 Media Viewer and location map overlays
Resolved		Etonkovidova	T52714 VisualEditor: Location map template does not display correctly
Declined		None	T32702 PDF export extension problem with <div style="position:> and any location templates
Declined		None	T92535 Exclude {{location map}} images from PageImages
Duplicate		None	T169507 Fix vector map visualisations once the user clicks them on an article (red dot and label missing)
Resolved		TheDJ	T120809 Generate location maps dynamically, producing actual images
Resolved		SBisson	T112948 All map location names should be in the user's language
Resolved		Mooeypoo	T192701 Create an optimized language-fallback system for Maps internationalization based on investigations
Resolved		SBisson	T192662 name:<local name code> is not always available in OSM

Event Timeline

SBisson created this task.Apr 20 2018, 3:32 PM

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptApr 20 2018, 3:32 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think you meant name=<local name>, not name:<local name>. "name" should be the last fallback if other options are not available. "int_name" should also be considered (it's the most common tag besides the name itself -- over 400,000). As for telling the language of the "name", I proposed OSM to have regions with primary language code) - currently being discussed. Please participate.

SBisson updated the task description. (Show Details)Apr 20 2018, 4:27 PM

In T192662#4146229, @Yurik wrote:

I think you meant name=<local name>, not name:<local name>.

Yes, that's what I meant. Updated.

"name" should be the last fallback if other options are not available.

For the cases I'm describing here, "name" if by far the best fallback but we can't tell without knowing the lang of "name."

"int_name" should also be considered (it's the most common tag besides the name itself -- over 400,000).

I don't see how "int_name" can be used if we don't know which lang it is written in.

As for telling the language of the "name", I proposed OSM to have regions with primary language code) - currently being discussed. Please participate.

Is there a wiki page for such proposals? There was about about having name:language=<language used in the name tag> but it was rejected. Even if accepted the data doesn't appear overnight but I thought it was a decent solution.

I looked at some numbers. For China*, there are 29332 named cities or towns in OSM. 7508 have no name:* tags, and 16288 have a name:zh. The labels for these would not appear in a different language for a Chinese map of China. This leaves 5536 which have name, a name:*, but not name:zh. If I restrict this to just cities, the proportion which have name:* but not name:zh decreases.

Specifically, Geofabrik's extract for China, which covers a slightly bigger area.

SQL to generate the results from an osm2pgsql database is below

SELECT
    *,
    "Total named cities and towns" - "Without any names but name=*" - "Has Chinese Han name specified" AS "Would switch language"
  FROM (SELECT
    COUNT(*) AS "Total named cities and towns",
    COUNT(*) FILTER (WHERE extract_names(tags) IS NULL) AS "Without any names but name=*",
    COUNT(*) FILTER (WHERE extract_names(tags) ? 'zh') AS "Has Chinese Han name specified"
  FROM planet_osm_point
  WHERE
    name IS NOT NULL
    AND place in ('city', 'town')) _;

@SBisson I think adding an extra tag to every single OSM object with the name tag is a bit excessive - there are 62 million of them. A much better solution IMO is to make it possible to calculate that language based on the geo position of the object. My suggestion was to create a new "language meta-regions" with the language tags. Another solution is to add language tags to the existing regions. Both have pros/cons, but so far it was only discussed in that mailing thread (link above). If either of these solutions are implemented, it should be possible to generate a geo index for lookups - something that can be done during vtile data generation.

In T192662#4147291, @Pnorman wrote:

I looked at some numbers. For China*, there are 29332 named cities or towns in OSM. 7508 have no name:* tags, and 16288 have a name:zh. The labels for these would not appear in a different language for a Chinese map of China. This leaves 5536 which have name, a name:*, but not name:zh. If I restrict this to just cities, the proportion which have name:* but not name:zh decreases.

Thanks Paul. In Stephane's examples above (internationalized vs. current), it looks like what's missing here is not the city name but many street and district names. Is there a way to sample data for those?

Also, can you show us the USA as well?

Thanks!

• Mholloway subscribed.Apr 23 2018, 6:11 PM

For Chinese highways

│ Total named highways           │ 412114 │
│ Without any names but name=*   │ 247410 │
│ Has Chinese Han name specified │ 50415  │
│ Would switch language          │ 114289 │

So 18% of city and town labels would switch languages and 28% of road labels. I'll need to load the US data, so I can post that this evening.

• jmatazzoni added a subscriber: Amire80.Apr 23 2018, 7:07 PM

Pnorman added a parent task: T112948: All map location names should be in the user's language.Apr 23 2018, 9:52 PM

For the Geofabrik US West region

│ Total named highways             │ 2223811 │
│ Without any names but name=*     │ 2215127 │
│ Has English latin name specified │ 270     │
│ Would switch language            │ 8414    │

│ Total named cities and towns     │ 1236 │
│ Without any names but name=*     │ 1025 │
│ Has English latin name specified │ 63   │
│ Would switch language            │ 148  │

sommerluk subscribed.Apr 24 2018, 8:53 AM

Sent https://lists.openstreetmap.org/pipermail/talk/2018-April/080560.html

Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptApr 24 2018, 6:14 PM

• jmatazzoni added a parent task: T192701: Create an optimized language-fallback system for Maps internationalization based on investigations.Apr 24 2018, 7:06 PM

Mooeypoo mentioned this in T192701: Create an optimized language-fallback system for Maps internationalization based on investigations.Apr 24 2018, 8:39 PM

Nikerabbit subscribed.Apr 27 2018, 7:17 AM

Joe asked for my comments here, although I'm a bit lost, because I'm really not an OSM expert. The big thing that I fail to understand is how can it happen that some labels are not shown at all. In practice, this is indeed a regression, but isn't there supposed to be some kind of a fallback?

The big thing that I fail to understand is how can it happen that some labels are not shown at all. In practice, this is indeed a regression, but isn't there supposed to be some kind of a fallback?

With the i18n changes we'd never fail to show a label that is currently shown. Different labels may show because labels are different sizes. For example, a Chinese label will generally take less space than an English one, so fewer English labels can be shown.

In T192662#4163288, @Amire80 wrote:

Joe asked for my comments here, although I'm a bit lost, because I'm really not an OSM expert. The big thing that I fail to understand is how can it happen that some labels are not shown at all. In practice, this is indeed a regression, but isn't there supposed to be some kind of a fallback?

There's a series of fallbacks, explained thoroughly here: T192701: Create an optimized language-fallback system for Maps internationalization based on investigations

In T192662#4163346, @Pnorman wrote:

The big thing that I fail to understand is how can it happen that some labels are not shown at all. In practice, this is indeed a regression, but isn't there supposed to be some kind of a fallback?

With the i18n changes we'd never fail to show a label that is currently shown. Different labels may show because labels are different sizes. For example, a Chinese label will generally take less space than an English one, so fewer English labels can be shown.

The only time a label won't be shown is if it doesn't, at all, have a local value (name=) which, as I understand it, doesn't happen.

This isn't really a regression; the previous code would supposedly show a random language instead if there were no fallbacks at all and no local language, which is really not a very good result.

As I understand, there should never be labels that have no local values, though. Is that right, @Pnorman ?

• jmatazzoni edited projects, added Collaboration-Team-Triage (Collab-Team-This-Quarter); removed Collaboration-Team-Triage.May 3 2018, 1:43 AM

• jmatazzoni moved this task from Untriaged to In Development on the Collaboration-Team-Triage (Collab-Team-This-Quarter) board.

• jmatazzoni mentioned this in Collaboration-Team-Triage (Collab-Team-This-Quarter).

This task contains some interesting discussion but is not actionable. It was superseded by T192701: Create an optimized language-fallback system for Maps internationalization based on investigations

name:<local name code> is not always available in OSMClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

name:<local name code> is not always available in OSM
Closed, ResolvedPublic
Actions

Related Objects
Search...