Page MenuHomePhabricator

San José label has encoding errors
Closed, ResolvedPublic

Description

The label appears on zoom 7, and on zoom 8 a "San José" label appears on z8.

The zoom and encoding error tells me

  • there is a Natural Earth encoding error,
  • it's failing to match the city against the OSM object

Looking at the OSM node it has official_name=San José de Mayo

The current code considers name and name:en when matching.

Event Timeline

Pnorman created this task.Apr 23 2018, 8:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 23 2018, 8:47 PM
Catrope added a subscriber: Catrope.EditedMay 2 2018, 7:25 PM

This looks like a UTF-8 string is being (mis)interpreted as ISO-8859-1, which commonly happens on the web when the Content-Type HTTP header is missing. For historical reasons, that header defaults to iso-8859-1, even though almost everything is utf-8 now. Another context in which this can happen is database fields, which often have similarly unhelpful encoding defaults.

Yes, it's an encoding error, but not one related to HTTP.

Looking at the database, this is geonameid=3440639 from ne_places, and displays the same in my terminal, which is set up appropriately for unicode.

The name is \x53616e204a6f73c383c2a9206465204d61796f, which, stripping the understandable prefix and suffix, leaves c3 83 c2 a9.

ogr2ogr -f PostgreSQL -lco GEOMETRY_NAME=way -lco SPATIAL_INDEX=FALSE \
  -lco EXTRACT_SCHEMA_FROM_LAYER_NAME=YES -nln loading.ne_places \
  --config SHAPE_ENCODING WINDOWS-1252 -explodecollections -t_srs EPSG:3857 \
  -clipsrc -179.999999999 -85.05112877980659 179.999999999 85.05112877980659 \
  PG:dbname=ct data/ne_places/ne_10m_populated_places_simple.shp

Running this by hand warns

Warning 1: One or several characters couldn't be converted correctly from WINDOWS-1252 to UTF-8.
This warning will not be emitted anymore

Natural Earth used to be in WINDOWS-1252, but this seems to have changed. Fixed in https://github.com/kartotherian/meddo/pull/12 and pulling new external data. This won't fix the failure to match the label which has implications for multilingual names

Pnorman closed this task as Resolved.Jun 26 2018, 4:57 PM
Pnorman claimed this task.

Fixed.

Vvjjkkii renamed this task from San José label has encoding errors to deeaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii triaged this task as High priority.
Vvjjkkii removed Pnorman as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
AntiCompositeNumber renamed this task from deeaaaaaaa to San José label has encoding errors.Jul 1 2018, 4:00 AM
AntiCompositeNumber closed this task as Resolved.
AntiCompositeNumber raised the priority of this task from High to Needs Triage.
AntiCompositeNumber assigned this task to Pnorman.
AntiCompositeNumber updated the task description. (Show Details)
AntiCompositeNumber added a subscriber: Aklapper.