Page MenuHomePhabricator

Kartographer map labels for place names with ZWNJ character (U+200C) are rendered as white rectangular boxes
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

  • It is rendered partly with rectangular boxes as

హనుమా▯▯▯▯▯
జంక్ష▯▯▯▯▯

What should have happened instead?:

  • It shoud have been rendered properly like most other Telugu labels as

హనుమాన్ జంక్షన్

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
Firefox 144.0 (64bit), Mozilla Firefox snap for Ubuntu canonical-002 1.0 on Ubuntu 22.04.5 LTS
By zooming and panning the map, the problem can be seen for few other labels such as వేలేరు (rendered properly at high zoom but incorrectly at lower zoom with the last character represented by rectangualar boxes, భూదాన్ పోచంపల్లి, Though the problem seems to effect very few places (may be about 10 out of 16000 in Andhra Pradesh), it may lead to perception of poor quality about wikimedia maps in users.

Screenshot from 2025-10-31 07-00-40.png (791×957 px, 654 KB)

Event Timeline

Arjunaraoc renamed this task from Kartographer map labels for some place names in Telugu language are partly rendered with rectangular boxes to Kartographer map labels for a few place names in Telugu language are partly rendered with rectangular boxes.Oct 31 2025, 1:48 AM

I learnt from my study that ZWNJ character (U+200C) is being rendered as white rectangular boxes followed by newline. I fixed the names like హనుమాన్ జంక్షన్ , భూదాన్ పోచంపల్లి in OSM today as U+200C is not required as the following letter is a space. But a place name like Jagdevpur (జగ్దేవ్ పూర్ (without space in between జగ్దేవ్‌పూర్) could not be fixed.

Arjunaraoc renamed this task from Kartographer map labels for a few place names in Telugu language are partly rendered with rectangular boxes to Kartographer map labels for place names with ZWNJ character U+200C are rendered as white rectangular boxes.Oct 31 2025, 5:31 AM
Arjunaraoc renamed this task from Kartographer map labels for place names with ZWNJ character U+200C are rendered as white rectangular boxes to Kartographer map labels for place names with ZWNJ character (U+200C) are rendered as white rectangular boxes.

Screenshot from 2025-11-01 07-36-36.png (794×961 px, 688 KB)
Updated Wikimedia map for the screenshot posted in the task after changes in OSM

I can replicate this problem by zooming in and out.

One queston might be which fonts are available and how they support/render those bytes. That is a riddle per T228591: Document how to request installing additional fonts for SVG thumbnails and generated PDF files on Wikimedia servers (which could also cover Maps/Kartographer, I guess).

Kartographer and SVG thumbnails are handled by two completely different software stacks.

I tried to look at the code to understand how one would go about fixing this and got hopelessly lost in a maze of abandoned repositories without finding anything.

TheDJ subscribed.

Kartographer and SVG thumbnails are handled by two completely different software stacks.

While technically true, the services should be sharing the same extended font set. Although perhaps this got lost in the recent conversion to Kubernetes...

Kartographer and SVG thumbnails are handled by two completely different software stacks.

While technically true, the services should be sharing the same extended font set. Although perhaps this got lost in the recent conversion to Kubernetes...

The maps servers which predated the move to ii fonts-dejavu 2.37-1 all metapackage to pull in fonts-dejavu-core and fonts-dejavu-extra

ii  fonts-dejavu-core                    2.37-1                        all          Vera font family derivate with additional characters
ii  fonts-dejavu-extra                   2.37-1                        all          Vera font family derivate with additional characters (extra variants)
ii  fonts-noto                           20181227-1                    all          metapackage to pull in all Noto fonts
ii  fonts-noto-cjk                       1:20170601+repack1-3+deb10u1  all          "No Tofu" font families with large Unicode coverage (CJK regular and bold)
ii  fonts-noto-core                      20181227-1                    all          "No Tofu" font families with large Unicode coverage (core)
ii  fonts-noto-unhinted                  20181227-1                    all          "No Tofu" font families with large Unicode coverage (unhinted)

The current container image has the same fonts installed (in more recent package versions, but those fonts should not materially have changed between Debian 10 and 12:

ii  fonts-dejavu                2.37-6                         all          metapackage to pull in fonts-dejavu-core and fonts-dejavu-extra
ii  fonts-dejavu-core           2.37-6                         all          Vera font family derivate with additional characters
ii  fonts-dejavu-extra          2.37-6                         all          Vera font family derivate with additional characters (extra variants)
ii  fonts-noto                  20201225-1                     all          metapackage to pull in all Noto fonts
ii  fonts-noto-cjk              1:20220127+repack1-1           all          "No Tofu" font families with large Unicode coverage (CJK regular and bold)
ii  fonts-noto-core             20201225-1                     all          "No Tofu" font families with large Unicode coverage (core)
ii  fonts-noto-unhinted         20201225-1                     all          "No Tofu" font families with large Unicode coverage (unhinted)

But syncing the list of packages between Thumbor and Kartotherian sounds like an idea worth considering.

Change #1201020 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[mediawiki/services/kartotherian@master] Sync list of fonts to what is also used on the Thumbor images

https://gerrit.wikimedia.org/r/1201020

This also happens with the zero-width joiner (ZWJ, U+200D), which you can see in the label for Sri Lanka (map):

Bildschirmfoto_2025-11-03_00-05-28.png (247×249 px, 29 KB)

This label is "ශ්‍රී ලංකාව இலங்கை" (OSM). The first line should be the first word in Sinhala (containing a ZWJ), the second line is the second word also in Sinhala (no ZWJ). The third line is Tamil.

Another example is Mongolian script (map):

Bildschirmfoto_2025-11-03_00-14-17.png (174×211 px, 37 KB)

This label is "ᠠ‍ᠨ᠋‍ᡍ᠋‍ᠠ‍ᠷ‍ᠠ" (OSM), but only the last character is displayed. Everything before the last ZWJ is missing.

As well as Telugu, Sinhala and Mongolian, I also found issues displaying Devanagari and Malayalam. You can see examples for those at https://test.wikidata.org/w/index.php?oldid=745896#Rendering_issues

One queston might be which fonts are available and how they support/render those bytes. That is a riddle per T228591: Document how to request installing additional fonts for SVG thumbnails and generated PDF files on Wikimedia servers (which could also cover Maps/Kartographer, I guess).

This seems like a rendering issue, not a font issue, because it affects various scripts (which are almostly certainly in separate fonts) which otherwise render fine.

It would be good to know if this is recent or not. Having confirmation about that could say a lot about where to start looking for the cause of this problem.

Possibly the zero width joiner is being stripped from the string in a transform or storage layer ?

Thanks a lot for the great reports! For testing purposes, I think we could use these two:

https://maps.wikimedia.org/img/osm-intl,8,24.602,97.965,350x350.png?lang=tdd
https://maps.wikimedia.org/img/osm-intl,8,-5.2,119.8,350x350.png?lang=mak

@Nikki let me know if it sounds reasonable or not to use the above. I'd basically need some fixed URLs to use when testing changes or fixes, to figure out if it worked or not.

I used the above URIs on kartotherian staging (no traffic, so we can easily check for errors etc..) and I didn't find anything in the logs that would indicate a clear rendering issue (nor in tegola's logs). Testing the fonts in https://gerrit.wikimedia.org/r/1201020 seems to be worth doing, we'll hopefully have a new staging environment with separated Postgres datastore to test out the change and verify if it improve things.

The other naive question that I have is the following: if we pick the above examples and check the openstreet maps, are those rendered correctly? I am asking since we periodically import OSM data in postgres, so if there is a discrepancy between upstream data and ours it may point us to the right direction. Please let me know if this is not the right understanding, and be patient because I am very new to maps data :)

I worked extensively for Coordinate Me 2024 in May 2024. I never observed this issue during that time or in the early years of Wikimedia Maps.

I tried to look at the code to understand how one would go about fixing this and got hopelessly lost in a maze of abandoned repositories without finding anything.

@Pppery In https://wikitech.wikimedia.org/wiki/Maps/v2#Components there is hopefully a concise and high level list of links to check, up-to-date with the current infrastructure status. Let me know if it helps or not :)

It would be good to know if this is recent or not. Having confirmation about that could say a lot about where to start looking for the cause of this problem.

Looking at the Wayback Machine, the name for Sri Lanka was rendering properly on the 16th of February:

7.png (256×256 px, 10 KB)

(from https://web.archive.org/web/20250216203103/https://maps.wikimedia.org/osm-intl/4/11/7.png)

but was broken by the 8th of March:

15.png (256×256 px, 5 KB)

(from https://web.archive.org/web/20250308191411/https://maps.wikimedia.org/osm-intl/5/23/15.png)

It's still broken in all later copies of those two tiles and also this tile: https://web.archive.org/web/20250000000000*/https://maps.wikimedia.org/osm-intl/3/5/3.png

Thanks a lot for the great reports! For testing purposes, I think we could use these two:

https://maps.wikimedia.org/img/osm-intl,8,24.602,97.965,350x350.png?lang=tdd
https://maps.wikimedia.org/img/osm-intl,8,-5.2,119.8,350x350.png?lang=mak

@Nikki let me know if it sounds reasonable or not to use the above. I'd basically need some fixed URLs to use when testing changes or fixes, to figure out if it worked or not.

I get a 403 when clicking those links, but based on the URLs, no, that's a different issue. Those are ones where the fonts seem to be missing entirely (the "Missing fonts" section of the page I linked). The problem in this ticket is that it has a font, but some words don't render properly (the "Rendering issues" section of the page).

Tiles for the two examples I mentioned in my previous comment:

Tiles for the examples on the page I linked:

The other naive question that I have is the following: if we pick the above examples and check the openstreet maps, are those rendered correctly? I am asking since we periodically import OSM data in postgres, so if there is a discrepancy between upstream data and ours it may point us to the right direction. Please let me know if this is not the right understanding, and be patient because I am very new to maps data :)

It's difficult to check because OSM mostly shows English in India and Sri Lanka, and doesn't have a way to show tiles in another language, but the Devanagari example, at least, does render properly in OSM: https://www.openstreetmap.org/node/3386117460 (tile). OSM Deutschland's map also renders the label for Sri Lanka properly: https://openstreetmap.de/karte/?zoom=6&lat=7.81758&lon=80.9366

@Nikki great investigation and datapoints, thanks a lot!

The wayback machine reports that you posted don't match with the recent postgres upgrade, but they do match with the migration of Kartotherian to Kubernetes. Before that, kartotherian ran on the same bare metal hosts as postgres, using a very old mapnik version etc.. It was a big upgrade, and despite all of our tests something may have changed and we didn't realize it.

Moritz is currently working on adding a proper postgres instance dedicated for maps staging, so we'll be able to test and report back more easily after that. The first thing to test will be https://gerrit.wikimedia.org/r/c/mediawiki/services/kartotherian/+/1201020/, that given what I wrote above may be really relevant.

Looping in also @Jgiannelos as FYI :)

After a chat with Yiannis, the following commit was highlighted https://github.com/wikimedia/osm-bright.tm2/commit/e5b7a05b692199238ce54eb8e2e879331256d7ae. We added it while transitioning to the new kartotherian/mapnik versions, but we lost the context on what it was meant to resolve. The fonts are brought in by the osm-bright.tm2 nodejs package, so https://gerrit.wikimedia.org/r/c/mediawiki/services/kartotherian/+/1201020/ is not worth pursuing anymore. It may be worth to test a new kartotherian version with e5b7a05b692199238ce54eb8e2e879331256d7ae reverted, we now have a fully working staging environment that could be used for it.

Change #1201020 abandoned by Muehlenhoff:

[mediawiki/services/kartotherian@master] Sync list of fonts to what is also used on the Thumbor images

Reason:

Turned out to be a red herring

https://gerrit.wikimedia.org/r/1201020

Change #1214528 had a related patch set uploaded (by Elukey; author: Elukey):

[mediawiki/services/kartotherian@master] Revert mapnik style to its previous version

https://gerrit.wikimedia.org/r/1214528

After a chat with Yiannis, the following commit was highlighted https://github.com/wikimedia/osm-bright.tm2/commit/e5b7a05b692199238ce54eb8e2e879331256d7ae. We added it while transitioning to the new kartotherian/mapnik versions, but we lost the context on what it was meant to resolve.

Some change from around that time likely resolved digits font issue in T198235. So I guess it may be needed to check if reverting given commit would have an effect on that other font issue.

@Pikne wow good to know thanks! I am planning to create an endpoint for testing (likely maps-staging.wikimedia.org), that the community will be able to use to test changes like this one before hitting production. So I hope that in the future we'll be able to use the staging stack to carefully review a change before hitting real traffic (fingers crossed). I'll keep you in the loop when/if my patch will be scheduled on staging.