Page MenuHomePhabricator

osm4wiki character encoding for Swedish characters has stopped working
Closed, ResolvedPublicBUG REPORT

Description

[https://tools.wmflabs.org/osm4wiki/cgi-bin/wiki/wiki-osm.pl?project=sv&article=Kategori%3AHerrg%25C3%25A5rdar_i_S%25C3%25B6dermanland link]

The above links gives a page with a map with markers to the right and a corresponding list of articles to the left.
The newly discovered problem is that the links to the articles in the list does not work when they contain Swedish characters like å, ä and ö.

Steps to Reproduce:

Click on the link above.

Actual Results:

It looks like the Swedish character "ä" has been encoded as %C3%83%C2%A4 instead of %C3%A4
This makes all links to articles with Swedish letters don't work any longer.

Expected Results:

All links, also to article with Swedish letters should be encoded so that they will lead to the correct page when you click on them.
In this case I would expect the link to be https://sv.wikipedia.org/wiki/Abbotn%C3%A4s

Is there any missing "language" or "encoding" parameter that should be given in the call to osm4wiki? It has not been needed before.

Event Timeline

Larske renamed this task from osm4wiki characted encoding for Swedish characters has stopped working to osm4wiki character encoding for Swedish characters has stopped working.Jun 25 2019, 10:01 AM
Aklapper added subscribers: Plenz, Kolossos.

https://tools.wmflabs.org/admin/tools lists @Kolossos and @Plenz as maintainers, hence subscribing. (If there is a better place to report bugs, please let us know.)

This might be due to fact osm4wiki files claim to be in utf-8 encoding, but they are saved in Western encoding (perhaps Windows-1250?). It should be easy to fix and might help with vast majority of these issues

Dvorapa: Hi any admin to help with simple Toolforge task?
Dvorapa: osm4wiki tool claims in its HTML/Perl files utf-8 encoding, but files are apparently saved in Windows-1250 encoding. Tool maintainers are not responding for almost a year. Could someone re-save osm4wiki files in utf-8 encoding as they claim and fix a bug task in Phabricator?
Dvorapa: (and thus make the tool work correctly again)
bd808: Dvorapa: most of us are busy right now, but that seems like a task that we could help with "soon". It would be helpful if there was a Phabricator task to track the request and what we do.

Adding cloud-services-team as this might be easily solved just by re-saving all necessary files to utf-8

I am sorry, I am currently sailing on the Russian ship SHTANDART, have internet only from time to time, and I have all access infos to Wikimedia at home. Due to Corona virus, I have no idea when I am back home, so currently I can do nothing.

I have to say that I am not very familiar with this encoding stuff. There is also another tool involved (kmlexport) which does much encoding work. All I can say: osm4wiki worked for me, and it even worked for somebody who created a Czech transaltion.

So I am not sure that it could really help if osw4wiki claims to produce Windows-1250 encoding.

I see. At least we can try as the encoding of files differs from what files claim, which is not correct.

I also double checked kmlexport for encoding issues, but no issues occur anymore.

This might be due to fact osm4wiki files claim to be in utf-8 encoding, but they are saved in Western encoding (perhaps Windows-1250?). It should be easy to fix and might help with vast majority of these issues

Adding cloud-services-team as this might be easily solved just by re-saving all necessary files to utf-8

@Dvorapa Have you confirmed the server side encoding issue or is this speculation based on the behavior?

Here's what I see on the file system:

$ become osm4wiki
$ cd public_html
$ find . -type f -exec file -i {} \;
./cgi-bin/test.pl: text/x-perl; charset=us-ascii
./cgi-bin/work/marker-8blu.gif: image/gif; charset=binary
./cgi-bin/work/marker-2blu.gif: image/gif; charset=binary
./cgi-bin/work/check0.png: image/png; charset=binary
./cgi-bin/work/marker-5blu.gif: image/gif; charset=binary
./cgi-bin/work/tools.png: image/png; charset=binary
./cgi-bin/work/marker-0blu.gif: image/gif; charset=binary
./cgi-bin/work/marker-4red.gif: image/gif; charset=binary
./cgi-bin/work/popl.png: image/png; charset=binary
./cgi-bin/work/trapa.png: image/png; charset=binary
./cgi-bin/work/marker-6red.gif: image/gif; charset=binary
./cgi-bin/work/woicon.ico: image/x-icon; charset=binary
./cgi-bin/work/toolbar.png: image/png; charset=binary
./cgi-bin/work/marker-7blu.gif: image/gif; charset=binary
./cgi-bin/work/pop13.png: image/png; charset=binary
./cgi-bin/work/pop12.png: image/png; charset=binary
./cgi-bin/work/marker-6blu.gif: image/gif; charset=binary
./cgi-bin/work/open1.png: image/png; charset=binary
./cgi-bin/work/j: text/plain; charset=us-ascii
./cgi-bin/work/marker-1red.gif: image/gif; charset=binary
./cgi-bin/work/marker-8red.gif: image/gif; charset=binary
./cgi-bin/work/pop21.png: image/png; charset=binary
./cgi-bin/work/popo.png: image/png; charset=binary
./cgi-bin/work/marker-3blu.gif: image/gif; charset=binary
./cgi-bin/work/e: text/plain; charset=us-ascii
./cgi-bin/work/check1.png: image/png; charset=binary
./cgi-bin/work/pop22.png: image/png; charset=binary
./cgi-bin/work/j~: text/plain; charset=us-ascii
./cgi-bin/work/pop23.png: image/png; charset=binary
./cgi-bin/work/marker-5red.gif: image/gif; charset=binary
./cgi-bin/work/pop24.png: image/png; charset=binary
./cgi-bin/work/nix.png: image/png; charset=binary
./cgi-bin/work/marker-2red.gif: image/gif; charset=binary
./cgi-bin/work/util.js: text/plain; charset=utf-8
./cgi-bin/work/marker-4blu.gif: image/gif; charset=binary
./cgi-bin/work/marker-7red.gif: image/gif; charset=binary
./cgi-bin/work/popu.png: image/png; charset=binary
./cgi-bin/work/marker-3red.gif: image/gif; charset=binary
./cgi-bin/work/wikilog.txt: text/html; charset=utf-8
./cgi-bin/work/popr.png: image/png; charset=binary
./cgi-bin/work/lang.txt~: text/plain; charset=us-ascii
./cgi-bin/work/wiki-osm.pl: text/x-perl; charset=utf-8
./cgi-bin/work/marker-1blu.gif: image/gif; charset=binary
./cgi-bin/work/lang.txt: text/plain; charset=utf-8
./cgi-bin/work/marker-0red.gif: image/gif; charset=binary
./cgi-bin/work/pop14.png: image/png; charset=binary
./cgi-bin/work/open0.png: image/png; charset=binary
./cgi-bin/work/wiki-osm.pl~: text/x-perl; charset=utf-8
./cgi-bin/work/pop11.png: image/png; charset=binary
./cgi-bin/wiki/marker-8blu.gif: image/gif; charset=binary
./cgi-bin/wiki/marker-2blu.gif: image/gif; charset=binary
./cgi-bin/wiki/check0.png: image/png; charset=binary
./cgi-bin/wiki/marker-5blu.gif: image/gif; charset=binary
./cgi-bin/wiki/tools.png: image/png; charset=binary
./cgi-bin/wiki/marker-0blu.gif: image/gif; charset=binary
./cgi-bin/wiki/marker-4red.gif: image/gif; charset=binary
./cgi-bin/wiki/popl.png: image/png; charset=binary
./cgi-bin/wiki/trapa.png: image/png; charset=binary
./cgi-bin/wiki/marker-6red.gif: image/gif; charset=binary
./cgi-bin/wiki/woicon.ico: image/x-icon; charset=binary
./cgi-bin/wiki/toolbar.png: image/png; charset=binary
./cgi-bin/wiki/marker-7blu.gif: image/gif; charset=binary
./cgi-bin/wiki/pop13.png: image/png; charset=binary
./cgi-bin/wiki/pop12.png: image/png; charset=binary
./cgi-bin/wiki/marker-6blu.gif: image/gif; charset=binary
./cgi-bin/wiki/open1.png: image/png; charset=binary
./cgi-bin/wiki/marker-1red.gif: image/gif; charset=binary
./cgi-bin/wiki/marker-8red.gif: image/gif; charset=binary
./cgi-bin/wiki/pop21.png: image/png; charset=binary
./cgi-bin/wiki/popo.png: image/png; charset=binary
./cgi-bin/wiki/marker-3blu.gif: image/gif; charset=binary
./cgi-bin/wiki/e: text/plain; charset=us-ascii
./cgi-bin/wiki/check1.png: image/png; charset=binary
./cgi-bin/wiki/pop22.png: image/png; charset=binary
./cgi-bin/wiki/pop23.png: image/png; charset=binary
./cgi-bin/wiki/wiki-osm.pl.original: text/x-perl; charset=us-ascii
./cgi-bin/wiki/marker-5red.gif: image/gif; charset=binary
./cgi-bin/wiki/pop24.png: image/png; charset=binary
./cgi-bin/wiki/nix.png: image/png; charset=binary
./cgi-bin/wiki/marker-2red.gif: image/gif; charset=binary
./cgi-bin/wiki/util.js: text/plain; charset=utf-8
./cgi-bin/wiki/marker-4blu.gif: image/gif; charset=binary
./cgi-bin/wiki/marker-7red.gif: image/gif; charset=binary
./cgi-bin/wiki/popu.png: image/png; charset=binary
./cgi-bin/wiki/wiki-osm.pl.error: text/x-perl; charset=us-ascii
./cgi-bin/wiki/marker-3red.gif: image/gif; charset=binary
./cgi-bin/wiki/wikilog.txt: text/html; charset=utf-8
./cgi-bin/wiki/popr.png: image/png; charset=binary
./cgi-bin/wiki/wiki-osm.pl: text/x-perl; charset=utf-8
./cgi-bin/wiki/marker-1blu.gif: image/gif; charset=binary
./cgi-bin/wiki/lang.txt: text/plain; charset=utf-8
./cgi-bin/wiki/marker-0red.gif: image/gif; charset=binary
./cgi-bin/wiki/pop14.png: image/png; charset=binary
./cgi-bin/wiki/open0.png: image/png; charset=binary
./cgi-bin/wiki/wiki-osm.pl~: text/x-perl; charset=utf-8
./cgi-bin/wiki/pop11.png: image/png; charset=binary
./index.htm: text/html; charset=us-ascii

I used external tool to check web encoding and Windows-1252 was the output. Anyway there is another issue with Wikipedia page title encoding directly in code, this resave will not fix the tool instantly, rather fix one part of the issues.

I used external tool to check web encoding and Windows-1252 was the output. Anyway there is another issue with Wikipedia page title encoding directly in code, this resave will not fix the tool instantly, rather fix one part of the issues.

This sounds like you are reporting that the HTML generated by https://tools.wmflabs.org/osm4wiki/cgi-bin/wiki/wiki-osm.pl for some (or all?) inputs contains characters in the Windows-1252 encoding. Is that correct?

Adding cloud-services-team as this might be easily solved just by re-saving all necessary files to utf-8

The output is dynamically generated, not a set of static files, so I don't think this suggestion is valid.

The rendering is mixed. The html indicates utf-8 in its html meta tag, but the content-type header of html response is Content-Type: text/html; charset=ISO-8859-1 so thats one issue.

The links are output in unicode (work if you view the page in unicode mode), but the link names are in ISO-8859-1. The links are also not uri encoded.

Can someone maybe provide me with membership of group tools.osm4wiki? I think, it's easy to fix. Plenz is out at sea, Kolossos not reachable at moment too.

Can someone Maybe provide me membership of group tools.osm4wiki? I think, it's easy to fix. Plenz is out at sea, Kolossos not reachable at moment too.

Endorse! Links seems to be working, only link titles are broken (at least for the Czech language)

I think I know one of the issues cause. The section titles might break the whole page encoding. Section titles seem to be not utf-8 encoded/decoded properly.

So section titles and HTML page's <title> needs to be fixed only I think. Because if I fix them in browser console, browser rerenders the page in utf-8 finally and correctly

Can someone maybe provide me with membership of group tools.osm4wiki? I think, it's easy to fix. Plenz is out at sea, Kolossos not reachable at moment too.

If either of them leaves a comment here indicating that adding maintainers to the tool is ok, then i'm sure we can make that happen...

DB111 claimed this task.

Thank you, Kolossos added me kindly! So I could fix things:

  1. THIS bug was resolved before I touched any code, I assume Dvorapa fixed little encoding bugs in "kmlexport".
  2. Output converted to UTF-8, so everything looks good now.
  3. Dvorapa, there was NEVER a "kml" parameter in this tool :-) I added limited support now.
  1. Dvorapa, there was NEVER a "kml" parameter in this tool :-) I added limited support now.

Weird, kml parameter worked there two years ago, I'm sure!

You're absolutely right! I've checked the pre-last version now. So Plenz (?) did a major rewrite last April and dropped this.

Yes, there was some issue with it, Plenz mentioned it in T220164.

  1. THIS bug was resolved before I touched any code, I assume Dvorapa fixed little encoding bugs in "kmlexport".

Yeah, I fixed a small issue with url encoding, changed it to encode ascii instead of utf-8. This fixed urls in osm4wiki, but not the other issues you fixed. Good job and thank you!