Page MenuHomePhabricator

Encoding problem with dataset filename
Closed, ResolvedPublicBUG REPORT

Description

In https://lingualibre.org/datasets/, we see "Q19858-bci-Baoulé.zip". It should display "Q19858-bci-Baoulé.zip"

In principle, in English, Baoulé should be "Baoule" but anyway it shows that there is an encoding issue in the dataset filename. UTF-8 must be used IMHO.

Event Timeline

Pamputt changed the subtype of this task from "Task" to "Bug Report".

There's not much to be worried of here: it's a purely "visual" HTML encoding issue. The code is working fine on the server's end: I just checked the code by generating a .zip archive with a bunch of accented letters and it worked fine.

I might have found the issue. According to the lingualibre.org's nginx configuration, the requests to /datasets are sent onto the datasets server (lingualibre.wikimedia.fr:9000). The nginx config on the datasets server is not publicly available, so I can't apply the fix for it.

@mickeybarber What you need to do is: add charset UTF-8; in the nginx config of the datasets server. You can either add it into the section that enables the autoindex, or at the "top" of the server block.

@Yug Should we disclose the datasets server's nginx config on https://github.com/lingua-libre/operations/ ?

I have nearly no back-end abilities.

Poslovitch and Michael, your are best to lead on this side.

@mickeybarber Any news about this? - this is seemingly still not fixed.