Encoding problem with dataset filename
Closed, ResolvedPublicBUG REPORT
Actions

Assigned To

Authored By

	Pamputt
	Feb 13 2021, 7:24 AM

Description

In https://lingualibre.org/datasets/, we see "Q19858-bci-BaoulÃ©.zip". It should display "Q19858-bci-Baoulé.zip"

In principle, in English, Baoulé should be "Baoule" but anyway it shows that there is an encoding issue in the dataset filename. UTF-8 must be used IMHO.

Event Timeline

Pamputt created this task.Feb 13 2021, 7:24 AM

Pamputt changed the subtype of this task from "Task" to "Bug Report".

Poslovitch claimed this task.Feb 13 2021, 9:07 PM

There's not much to be worried of here: it's a purely "visual" HTML encoding issue. The code is working fine on the server's end: I just checked the code by generating a .zip archive with a bunch of accented letters and it worked fine.

I might have found the issue. According to the lingualibre.org's nginx configuration, the requests to /datasets are sent onto the datasets server (lingualibre.wikimedia.fr:9000). The nginx config on the datasets server is not publicly available, so I can't apply the fix for it.

@mickeybarber What you need to do is: add charset UTF-8; in the nginx config of the datasets server. You can either add it into the section that enables the autoindex, or at the "top" of the server block.

@Yug Should we disclose the datasets server's nginx config on https://github.com/lingua-libre/operations/ ?

I have nearly no back-end abilities.

Poslovitch and Michael, your are best to lead on this side.

Yug moved this task from Query services to Bots and data management on the Lingua-Libre-Legacy board.Feb 15 2021, 3:06 PM

Poslovitch triaged this task as Low priority.Feb 16 2021, 11:51 PM

@mickeybarber Any news about this? - this is seemingly still not fixed.

Fixed.

Encoding problem with dataset filenameClosed, ResolvedPublicBUG REPORTActions

Description

Event Timeline

Encoding problem with dataset filename
Closed, ResolvedPublicBUG REPORT
Actions