download tool issue with Cyrillic encoding in filenames (wget)
Closed, InvalidPublic

Assigned To
None
Priority
Needs Triage
Author
bzimport
Subscribers
liangent, Krenair, Platonides and 2 others
Projects
Reference
bz40844
Description

Author: a1

Description:
https://toolserver.org/~platonides/catdown/catdown.php tool do not recognize Cyrillic in names of files. For example it writes "Р%9FамС%8FС%82РЅРёРє_Р·Р°С%82опленнС%8BРј_РєРѕС%80аблС%8FРј_РІ_СеваС%81С%82ополе"
instead of "Памятник затопленным кораблям в Севастополе.JPG" Please, fix it.


Version: unspecified
Severity: normal

bzimport added projects: Utilities-Other, Upstream.Via ConduitNov 22 2014, 1:05 AM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz40844.
bzimport created this task.Via LegacyOct 7 2012, 7:40 PM
Platonides added a comment.Via ConduitOct 7 2012, 8:43 PM

As answered in the mailing list, that's a wget problem.

The list generated by my tool correctly uses:
http://upload.wikimedia.org/wikipedia/commons/a/ad/%D0%9F%D0%B0%D0%BC%D1%8F%D1%82%D0%BD%D0%B8%D0%BA_%D0%B7%D0%B0%D1%82%D0%BE%D0%BF%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%BC_%D0%BA%D0%BE%D1%80%D0%B0%D0%B1%D0%BB%D1%8F%D0%BC_%D0%B2_%D0%A1%D0%B5%D0%B2%D0%B0%D1%81%D1%82%D0%BE%D0%BF%D0%BE%D0%BB%D0%B5.JPG

The problem seems to lie in wget when extracting to a local filename.

If you are using *nix with a utf-8 filesystem, pass the
--restrict-file-names=nocontrol parameter to wget.

If you're using Windows you will end up with utf-8 encoded filenames, so
you'd need another script to decode them to the format used by Windows.

Aklapper added a comment.Via ConduitOct 9 2012, 4:02 PM

Andrij: Does comment 1 help?

bzimport added a comment.Via ConduitOct 9 2012, 5:07 PM

a1 wrote:

Unfortunately no. I could not understand how could i "pass the
--restrict-file-names=nocontrol parameter to wget".

Platonides added a comment.Via ConduitOct 13 2012, 3:52 PM

Andrij, you would add that inside download.bat

I could try downloading the category for you if that helps.

I reported the problem upstream https://savannah.gnu.org/bugs/index.php?37564 This should be fixed at wget level.

liangent added a comment.Via ConduitOct 13 2012, 3:55 PM

Does this bug belongs to this bugzilla?

Aklapper added a comment.Via ConduitNov 14 2012, 3:21 PM

Andrij: Toolserver issues should be filed at https://jira.toolserver.org/secure/Dashboard.jspa

Closing as "INVALID" simply because this bug database is not the place where this report should be, but not because the report itself is invalid.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.