Page MenuHomePhabricator

False unicode-encoding in Database creates un-reachable image-description-page
Closed, ResolvedPublic

Description

In the german Wikipedia there is an image with the filename "Kohlenstoffnanoröhre-Animation.gif" ('cur'-DB-Entry
335479). I don't know why, but the "ö" is not encoded the usual way with %C3%B6, but with %6F%CC%88, which is an
"o" followed by special dieresis for the previous letter: so both encodings are correct "ö"-encodings.

But MediaWiki only shows Pages with "ö" in the %C3%B6-encoding. So even when calling directly the %6F%CC%88-
encoding I get an header-redirect to the %C3%B6-page (Location: http://de.wikipedia.org/wiki/Bild:
Kohlenstoffnanor%C3%B6hre-Animation.gif). It's now impossible to view that page or delete it.


Version: unspecified
Severity: major
URL: http://upload.wikimedia.org/wikipedia/de/e/e2/Kohlenstoffnanoro%CC%88hre-Animation.gif

Details

Reference
bz1225
TitleReferenceAuthorSource BranchDest Branch
Use a builder format for the allpages generatorrepos/mwbot-rs/mwbot!9legoktmgenerator-redomain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:06 PM
bzimport set Reference to bz1225.

wiki.bugzilla wrote:

The image is already replaced by a well named version (http://de.wikipedia.org/wiki/Bild:
Kohlenstoffnanoroehre_Animation.gif), so the misspelled one has to be deleted. Thanx.

This should have been corrected some time ago. Confirmed?

Please look at http://de.wikipedia.org/w/index.php?title=Kohlenstoffnanor%C3%B6hre&oldid=3850707 and click
on the only image on this page. Don't know if the creation of such images is fixed but this one still
exists ;)

gangleri wrote:

Hallo!

If you select view page source in your browser for
http://de.wikipedia.org/w/index.php?title=Kohlenstoffnanor%C3%B6hre&oldid=3850707
you will can find
http://upload.wikimedia.org/wikipedia/de/e/e2/Kohlenstoffnanoro%CC%88hre-Animation.gif

This is the image that should be deleted which does not show up at
[[de:image:Kohlenstoffnanoröhre-Animation.gif]].

re: comment 1 Today the image is available at
[[de:image:Kohlenstoffnanoroehre-Animation.gif]] because of
[[commons:image:Kohlenstoffnanoroehre_Animation.gif]].

re old url:
http://de.wikipedia.org/w/index.php?title=Spezial%3AAllpages&from=Kohlenstoffnanor&namespace=6

a) Please note that [[Special:Allpages]] is not suitable to identify all medias
because {{ns:Media}} is not supported at Allpages or the page which is subject
of this bug is not shown at Special:Allpages?namespace=6 because *another*
subsequent bug.

b) Please note that [[Special:Imagelist]] can not be used either because no
filter functionality is available any more.

c) The image can neither be found in the upload log with
http://de.wikipedia.org/w/index.php?title=Special:Log/upload&page=image:Kohlenstoffnanor%C3%B6hre-Animation.gif
nor with
http://de.wikipedia.org/w/index.php?title=Special:Log/upload&page=image:Kohlenstoffnanoro%CC%88hre-Animation.gif

The reason might relate to earlier versions of [[Special:Upload]] / [[Special:Log]].

Please note that %CC%88 stands for
COMBINING DIAERESIS - U+0308
http://www.fileformat.info/info/unicode/char/0308/index.htm
Entity (decimal) ̈ (hex) ̈
UTF-8 (hex) 0xCC 0x88 (cc88)

If this file "survived" the databeses Unicode normalisation please verify if
other files using COMBINING DIAERESIS or other characters from the "Unicode
Block 'Combining Diacritical Marks'" (
http://www.fileformat.info/info/unicode/block/combining_diacritical_marks/index.htm
)
still exists in the database.

best rgards reinhardt [[user:gangleri]]

P.S. beside blocks Bug 3969: unicode compatibitity (tracking)
I added blocks Bug 3985: character conversion (tracking)

gangleri wrote:

*addendum*

This report reminds me on
Bug 3860: links generated with precombined characters show red despite the fact
that the normalised links exist
which is a duplicate of
Bug 1527: *first* perform Unicode normalisation and check for existence of pages
*after* the normalisation

The reason why the file from the url is not "recognized" at
http://de.wikipedia.org/wiki/Bild:Kohlenstoffnanor%C3%B6hre-Animation.gif
might be the same.

Because the file from the url "survived" the databases Unicode normalisation it
is possible that other files using precombined Unicode characters still exist in
the database. Many page titles at [[yi:]] used BiDi punctuation characters and
different spellings ("tsvey-vovn" versus "vov+vov" etc.) in titles. It is very
likly that the contributors used / are still using also precombined Unicode
characters for (file) titles they uploded / are uploading.

Other wiki's / languages where Unicode normalisation is involved might be
affected as well. Identifying these is a general issue / relates to all wiki's.

Bug 830: Commons rejects upload of filenames in Hindi
might be "historical". Helpfull / usefill links from the original reporter are
missing there.

best regards reinhardt [[user:gangleri]]

Running a final check for remaining bad image names

Normalized remaining filenames last night.