Author: gangleri
Description:
Hallo!
This request proposes a synthesis solution for different bugs:
a) Bug 1414: Unicode whitespaces allowed in article title
b) Bug 1524: usernames should use unicode whitelist
c) Bug 2593: Non-printing characters allowed in registration
d) Bug 3819: strip phantom general punctuation characters from page titles
Requests and solutions can be "restrictive" but these would make it impossible
to use these characters at all. Personaly I do not like restrictive solutions.
The solution proposed here is to implement a notification for "action=submit"
(preview or save) indicating that saving would generate "irregular links", links
containing "irregular characters".
The notification should list *all* "irregular links" individualy (what would be
an irregular link should be defined in a .php include file) and a "save anyway"
buttom.
*notifications* are not new in MediaWiki:
- Special:Upload notifies if the size of a file to be uploaded is above a limit.
- Special:Upload notifies if a file would be uploaded with a title that is
already existing.
Both notifications are using [[MediaWiki:Uploadwarning]] button:
[[MediaWiki:Savefile]] text: [[MediaWiki:Ignorewarning]] etc.
The proposed solution would meat the main goal:
- generating a warning if somthing could happen what makes trouble
- if the generation is intended then it is up to the user to generate the link
Benefit: The warning should prevent from generating "unintended" "irregular links".
The list of the "irregular links" should display the "irregular characters" as
HTML entities if such exist else in &#nnnn; notation and *not* as UTF-8 because
it would not be possible to see / distinguish many of them as UTF-8.
*main* "irregular characters" identified until now:
- whitespace / non-printing characters
- general punctuation characters
The notification should support all types of codings of the "irregular
characters": UTF-8, HTML entities (‎ rlm; ...) &#nnnn;, &#xnnnn; %XX%YY%ZZ
in links or their parameters (also inside {{localurl}}, {{fullurl}} ...).
The proposed solution would make it easy to identify such forms of vandalism or
mistakes caused by copy and paste or incorrect editing due to insertion /
deletion of such characters. Detecting and fixing them now is very time consuming.
*other* "irregular characters"
It should be evaluated if this function can be used for "Unicode character
normalisation" also. This is dealing with MediaWiki's conversion of Unicode
precomposed characters to a group of Unicode characters.
An optimal achievement would be to generate "proposals" "what to replace with
what" offering checkboxes beside the links.
Example:
A Unicode Character HEBREW LETTER ALEF WITH PATAH - U FB2E would be replaced
anyway by MediaWiki with the two characters HEBREW LETTER ALEF - U+05D0 and
HEBREW POINT PATAH - U+05B7. So if we change the characters in the build in
title normalisation why not being able to change also
- the &#nnnn; representation אַ to אַ
- the &#xnnnn; representation אַ to אַ
- the %EF%AC%AE to %D7%90%D6%B7
in the source of the page?
It makes only trouble to keep these. See Bug 3860: links generated with
precombined characters show red despite the fact that the normalised links exist
testcase: [[wiktionary:yi:bugzilla/03860]]
Because changes would be controled by checkboxes it would still be possible to
maintain precombined characters for documentation, testing ... However fixing /
"converting to the standard" would be achieved with a "build in help" "knowledge
tool" and can save much time.
some bugs dealing with Unicode normalization:
- Bug 1375: Unicode normalization leaves red links
- Bug 1527: problem on URL with Devanagari characters
- Bug 2399: Unicode normalization interferes with Hebrew and Arabic with vowels
Best regards reinhardt [[user:gangleri]]
Version: unspecified
Severity: enhancement
URL: http://test.wikipedia.org/wiki/Bugzilla_003696