Page MenuHomePhabricator

Convert ASCII quotes to Unicode directional quotes, ellipses
Closed, DeclinedPublic

Description

Author: nhamblen

Description:
Although differentiated opening and closing quotes look better, most people
can‘t type them easily or at all. In keeping with the aim of fast editing, it
would be best to allow editors to type in standard quotes but display
distinctive quotes. Same goes for the ellipses…


Version: 1.5.x
Severity: enhancement

Details

Reference
bz1513

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:12 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz1513.

nhamblen wrote:

New parser function to convert various strings to UTF-8 entities

This patch uses to regular expressions to determine correct quote or ellipses
to use. It leaves alone everything in preformatted sections. I've tested it in
all the quote situations I can come up with and tweaked the regexp till it
worked. There may be more; it would be best to have this pointed at a live
backup for a while.

NOTE: I moved the em dash code into this function, so if a fix for bug #1485 is checked into HEAD it will need to be undone. (I'd be happy to update this patch to delete the other if that happens.)

Attached:

Please see section "Quote signs in several languages" in
http://en.wikipedia.org/wiki/Quotation_mark

Also about dashes, example in Russain language n-dash absent. There are only m-dash.

nhamblen wrote:

(In reply to comment #2)

Thanks for that link, Alexander. I would have never guessed that ”this format”
was standard in Swedish. sv.wikipedia would have a legitimate gripe if their
ASCII quotes were converted to English-style opening and closing quotes.

I also checked Romanian, whose wikipedia seems to use "these quotes" even though
they resemble none of the standard or alternative quotes in the table. In that
case, it's hard to call the quote conversion incorrect when the original was
also incorrect.

Ideally, languages whose quotation marks are very different from the ASCII ones
would not use the ASCII marks at all. Russian (I glanced over the page in
Russian on Russian language) seems to use UTF-8 codes. In that case, there is no
issue; the conversion routine will not touch them.

Of course, there's still the problem with Swedish and other languages that use
quoting schemes that are close to, but not exactly like, English. For them it
would be necessary to disable the conversion, or if someone wants to do it,
provide alternate conversion.

Would that be satisfactory? The principle I'm pitching is that we don't have to
provide the convenience function for every language, but we do have to avoid
making things worse for them.

I'm inclined to close this as WONTFIX. It's not possible to get it right automatically
in all cases, and wrong "smart" quotes are much more annoying than straight
quotes (which are always "right" even if they're not as pretty as you might
sometimes like).

nhamblen wrote:

(In reply to comment #4)

I think that would be premature. This is only a proposed enhancement for a
future version of the software, why not let it be? Besides, it's not for you or
me to say what is typographically "correct," it's up to everyone using
wikimedia. I suggest we see how things go with bug 1485. If it's a success, I'll
ask the users if they want something similar — but only 95% accurate — for quote
marks and ellipses.