Page MenuHomePhabricator

Use Unicode quotes, ellipses, dashes, hyphens instead of ASCII ones in messages
Open, LowestPublicFeature

Description

Author: hendrik.maryns

Description:
replaces ’, ... and "

I work on the Dutch translation of the wiki software. In looking at MessagesEn.php, I noticed that the way to handle single quotes, if they need to be in the string is handled very inconsistently: sometimes they are quoted: '\'', sometimes the string is put into double quotes: "'". This makes it confusing, and it is unelegant. There is an easy solution though: use the unicode quote sign: ’. This can be used anywhere, since it has no meaning for php: '’', "’".
I have replaced all relevant occurrences of ' in MessagesEn.php, see the patch. (As a side effect, all ’ in comments are replaced too.) Notice that sometimes, ‘ is the correct alternative: there where it is an opening quote. See http://www.unicode.org/charts/PDF/U2000.pdf, entities 2018, 2018, 2026.

While we’re at it, I can as well suggest another improvement: use real ellipsis instead of three dots. That is also in the patch.

If you’re worried about a11y: see the corresponding bug in Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=373623.

Oh, and by the way, even more exotic Unicode symbols are used already, ← for example (in: 'previousrevision' => '←Older revision',).

Once this is done, one could think of making the use of ' and " more consistent (I’d say: always use the second, since ' is needed in wiki markup from time to time). I replaced the " by ', there where they were only used to allow ' which are no longer there.

Of course, I will have overlooked some occurrences in the patch, but it is a first start.


Version: 1.21.x
Severity: enhancement

attachment en.patch ignored as obsolete

Details

Reference
bz10352

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:50 PM
bzimport set Reference to bz10352.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

There are usability issues with using hard-to-input characters in messages that are supposed to be easily customizable to meet end-users' needs (which makes this rather different from Firefox). Besides, undifferentiated quotes are standard on the Internet especially, with curly quotes rare. The state of MessagesEn.php is certainly not a factor worthy of consideration. I would be inclined to resolve LATER, when input methods are superior or it's common to not use ASCII apostrophes/quotation marks.

(As for ellipses, in some common fonts literal ellipses look very ugly compared to three literal periods. Possibly I'm just thinking of fixed-width fonts, but I don't think so.)

hendrik.maryns wrote:

Doesn’t that contradict the use of those arrows ‘←’?

‘ and ’ aren’t that hard to input: on a qwerty US-int it is simply Right Alt+9 and 0. Ellipsis is another matter.

What does it matter that ‘undifferentiated quotes are standard on the Internet especially’? Of course they are, since the curly quotes are rather new and almost nobody knows that they even exist. But is that an argument against using them? With small type the difference is barely visible, and certainly no-one will get confused when seeing them.

Ellipsis indeed looks ugly in fixed-width fonts, but that’s obviously *because* it is fixed width. It looks rather good in my default FF fonts.

Damn, is it really enough for one guy to shoot this off? Pity.

ayg wrote:

(In reply to comment #2)

Doesn’t that contradict the use of those arrows ‘←’?

No, because those only occur in a couple of places and aren't going to be necessary in routine message edits to ensure visual consistency.

‘ and ’ aren’t that hard to input: on a qwerty US-int it is simply Right
Alt+9 and 0.

Nope. It varies widely depending on operating system (and window manager, if applicable). In GNOME, on Ubuntu, that doesn't work: I need Ctrl-Shift-u2018/2019. On Windows you'd generally use Alt-145/146, numbers from the numpad only, with Num Lock on (or was it off?). I recall Macs have something like what you describe, although I've never used Macs.

What does it matter that ‘undifferentiated quotes are standard on the
Internet especially’? Of course they are, since the curly quotes are rather
new and almost nobody knows that they even exist. But is that an argument
against using them?

It's an argument that they're unnecessary. If there were no drawbacks, may as well use them even if they're unnecessary, since they look nice. But as I outlined, there are drawbacks, even if you don't think they're substantial in the face of good typography.

With small type the difference is barely visible, and
certainly no-one will get confused when seeing them.

Even harder to keep consistency, then, and even less of an advantage to using them.

Ellipsis indeed looks ugly in fixed-width fonts, but that’s obviously
*because* it is fixed width. It looks rather good in my default FF fonts.

Mine too. Maybe I was imagining something.

Damn, is it really enough for one guy to shoot this off? Pity.

Two guys: I expressed my reservations, and lead developer Brion Vibber resolved the bug INVALID. Web typography and monitor resolutions may eventually improve to the point that this would be nice, but not now.

hendrik.maryns wrote:

(In reply to comment #3)

Damn, is it really enough for one guy to shoot this off? Pity.

Two guys: I expressed my reservations, and lead developer Brion Vibber resolved
the bug INVALID. Web typography and monitor resolutions may eventually improve
to the point that this would be nice, but not now.

Sorry, didn’t mean to be rude, there. Well, ok then. I’ll hope it won’t take too long, I like things to be correct... Cheers.

Is this something that we should reconsider?

(In reply to comment #5)

Is this something that we should reconsider?

I don't think so. The bug's resolution could be changed to "wontfix" if "later" is bothering you.

Reopening from LATER and adjusting summary.

(In reply to comment #3)

Web typography and monitor resolutions may eventually improve
to the point that this would be nice, but not now.

Five years passed, it’s 2012 now. I’m pretty sure this would now be acceptable from the accessibility point of view, and I don’t understand the concerns about inputting difficulties — yes, there are ones (not much changed in this aspect), but why would anyone need to input parts of the interface other than the people creating it?).

So, do we want to use correct typography in English language messages (and encourage using it in other languages’ ones)? I think that currently the general Unicode‐compatibility of browsers and OSes is good enough to do this.

(The original report also mentioned changes to comments in code, but I think this could be a bad idea — code should be very easily greppable, is still semi‐often displayed in Unicode‐crippled environments like Windows’ cmd.exe, and lacks explicit encoding information, unlike HTML pages.)

(I took special care to include several Unicode characters in this comment. Can you spot them all?)

I'm missing a rationale/use case here.

(In reply to comment #0)

handled very inconsistently: sometimes they are quoted:
'\'', sometimes the string is put into double quotes: "'". This makes it
confusing, and it is unelegant. There is an easy solution though: use the
unicode quote sign: ’.

There is an even easier solution: use "" around the string containing the message if there is a ' inside it, and '' if it contains "...
Dashes and hyphens have just been added to the summary and I've no idea why.

Since T126719 was closed in favour of a more general suggestion captured in this ticket, I wanted to share some info and resources about “smart quotes” in case they are useful.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM
Aklapper removed a subscriber: wikibugs-l-list.