«) converted to non-breaking space ( ) (French spaces)
Open, MediumPublic
Actions

Assigned To

None

Authored By

	• bzimport
	Jan 23 2008, 4:10 AM

Description

Author: x00000000

Description:
A space before "»" (» - right-pointing double angle quotation mark) or a space after "«" (« - left-pointing double angle quotation mark) will be converted to a no-break space ( ).

This may be appropriate for most french text, but breaks line wrapping in languages where guillemets are used in the opposite order (»quote« instead of «quote» or « quote »). Compare http://en.wikipedia.org/wiki/Guillemets .

Details

Reference: bz12752

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T31145 Core features that shouldn't be in core (tracking)
		Open		None	T14752 Space before/after »guillemets« (»/«) converted to non-breaking space ( ) (French spaces)

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 21 2014, 10:02 PM

• bzimport added a project: MediaWiki-Parser.

• bzimport set Reference to bz12752.

• bzimport added a subscriber: Unknown Object (MLST).

• bzimport created this task.Jan 23 2008, 4:10 AM

Agreed, e.g. the use of guillemets on the Czech Wikisource is quite problematical because of this. This should be applied only if the content language is French. Or, more generally – we should probably have per-language rules. See bug #13619.

See also bug #3158.

x00000000 wrote:

Workaround is to write something like text »quote« text.
MediaWiki doesn't recognize   as space at the point where it replaces them with  s.

Sounds like checking for word breaks should do the job reasonably well here.

Eg:

...quoted » outside
\s»\W -> break

outside »quoted...
\s»\w -> no break

As long as nobody uses this form:
outside » quoted...

in which case it would be much more difficult to distinguish which side the non-break space belongs on, requiring heuristics to try to see where the quote was started.

x00000000 wrote:

Would be better than now, assuming "break" means "nbsp" (i.e. "no break").

But it won't work for cases like "the sign »,« is a comma", citations starting/ending with an ellipsis or other punctuation (like »... text ...« or »[…] text!«) or Spanish-style »¿uh?« (but guillemets aren't common in Spanish).

And it doesn't work for most languages if the replacement operates on bytes instead of chars, like the code snippet in bug 13619 comment 3 suggests. The \w needs to match the appropriate Unicode classes.

BTW, I don't think these simple   heuristics are useful at all. E.g., they cause code like <code>x = flag ? 0 : 1;</code> to be unusable after copying and break valid CSS like <span style="color : red ; background : yellow"/>.

x00000000 wrote:

This should fix most occurences in French without breaking much elsewhere:

s/((?:[\s(]|^)«) /$1&nbsp;/
s/ »(?=\.?\)|[.,]?(?:\s|<ref[\s>]|$))/&nbsp;»/

Should also work with raw UTF-8 bytes if « and » are written as \302\253 and \302\273.

BTW, the current code seems to have a bug:

'/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1&nbsp;\\2'

should be either

'/(.) (\\?|:|;|!|%|\\302\\273)/' => '\\1&nbsp;\\2'

'/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1&nbsp;'

x00000000 wrote:

I missed the common cases ''« text »'' vs »''text''«, and <ref/>s seem to be already expanded at that stage (by looking at the code; I have no MediaWiki installation to test):

s/((?:[\s(]|<[a-zA-Z]+>|^)«) /$1&nbsp;/
s/ »(?=\.?\)|[.,]?(?:\s|<(?:\/|sup[\s>])|$))/&nbsp;»/

This handles also <blockquote>« citation »</blockquote> and similar (a line break isn't likely to occur at the beginning of a block element, but it makes a difference if text-align:justify (in Unicode compliant browsers)). It doesn't handle start tags with attributes like <span style="...">« text »</span> because that would be very expensive if done properly.

The better solution would be a configuration switch to apply these substitutions only for languages where they make sense. The only one of the current substitutions that makes some sense in most languages is s/ %/ %/ (but it still destroys <code>x = y % z</code>).

Quoting @Ankry from T99034:

Present MediaWiki (1.26wmf5) parser replaces ' »' with '&160;»'. It is unintended behaviour for plwikisource as both types of quoting: »this one« and «this one» are used in Polish language texts, the first being even preferred. Preventing soft line breaking before '»' sign is not correct for Polish texts. How can it be disabled for plwikisource?

Test page for this behaviour: https://pl.wikisource.org/wiki/Wikiskryba:Zdzislaw/brudnopis/test3

Wieralee subscribed.May 14 2015, 10:04 PM

Is this still an issue? It definitely needs more detail to survive in modern times.

Still an issue, see the previous comment here for a more detailed summary.

jayvdb subscribed.Nov 29 2015, 11:44 PM

Danny_B subscribed.Jul 4 2016, 12:47 AM

Seems like all other languages suffer from this magic for French (no offense intended). As French is presumably the only language which needs that, this feature should definitely be removed by default.

Either have config variable to turn such behavior on or create an Extension:Guillemets which would handle that on wikis where installed.

Danny_B added a parent task: T31145: Core features that shouldn't be in core (tracking).Jul 4 2016, 12:54 AM

Legoktm mentioned this in T134103: Create 'Technical-Tool-Request' project.Sep 10 2016, 6:03 AM

Harej moved this task from Incoming to Confirmed Extension Requests on the MediaWiki-extension-requests board.Jan 29 2018, 9:47 PM

JAnD subscribed.May 6 2019, 9:41 AM

I suspect I fixed this in T197902: Be more selective in applying French Space armoring; »quote« shouldn't add   anymore.

cscott mentioned this in T197879: Fix mw:DisplaySpace to match PHP "armorFrenchSpaces".Nov 12 2019, 3:58 PM

Indeed looks fixed, the three test cases in the example page I linked earlier all behave the same now:

In T14752#1285359, @matmarex wrote:

Quoting @Ankry from T99034:

Present MediaWiki (1.26wmf5) parser replaces ' »' with '&160;»'. It is unintended behaviour for plwikisource as both types of quoting: »this one« and «this one» are used in Polish language texts, the first being even preferred. Preventing soft line breaking before '»' sign is not correct for Polish texts. How can it be disabled for plwikisource?

Test page for this behaviour: https://pl.wikisource.org/wiki/Wikiskryba:Zdzislaw/brudnopis/test3