Linktrail for Norwegian (bokmål) (and similar problem at Swedish and Danish)
Open, Needs TriagePublic

Description

The linktrail for no should be adjusted

Existing form is

$linkTrail = '/^([æøåa-z]+)(.*)$/sDu';

New form should be

$linkTrail = '/^([-\'a-zàáâçčʒǯđðéèêëǧǥȟíìîïıǩŋñóòôõßšŧúùûýÿüžþæøåäö]+)(.*)$/sDu';

The new form includes chars from neighboring languages. It also adds hyphen (-) and apostrophe ('), which are both used in suffixes in Norwegian.

It should be verified that use of a hyphen and a single trailing apostrophe does not pose any problem, like in the following wikitext which is all valid Norwegian

* [[bil]]ene
* [[by]]er
* ''[[Vesaas]]'''
* '''[[Vesaas]]''''
* '''''[[Vesaas]]''''''
* ''[[Ap]]'s''
* '''[[Ap]]'s'''
* '''''[[Ap]]'s'''''
* ''[[URL]]-en''
* '''[[URL]]-en'''
* '''''[[URL]]-en'''''

Note T29473 and T130454, please fix this properly and not just hack in a solution.

jeblad created this task.Mar 19 2016, 11:14 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 19 2016, 11:14 AM
Stigmj added a subscriber: Stigmj.Mar 19 2016, 11:40 AM

This should be changed to use negative lookahead to avoid the problems, as in T29473 for ca and kaa linktrails:

New linkTrail should be:

$linkTrail = "/^((?:[-a-zàáâçčʒǯđðéèêëǧǥȟíìîïıǩŋñóòôõßšŧúùûýÿüžþæøåäö]|'(?!'))+)(.*)$/sDu";
jeblad updated the task description. (Show Details)Mar 19 2016, 1:02 PM
jeblad added a comment.EditedMar 19 2016, 1:05 PM

Please do not make more hacks, this should be fixed in splitTrail.

The regex should be strictly for language specific parsing, not wikitext parsing. Those two concerns should be separated. Adding wikitext parsing to this regex would be to implement a new bug.

jeblad added a comment.EditedMar 19 2016, 3:28 PM

Gentive-s with an apostroph is in use in Norwegian (Språkrådet), Swedish (svwiki, Språktidningen), Danish (sproget.dk). There are probably more languages, but this should be sufficient reason for additing proper parsing.

A typical problem arise with parsing of [[Vesaas]]', which would be troublesome with both italic and bold, both when the link is wrapped in those and when some following link or text is wrapped in those formats. If the linkTrail is made as in T29473, then we can't add an apostrophe before any such formatters (they are directionless) and the apostrophe will go outside the link. If splitTrail is made as in T29473 (sort of, its a bug there) then the apostrophe will go outside the link for italics, and not for bold. In all cases the linktrail can be said to fail for fully or partly formatted links.

Note that Swedish also uses the form

*  [[foo]]:s

which should also be considered, but this only creates problem in data lists. The following will fail (must be tested!)

;  [[foo]]:s
jeblad added a comment.EditedMar 19 2016, 5:34 PM

We can get around the problem partially by using negative lookahead and positive lookbehind, but none of them solves the whole problem.

$linkTrail = '/^([-a-zàáâçčʒǯđðéèêëǧǥȟíìîïıǩŋñóòôõßšŧúùûýÿüžþæøåäö]+|[-a-zàáâçčʒǯđðéèêëǧǥȟíìîïıǩŋñóòôõßšŧúùûýÿüžþæøåäö]*(?:(?<=[sxzşŝșšśßžżź])\')|\'s|\'(?!\')))(.*)$/sDu';

This will fail for both s-lookalike-apostroph-bold and s-lookalike-apostroph-italics. The important case is s-lookalike-apostroph-bold because this case might turn up in the lead-in.

jeblad renamed this task from Linktrail for Norwegian (bokmål) Wikipedia to Linktrail for Norwegian (bokmål) Wikipedia (and similar problem at Swedish and Danish).Mar 19 2016, 5:44 PM
Restricted Application added a subscriber: Josve05a. · View Herald TranscriptMar 19 2016, 5:44 PM

I'm wondring if we should use U+02BC (MODIFIER LETTER APOSTROPHE) for this and not U+0029 (APOSTROPHE) as we then avoid the whole problem.

Apostrophe (U+0029) would then go outside the link and Modifier letter apostrophe (U+02BC) would be added to the link text. See also Which Unicode character should represent the English apostrophe? (And why the Unicode committee is very wrong.) and UDHR in Unicode.

jeblad added a comment.EditedMar 19 2016, 8:28 PM

Unles anyone has a better idea

$linkTrail = '/^([‐ʼa-zàáâçčʒǯđðéèêëǧǥȟíìîïıǩŋñóòôõßšŧúùûýÿüžþæøåäö]+)(.*)$/sDu';

This uses hyphen and modifier letter apostrophe. It should not make it necessary to do any changes to splitTrail.

Krenair renamed this task from Linktrail for Norwegian (bokmål) Wikipedia (and similar problem at Swedish and Danish) to Linktrail for Norwegian (bokmål) (and similar problem at Swedish and Danish).Mar 19 2016, 8:31 PM
Krenair removed a project: MediaWiki-Sites.
Krenair updated the task description. (Show Details)
Krenair removed a subscriber: MediaWiki-Sites.

That doesn't seem to solve the practical problem at nowiki unless you also propose to replace all of the apostrophes in use at nowiki for the correct use cases. It's would also be nearly impossible to get the users to actually use these correctly unless they are readily available at their keyboards for both desktop and mobile versions.

Is this something for the Wikimedia Language Engineering team to look at?

No it does not solve whats actually on-wiki, it is a description of how we solve this in the configuration. This is a task for a piece of work, it would then build on other tasks to solve the overall problem.

Restricted Application added a subscriber: jhsoby. · View Herald TranscriptMar 31 2017, 2:00 PM
Liuxinyu970226 added a subscriber: Liuxinyu970226.EditedMar 31 2017, 2:05 PM

I'm afraid that this issue is unlikely to be resolved in the next several decades, either ISO or CLDR and even IETF already *CLEARLY* defined that no is Norwegian Marcolanguage, which includes not only two languages that can be called as "Norwegian", and at least https://duckduckgo.com already void this, why a number of Norwegian users still claim that no=nb? Why? Why? Why?

Not sure what Liuxinyu970226 says in T130451#3146619, … Both Nynorsk and Bokmål are Norwegian and standardized by The Norwegian language council, and when it comes to characters and other glyphs both forms of Norwegian are equal.

Restricted Application added a subscriber: Danmichaelo. · View Herald TranscriptOct 30 2017, 6:51 PM