<nowiki> tags should be applied to the minimal rather than maximal content (or well, a compromise thereof..)
Closed, ResolvedPublic

Description

Right now the following HTML entered by a user:

<p>Hello my [[name]] is Julian and I live in the city of Rochester, New York with my friend Sandy; we write [[books]].</p>

… turns into:

<p>Hello my <nowiki>[[name]] is Julian and I live in the city of Rochester, New York with my friend Sandy; we write [[books]].</nowiki></p>

… whereas ideally it should be minimal rather than maximal:

<p>Hello my <nowiki>[[</nowiki>name<nowiki>]]</nowiki> is Julian and I live in the city of Rochester, New York with my friend Sandy; we write <nowiki>[[</nowiki>books<nowiki>]]</nowiki>.</p>

This would significantly reduce issues with users accidentally inputting wikitext. We're taking actions in VisualEditor-land to discourage such input, but if Parsoid could help it would be hugely useful.


Version: unspecified
Severity: normal

bzimport added a project: Parsoid.Via ConduitNov 22 2014, 1:50 AM
bzimport set Reference to bz50841.
Jdforrester-WMF created this task.Via LegacyJul 5 2013, 10:29 PM
ssastry added a comment.Via ConduitJul 5 2013, 10:41 PM

Let us discuss more on irc monday, but the general problem of escaping minimal contexts can complicate the algorithm. That said, we could implement some special cases maybe.

Also, a lot of nowiki-tags (as in this example) looks ugly to me (personally), and could potentially induce greater grumbling among editors. So, my suggestion is to minimize incidences of nowiki contexts via VE (however you do it) first, and then implement a version of this proposal in Parsoid after that.

Yet another (not fully thought out) alternative is to convert wikitext chars into HTML entities which eliminates nowikis altogether. We currently use this entity-based solution when we have to escape pipes in url/ext. links in template arg contexts and can just use that more broadly.

liangent added a comment.Via ConduitJul 5 2013, 10:46 PM

I forgot whether I pointed this out or not: applying <nowiki> on "normal" text disabled language conversion on it too; we may want to <nowiki> wikitext meta characters only.

ssastry added a comment.Via ConduitJul 5 2013, 10:51 PM

Actually, another way is to use the existing algorithm to detect contexts that would have been wrapped in <nowiki>..</nowiki> and apply nowiki-escaping to all possible wikitext chars in there (which is safe, but excessive). This will not complicate the algorithm anymore, but will also introduce more nowiki tags than strictly necessary.

Dragons_flight added a comment.Via ConduitJul 5 2013, 10:55 PM

It should be noted, that at present VE aggressively grabs normal text on either side of the offending elements as well. For example, using VE to enter:

  • I am the very model of a modern [[Major-General]], I've information vegetable, animal, and mineral, I know the kings of England, and I quote the fights historical

Becomes

  • <nowiki>I am the very model of a modern [[Major-General]], I've information vegetable, animal, and mineral, I know the kings of England, and I quote the fights historical</nowiki>

Even though the plain text at either end of the wikilink could reasonably be ignored without increasing the number of nowikis. Since VE can not currently handle nowiki elements, that also means that the entire sentence becomes uneditable in VE after saving.

GWicke added a comment.Via ConduitJul 9 2013, 3:22 AM

Maybe we could still use a single wrapper but exclude unsuspicious leading / trailing text from the nowiki block. I agree with Subbu that escaping each run of dangerous characters individually would often result in too much nowiki noise.

Re language conversion:
How often is nowiki used to suppress language conversion? Could -{ }- or some other syntax be used instead?

ssastry added a comment.Via ConduitJul 9 2013, 4:11 AM

Actually, why not use entities for escaping? That will eliminate nowikis, make them editable, and display as text.

[[Foo]] ==> &#91;&#91;Foo&#93;&#93;

and the like. Am I missing something that makes this problematic?

liangent added a comment.Via ConduitJul 9 2013, 4:43 AM

(In reply to comment #5)

Re language conversion:
How often is nowiki used to suppress language conversion? Could -{ }- or some
other syntax be used instead?

It's more like some side effect...

GWicke added a comment.Via ConduitJul 9 2013, 4:51 PM

(In reply to comment #6)

Actually, why not use entities for escaping? That will eliminate nowikis,
make
them editable, and display as text.

[[Foo]] ==> &#91;&#91;Foo&#93;&#93;

and the like. Am I missing something that makes this problematic?

Numeric entities are very hard for users to decipher. While unwanted nowiki tags can simply be removed, fixing up unwanted entity escapes is much more work.

(In reply to comment #7)

(In reply to comment #5)
> Re language conversion:
> How often is nowiki used to suppress language conversion? Could -{ }- or some
> other syntax be used instead?

It's more like some side effect...

For us it would be helpful if we could separate those two aspects. If little existing content relies on language conversion being disabled in nowiki, then that might actually be possible.

ssastry added a comment.Via ConduitJul 9 2013, 4:52 PM

Gabriel and I discussed the entity solution on IRC, and Gabriel made a good observation that fixing those in 'edit source' will become much harder compared to nowiki tags. So, entity-based escaping makes fixups harder using the source editor, but makes it easier with VE. With nowiki, it is the opposite. But, presumably the noneditability of nowiki sections in VE is temporary.

I'll tackle this with nowikis today by only changing how identified text blocks are escaped without trying to optimize placement itself (Ex:, in reality it is sufficient to only escape "[[" or "]]" in [[Foo]], but that requires more global knowledge which will complicate this).

We'll try to wrap closely occurring wikitext chars in a single nowiki block. So, "[[foo]]" will be wrapped as <nowiki>[[foo]]</nowiki>. "Closely occurring" will be heuristic based, say within X chars.

liangent added a comment.Via ConduitJul 9 2013, 5:18 PM

(In reply to comment #8)

For us it would be helpful if we could separate those two aspects. If little
existing content relies on language conversion being disabled in nowiki, then
that might actually be possible.

There shouldn't be much. It's the case just because in Parser.php,

$text = $this->mStripState->unstripNoWiki( $text );

is placed after

$text = $this->getConverterLanguage()->convert( $text );

but I'm not 100% sure this is just something accidental (or placed in this order delibrately).

liangent added a comment.Via ConduitJul 9 2013, 5:20 PM

(In reply to comment #10)

(In reply to comment #8)
> For us it would be helpful if we could separate those two aspects. If little
> existing content relies on language conversion being disabled in nowiki, then
> that might actually be possible.

There shouldn't be much. It's the case just because in Parser.php,

$text = $this->mStripState->unstripNoWiki( $text );

is placed after

$text = $this->getConverterLanguage()->convert( $text );

but I'm not 100% sure this is just something accidental (or placed in this
order delibrately).

Oh I got it. We want to have <nowiki>-{ }-</nowiki> work, so we can't place -{ }- back until all other -{ }- markups have been processed, but this also mean other text in nowiki misses the chance to be converted...

GWicke added a comment.Via ConduitJul 9 2013, 6:37 PM

(In reply to comment #11)

Oh I got it. We want to have <nowiki>-{ }-</nowiki> work, so we can't place
-{
}- back until all other -{ }- markups have been processed, but this also mean
other text in nowiki misses the chance to be converted...

I guess -{ and }- in nowiki could be escaped before applying language conversion.

GWicke added a comment.Via ConduitJul 9 2013, 6:38 PM

Possibly interesting nowiki watch list to monitor nowiki application in live edits:
https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&offset=&limit=500&wpSearchFilter=550

gerritbot added a comment.Via ConduitJul 9 2013, 11:10 PM

Change 72858 had a related patch set uploaded by Subramanya Sastry:
(Bug 50841): First pass reducing scope of nowiki tags

https://gerrit.wikimedia.org/r/72858

gerritbot added a comment.Via ConduitJul 10 2013, 3:28 AM

Change 72858 merged by jenkins-bot:
(Bug 50841): First pass reducing scope of nowiki tags

https://gerrit.wikimedia.org/r/72858

gerritbot added a comment.Via ConduitJul 17 2013, 3:51 AM

Change 74110 had a related patch set uploaded by Subramanya Sastry:
Take #2: (Bug 50841) Reduce scope of nowiki tags

https://gerrit.wikimedia.org/r/74110

gerritbot added a comment.Via ConduitJul 18 2013, 9:04 PM

Change 74110 merged by jenkins-bot:
Take #2: (Bug 50841) Reduce scope of nowiki tags

https://gerrit.wikimedia.org/r/74110

GWicke added a comment.Via ConduitJul 18 2013, 9:10 PM

This will likely be deployed later today.

ssastry added a comment.Via ConduitJul 18 2013, 9:15 PM

There are still some scenarios where nowiki wrapping will be applied to a longer string than required. I haven't tried to optimize it all, and if there are still issues, I can revisit it.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.