Page MenuHomePhabricator

Sometimes user-inserted ISBNs don't get converted to magic links somehow, meaning Parsoid has to nowiki them…
Closed, ResolvedPublic8 Estimated Story Points

Event Timeline

Bgwhite raised the priority of this task from to Needs Triage.
Bgwhite updated the task description. (Show Details)
Bgwhite added a project: VisualEditor.
Bgwhite added subscribers: Bgwhite, Magioladitis, NicoV.
Jdforrester-WMF renamed this task from Putting nowiki tags around isbn numbers to Sometimes user-inserted ISBNs don't get converted to magic links somehow, meaning Parsoid has to nowiki them….Nov 3 2015, 8:13 PM
Jdforrester-WMF triaged this task as Medium priority.
Jdforrester-WMF set Security to None.
Jdforrester-WMF moved this task from To Triage to TR1: Releases on the VisualEditor board.
Jdforrester-WMF moved this task from TR1: Releases to TR0: Interrupt on the VisualEditor board.
Jdforrester-WMF added a subscriber: cscott.

I took a look at the edits; it's hard to say what was going wrong there. There were a bunch of strange line breaks in the content as well. Perhaps this was a copy/paste from some other source, and that didn't trigger auto-link-ification?

@NicoV I understand you're finding these with an automated tool, but they don't tell us how the users are actually creating these. We need to know what sequence of user actions causes the autolink filter to be bypassed before we can fix that edge case.

@cscott
I'm only using the edit filter that detects the addition of nowiki tags, no automated tool at all to find them.
I will repeat what I already said so many times : I'm not working for WMF, I'm not the one who has decided to deploy tools too early on large scale without properly testing them, I'm a simple volunteer and I already spend too much time reporting so many problems created by tools deployed without being properly tested (like Content Translation which has been deployed like 6 months ago, and is still creating more damaged articles than clean articles).
So, no I'm not going to hunt the users to understand what they have done to produce the problem : I gave you an easy way to find problematic edits, why the WMF can't contact these users to understand what they did ?
Especially given that often, I don't even get an answer to the problems I'm reporting (see for example T110826), I'm not going to spend my time doing work that is obviously WMF's responsibility.

@cscott
As Shakespeare said, "Don't shoot the messenger"

As VE is a default for newusers in de.wikipedia, you will find this error often, if you search for pages titled "Entwurf" in the user name space in the german wikipedia. It is obvious that this newusers do neither know about the possibility to nowiki an ISBN, nor even know, that an ISBN is supposed to be autolinked always ever.

(and yes the Content translation tool can be improved: Why offer 3rd party machine translation, when there is no MT for en -> de available, why not offer wikionary information, even so de.wikt and en.wikt do offer useful ressources, categories only work, if the same structure is used in both projects and there is no obvious way to add adequate cats, you can translate or delete paragraphes, but how do you add new ones? and where does a translation of a newuser from de to en end up? in the draft or main name space?)

One possible way to get an unlinked ISBN by normal editing is to enter the ISBN and then navigate the cursor away, e.g. click somewhere else. As in most cases the ISBN is at the end of your insertion, and you probably pasted it instead of typing it in, it is quite common not to enter another character after the ISBN, which means you don't trigger the automatic link.

Esanders added a subscriber: Xqt.

and you probably pasted it instead of typing it in

ISBNs are linked instantly on paste.

To reproduce this behaviour of VisualEditor which always encloses ISBN numbers inside <nowiki> tags when you enter an ISBN number without followed by a space or line feed try the following steps:

  • edit a page with VisualEditor
  • add any ISBN number but don't c&p it and don't finish with a space or line feed (but also brackets, dots doesn't leads to the ISBN parsing)!
  • click "Save changes"
  • click "Review your changes" to verify your edit and save it

You'll find the ISBN no enclosed in <nowiki> tags like <nowiki>ISBN 978-3-9815841-5-8</nowiki>
See [1] for a sample

On de-wiki there is a abuse filter [2] now. Please refer it for the problematic edits.

The tag filter says this edit is made by VisualEditor [3]

[1] https://de.wikipedia.org/w/index.php?title=Benutzer%3AXqt2FTest&type=revision&diff=158437087&oldid=157652172 for a sample
[2] https://de.wikipedia.org/w/index.php?title=Spezial:Missbrauchsfilter-Logbuch&wpSearchFilter=236
[3] https://de.wikipedia.org/w/index.php?title=Spezial:Letzte_%C3%84nderungen&tagfilter=ISBN

That is correct behaviour. We can't autolink until they hit space as we don't know where the ISBN ends. ISBNs will almost always be copied and pasted.

No. Either one VisualEditor does well and hundreds [1] of human editors are wrong or vice versa. I guess the last one.

[1] https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&fulltext=Search&search=insource%3A%2Fnowiki%5C%3EISBN+%5B0-9%5D%2F&searchToken=elwtrp45fc5er9qqbbl9mpj3z

https://www.mediawiki.org/wiki/Requests_for_comment/Future_of_magic_links is relevant to this conversation. If the various wikis embrace this proposal, this will cease to be a problem as far as I can tell. Looks like enwiki is beginning this transition already.

The magic link RFC won't directly affect this task; if anything removing
explicit auto links from wiki text will make VE's auto-link behavior even
more important.

Note that Google docs has exactly the same "must press space or return
after a typed link" behavior that VE does. We could potentially consider
having cursor movement (click away, defocus, etc) trigger the auto-link,
though.

The magic link RFC won't directly affect this task;

So yes, there are 2 parts to this ticket here: (a) VE doesn't autolink ISBNs (b) Parsoid nowikis them.

The RFC affects (b), not (a).

How would https://phabricator.wikimedia.org/T1084 affect this task?
EDIT: to answer myself, it wouldn't.

To reproduce this behaviour of VisualEditor which always encloses ISBN numbers inside <nowiki> tags when you enter an ISBN number without followed by a space or line feed try the following steps:

  • edit a page with VisualEditor
  • add any ISBN number but don't c&p it and don't finish with a space or line feed (but also brackets, dots doesn't leads to the ISBN parsing)!
  • click "Save changes"
  • click "Review your changes" to verify your edit and save it

You'll find the ISBN no enclosed in <nowiki> tags like <nowiki>ISBN 978-3-9815841-5-8</nowiki>

More generally: if you type the full ISBN number, and then click away before typing anything else, it won't get autolinked.

The same thing can happen with an URL like http://example.com/.

I think we need to run the sequence matcher not just after typing, but also after a selection is changed (at the position of previous selection). I'll look into this.

Change 341287 had a related patch set uploaded (by matmarex):
[VisualEditor/VisualEditor] Allow variable-length sequences without a fake space terminator, use for autolinking

https://gerrit.wikimedia.org/r/341287

Change 341288 had a related patch set uploaded (by matmarex):
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Use delayed sequence

https://gerrit.wikimedia.org/r/341288

Change 341291 had a related patch set uploaded (by matmarex):
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Improve ISBN and RFC/PMID autolinking

https://gerrit.wikimedia.org/r/341291

I noticed another case that could lead to this – converting pasted ISBN into magic links currently relies on the MW API to do the conversion (the same system we use to convert pasted wikitext like [[foo]] into content to avoid nowikifying it). Since we already implement handling of ISBNs, this is wasteful, and the user temporarily losing their internet connection while editing could lead to failed conversion and nowikifying. (Realistically, I don't think this happens often and the previous problem is probably the cause of most of nowiki ISBN, but this is also possible.)

Change 341556 had a related patch set uploaded (by matmarex):
[mediawiki/extensions/VisualEditor] ve.ui.MWWikitextStringTransferHandler: Avoid API call for magic links

https://gerrit.wikimedia.org/r/341556

Change 341556 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] ve.ui.MWWikitextStringTransferHandler: Avoid API call for magic links

https://gerrit.wikimedia.org/r/341556

Change 341287 merged by jenkins-bot:
[VisualEditor/VisualEditor] Allow variable-length sequences without a fake space terminator, use for autolinking

https://gerrit.wikimedia.org/r/341287

Change 342651 had a related patch set uploaded (by Jforrester):
[mediawiki/extensions/VisualEditor] Update VE core submodule to master (bc6417ba5)

https://gerrit.wikimedia.org/r/342651

Jdforrester-WMF removed a project: Patch-For-Review.

Theoretically this will now be fixed (the code will roll out to wikis starting on 2017-03-21).

Change 342651 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] Update VE core submodule to master (41134af2b)

https://gerrit.wikimedia.org/r/342651

To reproduce this behaviour of VisualEditor which always encloses ISBN numbers inside <nowiki> tags when you enter an ISBN number without followed by a space or line feed try the following steps:

  • edit a page with VisualEditor
  • add any ISBN number but don't c&p it and don't finish with a space or line feed (but also brackets, dots doesn't leads to the ISBN parsing)!
  • click "Save changes"
  • click "Review your changes" to verify your edit and save it

This can actually be done with any dialog, not just the save dialog. Type ISBN, click e.g. Insert → Media, and it will never be autolinked. So, here's one final patch for that.

Change 342875 had a related patch set uploaded (by Bartosz Dziewoński):
[VisualEditor/VisualEditor] ve.ce.Surface: Check delayed sequences when deactivating surface

https://gerrit.wikimedia.org/r/342875

Change 342875 merged by jenkins-bot:
[VisualEditor/VisualEditor] ve.ce.Surface: Check delayed sequences when deactivating surface

https://gerrit.wikimedia.org/r/342875

Jdforrester-WMF changed the point value for this task from 8 to 0.Mar 15 2017, 10:22 PM
Jdforrester-WMF changed the point value for this task from 0 to 8.

Change 341288 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Use delayed sequence

https://gerrit.wikimedia.org/r/341288

Change 341291 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Improve ISBN and RFC/PMID autolinking

https://gerrit.wikimedia.org/r/341291

IMHO the behavior got actually worse:

  1. Copy 978-3-642-14564-3
  2. Open a page for editing in VE, type "ISBN ".
  3. Paste the copied ISBN.
  4. Try to get it autolinked.

Result in 1.29.0-wmf.15: It will get linked if (and only if) you type a space.
Result in 1.29.0-alpha: No matter what you do, the ISBN will not be linked.
Desired result: It should get linked either directly after you paste it or at least after your next action (whether you type anything or move the cursor or whatever)

Typing the ISBN now works as expected, but honestly, how ofter do you actually type 13 digits (plus some hyphens) when you can just copy&paste them?

IMHO the behavior got actually worse:

  1. Copy 978-3-642-14564-3
  2. Open a page for editing in VE, type "ISBN ".
  3. Paste the copied ISBN.
  4. Try to get it autolinked.

Result in 1.29.0-wmf.15: It will get linked if (and only if) you type a space.
Result in 1.29.0-alpha: No matter what you do, the ISBN will not be linked.

Yeah, we should fix that too. (It works as soon as you deleted and re-add the last character, but that's not helpful.)