Third time today I've had to fix this.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | matmarex | T117165 Sometimes user-inserted ISBNs don't get converted to magic links somehow, meaning Parsoid has to nowiki them… | |||
Resolved | Xqt | T147180 isbn formatting |
Event Timeline
I took a look at the edits; it's hard to say what was going wrong there. There were a bunch of strange line breaks in the content as well. Perhaps this was a copy/paste from some other source, and that didn't trigger auto-link-ification?
@cscott
If you want a much simpler example, you can take a look at this edit
https://fr.wikipedia.org/w/index.php?title=Alain_Guyard&curid=5509552&diff=120464764&oldid=120423956
or this edit
https://fr.wikipedia.org/w/index.php?title=Mireille_Calle-Gruber&curid=1645851&diff=120452815&oldid=118906045
It's quite easy to find examples by looking at the edits that triggered the nowiki filter
https://fr.wikipedia.org/w/index.php?title=Sp%C3%A9cial:Modifications_r%C3%A9centes&tagfilter=nowiki
@NicoV I understand you're finding these with an automated tool, but they don't tell us how the users are actually creating these. We need to know what sequence of user actions causes the autolink filter to be bypassed before we can fix that edge case.
@cscott
I'm only using the edit filter that detects the addition of nowiki tags, no automated tool at all to find them.
I will repeat what I already said so many times : I'm not working for WMF, I'm not the one who has decided to deploy tools too early on large scale without properly testing them, I'm a simple volunteer and I already spend too much time reporting so many problems created by tools deployed without being properly tested (like Content Translation which has been deployed like 6 months ago, and is still creating more damaged articles than clean articles).
So, no I'm not going to hunt the users to understand what they have done to produce the problem : I gave you an easy way to find problematic edits, why the WMF can't contact these users to understand what they did ?
Especially given that often, I don't even get an answer to the problems I'm reporting (see for example T110826), I'm not going to spend my time doing work that is obviously WMF's responsibility.
As VE is a default for newusers in de.wikipedia, you will find this error often, if you search for pages titled "Entwurf" in the user name space in the german wikipedia. It is obvious that this newusers do neither know about the possibility to nowiki an ISBN, nor even know, that an ISBN is supposed to be autolinked always ever.
(and yes the Content translation tool can be improved: Why offer 3rd party machine translation, when there is no MT for en -> de available, why not offer wikionary information, even so de.wikt and en.wikt do offer useful ressources, categories only work, if the same structure is used in both projects and there is no obvious way to add adequate cats, you can translate or delete paragraphes, but how do you add new ones? and where does a translation of a newuser from de to en end up? in the draft or main name space?)
One possible way to get an unlinked ISBN by normal editing is to enter the ISBN and then navigate the cursor away, e.g. click somewhere else. As in most cases the ISBN is at the end of your insertion, and you probably pasted it instead of typing it in, it is quite common not to enter another character after the ISBN, which means you don't trigger the automatic link.
To reproduce this behaviour of VisualEditor which always encloses ISBN numbers inside <nowiki> tags when you enter an ISBN number without followed by a space or line feed try the following steps:
- edit a page with VisualEditor
- add any ISBN number but don't c&p it and don't finish with a space or line feed (but also brackets, dots doesn't leads to the ISBN parsing)!
- click "Save changes"
- click "Review your changes" to verify your edit and save it
You'll find the ISBN no enclosed in <nowiki> tags like <nowiki>ISBN 978-3-9815841-5-8</nowiki>
See [1] for a sample
On de-wiki there is a abuse filter [2] now. Please refer it for the problematic edits.
The tag filter says this edit is made by VisualEditor [3]
[1] https://de.wikipedia.org/w/index.php?title=Benutzer%3AXqt2FTest&type=revision&diff=158437087&oldid=157652172 for a sample
[2] https://de.wikipedia.org/w/index.php?title=Spezial:Missbrauchsfilter-Logbuch&wpSearchFilter=236
[3] https://de.wikipedia.org/w/index.php?title=Spezial:Letzte_%C3%84nderungen&tagfilter=ISBN
No. Either one VisualEditor does well and hundreds [1] of human editors are wrong or vice versa. I guess the last one.
https://www.mediawiki.org/wiki/Requests_for_comment/Future_of_magic_links is relevant to this conversation. If the various wikis embrace this proposal, this will cease to be a problem as far as I can tell. Looks like enwiki is beginning this transition already.
The magic link RFC won't directly affect this task; if anything removing
explicit auto links from wiki text will make VE's auto-link behavior even
more important.
Note that Google docs has exactly the same "must press space or return
after a typed link" behavior that VE does. We could potentially consider
having cursor movement (click away, defocus, etc) trigger the auto-link,
though.
So yes, there are 2 parts to this ticket here: (a) VE doesn't autolink ISBNs (b) Parsoid nowikis them.
The RFC affects (b), not (a).
How would https://phabricator.wikimedia.org/T1084 affect this task?
EDIT: to answer myself, it wouldn't.
More generally: if you type the full ISBN number, and then click away before typing anything else, it won't get autolinked.
The same thing can happen with an URL like http://example.com/.
I think we need to run the sequence matcher not just after typing, but also after a selection is changed (at the position of previous selection). I'll look into this.
Change 341287 had a related patch set uploaded (by matmarex):
[VisualEditor/VisualEditor] Allow variable-length sequences without a fake space terminator, use for autolinking
Change 341288 had a related patch set uploaded (by matmarex):
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Use delayed sequence
Change 341291 had a related patch set uploaded (by matmarex):
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Improve ISBN and RFC/PMID autolinking
I noticed another case that could lead to this – converting pasted ISBN into magic links currently relies on the MW API to do the conversion (the same system we use to convert pasted wikitext like [[foo]] into content to avoid nowikifying it). Since we already implement handling of ISBNs, this is wasteful, and the user temporarily losing their internet connection while editing could lead to failed conversion and nowikifying. (Realistically, I don't think this happens often and the previous problem is probably the cause of most of nowiki ISBN, but this is also possible.)
Change 341556 had a related patch set uploaded (by matmarex):
[mediawiki/extensions/VisualEditor] ve.ui.MWWikitextStringTransferHandler: Avoid API call for magic links
Change 341556 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] ve.ui.MWWikitextStringTransferHandler: Avoid API call for magic links
Change 341287 merged by jenkins-bot:
[VisualEditor/VisualEditor] Allow variable-length sequences without a fake space terminator, use for autolinking
Change 342651 had a related patch set uploaded (by Jforrester):
[mediawiki/extensions/VisualEditor] Update VE core submodule to master (bc6417ba5)
Theoretically this will now be fixed (the code will roll out to wikis starting on 2017-03-21).
Change 342651 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] Update VE core submodule to master (41134af2b)
This can actually be done with any dialog, not just the save dialog. Type ISBN, click e.g. Insert → Media, and it will never be autolinked. So, here's one final patch for that.
Change 342875 had a related patch set uploaded (by Bartosz Dziewoński):
[VisualEditor/VisualEditor] ve.ce.Surface: Check delayed sequences when deactivating surface
Change 342875 merged by jenkins-bot:
[VisualEditor/VisualEditor] ve.ce.Surface: Check delayed sequences when deactivating surface
Change 341288 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Use delayed sequence
Change 341291 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor] ve.ui.MWLinkAction: Improve ISBN and RFC/PMID autolinking
IMHO the behavior got actually worse:
- Copy 978-3-642-14564-3
- Open a page for editing in VE, type "ISBN ".
- Paste the copied ISBN.
- Try to get it autolinked.
Result in 1.29.0-wmf.15: It will get linked if (and only if) you type a space.
Result in 1.29.0-alpha: No matter what you do, the ISBN will not be linked.
Desired result: It should get linked either directly after you paste it or at least after your next action (whether you type anything or move the cursor or whatever)
Typing the ISBN now works as expected, but honestly, how ofter do you actually type 13 digits (plus some hyphens) when you can just copy&paste them?
Yeah, we should fix that too. (It works as soon as you deleted and re-add the last character, but that's not helpful.)