Apparently this is a moderately common problem:
Description
Details
Related Objects
- Mentioned In
- T209493: VE is transforming citation templates into formatted text with "cite class" tags (when copying a reference defined within template-generated reflist)
T232461: Read-mode references pasted into VE via a third-party editor are not stripped - Mentioned Here
- T236220: As a contributor I want to know why references pasted from read mode do not appear in VE
Event Timeline
If we had an HTML-based per target blacklist we could probably safely filter out something like sup.reference, or sup.reference:not([typeof]).
Change 320764 had a related patch set uploaded (by Esanders):
Setup htmlBlacklist and add rule for read-mode MW references
Change 320764 merged by jenkins-bot:
Setup htmlBlacklist and add rule for read-mode MW references
Change 431655 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Add comment to htmlBlacklist item
Change 431655 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Add comment to htmlBlacklist item
Change 534480 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Fix HTML blacklist inheritance
Looks like this fix regressed during a recent refactor on target inheritance. The above patch should fix it again.
Change 534480 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Fix HTML blacklist inheritance
Change 534487 had a related patch set uploaded (by Jforrester; owner: Esanders):
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.21] Fix HTML blacklist inheritance
Change 534488 had a related patch set uploaded (by Jforrester; owner: Esanders):
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.20] Fix HTML blacklist inheritance
Change 534494 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Add unit tests for read-mode reference filter
Change 534487 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.21] Fix HTML blacklist inheritance
Change 534488 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.20] Fix HTML blacklist inheritance
Mentioned in SAL (#wikimedia-operations) [2019-09-04T17:45:23Z] <jforrester@deploy1001> Synchronized php-1.34.0-wmf.21/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 56s)
Mentioned in SAL (#wikimedia-operations) [2019-09-04T17:47:33Z] <jforrester@deploy1001> Synchronized php-1.34.0-wmf.20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 57s)
I wrote a bot to fix this error when it shows in the wikitext. Non-trivial because of determining underlying citation by its number. The most recent regress injected about 3000 bad citations on enwiki, which the bot has fixed. There are probably more in other wikis, and in non-mainspace. If the bot is needed again available at https://en.wikipedia.org/wiki/User:GreenC_bot/Job_18
This may still be active under certain conditions:
https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=861
It was added in this diff
I contacted the editor how the edit was made:
https://en.wikipedia.org/wiki/User_talk:7804j#VisualEditor_bug_question
It appears the content was copy-pasted either:
(2) from another paragraph of the same article using the visual editor, or (3) from another paragraph of the same article using the visual editor, but opened in a new tab (i.e., with two tabs of the same article opened on my browser)
The garbled text did not preexist anywhere but newly generated.
Change 535586 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Use MW import rules in MW tests
Change 535586 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Use MW import rules in MW tests
Change 534494 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Add unit tests for read-mode reference filter
@Esanders, how'd you arrive at not providing contributors any feedback about their attempted paste?
Reason for my question: I found it confusing that, despite nothing being shown on VE's edit surface, my attempted paste seemed to have some effect considering it activates the "Publish changes" button (watch this video, beginning at 0:09).
I'd have assumed that rather than pasting nothing, we'd paste the reference in plaintext to communicate to contributors something like: copy and paste is not broken; however, copying and pasting this type of content (in this case, a reference from read mode) is not supported
Technically, pasting from Wikipedia read mode is no different than pasting from an external website, and there are many sanitisations that happen during paste of external HTML, including
- Removing of certain tags that aren't editable in VE, and probably not intended to be preserved: <u>, <time>, <lang>, <span>, <font>, <fieldset> ...
- Removing of addition tag attributes, that also aren't editable and may be adding unwanted styling (font size/colour)
- Removing of external links
- Removing of Wiki read mode citations
If the user pastes a large block of text, they may trigger multiple of these sanitisation rules, so rather than try to display a large warning displaying of these, it is just understood that you can't paste anything into VE.
Note that other rich content will not paste "correctly" into VE, such as templates, images, and extensions (code blocks, math equations).
Other than detecting things like citations, it would not be generically possible to know if the pasted content had come from Wikipedia or any other site, it is just regular HTML.
When we switch to Parsoid HTML for read mode, it should be possible to preserve all rich content, including references.