Page MenuHomePhabricator

References pasted from read mode should be dropped until we can support them properly
Closed, ResolvedPublic1 Estimated Story Points

Event Timeline

If we had an HTML-based per target blacklist we could probably safely filter out something like sup.reference, or sup.reference:not([typeof]).

Change 320764 had a related patch set uploaded (by Esanders):
Setup htmlBlacklist and add rule for read-mode MW references

https://gerrit.wikimedia.org/r/320764

Jdforrester-WMF triaged this task as Medium priority.
Jdforrester-WMF set the point value for this task to 1.
Jdforrester-WMF moved this task from To Triage to TR1: Releases on the VisualEditor board.

Change 320764 merged by jenkins-bot:
Setup htmlBlacklist and add rule for read-mode MW references

https://gerrit.wikimedia.org/r/320764

Change 431655 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Add comment to htmlBlacklist item

https://gerrit.wikimedia.org/r/431655

Change 431655 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Add comment to htmlBlacklist item

https://gerrit.wikimedia.org/r/431655

Change 534480 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Fix HTML blacklist inheritance

https://gerrit.wikimedia.org/r/534480

Looks like this fix regressed during a recent refactor on target inheritance. The above patch should fix it again.

Change 534480 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Fix HTML blacklist inheritance

https://gerrit.wikimedia.org/r/534480

Change 534487 had a related patch set uploaded (by Jforrester; owner: Esanders):
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.21] Fix HTML blacklist inheritance

https://gerrit.wikimedia.org/r/534487

Change 534488 had a related patch set uploaded (by Jforrester; owner: Esanders):
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.20] Fix HTML blacklist inheritance

https://gerrit.wikimedia.org/r/534488

Change 534494 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Add unit tests for read-mode reference filter

https://gerrit.wikimedia.org/r/534494

Change 534487 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.21] Fix HTML blacklist inheritance

https://gerrit.wikimedia.org/r/534487

Change 534488 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.20] Fix HTML blacklist inheritance

https://gerrit.wikimedia.org/r/534488

Mentioned in SAL (#wikimedia-operations) [2019-09-04T17:45:23Z] <jforrester@deploy1001> Synchronized php-1.34.0-wmf.21/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2019-09-04T17:47:33Z] <jforrester@deploy1001> Synchronized php-1.34.0-wmf.20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 57s)

I wrote a bot to fix this error when it shows in the wikitext. Non-trivial because of determining underlying citation by its number. The most recent regress injected about 3000 bad citations on enwiki, which the bot has fixed. There are probably more in other wikis, and in non-mainspace. If the bot is needed again available at https://en.wikipedia.org/wiki/User:GreenC_bot/Job_18

This may still be active under certain conditions:

https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=861

It was added in this diff

https://en.wikipedia.org/w/index.php?title=Prospect_theory&type=revision&diff=914509204&oldid=913948047

I contacted the editor how the edit was made:

https://en.wikipedia.org/wiki/User_talk:7804j#VisualEditor_bug_question

It appears the content was copy-pasted either:

(2) from another paragraph of the same article using the visual editor, or (3) from another paragraph of the same article using the visual editor, but opened in a new tab (i.e., with two tabs of the same article opened on my browser)

The garbled text did not preexist anywhere but newly generated.

Change 535586 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Use MW import rules in MW tests

https://gerrit.wikimedia.org/r/535586

Change 535586 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Use MW import rules in MW tests

https://gerrit.wikimedia.org/r/535586

Change 534494 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Add unit tests for read-mode reference filter

https://gerrit.wikimedia.org/r/534494

@Esanders, how'd you arrive at not providing contributors any feedback about their attempted paste?

Reason for my question: I found it confusing that, despite nothing being shown on VE's edit surface, my attempted paste seemed to have some effect considering it activates the "Publish changes" button (watch this video, beginning at 0:09).

I'd have assumed that rather than pasting nothing, we'd paste the reference in plaintext to communicate to contributors something like: copy and paste is not broken; however, copying and pasting this type of content (in this case, a reference from read mode) is not supported

Technically, pasting from Wikipedia read mode is no different than pasting from an external website, and there are many sanitisations that happen during paste of external HTML, including

  • Removing of certain tags that aren't editable in VE, and probably not intended to be preserved: <u>, <time>, <lang>, <span>, <font>, <fieldset> ...
  • Removing of addition tag attributes, that also aren't editable and may be adding unwanted styling (font size/colour)
  • Removing of external links
  • Removing of Wiki read mode citations

If the user pastes a large block of text, they may trigger multiple of these sanitisation rules, so rather than try to display a large warning displaying of these, it is just understood that you can't paste anything into VE.

Note that other rich content will not paste "correctly" into VE, such as templates, images, and extensions (code blocks, math equations).

Other than detecting things like citations, it would not be generically possible to know if the pasted content had come from Wikipedia or any other site, it is just regular HTML.

When we switch to Parsoid HTML for read mode, it should be possible to preserve all rich content, including references.

ppelberg closed this task as Resolved.EditedOct 23 2019, 1:05 AM

I'm marking this task as "Resolved" considering this patch works as it's been intended to.

We will look at whether this behavior should be revisited in this task: T236220

👆 The above is an outcome of @Esanders and my conversation earlier today.