Page MenuHomePhabricator

VisualEditor: Put in hacks to scrub plugin garbage (e.g. myEventWatcherDiv)
Closed, ResolvedPublic

Description

These seem to be the leading cause of edit corruption right now.

myEventWatcherDiv: https://ru.wikipedia.org/?diff=64516612 https://ru.wikipedia.org/?diff=64516412 https://pt.wikipedia.org/?diff=39659121 https://pt.wikipedia.org/?diff=39659108

<embed> tags: https://pt.wikipedia.org/?diff=39696565

<object> tags: https://fr.wikipedia.org/?diff=105796883 https://fr.wikipedia.org/?diff=105796959 https://fr.wikipedia.org/?diff=105797061

I'm thinking we should put in hacks to remove these kinds of tags. Maybe at the point where we serialize the HTML and send it to Parsoid (ve.init.mw.Target#getHTML). If these tags are added immediately upon document creation (we'd need to get our hands on one of these bad plugins to test that) we could also consider trying to work around this in ve.createDocumentFromHtml instead. I suspect, though, that these tags are probably added asynchronously, and probably only in cases where we fall back to the iframe trick because DOMParser HTML support is not available.


Version: unspecified
Severity: normal

Details

Reference
bz68900

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:38 AM
bzimport set Reference to bz68900.
Catrope created this task.Jul 31 2014, 6:27 AM

I wonder if we can just do something like this

$( newDoc )
.remove( '[id=myEventWatcherDiv]' ) Bug 51423
.remove( 'object[type=cosymantecnisbfw], script[id=NortonInternetSecurityBF]' )
Bug 63229
.remove( 'embed[id ^= xunlei_com_thunder_helper_plugin]' ) Bug 63121
.remove( 'div[id=sendToInstapaperResults]' )
Bug 61776
.remove( 'style[id=_clearly_component__css]' ) Bug 53252
.remove( 'script[id=FoxLingoJs]' )
Bug 52884
.remove( 'embed[type=application\\/x-datavault]' ) Bug 52791
.remove( 'embed[type=application\\/iodbc]' );
Bug 51521

(In reply to Alex Monk from comment #2)

I wonder if we can just do something like this
$( newDoc )
.remove( '[id=myEventWatcherDiv]' ) Bug 51423
.remove( 'object[type=cosymantecnisbfw],
script[id=NortonInternetSecurityBF]' )
Bug 63229
.remove( 'embed[id ^= xunlei_com_thunder_helper_plugin]' ) Bug 63121
.remove( 'div[id=sendToInstapaperResults]' )
Bug 61776
.remove( 'style[id=_clearly_component__css]' ) Bug 53252
.remove( 'script[id=FoxLingoJs]' )
Bug 52884
.remove( 'embed[type=application\\/x-datavault]' ) Bug 52791
.remove( 'embed[type=application\\/iodbc]' );
Bug 51521

Yeah I was thinking about doing something like that. We have no easy way to know in advance what that will fix, but we can try it.

Change 163961 had a related patch set uploaded by Alex Monk:
Remove certain blacklisted elements when getting HTML from document

https://gerrit.wikimedia.org/r/163961

Change 163961 merged by jenkins-bot:
Remove certain blacklisted elements when getting HTML from document

https://gerrit.wikimedia.org/r/163961

Marking this as fixed, will keep an eye on the others in the next couple of weeks or so to see if those are resolved by it.