Page MenuHomePhabricator

Support preserving external links in pasted HTML content
Open, LowPublic

Description

We often have a need to import existing content into wikis, for example from Google Docs or GMail. Most of those documents contain links, which are important for the semantics of the document.

Currently, VisualEditor strips all links unconditionally. It would be useful if it could offer users the option to instead preserve links.

One implementation option I could see work is a pop-up triggered when detecting links, and with appropriate guidance on project policies related to external linking.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 10 2016, 7:53 PM
GWicke updated the task description. (Show Details)Mar 10 2016, 9:51 PM
GWicke updated the task description. (Show Details)

This would be extremely cool. I earlier settled on using a detour via LibreOffice's "Wiki Publisher" extention to export to wikitext: https://meta.wikimedia.org/wiki/User:Tbayer_(WMF)/Converting_Google_Docs_to_wikitext

(And while we are at it, preserving color and text attributes while copypasting HTML tables might make much of https://meta.wikimedia.org/wiki/User:Tbayer_(WMF)/Converting_Google_Slides_to_wikitext unnecessary.)

I don't think preserving inline CSS is going to be a good idea for WMF wikis. Most HTML sources contain a ton of unwanted inline styles, and 99% of the time we want to strip those.

BTW if you are looking for a workaround you can paste into standalone VE first which doesn't do as much sanitisation, and the paste from there into MediaWiki VE as a VE->VE paste and so skip sanitisation.

@Esanders Re workaround, thanks for the hint. Is there an online instance of standalone VE where one could try this out ? (E.g. https://wikimedia.github.io/VisualEditor/demos/ve/desktop-dist.html ? I tried there, but it seems to strip color and text attributes too.)

@Esanders Re workaround, thanks for the hint. Is there an online instance of standalone VE where one could try this out ? (E.g. https://wikimedia.github.io/VisualEditor/demos/ve/desktop-dist.html ? I tried there, but it seems to strip color and text attributes too.)

There is no VisualEditor instance that doesn't strip those, and there won't be for a long time. There is almost no circumstance where readers of Wikipedia articles are better served by encouraging editors to use colours.

@Esanders Re workaround, thanks for the hint. Is there an online instance of standalone VE where one could try this out ? (E.g. https://wikimedia.github.io/VisualEditor/demos/ve/desktop-dist.html ? I tried there, but it seems to strip color and text attributes too.)

There is no VisualEditor instance that doesn't strip those, and there won't be for a long time. There is almost no circumstance where readers of Wikipedia articles are better served by encouraging editors to use colours.

This was just a technical question about how to try out Ed's suggestion to use standalone VE. (Or did you mean to correct his statement that it uses less sanitization?)

It does use less sanitization, but style attributes are removed to fix an internal copy/paste issue. You could click 'Edit HTML' and paste the raw HTML (via here; http://edg2s.github.io/content-editable-sandbox/ for example).

DLynch added a subscriber: DLynch.Apr 21 2016, 4:12 PM

Technically, it'd be simple enough: just remove link from the importRules (defined here). We already do something similar for meta-shift-v to force plaintext pasting, here.

The unanswered questions are mostly UI -- how to trigger it? (With a small dash of would-it-cause-problems?)

We are now only stripping external links. Links to the same wiki will get converted as internal links and so won't be removed.

Jdforrester-WMF renamed this task from Support preserving links in pasted HTML content to Support preserving external links in pasted HTML content.Apr 26 2016, 8:27 PM
Jdforrester-WMF triaged this task as Low priority.
Jdforrester-WMF moved this task from To Triage to Freezer on the VisualEditor board.

For those interested in temporary workarounds, you can remove links from the blacklist:

ve.init.target.constructor.static.importRules.external.blacklist.shift()
jrbs added a subscriber: jrbs.Aug 30 2017, 11:44 PM
jayvdb added a subscriber: jayvdb.Aug 23 2018, 1:14 AM

I dont care much about the style, but dropping links seems unnecessary, except as an odd abuse prevention mechanism. Perhaps this could be allowed on more internal wikis like mediawiki.org and meta, which see more content copy-pasted from office documents by staff, devs and affiliates.

e.g. https://www.mediawiki.org/wiki/Google_Summer_of_Code/2018/Code_analytics was a dump from Google Docs.

Thankfully I was able to do this by first pasting into https://wikimedia.github.io/VisualEditor/demos/ve/desktop-dist.html , and then copy-pasting to MediaWiki.org . But even I wasnt sure about whether it would be worth opening Phabricator to find this issue; too scared, but very pleasantly surprised to find a workaround, but now scratching my head why an excellent feature is disabled.

Bumping this thread. Maybe making this configurable would be the best option (i.e. via LocalSettings.php or such)? With external links being blocked by default?

This isn't a priority for the team at the moment, but community patches are always welcome.

Change 493300 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Allow external link pasting to be enabled by config

https://gerrit.wikimedia.org/r/493300

Change 493300 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Allow external link pasting to be enabled by config

https://gerrit.wikimedia.org/r/493300

mingle added a comment.EditedTue, Apr 9, 4:34 PM

@Esanders, thanks for merging.

Quick question, do you know why pasting external links into https://test2.wikipedia.org works, even when ve.init.mw.Target.static.importRules.external.blacklist contains link/mwExternal?

Is there another piece of code or component that controls whether or not external links are allowed? Or do you believe this is a difference between 1.31.1 and 1.33.0?

DLynch added a comment.Tue, Apr 9, 4:54 PM

@mingle: It doesn't seem to? I just checked on test2, and cannot paste external links.

Note that external.blacklist only applies to pastes that didn't come from another VE instance. If you are copying from one VE instance to another nothing is removed.

DLynch added a comment.Tue, Apr 9, 5:19 PM

@Esanders: ...actually, as well as pasting from externally, I was copying external links within that instance on test2, and they were getting stripped. So we might have a separate issue?

mingle added a comment.Tue, Apr 9, 5:28 PM

@DLynch hmm, am I going crazy or missing something?

Also, I'm pasting from sources within the browser outside of the Mediawiki instance. Either from the URL bar, or copy/pasting from plain text on a page. Both seem to be converted to external links on test2.

osorio-juan-microsoft added a comment.EditedTue, Apr 9, 6:14 PM

@DLynch hmm, am I going crazy or missing something?

Also, I'm pasting from sources within the browser outside of the Mediawiki instance. Either from the URL bar, or copy/pasting from plain text on a page. Both seem to be converted to external links on test2.

I think you are talking about different things. You are pasting the plain text https://google.com/foobar which gets interpreted as a link. This issue addresses pasting a link (from a webpage, or other rich text source). In the latter, what is in the clipboard is an actual HTML link (<a href="https://google.com/foobar">Foobar</a>).

Try copying this entire sentence with the link at the end foobar

mingle added a comment.EditedTue, Apr 9, 6:31 PM

@DLynch hmm, am I going crazy or missing something?

Also, I'm pasting from sources within the browser outside of the Mediawiki instance. Either from the URL bar, or copy/pasting from plain text on a page. Both seem to be converted to external links on test2.

I think you are talking about different things. You are pasting the plain text https://google.com/foobar which gets interpreted as a link. This issue addresses pasting a link (from a webpage, or other rich text source). In the latter, what is in the clipboard is an actual HTML link (<a href="https://google.com/foobar">Foobar</a>).

Try copying this one: foobar

Ah, thanks! That explains the confusion.

The issue on my instance, from the task that was merged here (T220462), is not about pasting an actual HTML link but about external plain text links not being automatically interpreted as a link.

This behavior seems to change based on the presence of link/mwExternal in the blacklist.

When link/mwExternal is in the blacklist:

When link/mwExternal is not in the blacklist:

  • Internal plaintext links are automatically converted
  • External plaintext links are also automatically converted

edit: after some more tests, I think this is only affecting URLs that I'm copying/pasting from Chrome's URL bar. When link/mwExternal is in the blacklist, those do not copy/paste correctly.

It seems like things may not be as "plaintext" under the hood as I thought.

External plaintext links remain plaintext

I'm not seeing that in master - you can test this for yourself on any of our live sites (e.g. en.wikipedia.org).

It seems like things may not be as "plaintext" under the hood as I thought.

You are right, the content copied from the Chrome URL bar is the HTML anchor tag. For this page, for example, the clipboard content of copying the URL from Chrome address bar is:

Version:0.9
StartHTML:0000000105
EndHTML:0000000274
StartFragment:0000000141
EndFragment:0000000238
<html>
<body>
<!--StartFragment--><a href="https://phabricator.wikimedia.org/T129546">https://phabricator.wikimedia.org/T129546</a><!--EndFragment-->
</body>
</html>
mingle added a comment.Tue, Apr 9, 8:48 PM

That makes sense.

After some more tests, I can reproduce this on a fresh REL1_31 install but it appears to have been fixed in REL1_32 and upwards. I'll work on upgrading my instances and that should take care of the issue I'm facing.

Thanks for the help!