Page MenuHomePhabricator

Pasting text from Google docs into visual editor loses formatting if certain characters which look like wikitext are present
Closed, ResolvedPublic8 Estimated Story Points

Description

Steps to reproduce:

  1. Create a google doc containing formatted (bullet points, nested bullet points, headings) text
  2. Add some wikitext characters
    1. reproduced with [
    2. reproduced with { {
  3. Paste into visualeditor

Actual Results
All or almost all formatting is lost

Expected Results
Formatting is retained

Event Timeline

Jdforrester-WMF renamed this task from Pasting text from Google docs into visual editor loses formatting if certain characters present to Pasting text from Google docs into visual editor loses formatting if certain characters which look like wikitext are present.Sep 13 2016, 7:13 PM
Jdforrester-WMF triaged this task as Medium priority.
Jdforrester-WMF set the point value for this task to 8.
Jdforrester-WMF subscribed.

Looks like we're evaluating whether something should be viewed as wikitext based on having string like "[5 minutes]" in them, even if the document also has actual styling (<b>, <ul>, etc.).

Updating on this: it's actually more consistent now! We always lose formatting on pastes from Google Docs regardless of wikitext-ish characters being present. This is because Google Docs copied HTML is really full of style.

e.g. a copy of "Test document with formatting and" gets us HTML that looks like:

"<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-a88d907f-7876-0b35-0539-c8d29f6c980a"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Test document </span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:italic;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">with</span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">formatting</span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> and</span></b>"

We strip all style attributes blindly, then unwrap the resultant empty spans, and that leaves us with just the plain text for simple formatting like this.

Restoring Google Docs content having formatting would be doable, but fiddly. Rather than removing the style attributes, we could go over them all and translate them into standard tags (turning a <span style="font-weight:700"> into a <b>, for instance). This opens up a whole can of worms, though.

(The wikitext conversion then happens because it runs when we detect wikitext-like stuff in plaintext... which is all we're left with. Pasting in content with actual markup that survives the sanitizing works fine.)

Trying to parse out all the formatting and apply it correctly would be very fragile. I wonder if there's already library for this out there somewhere?

@Deskana: A bit of searching, and all I turned up was one from the DraftsJS people which is aimed precisely at pasting Google Docs content. It's very small and narrowly tailored to the markup Google Docs produces.

Change 393813 had a related patch set uploaded (by DLynch; owner: DLynch):
[VisualEditor/VisualEditor@master] ce.Surface: support formatted google docs paste content again

https://gerrit.wikimedia.org/r/393813

Change 393813 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] ce.Surface: support formatted google docs paste content again

https://gerrit.wikimedia.org/r/393813

Change 393932 had a related patch set uploaded (by DLynch; owner: DLynch):
[VisualEditor/VisualEditor@master] ce.Surface: more tests for Google Docs pasting

https://gerrit.wikimedia.org/r/393932

Change 393932 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] ce.Surface: more tests for Google Docs pasting

https://gerrit.wikimedia.org/r/393932

Change 393968 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (49b182123)

https://gerrit.wikimedia.org/r/393968

Change 393968 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (49b182123)

https://gerrit.wikimedia.org/r/393968