Page MenuHomePhabricator

Insufficient span tags stripping from copy-and-paste in Safari
Closed, ResolvedPublic8 Story Points

Description

From @TrevorParscal's report on T78540#1157939:

Reproduced with Safari 8.0.4 on MacOS X 10.10.2.

  1. Select an internal link, a space and some plain text
  2. Copy
  3. Paste
  4. Click save
  5. Click preview changes
  6. Notice that there's an extra span around the space and plain text in the pasted content

Details

Reference
bz69494

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:45 AM
bzimport added a project: VisualEditor.
bzimport set Reference to bz69494.

Wow that's weird. I wonder if this is related to copy/paste in any way?

Jdforrester-WMF changed the task status from Open to Stalled.Feb 2 2015, 7:24 PM
Whatamidoing-WMF added a subscriber: Whatamidoing-WMF.

This has suddenly started happening all over the place. It's also adding language codes. It might be related to copying. I've definitely seen this in Safari 6.2 on Mac OS 10.8.5

https://en.wikipedia.org/w/index.php?title=Chandler_Bats&diff=prev&oldid=651740262 is a fairly clean example.

https://fr.wikipedia.org/w/index.php?title=Zach_Galifianakis&diff=next&oldid=112817147 adds left-to-right code.

https://fr.wikipedia.org/w/index.php?title=Zach_Galifianakis&diff=prev&oldid=112800605 (earlier) adds many span tags. Based on the content, it might be adding them to copy-paste content. This bit in particular:

<span lang="FR"><span lang="FR">[1]</span> http://www.zachgalifianakis.com/biographytext.htm</span>

looks rather like the editor copied a citation from the en.wp article and pasted it into the fr.wp article (and then translated the text).

Elitre added a subscriber: Elitre.Mar 18 2015, 1:04 PM
Jdforrester-WMF renamed this task from VisualEditor: Unnecessary <span>s inserted into articles to Unnecessary <span>s inserted into articles.Mar 19 2015, 2:28 AM
Jdforrester-WMF changed the task status from Stalled to Open.
Jdforrester-WMF triaged this task as High priority.
Jdforrester-WMF set Security to None.
Jdforrester-WMF edited a custom field.

When a text is copied between different-language wikipedias, it seems to me to be perfectly fine that it is wrapped with a <span> that states the original language. If the user copy/pastes and then erases the text, they should, theoretically (And practically -- it's marked) erase the language annotation.

I can't manage to reproduce the overlapping span tags (the double spans) that used to appear back in August. The current ones are more or less what we want to see, or copy/paste in pieces by the user.

The potential bugs I see here are:

  1. If the user did not see an indication that these copy/paste language spans are language annotations, that's a bug
  2. This line seems to be a bug, since it's a double-wrapper language span that shouldn't happen even if it was a result of a copy/pate from another language.
<span lang="FR"><span lang="FR">[1]</span> http://www.zachgalifianakis.com/biographytext.htm</span>

By the way, it also makes perfect sense to add directionality to a language block, especially if that language block is being edited. That's the point of language annotations, and it seems to be very convenient that this automatically happens between copy/pastes. It helps not only the editor, but also the page in read mode, as well as indexing, accessiblity, etc. That part I wouldn't call a bug unless there's something I'm completely missing here.

NicoV added a subscriber: NicoV.Mar 20 2015, 8:15 AM

@Mooeypoo What you seem to miss in your two comments is that the language code put in the lang tags doesn't seem to be the original language, but the language of the current wiki...
All the examples above show lang="EN" added to enwiki, lang="FR" added to frwiki: this is totally useless; and if it's due to a copy from a wiki in an other language, it's just plain wrong

Same for the directionality: default directionality on frwiki is "ltr", so adding a dir="ltr" is useless.

@NicoV, you're right. Apologies, I missed that. The <span> languages shouldn't be added from the same language.

Here's another example, with no language tags: https://en.wikipedia.org/w/index.php?title=Moto_360&curid=42238402&diff=653207044&oldid=653203355

@ssastry thinks this is related to cut-and-paste, and that there used to be bogus ID attributes in the <span>s which were removed by Parsoid (see https://gerrit.wikimedia.org/r/197656 ).

In case it is useful to VE to debug, open https://logstash.wikimedia.org/#/dashboard/elasticsearch/parsoid and search for "html2wt" -- you will find logged warnings (1 warning per span found => multiple warnings per page in some cases).

Does this ticket cover all insertions of span tags? Is is useful to provide more diffs? (for example https://fr.wikipedia.org/w/index.php?diff=113225218 )

Jdforrester-WMF renamed this task from Unnecessary <span>s inserted into articles to Insufficient span tags stripping from copy-and-paste in Safari.Mar 27 2015, 10:50 PM
Jdforrester-WMF reassigned this task from Catrope to Esanders.
Jdforrester-WMF edited a custom field.

Change 200299 had a related patch set uploaded (by Esanders):
Simplify getClipboardHash

https://gerrit.wikimedia.org/r/200299

Change 200299 merged by jenkins-bot:
Simplify getClipboardHash

https://gerrit.wikimedia.org/r/200299

Elitre reopened this task as Open.Apr 3 2015, 7:32 AM

Still seeing those span tags in the wild...

Elitre closed this task as Resolved.Apr 3 2015, 7:33 AM

Maybe I should look at the Version before commenting though?

gpaumier moved this task from To Triage to Announce in next Tech/News on the User-notice board.
gpaumier moved this task from Backlog to Triaged on the Notice board.
matej_suchanek moved this task from Triaged to Archive on the Notice board.Apr 7 2015, 3:07 PM

Doesn't seem to be entirely fixed, we still get <span lang="EN-US"> on fr wikipedia, for text that is obviously not in English.

https://fr.wikipedia.org/w/index.php?title=La_route_M%C3%A9diterran%C3%A9e&curid=9025986&diff=113836353&oldid=113835437

Elitre reopened this task as Open.Apr 16 2015, 1:06 PM

Reopening because this doesn't look fixed. On cywiki there are lang=CY span tags yet. The user shouldn't have copy/pasted wikitext, but the span tags indicating the context is in the same language than the wiki it's being pasted on do not seem useful.

NicoV added a comment.Apr 19 2015, 7:58 AM

And not only it's almost always useless, but it can also be totally wrong...
In this edit, span tags were added with lang="FR" when it's clearly not in French.

NicoV raised the priority of this task from High to Unbreak Now!.Apr 20 2015, 9:40 AM
Aklapper lowered the priority of this task from Unbreak Now! to High.Apr 20 2015, 10:55 AM

Restoring previous priority "high" - Maintainers will take a look at this soon and are aware of this problem, but it is up to them to judge priority in comparison with other open urgent tasks (plus this got reopened on Thursday and there's been a weekend since then).
Sorry for the inconvenience caused by this. :-/

Examples of this not working have been posted both here and at enwiki since the 14th and no answer or no acknowledgement in either place since then (tuesday last week, reopening it on friday was already a consequence of no one answering).
I raised the priority so that someone will do something instead of ignoring the problem.

Jdforrester-WMF closed this task as Resolved.Apr 20 2015, 6:22 PM

Moved new bug reports to T96589: More <span> corruption (unknown source). The bug here was fixed, this appears to be a different source.