Page MenuHomePhabricator

Strikethrough in Reply tool adds <s> tags with href attribute
Closed, ResolvedPublic

Description

Reported here: https://www.mediawiki.org/w/index.php?title=Topic:Vmu502mrhvk3l7ex&topic_showPostId=vmu502mrhzi5tbd5#flow-post-vmu502mrhzi5tbd5

If you enter Foo in visual mode as an anonymous user and choose it to be stroke through, it gets to this in source mode:

<s>Foo</s>

But after you save the change, the following is in the diff:

<s href="Special:Contributions/10.0.3.1">Foo</s>

See https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Talk:Dogs&diff=424263&oldid=424262 for example

Open questions

  • What in the code might be causing an href tag to be inserted when formatting text in the Reply tool's visual mode?

Done

  • "Open questions" are answered.
  • If root causes are identified, tickets are filed to resolve said issues.
    • The root causes were identified and addressed within this ticket; see: T253584#6212591.

Event Timeline

Dvorapa created this task.May 25 2020, 6:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 25 2020, 6:35 PM

The moving of attributes to the wrong element looks like a Parsoid issue.

I haven't yet found a way to reproduce this. I've tried switching editors, and reply conflicts...

Why is this in Editing QA column? I also can't reproduce this by the way.

Looks like it was incorrectly created with that tag.

Dvorapa added a comment.EditedMay 29 2020, 1:52 PM

I copied tags from T253402 as these two bugs occur simultaneously. Editing QA does not have any description, so I didn't know what this tag stands for and copied it blindly. Sorry for me.

I copied tags from T253402 as these two bugs occur simultaneously. Editing QA does not have any description, so I didn't know what this tag stands for and copied it blindly. Sorry for me.

No worries! :) It's alright.

ppelberg added a subscriber: ppelberg.EditedJun 3 2020, 1:11 AM

Hmm. I haven't been able to reproduce this either. [1]

@Krenair, do you remember what steps you took to produce the diffs linked above?

...I suspect it'll be hard to recall at this point, but figured it's worth asking.


For context, the steps I took to produce the below [1] were as follows:

  1. Visit while logged in: https://en.wikipedia.beta.wmflabs.org/wiki/Talk:Cats
  2. Click the "Reply" link appended to the comment ending in: AronMan (talk) 01:58, 22 May 2020 (UTC) (this looks like it's the same comment you were responding to).
  3. In the Reply tool's visual mode enter:
hello 
how
are
you
  1. Make hello into a "superscript"
  2. Make how into "big" letters
  3. "Underline" are
  4. Post reply
  5. ✅Observe comment is posted as expected:
:::::<sup>hello</sup>
:::::<big>how</big>
:::::<u>are</u>
:::::you [[User:Ppelberg-test|Ppelberg-test]] ([[User talk:Ppelberg-test|talk]]) 01:04, 3 June 2020 (UTC)

  1. https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Talk%3ADogs&type=revision&diff=427019&oldid=426999
Pelagic added a subscriber: Pelagic.EditedJun 3 2020, 10:38 AM

I got this on sup and span lang whilst making a visual-editing reply at diff 426998, described in post vnjnk29hmsb98r71.

Note, two different user hrefs on two separate HTML elements. IIRC, I did switch from Visual back to Source and noticed the added href on one of those before saving. Also the contents of the hrefs weren’t URLs but the format that would normally be seen in a wikilink (one had a space, other was special:contribs as in a signature).

Hmm. I haven't been able to reproduce this either. [1]

@Krenair, do you remember what steps you took to produce the diffs linked above?

...I suspect it'll be hard to recall at this point, but figured it's worth asking.

Sorry, nope.

I got this on sup and span lang whilst making a visual-editing reply at diff 426998, described in post vnjnk29hmsb98r71.

Note, two different user hrefs on two separate HTML elements. IIRC, I did switch from Visual back to Source and noticed the added href on one of those before saving. Also the contents of the hrefs weren’t URLs but the format that would normally be seen in a wikilink (one had a space, other was special:contribs as in a signature).

Thank you for sharing these details, @Pelagic. We're going to investigate this; see "Next steps" below.

Sorry, nope.

Thank you for thinking about it.


Next steps
Notes from this morning's meeting:

  • Considering we have not yet been able to reproduce the issues being reported, we are going to think about the cause(s) of this issue through the following question: "What in the code might be causing an href tag to be inserted when formatting text in the Reply tool's visual mode."

The above has been added to the task description's "Open questions" section.

CCing @cscott @ssastry for ideas about what might cause this.

<s href="Special:Contributions/10.0.3.1">Foo</s> is trippy for sure :-) I think the data-parsoid id from an existing link is being transferred. I can replicate this on the commandline by manually adding the id to whatever tag I want.

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --html2wt --pbinfile /tmp/pbout < /tmp/test.html 
[[User:SSastry]]<s href="User:SSastry">foo</s><u href="User:SSastry">bar</u><big href="User:SSastry">baz</big>/p>

Parsoid doesn't have validity checks on whether a particular attribute is supported by a HTML5 tag, but even if it did, it is possible to still see corruptions like this in other scenarios where different tags accept a specific attribute.

So, it might be worth focusing efforts on what in the reply tool is causing this incorrect id assignment.

We discussed today that one place we clone IDs is when we duplicate the list item to create a new comment. @ssastry is it possible that an ID on the parent list-item could be causing this?

Change 604137 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/DiscussionTools@master] Avoid cloneNode in modifier, create new node without attributes

https://gerrit.wikimedia.org/r/604137

We discussed today that one place we clone IDs is when we duplicate the list item to create a new comment. @ssastry is it possible that an ID on the parent list-item could be causing this?

I don't follow your question.

But, looking at the diffs, it looks likes ids of some ancestor's child got duplicated ... but that seems so arbitrary ... why / how would that happen? An alternative explanation might be that some TID confusion is happening and so, data-parsoid blob from a different version is getting picked up ... but that would probably cause lot more corruption than this, no?

So, yes, still baffled.

I think I actually found the cause of the bug. It happens only when you type the reply in wikitext mode, then switch to visual mode before saving. You might need a few tags in your wikitext (depending on the existing contents of the page). I just reproduced here: https://en.wikipedia.beta.wmflabs.org/w/index.php?title=T253584&diff=428953&oldid=428952

When we're switching from source to visual mode, we send the wikitext to Parsoid without the "context" of the page, so Parsoid/RESTBase will generate the attributes like id="mwBA" from scratch. Then when saving, we insert the reply HTML into the original page HTML, and if there is already an element with the same ID on the page, its attributes will be reused.

Probably the easiest way to fix it would be to strip the id attributes from the reply HTML before saving. Alternatively, maybe there's some way to tell Parsoid where we're going to put the HTML and that it shouldn't use the existing IDs?

I think I actually found the cause of the bug. It happens only when you type the reply in wikitext mode, then switch to visual mode before saving. You might need a few tags in your wikitext (depending on the existing contents of the page). I just reproduced here: https://en.wikipedia.beta.wmflabs.org/w/index.php?title=T253584&diff=428953&oldid=428952

When we're switching from source to visual mode, we send the wikitext to Parsoid without the "context" of the page, so Parsoid/RESTBase will generate the attributes like id="mwBA" from scratch. Then when saving, we insert the reply HTML into the original page HTML, and if there is already an element with the same ID on the page, its attributes will be reused.

Probably the easiest way to fix it would be to strip the id attributes from the reply HTML before saving. Alternatively, maybe there's some way to tell Parsoid where we're going to put the HTML and that it shouldn't use the existing IDs?

I am confused .. are you doing something different wrt editor switching in DiscussionTools than how do it in VisualEditor? I'll hand this off to @cscott since he is following this more closely than me.

@subbu this is (vaguely) related to https://gerrit.wikimedia.org/r/565367 and the patch above it -- when using html2html mode, you need to take care that you don't overwrite the data-parsoid IDs assigned previously. I think we're doing that correctly for language conversion and (after the patch) red links -- there are unit tests verifying this -- but it's quite possible that the codepath being used for VE->wikitext (or wikitext->VE?) switching is overwriting the ids. @matmarex I don't think the solution is to strip the IDs, unless this is newly-added content we're talking about. Can we get details of the exact RESTBase API calls being used for the switch and/or if these are different from the API calls used by VE?

unless this is newly-added content we're talking about.

It is.

Can we get details of the exact RESTBase API calls being used for the switch and/or if these are different from the API calls used by VE?

We use the parsefragment action in ApiVisualEditor (the method is parseWikitextFragment).

Looking at other places that method is used in VE, for example in the wikitext past handler (MWWikitextStringTransferHandler) we do indeed have code to strip RESTBase IDs from the returned content.

Change 604494 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/VisualEditor@master] Move restbaseId stripping to ve.utils.parsoid

https://gerrit.wikimedia.org/r/604494

"When we're switching from source to visual mode, we send the wikitext to Parsoid without the "context" of the page"

We have the pageName context, but possibly we also need to send the etag?

Change 604495 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/DiscussionTools@master] Strip RESTBase IDs when switching to VE

https://gerrit.wikimedia.org/r/604495

matmarex reassigned this task from matmarex to Esanders.Jun 10 2020, 7:38 PM

Change 604494 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Move restbaseId stripping to ve.utils.parsoid

https://gerrit.wikimedia.org/r/604494

Change 604495 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] Strip RESTBase IDs when switching to VE

https://gerrit.wikimedia.org/r/604495

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJun 15 2020, 9:41 PM
ppelberg edited projects, added Skipped QA; removed Editing QA.Jun 15 2020, 9:41 PM
ppelberg updated the task description. (Show Details)

I don't know if we had a consistently reproduceable test case, but I'm confident that @matmarex identified the root cause, and time will tell if that is correct.

I don't know if we had a consistently reproduceable test case, but I'm confident that @matmarex identified the root cause, and time will tell if that is correct.

Roger that. Thank you for clarifying, Ed.

Change 604137 abandoned by Bartosz Dziewoński:
Avoid cloneNode in modifier, create new node without attributes

Reason:
Not needed

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/ /604137