Page MenuHomePhabricator

Switching from source editing to visual editing mode is broken with the REST API
Open, MediumPublic

Description

See https://office.wikimedia.org/w/index.php?diff=prev&oldid=327737 (private wiki)

I suspect that this was caused by repeatedly switching from wikitext to visual modes, but the visual editor doesn't support association lists, so the wikitext mode was the only way to get it done. Note particularly the table corruption (I didn't even click on the table, much less edit it) and the replacement of interlanguage links: w:de: became :en:de:

Event Timeline

We should get to the bottom of this before we move Parsoid off RESTBase on production wikis in case this is related to failed stashing.

Something strange happened here

Who are you going to call?

Also, see this link where I made a bunch of edits fixing up wikitext and I wonder if that is actually something related .. i.e. whether the non canonical wikitext for tables, etc. was caused by the VE<->REST API<->Parsoid combo vs. human editors writing that source wikitext that I edited out.

ssastry triaged this task as High priority.EditedApr 6 2023, 3:58 AM

Another intsance .

I was able to reproduce this bug by switching editing modes. Could be an etag (or some other stashing related) bug causing the REST API to return a stale data-parsoid blob to Parsoid.

ssastry renamed this task from Something strange happened here (corrupted page) to Switching between editing modes is broken with the REST API .Apr 6 2023, 3:58 AM

Change 906542 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/extensions/VisualEditor@master] Don't force the "view" flavor when using stashing!

https://gerrit.wikimedia.org/r/906542

Change 906543 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] HtmlOutputRendererHelper: Force flavor when stashing

https://gerrit.wikimedia.org/r/906543

NOTE: the two patches will probably NOT fix the problem described in this ticket. It's a small bug that I found while investigating, but as far as I can tell, it is inconsequential. It seemed like a good idea to iron out all potential source of confusion.

Another intsance .

I was able to reproduce this bug by switching editing modes. Could be an etag (or some other stashing related) bug causing the REST API to return a stale data-parsoid blob to Parsoid.

So far, I have been unable to reproduce the issue locally. Can you give me more details on the exact settings you are using for VE, and the contents of the page before the edit, and what you are changing?

Change 896104 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Make VE on officewiki use Parsoid directly

https://gerrit.wikimedia.org/r/896104

I just realized that Officewiki is still using "vrs without restbase" mode, not direct mode (like mediawiki.org does). So dirty diffs are expected.

My expectation is that enabling direct mode on officewiki would fix the issue. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/896104

We can switch to direct mode, but as we discussed on Slack, this is not a dirty diff, but a corrupted diff due to the fact that html2wt / selser is likely getting mismatched HTML and data-parsoid blobs.

On some further investigation, variant conversion is interfering with the injecting of data-parsoid attributes into the HTML DOM. I am working on a fix now, but it's a bit more tangled than I'd like.

There is the added mystery of understanding why variant conversion is being trigegred in the first place...

Steps to reproduce:

  • Start with this officewiki sandbox page. Click "Edit"
  • Italicize bar in VE
  • Switch to source editing
  • Add "----" below the now italicized bar
  • Switch back to VE
  • Click 'Save Changes' to open the save dialog. Click 'Review changes' to see the wikitext that is to be saved and you should see the corruption (Don't save so that we can continue exploring the bug with the same oldid).

Change 906651 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] LanguageVariantConverter: preserve data-parsoid array

https://gerrit.wikimedia.org/r/906651

Change 906652 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] HtmlOutputRendererHelper: skip redundant variant conversion

https://gerrit.wikimedia.org/r/906652

The issue can be reproduced locally with $wgVisualEditorDefaultParsoidClient = 'vrs'; and $wgVirtualRestConfig['modules']['parsoid'] = ...

The problem stems from the way that the output for the /html/ endpoint is generated. It used to be that Parsoid would stitch the data-parsoid inline and not add id="mw..." attributes. Now, pageBundle is always request and the REST API inlines them, resulting in HTML that looks like <p id="mwAg" data-parsoid='{"dsr":[0,4,0,0]}'>asfd</p>

This was noted in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/867571/1/includes/Rest/Handler/HtmlOutputRendererHelper.php#b545

Next, when switching back from wikitext editing, the HTML that's generated and that VE returns will have the same ids on different nodes from the original rendering of the revision, since it's a separate parse. VE doesn't send back any etag or renderid, but it does return the revid, which, sure, fine,

    [from] => html
    [format] => wikitext
    [stash] => 
    [original] => Array
        (
            [revid] => 1382
        )

)

so, the REST API decides to fetch original parser output for that revision id,
https://github.com/wikimedia/mediawiki/blob/master/includes/Rest/Handler/Helper/HtmlInputTransformHelper.php#L458-L462

The bug is that it then applies the original data-parsoid it just fetched to the newly edited html. And since that html has ids which may be the same but don't necessarily correspond, you get the corruption.

That didn't used to happen before new way data-parsoid is inlined. It also didn't happen when it was left to regenerate the old html,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Wikitext/ContentModelHandler.php#L70-L116

It doesn't happen for 'direct' mode, because the response is faked and a correct ParsoidRenderID is passed to setOriginal,
https://github.com/wikimedia/mediawiki-extensions-VisualEditor/blob/master/includes/DirectParsoidClient.php#L119-L124

Regenerating the DOM in the face of pageBundle was something that was never worked out, see T146187

Thanks @Arlolra. Since we now have a diagnosis and a mechanism to reproduce this locally, I propose we plug this hole by switching officewiki to direct mode. This will let us fix the underlying issues uncovered in Arlo's diagnosis a bit more leisurely.

FWIW, the direct mode also suffers from the same issue but is masked by the fact that a ParsoidRenderID is available so there's no need to regenerate.

If you comment out this etag, you get the same result,
https://github.com/wikimedia/mediawiki-extensions-VisualEditor/blob/master/includes/DirectParsoidClient.php#L262

VE doesn't send back any etag or renderid, but it does return the revid, which, sure, fine,

Oh, but that HTML was generated from the edited wikitext, so any dsr info in the inlined data-parsoid won't necessarily match up with the wikitext associated with that revid, so not fine

ssastry renamed this task from Switching between editing modes is broken with the REST API to Switching from source editing to visual editing mode is broken with the REST API .Apr 7 2023, 3:43 AM

Change 896104 merged by jenkins-bot:

[operations/mediawiki-config@master] Make VE on officewiki use Parsoid directly

https://gerrit.wikimedia.org/r/896104

Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:29:31Z] <lucaswerkmeister-wmde@deploy2002> Started scap: Backport for [[gerrit:896104|Make VE on officewiki use Parsoid directly (T320529 T333402)]]

Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:30:54Z] <lucaswerkmeister-wmde@deploy2002> lucaswerkmeister-wmde and daniel: Backport for [[gerrit:896104|Make VE on officewiki use Parsoid directly (T320529 T333402)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:39:20Z] <lucaswerkmeister-wmde@deploy2002> Finished scap: Backport for [[gerrit:896104|Make VE on officewiki use Parsoid directly (T320529 T333402)]] (duration: 09m 48s)

For the record, Arlo said on slack: Problem is also that VE is sending the oldid when it parsed the HTML from source edited wikitext.

Change 911419 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/VisualEditor@master] [WIP] Stop sending oldid when switched

https://gerrit.wikimedia.org/r/911419

Summary of a meeting I just had with Arlo about this:

  • I failed to reproduce locally because the content of my test page wasn't complex enough. P47271 will do it.
  • The VE client should not send an oldid after switching. The "original" revision is no longer the one in the database.
  • We don't have to worry about fixing non-restbase vrs mode, everything will be "direct" soon.
  • We should no longer have the hack in ContentModelHandler that re-generates the DOM on the flx.
  • HtmlInputTransformHelper should behave as follows when attempting selser, and no stashed base HTML can be found (because the stash is gone, or no etag was supplied):
    • if an oldid was supplied, try to re-generate HTML (or get it from parser cache).
    • if no oldid was supplied, the conversion should fail.
daniel lowered the priority of this task from High to Medium.Apr 25 2023, 3:13 PM

Office wiki is fixed, this should no longer affect any production issues. Dropping priority.
VRS mode is going to be removed anyway. We still should fix the conceptual issue.

I made a test case around clarifying the expected bahavior when stah expired and a revision ID is given: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/913106

Change 906543 merged by jenkins-bot:

[mediawiki/core@master] HtmlOutputRendererHelper: Force flavor when stashing

https://gerrit.wikimedia.org/r/906543

Change 906542 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Don't force the "view" flavor when using stashing!

https://gerrit.wikimedia.org/r/906542