Page MenuHomePhabricator

HTTP 404 error in VE possibly when confronted with an edit conflict
Closed, ResolvedPublic

Description

When I tried editing 2019 Abqaiq–Khurais attack today the Visual Editor gives me a "HTTP 404" error when I try and save.

I did a wide array of edits, including Wikilinking, text editing, text additions, and Infobox editing using TemplateData. It would be difficult to replicate the exact edits I did but I cannot think of a culprit. Possibly abuse filters, or edit conflict, or something of that sort? As this is not occuring on any other pages I edit.

Screenshot of the error:

Luckily, I can reload the page and my edits are recovered but I am unable to publish the changes I have made. I cannot switch to source editing either. Screenshot of switching:

Browser: Google Chrome Version 76.0.3809.132 (Official Build) (64-bit)
OS: macOS 10.14.6 (18G95)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Reopening since T230272 has been resolved, but this error is still occurring according to reports. This may be a different underlying issue.

@Ryasmeen - Can you reproduce this bug? (Note that it's specifically happening when there is an edit conflict and the two-column edit conflict beta feature is active.)

Just happened again at September 2019 climate strike. HTTP 404 when there is an edit conflict

IndyJr added a subscriber: IndyJr.Sun, Sep 22, 10:42 PM

I no longer observed that bug after they (developers) claimed to have fixed it, but if it still persists today evidently the underlying problem remains.

Waddie96 added a comment.EditedMon, Sep 23, 8:46 AM

Just happened again, this time at Royal Game of Ur. I received an HTTP 404 error when trying to save from VE, and a "Error loading data from server: apierror-visualeditor-docserver-http" error when trying to switch to source edit mode. However, this time there was no edit conflict!

It is evident the bug has not been fixed, it has only decreased frequency.

marcella added a subscriber: marcella.
This comment was removed by Waddie96.

@Ryasmeen - Can you reproduce this bug? (Note that it's specifically happening when there is an edit conflict and the two-column edit conflict beta feature is active.)

Yes @kaldari, I was just able to reproduce this but only once:

@Ryasmeen - Can you reproduce this bug? (Note that it's specifically happening when there is an edit conflict and the two-column edit conflict beta feature is active.)

Yes @kaldari, I was just able to reproduce this but only once:

Did you encounter it secondary to an edit conflict or independent of that?

@Ryasmeen - Can you reproduce this bug? (Note that it's specifically happening when there is an edit conflict and the two-column edit conflict beta feature is active.)

Yes @kaldari, I was just able to reproduce this but only once:

Did you encounter it secondary to an edit conflict or independent of that?

I encountered this twice once with Two column edit conflict being enabled and once without. But only once on that day. So I think it happens only once a day? I'm still not sure of that yet. Will check again tomorrow and confirm.

@Ryasmeen - Can you reproduce this bug? (Note that it's specifically happening when there is an edit conflict and the two-column edit conflict beta feature is active.)

Yes @kaldari, I was just able to reproduce this but only once:

Did you encounter it secondary to an edit conflict or independent of that?

I encountered this twice once with Two column edit conflict being enabled and once without. But only once on that day. So I think it happens only once a day? I'm still not sure of that yet. Will check again tomorrow and confirm.

I've experienced it mostly with edit conflicts (both with the beta two-column feature enabled and disabled). It occurs, for me at least, more than once a day. There is no way to solve it other than to copy the entire article, exiting the edit and not saving changes, editing the article again, pasting your clipboard, then clicking publish—if there was an e/c then one would have to manually resolve this as to not undo these edits.

It would greatly help if you could provide us with the full request sent to the server and the full response message sent back from the server (including request and response headers), since there is a wide range of reasons why the server responds with a 404.

Waddie96 added a comment.EditedFri, Sep 27, 4:04 PM

It would greatly help if you could provide us with the full request sent to the server and the full response message sent back from the server (including request and response headers), since there is a wide range of reasons why the server responds with a 404.

Of course, how does one do that? I've created a .HAR file from the moment I load VE until the error, but I don't know which part of the .HAR I should post here that is the "full request sent to the server and the full response message sent back from the server (including request and response headers)"

It would greatly help if you could provide us with the full request sent to the server and the full response message sent back from the server (including request and response headers), since there is a wide range of reasons why the server responds with a 404.

Of course, how does one do that? I've created a .HAR file from the moment I load VE until the error, but I don't know which part of the .HAR I should post here that is the "full request sent to the server and the full response message sent back from the server (including request and response headers)"

I've emailed you my HAR file.

For the first time since I've encountered this bug I received a Resolve edit conflict instead of the HTTP 404 error. The beta feature is disabled.

@mobrovac @matmarex Hope you've had a chance to get what you need out the HAR file I sent you.

Yes, I actually started looking at it about an hour ago and just noticed that you were also talking about it on this task.

As far as I can tell, everything in the HAR file looks normal. VE receives content from RESTBase with the etag W/"917621012/06d7ab00-def2-11e9-bf13-931bd6eb9bdd", passes the same etag as an API query parameter, and then API responds that RESTBase responded with a HTTP 404 error.

Does RESTBase even store renders for old revisions at all? If it doesn't, does it even make sense for VE to send an If-Match header?

I just tested quickly and when requesting the latest revision, I get the same etag every time, as expected:

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947735?redirect=false&stash=true" | grep etag:
etag: W/"918947735/fcfcbb10-e3e8-11e9-a490-e9b3c9ebfa1e"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947735?redirect=false&stash=true" | grep etag:
etag: W/"918947735/fcfcbb10-e3e8-11e9-a490-e9b3c9ebfa1e"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947735?redirect=false&stash=true" | grep etag:
etag: "918947735/fcfcbb10-e3e8-11e9-a490-e9b3c9ebfa1e"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947735?redirect=false&stash=true" | grep etag:
etag: "918947735/fcfcbb10-e3e8-11e9-a490-e9b3c9ebfa1e"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947735?redirect=false&stash=true" | grep etag:
etag: W/"918947735/fcfcbb10-e3e8-11e9-a490-e9b3c9ebfa1e"

But when requesting an older revision, I get a different one every time:

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947627?redirect=false&stash=true" | grep etag:
etag: W/"918947627/378ddd40-e3e9-11e9-9dae-cf7d1d36d96c"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947627?redirect=false&stash=true" | grep etag:
etag: W/"918947627/38644010-e3e9-11e9-bc9c-375e774db322"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947627?redirect=false&stash=true" | grep etag:
etag: W/"918947627/393a2db0-e3e9-11e9-bc9c-375e774db322"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947627?redirect=false&stash=true" | grep etag:
etag: W/"918947627/3a1ff9d0-e3e9-11e9-8a5e-6789ac21d536"

$ curl -s -I "https://en.wikipedia.org/api/rest_v1/page/html/User%3AMatma_Rex%2Fsandbox/918947627?redirect=false&stash=true" | grep etag:
etag: W/"918947627/3ad36b50-e3e9-11e9-be32-b5ef25c69fc0"

Does RESTBase even store renders for old revisions at all? If it doesn't, does it even make sense for VE to send an If-Match header?

RESTBase stores old revisions only on demand, which happens when the client provides stash=true. In order to minimise dirty diffs and conflicts, a new version of the HTML is generated every time you ask for an older revision, which is why you get a different ETag every time. These renders are stored for 24h and then discarded. The If-Match header sent during transform is used to fetch the exact render the user started editing, so yes, VE needs to send that header as otherwise RESTBase would always return a 404.

RESTBase can return a 404 during transforms in two cases: (i) the client supplied an If-Match header for which a render was not stored previously (i.e. the HTML was not obtained using stash=true); or (ii) the transform request happened more than 24h after the original HTML has been requested. Given the user reports, I don't think (ii) is likely, which leaves us with (i).

Does the VE API log 404s it receives from RB during transforms? It would be useful to have these to pin-point the problem.

Waddie96 renamed this task from HTTP 404 error in VE when using two-column edit conflict beta feature to HTTP 404 error in VE possibly when confronted with an edit conflict.Tue, Oct 1, 10:45 AM

@mobrovac Thanks for the explanation.

I wanted to spend a lot of time today experimenting, but couldn't reproduce the problem again, so instead I spent a lot of time reading the code. Did not find anything interesting.

One curious behavior I noticed is that RESTBase sometimes returns "weak" etags (prefixed with W/) and sometimes "strong" ones (not prefixed). Is there a reason for this? (I couldn't find where in the code this happens.)

Does the VE API log 404s it receives from RB during transforms? It would be useful to have these to pin-point the problem.

No. I'll add some logging.

Change 540233 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@master] ApiVisualEditor: Add logging for RESTBase HTTP errors

https://gerrit.wikimedia.org/r/540233

Change 540234 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.24] ApiVisualEditor: Add logging for RESTBase HTTP errors

https://gerrit.wikimedia.org/r/540234

Change 540235 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[operations/mediawiki-config@master] Add 'VisualEditor' logging channel to wmgMonologChannels

https://gerrit.wikimedia.org/r/540235

Change 540235 merged by jenkins-bot:
[operations/mediawiki-config@master] Add 'VisualEditor' logging channel to wmgMonologChannels

https://gerrit.wikimedia.org/r/540235

Mentioned in SAL (#wikimedia-operations) [2019-10-01T23:12:54Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T233127: Add VisualEditor logging channel to wmgMonologChannels (duration: 00m 59s)

Change 540234 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.24] ApiVisualEditor: Add logging for RESTBase HTTP errors

https://gerrit.wikimedia.org/r/540234

Mentioned in SAL (#wikimedia-operations) [2019-10-01T23:46:06Z] <ebernhardson@deploy1001> Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: T233127: ApiVisualEditor: Add logging for RESTBase HTTP errors (duration: 00m 58s)

One curious behavior I noticed is that RESTBase sometimes returns "weak" etags (prefixed with W/) and sometimes "strong" ones (not prefixed). Is there a reason for this? (I couldn't find where in the code this happens.)

That's done by Varnish, not RESTBase. When the length of the document exceeds a certain length and the client has indicated it accepts gzip, Varnish compresses the output and prepends W/ to the ETag.

No. I'll add some logging.

Thank you! I'll do the same on the RB side

Change 540427 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.25] ApiVisualEditor: Add logging for RESTBase HTTP errors

https://gerrit.wikimedia.org/r/540427

Change 540233 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] ApiVisualEditor: Add logging for RESTBase HTTP errors

https://gerrit.wikimedia.org/r/540233

Change 540427 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.34.0-wmf.25] ApiVisualEditor: Add logging for RESTBase HTTP errors

https://gerrit.wikimedia.org/r/540427

Mentioned in SAL (#wikimedia-operations) [2019-10-02T16:46:25Z] <lucaswerkmeister-wmde@deploy1001> Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: [[gerrit:540427|ApiVisualEditor: Add logging for RESTBase HTTP errors (T233127)]] + [[gerrit:540429|ApiVisualEditorEdit: Add logging for funny etags (T233320)]] (duration: 01m 04s)

So the logging wasn't actually working, and I spent most of today figuring out why. (Turns out nothing gets logged if you try to log structured data.) We just deployed another set of patches (https://gerrit.wikimedia.org/r/q/Iae778f95774df2e24b30387221e39e097e25a4cf) and the logging should work now, with the results logged under 'channel:VisualEditor': https://logstash.wikimedia.org/goto/4d053e9de35d9fccb40fabcabd46ff00 (this has logs for this task and T233320)

@mobrovac If you have the time to take a look, there are ~700 instances of a HTTP 404 response from RESTBase logged now: https://logstash.wikimedia.org/goto/4817263807e9f036af940cf848effc24. I apologize for the messy format (couldn't get structured logging to cooperate, see T234564), but the logs contain the URL we requested, the If-Match header we sent, and the response body.

Some of the cases have mangled or missing If-Match (let's come back to those later, in T233320), but at a glance most of the etags we sent look perfectly valid. Do you have a way to look up if RESTBase ever had those documents, if it purged them, or what?

Would it be useful to log anything else on our side?

@mobrovac If you have the time to take a look, there are ~700 instances of a HTTP 404 response from RESTBase logged now: https://logstash.wikimedia.org/goto/4817263807e9f036af940cf848effc24. I apologize for the messy format (couldn't get structured logging to cooperate, see T234564), but the logs contain the URL we requested, the If-Match header we sent, and the response body.
Some of the cases have mangled or missing If-Match (let's come back to those later, in T233320), but at a glance most of the etags we sent look perfectly valid. Do you have a way to look up if RESTBase ever had those documents, if it purged them, or what?

Thank you @matmarex for setting up logging on the VE side! I took a look at random entries there and found that all seem legit, in that there is no stashed version in storage for the transforms being requested. Now, why that is the case is not clear to me. If RB does not manage to stash the contents it would throw an error when the user starts editing, so I think we can rule that out. The other possibility I can think of is that sometimes stash=true is not supplied to RB, but I can't see how that would happen only sometimes. Perhaps there are other possibilities here, but they are not apparent to me now.

Oh, and I also found some legit 404 cases, where pages were deleted, presumably because someone deleted the page while the user was editing it. E.g., for request with ID XZbZkQpAIDwAAAwda-YAAAAS RB returned:

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/not_found#page_revisions","title":"Not found.","method":"get","detail":"Page was deleted","uri":"/en.wikipedia.org/v1/page/html/Dev_filmography/919164943"}

However, there were only 2 such cases in the last 24h. For that, it might be good for VE to inspect the response and emit an appropriate message to the client, but that's OT here.

Would it be useful to log anything else on our side?

Ideally, having the whole sequence of events from the VE side would be ideal, but I doubt possible. I will enable logging on the RB side Monday, hopefully that will provide some more info.

Change 541204 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] [TEMP] Log all VE requests

https://gerrit.wikimedia.org/r/541204

Change 541204 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] [TEMP] Log all VE requests

https://gerrit.wikimedia.org/r/541204

Mentioned in SAL (#wikimedia-operations) [2019-10-07T10:10:44Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772

Mentioned in SAL (#wikimedia-operations) [2019-10-07T10:16:42Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772 (duration: 05m 58s)

I spent most of yesterday trying to reproduce the 404 for a legit error, but was unable to do so.

I added logging to RESTBase so that it logs all VE requests and the responses. I'll keep it up for a while (up to 24h) to collect data. Hopefully these logs will shed some light on this issue.

Oh, and I also found some legit 404 cases, where pages were deleted, presumably because someone deleted the page while the user was editing it. E.g., for request with ID XZbZkQpAIDwAAAwda-YAAAAS RB returned:

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/not_found#page_revisions","title":"Not found.","method":"get","detail":"Page was deleted","uri":"/en.wikipedia.org/v1/page/html/Dev_filmography/919164943"}

However, there were only 2 such cases in the last 24h. For that, it might be good for VE to inspect the response and emit an appropriate message to the client, but that's OT here.

Interesting. So what is VE supposed to do in this case if we want to let the user save their edit anyway? (Consistent with MediaWiki, we display a warning about this first, but we let the user override it.) Should we retry the request without oldid and etag?

Alternatively, could RESTBase keep and return the render even if the page is deleted, as long as it's requested with the right etag?

Interesting. So what is VE supposed to do in this case if we want to let the user save their edit anyway? (Consistent with MediaWiki, we display a warning about this first, but we let the user override it.) Should we retry the request without oldid and etag?
Alternatively, could RESTBase keep and return the render even if the page is deleted, as long as it's requested with the right etag?

Sorry, I misspoke there. These 404s happen when VE requests the HTML itself via /page/html/{title}, not during transforms. I thought your logging included only transform data, so didn't even bother looking at the route.

In other news, I opened T234928: RESTBase sometimes not retaining stashed content? as my digging revealed it's likely a problem in RESTBase. Stay tuned as we figure out what/why/how this is happening.

Update: I think I found the bug in RESTBase. I deployed the fix yesterday and so far so good. I will leave RB logging on until Monday to make sure the problem is indeed fixed.

In the meantime, I also found how to reproduce the bug, so if there are any testers willing to try it, feel free to test it out. The bug essentially happens on a first VE-edit of a page (that is, open the browser, load a page, click on edit) for pages that have been last edited more than 24h prior.

Update: I think I found the bug in RESTBase. I deployed the fix yesterday and so far so good. I will leave RB logging on until Monday to make sure the problem is indeed fixed.
In the meantime, I also found how to reproduce the bug, so if there are any testers willing to try it, feel free to test it out. The bug essentially happens on a first VE-edit of a page (that is, open the browser, load a page, click on edit) for pages that have been last edited more than 24h prior.

It seems puzzled to me, it often happened to me editing a certain page several times the same day, certainly not at the first edit.

It seems puzzled to me, it often happened to me editing a certain page several times the same day, certainly not at the first edit.

It's actually related to the way VE asks for the HTML from RESTBase; sometimes it asks it via /page/html/{title} while other times it uses /page/html/{title}/{revision}. When the former form is used that's when the bug occurs, and making a first edit is a sure way to have VE use it. There might be other instances, though.

The RESTBase bug has been fixed. However, there are still 404s being emitted during transforms. They now happen at a rate of 4-5 per day (which is great improvement compared to 150 - 200 per day before the bug was fixed). I was looking at the RESTBase logs and they are all legitimate 404s from RESTBase's point of view, in that VE never asks RB for the content in the first place, i.e. there are no equivalent /page/html/{title} requests that reach RESTBase. That may be due to a couple of reasons:

  • VE does not ask the content with ?stash=true, thus hitting Varnish; or
  • VE does not ask for the content at all, presumably because it uses a (local) cache of some sort.

Either way, kicking the ball back now into the VE court.

@mobrovac I think I have some more questions…

I looked at the VE error logs after 10 October (assuming that's after the bug you found was fixed): https://logstash.wikimedia.org/goto/e6af566ed10567b8e30c7861e066ab96.

Out of 500 errors shown there, 310 are HTTP 404 errors. 80 of them have no etag (T233320#5540141). The remaining 230 requests have 38 unique etags (users retry after getting an error). I see two patterns in them:

  • (24 etags) Etags like W/"0/<uuid>/stash". This would indicate that the user was creating a new page (hence the 0) and switched from wikitext to visual editor (hence the /stash). But I tried doing that and did not get any errors. Example etags:
    • W/"0/805c0690-ee80-11e9-8474-0be43dc50466/stash"
    • W/"0/67cf3e90-edcb-11e9-abe0-4b6483c9b636/stash"
    • W/"0/baf79a10-ed01-11e9-9fa5-cb501a6d8bcd/stash"
  • (11 etags) Etags like W/"<revid>/<uuid>[/stash]", except that the time in the UUID is older than 24 hours (compared to the time of the failing request). The oldest one was 99 days old, but there are also several 1-2 days old. This would indicate that the user kept the editor open for this long (plausible for 1-2 days, not really for 99…), or that it was cached somewhere for longer than it should be. Example etags:
    • W/"101599133/05fd1d50-9fd7-11e9-886f-a5a72f81032f" (99 days)
    • W/"919455201/5a52c260-ec76-11e9-bfe4-23c2a3cb166a/stash" (1 day)
    • W/"193024633/0b46c150-eb9d-11e9-b282-8dfa484446ac" (1 day)

So my questions are:

  • Do you have any idea what is causing the first kind of failures?
  • What should VE do when it turns out we're editing a render older than 24 hours (so it is no longer available in RESTBase, as you wrote earlier), so that we can avoid the second kind of failures? Is there a way for VE to send the original render back to RESTBase?

@mobrovac I think I have some more questions…

Thank you for investigating further @matmarex !

  • Do you have any idea what is causing the first kind of failures?

Grrr it seems there is another bug lurking somewhere. I looked at the etags you posted and it is not clear to me how that ETag gets to VE. In all cases, the Etag contains the revision 0 but none of these pages are new - they have existed for quite a while. I tried reproducing this, but haven't managed to. I suspect the problem is somewhere in RESTBase. I will open another ticket and investigate further.

  • What should VE do when it turns out we're editing a render older than 24 hours (so it is no longer available in RESTBase, as you wrote earlier), so that we can avoid the second kind of failures? Is there a way for VE to send the original render back to RESTBase?

It is surprising to me that users are even able to send the data for transformation in these cases. AFAIK, a session lasts up to 9h after which it is scraped, so how does VE even obtain the metadata about the user/edit to be able to send the request to RESTBase? Isn't that info stored in the session? To more concretely answer your question, I am not sure there is something we can actually do, apart from (somehow) advertising to users that their edit is valid for only 24h (which, if you ask me, should be turned down to 9h to match the length of a session).

Grrr it seems there is another bug lurking somewhere. I looked at the etags you posted and it is not clear to me how that ETag gets to VE. In all cases, the Etag contains the revision 0 but none of these pages are new - they have existed for quite a while. I tried reproducing this, but haven't managed to. I suspect the problem is somewhere in RESTBase. I will open another ticket and investigate further.

I opened T235465: Stashing: revid mismatch between URI and Etag for this.

@mobrovac I think I have some more questions…

Thank you for investigating further @matmarex !

  • Do you have any idea what is causing the first kind of failures?

Grrr it seems there is another bug lurking somewhere. I looked at the etags you posted and it is not clear to me how that ETag gets to VE. In all cases, the Etag contains the revision 0 but none of these pages are new - they have existed for quite a while. I tried reproducing this, but haven't managed to. I suspect the problem is somewhere in RESTBase. I will open another ticket and investigate further.

I did a thorough investigation and I think I know what is happening, but I don't know why it is happening. In certain cases, VE calls /transform/wikitext/to/html/{title} (but without the revision). RESTBase then responds with the HTML, but since the revision was not provided by the client, RESTBase cannot guess it, so sets the revision in the ETag to 0 and stores the stash under that revision id. But then, in the next call, VE calls /transform/html/to/wikitext/{title}/{revision}. Since the revision is now specified, RESTBase expects the stash to be found using that revision. I put up a patch that deals with this on the RESTBase side. I expect it to be deployed in the next few days (hopefully tomorrow).

However, I am more and more convinced that something very weird is going on with VE. I think the root cause for this, the previous bug and the case where VE fails to send an ETag to RESTBase at all are all the same. In all of the instances the problem is the same: for the first call VE does not send the revision ID, but it does so in subsequent calls.

VE works this way because when you open the editor, we want to display the latest revision of the page, and we don't know what is the latest revision (we know which revision was shown in read mode, but the page might have been edited in the meantime). However, after you start editing, all subsequent operations must be done based on the revision that is shown in the editor (rather than the latest revision, in case the page was edited in the meantime again).

If this is a problem, then we could make a separate query to the MediaWiki API to get the latest revision ID when starting the editor, but this will slow down the loading of the editor a bit.

Sorry, I think I misread, you're talking about /transform/wikitext/to/html but I was thinking of the /page/html API for opening the editor. I don't know why we would call /transform/wikitext/to/html without a revision; I think we only use that for switching editor modes.

Hm. I understand the logic here, but what I don't understand is: why is VE asking for content without the revision only sometimes. E.g. the bug discovered in T234928: RESTBase sometimes not retaining stashed content? only manifested itself in a small percentage of cases, and in all of the instances the revision ID was not given. When the ID was provided, this bug would not occur, which points to an inconsistency.

Sorry, I think I misread, you're talking about /transform/wikitext/to/html but I was thinking of the /page/html API for opening the editor. I don't know why we would call /transform/wikitext/to/html without a revision; I think we only use that for switching editor modes.

Yes, wt2html is used only for swithcing modes, but the inconsistency persists: the first bug happened when /page/html/ wouldn't get the revision and the second bug happens because /transform/wikitext/to/html/ doesn't get it, hence why I think the root cause is the same.

I deployed the fix for RESTBase for T235465 some ~13h ago, and since then there were altogether 24 occurrences of 404s for transforms (8 unique, each with multiple retries, half for wt2html, half for html2wt) and they all seem legit from RESTBase's point of view as none of them had the If-Match header set.

Thanks. In VE logs since then I see only two kinds of problems – the very old etags I mentioned earlier, and unset etags which are probably caused by the same issue as T233320, but not caught by your workaround because /transform/wikitext/to/html doesn't output <meta property="mw:TimeUuid" content="…"/> tags.

I'll come back to this after looking at T233320 again. Also, I'll keep a VE open for 24 hours to see what actually happens with it. ;)

matmarex added a comment.EditedFri, Oct 18, 8:04 AM

I'll come back to this after looking at T233320 again. Also, I'll keep a VE open for 24 hours to see what actually happens with it. ;)

I tested this and got the expected results, namely:

  • When editing the latest revision of the page, if I never switched from wikitext, and if no one else saves an edit in the meantime, I could keep the editor open indefinitely and save my changes.
  • When editing an old revision, I couldn't save my changes, getting a HTTP 404 error.
  • When editing the latest revision of the page in wikitext, then switching to visual, I couldn't save my changes, getting a HTTP 404 error.

Notes:

  • I did not actually save the changes, only opened the save dialog, which causes a paction=serializeforcache API query to be sent.
  • The edit token apparently expired (I guess this is the 9 hour session expiration that you mentioned), causing the first request to fail with 'badtoken' error, but VE just fetched a new one and retried, and the second request succeeded (or failed with HTTP 404).
    • In the old wikitext editor, I think in this scenario you'd get the 'session_fail_preview' error ("Sorry! We could not process your edit due to a loss of session data. …"), and you'd need to click "Publish" again, but saving the edit would be possible.
matmarex closed this task as Resolved.Fri, Oct 18, 8:37 AM
matmarex claimed this task.
matmarex added a subscriber: ppelberg.

I'd like to summarize everything we discovered and fixed here, and close this task. It's getting huge and hard to keep track of.

There were/are several different issues that all result in a "HTTP 404" error when saving in VisualEditor:

  • Something mangles the "ETag" header when VE fetches the page HTML from RB (T230272). This was fixed by adding a special case in RB to undo the corruption we've seen, but we don't know the root cause of the problem.
  • Something removes the "ETag" header when VE fetches the page HTML from RB (T233320). This is only a problem when switching editor from wikitext to visual; on the initial page load, the missing data is duplicated in a <meta property="mw:TimeUuid" content="…"/> tag in the HTML content. However, the RB team would like to remove the duplication, when possible.
  • RB was sometimes incorrectly handling the case when VE fetches the page HTML for "latest" revision, then tries to transform modified HTML to wikitext for a specific revision, even though both revisions were the same (T234928, T235465). This was fixed.
  • RB only stores HTML renders for 24 hours, except for the render of latest revision of a page (T235822). This means that when a user keeps VE open for more than 24 hours while editing an old revision or after switching from wikitext, we can't transform their modified HTML back to wikitext, and so can't save their changes.

@mobrovac actually did most of the fixing, I did most of the complaining ;)

Our error logs say that the bugs that are already fixed were the most common causes. Work continues on T233320 and T235822.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptFri, Oct 18, 8:37 AM

(Also, I submitted a small patch to change the "HTTP 404" error message to "Error contacting the Parsoid/RESTBase server (HTTP 404)": https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/544152 – hopefully makes it easier for us to understand any future bug reports, and easier for users to find this task and related tasks if they get the error.)