Page MenuHomePhabricator

Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200"
Closed, DeclinedPublic

Description

Known affected features:

  • MediaWiki stylesheets – Causing pages to sometimes be without any styles.
  • VisualEditor javascript – Causing the "editor is loading" screen to remain indefinitely after clicking "Edit".
  • Flow javascript – Causing the "edit is loading" screen to remain indefinitely when clicking e.g. "Summarize" or some such.

This is very intermittent and not easy to reproduce. but, I've seen several times today when opening a page on Wikivoyage and loading VisualEditor, the process is aborted mid-way because one of the JavaScript downloads failed on the network.

GET https://nl.wikivoyage.org/w/load.php?… net::ERR_SPDY_PROTOCOL_ERROR 200

Inspecting the response in the Network tab, shows that its header look alright (no different from otherwise).

Status Code: 200 
-------
accept-ranges: bytes
age: 0
backend-timing: D=245623 t=1554313647202440
cache-control: public, max-age=2592000, s-maxage=2592000
content-encoding: gzip
content-type: text/javascript; charset=utf-8
date: Wed, 03 Apr 2019 17:47:27 GMT
etag: W/"0a31sua"
expires: Fri, 03 May 2019 17:47:27 GMT
server: mw1322.eqiad.wmnet
server-timing: cache;desc="miss"
status: 200
strict-transport-security: max-age=106384710; includeSubDomains; preload
vary: Accept-Encoding,X-Seven
via: 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
x-analytics: WMF-Last-Access=03-Apr-2019;WMF-Last-Access-Global=03-Apr-2019;https=1
x-cache: cp1075 pass, cp3042 miss, cp3030 miss
x-cache-status: miss

Might be relevant:

Event Timeline

ema triaged this task as Medium priority.Apr 8 2019, 11:36 AM

I proposed a possible fix for this on the task you merged: https://gerrit.wikimedia.org/r/521575

Seen again today when using Flow.

Failed to load resource: net::ERR_SPDY_PROTOCOL_ERROR

URL
https://www.mediawiki.org/w/load.php?lang=en&modules=diffMatchPatch%2Cpapaparse%2Crangefix%2Cspark-md5%2CtreeDiffer%2Cunicodejs%7Cext.CodeMirror.lib%2CvisualEditor%7Cext.CodeMirror.mode.mediawiki%7Cext.CodeMirror.visualEditor.init%7Cext.abuseFilter.visualEditor%7Cext.cite.styles%2CvisualEditor%7Cext.cite.visualEditor.core%2Cdata%7Cext.citoid.visualEditor%7Cext.citoid.visualEditor.data%7Cext.confirmEdit.visualEditor%7Cext.disambiguator.visualEditor%7Cext.flow.visualEditor%7Cext.flow.visualEditor.icons%7Cext.geshi.visualEditor%7Cext.graph.data%2CvisualEditor%7Cext.kartographer%7Cext.kartographer.editing%2Cutil%2CvisualEditor%7Cext.math.styles%2CvisualEditor%7Cext.score.visualEditor%7Cext.score.visualEditor.icons%7Cext.spamBlacklist.visualEditor%7Cext.templateDataGenerator.editPage%7Cext.titleblacklist.visualEditor%7Cext.visualEditor.base%2Ccore%2Cdata%2CdesktopTarget%2Cdiffing%2Cicons%2Clanguage%2Cmediawiki%2CmoduleIcons%2CmoduleIndicators%2Cmwalienextension%2Cmwcore%2Cmwextensionmessages%2Cmwextensions%2Cmwformatting%2Cmwgallery%2Cmwimage%2Cmwlanguage%2Cmwlink%2Cmwmeta%2Cmwtransclusion%2Cmwwikitext%2Cwelcome%7Cext.visualEditor.core.desktop%7Cext.visualEditor.mwextensions.desktop%7Cext.visualEditor.mwimage.core%7Cext.wikihiero.visualEditor%7Cext.wikimediaEvents.visualEditor%7Cmediawiki.action.view.redirectPage%7Cmediawiki.interface.helpers.styles%7Cmediawiki.language.names%7Cmediawiki.page.gallery.styles%7Cmediawiki.widgets.MediaSearch%7Coojs-ui.styles.icons-location%2Cicons-wikimedia&skin=vector&version=12hgy8c

The patch from the other task was merged: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/521575

If the problem stops occurring, that will mean it was caused by the extremely long responses previously generated by VE.

Krinkle renamed this task from Some load.php requests failing due to "ERR_SPDY_PROTOCOL_ERROR 200" to Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200".Sep 11 2019, 11:46 PM
Krinkle added subscribers: Friendly_Seven, Ayeshajii, Tgr and 6 others.
Krinkle added a subscriber: Agusbou2015.

I click on "Publish changes" (on any Wikimedia project) and changes are not saved.
Steps to reproduce:

  1. Open: https://es.wikipedia.org/wiki/The_Living_Daylights_(canci%C3%B3n)
  2. Click on: "Editar código" (Edit code)
  3. Click on: "Publicar cambios" (Publish changes)
  4. Changes won't be saved and a DNS message (ERR_SPDY_PROTOCOL_ERROR) error appears.

Expected result: Clicking "Publish changes" (on any Wikimedia project), changes should be saved and no error message should appear.
Actual result: Clicking "Publish changes" (on any Wikimedia project), changes won't be saved and ERR_SPDY_PROTOCOL_ERROR appears.

Has anyone seen this issue again in the past two weeks? If not, the VE patch might have fixed it…

Well, since that apparently fixed the problem, we should probably consider a more general solution that would limit the maximum size of load.php responses.

@matmarex i think at the very least, when we trip the limit, we would want to have that proactively logged, regardless of the solution ?

I recall seeing it on other/smaller responses as well, but haven't seen those recently.

Well, since that apparently fixed the problem, we should probably consider a more general solution that would limit the maximum size of load.php responses.

Yep, I'm also considering this for performance as there's some evidence suggesting that splitting the payload might result in quicker execution due to the compilation and stream-parsing not being as efficient in browsers as parallelising them manually as two separate ones. This is mostly anecdotal though, so worth putting some research into. We could then e.g. communicate in some fashion the rough ballpark size of a module (e.g. <=1K, <=10K, <=100KB, <=1MB, >1MB). And let mw.loader use that (in addition to domain and cache-group sharding) to split requests in such as way as to not ask for more than X in a single request. Perhaps using a 1 char-suffix in the version hash or something like that.

@matmarex i think at the very least, when we trip the limit, we would want to have that proactively logged, regardless of the solution ?

I'd like that. But, we don't yet know what the limit is (assuming it is deterministic by size, which might not be the case, e.g. could become gradually more common as size grows). And from what I know at this point, it is likely not easily detectable on either client or server side. To be continued :)

Closing this together with several other TLS/HTTP2 related issues as we've switched from Nginx to ATS for this traffic layer (per T238509#5674204). If this is still seen, feel free to re-open.