Page MenuHomePhabricator

RESTBase returns unknown_error when accessing page with large table
Open, MediumPublic

Description

For the page https://en.wikipedia.org/wiki/User:Sphilbrick/Mountains_of_New_Hampshire_table, the REST API is returning a 500 error and the text {"type":"https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error","method":"get","uri":"/en.wikipedia.org/v1/page/html/User%3ASphilbrick%2FMountains_of_New_Hampshire_table"} while accessing https://en.wikipedia.org/api/rest_v1/page/html/User%3ASphilbrick%2FMountains_of_New_Hampshire_table?redirect=false. This is causing the Visual Editor to fail loading when editing the page. The page isn't that big and doesn't have that many references, but maybe it has to do with the fact the page consist of a large table?

Event Timeline

matmarex subscribed.

Your example URL now works for me, but this one fails as described: https://en.wikipedia.org/api/rest_v1/page/html/User%3ASphilbrick%2FMountains_of_New_Hampshire_table/956014749 (trying to view the previous version of the page)

The error message looks like RESTBase/HyperSwitch (I'm not sure what is the interaction between those two) is reporting that their backend service – that is, Parsoid – has failed in an unknown way. I am guessing it's crashing trying to parse that page. Changing projects accordingly.

Confirmed. I can parse it just fine on the commandline on scandium. But when I ask the running Parsoid service for the same, it fails with a HTTP 500. I'll look in logstash to see what if anything is logged.

ssastry@scandium:~$ time sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php /srv/parsoid-testing/bin/parse.php --wiki=enwiki --integrated --pageBundle --pageName 'User:Sphilbrick/Mountains of New Hampshire table' < /dev/null > /dev/null

real	0m23.483s
user	0m21.608s
sys	0m1.664s

ssastry@scandium:~$ curl -v -x scandium.eqiad.wmnet:80 http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/User%3ASphilbrick%2FMountains_of_New_Hampshire_table/956266051
*   Trying 10.64.48.94...
* TCP_NODELAY set
* Connected to (nil) (10.64.48.94) port 80 (#0)
> GET http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/User%3ASphilbrick%2FMountains_of_New_Hampshire_table/956266051 HTTP/1.1
> Host: en.wikipedia.org
> User-Agent: curl/7.52.1
> Accept: */*
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 500 Internal Server Error
< Date: Thu, 14 May 2020 17:38:07 GMT
< Server: scandium.eqiad.wmnet
< X-Powered-By: PHP/7.2.26-1+0~20191218.33+debian9~1.gbpb5a340+wmf1
< X-Content-Type-Options: nosniff
< Cache-control: no-cache
< P3P: CP="See https://en.wikipedia.org/wiki/Special:CentralAutoLogin/P3P for more info."
< Vary: Accept-Encoding
< Backend-Timing: D=6410333 t=1589477887236544
< Content-Length: 0
< Connection: close
< Content-Type: text/html; charset=UTF-8
< 
* Curl_http_done: called premature == 0
* Closing connection 0
LGoto triaged this task as Medium priority.May 14 2020, 6:17 PM
LGoto moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.

Looks like logstash says: [Xr2HIwpAMF4AAVLt3-QAAAAB] /w/rest.php/en.wikipedia.org/v3/page/html/User%3ASphilbrick%2FMountains_of_New_Hampshire_table/956266051 PHP Fatal Error from line 7513 of /srv/parsoid-testing/src/Wt2Html/Grammar.php: Allowed memory size of 1468006400 bytes exhausted (tried to allocate 20480 bytes)

It is not that large of a table, so I suspect something else is going on there -= maybe something about a grammar rule or some subtle wikitext issue that is causing backtracking ... anyway, we'll look at this at some point in the coming months (not immediately) as we work through our memory profile and GC issues.

And, the reason why it completes on the commandline is probably because we don't have memory limits in that mode.

FWIW, I managed to complete what I wanted to complete, so this is not at all urgent, and I'm happy letting it go to the back burner for some time. I can edit using the old editor, so my main concern is that I like VE for working with tables, and I was surprised this choked, It's not tiny, but far smaller than some tables.

I faced similar error while publishing a translation from Content Translation tool. I translated "Pangong Tso" from English to Telugu. When I tried to publish it threw the below error:

Error converting HTML to wikitext: docserver-http: HTTP 400:

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error","method":"post","uri":"/te.wikipedia.org/v1/transform/html/to/wikitext/%E0%B0%AA%E0%B0%BE%E0%B0%82%E0%B0%97%E0%B1%8B%E0%B0%82%E0%B0%97%E0%B1%8D_%E0%B0%B8%E0%B0%B0%E0%B0%B8%E0%B1%8D%E0%B0%B8%E0%B1%81"}

This happened on 15 Oct 2020. There are other articles published by other users from the tool, on Oct 15 and earlier.