Page MenuHomePhabricator

500 error when retrieving HTML from REST API
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • query the https://cs.wiktionary.org/api/rest_v1/page/html/%2F page

What happens?:
You receive 500 status code with message:

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error","method":"get","uri":"/cs.wiktionary.org/v1/page/html/%2F"}

What should have happened instead?:
Successful retrieval of HTML.

Other information (browser name/version, screenshots, etc.):
Note that this page exists: https://cs.wiktionary.org/wiki//

Event Timeline

This is probably an edge case related to either URL parsing or title resolution of "%2F" which is the "/" character.

And, this could actually be in RESTBase rather than the underlying core REST API code (that provides the Parsoid endpoints used by RESTBase).

I looked at the mediawiki-title library and it considers "/" as valid.

The log says that RESTBase is trying to reach {"internalURI"=>"http://localhost:6502/w/rest.php/cs.wiktionary.org/v3/page/pagebundle/%2F/1258994", "internalMethod"=>"get"} and getting HTTPError: 500: http_error
https://logstash.wikimedia.org/app/discover#/doc/0fade920-6712-11eb-8327-370b46f9e7a5/ecs-test-1-1.11.0-6-2023.20?id=tK2bL4gB_1-qnHEYGQy_

On scandium, executing NO_PROXY="" no_proxy="" curl -I --proxy scandium.eqiad.wmnet:80 http://cs.wiktionary.org/w/rest.php/cs.wiktionary.org/v3/page/pagebundle/%2F/1258994 returns HTTP/1.1 500 Internal Server Error

Without the -I it's an empty response. Locally, though, I'm able to get a response from that page.

The log for the 500 error says PHP Fatal Error: Allowed memory size of 1468006400 bytes exhausted (tried to allocate 53248 bytes)
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-default-1-7.0.0-1-2023.05.18?id=BquqL4gBs53OSt3dH-Ic

So, it must be something about the content of that page that makes Parsoid quickly consume memory. Let's see.

Oh, that went somewhere totally different than I would have expected!

Locally, though, I'm able to get a response from that page.

To reproduce it, I needed to enable $wgNamespacesWithSubpages[NS_MAIN] = true; (or pick another namespace where this is enabled already) and then it is as simple as adding a heading to the content of the "/" page.

The call to generate anchors gets us into and infinite loop of trying to resolve titles,

#0  Wikimedia\Parsoid\Config\Env->resolveTitle() called at [~/services/parsoid/src/Config/Env.php:673]
#1  Wikimedia\Parsoid\Config\Env->makeTitle() called at [~/services/parsoid/src/Config/Env.php:708]
#2  Wikimedia\Parsoid\Config\Env->makeTitleFromURLDecodedStr() called at [~/services/parsoid/src/Wt2Html/PP/Handlers/Headings.php:114]
#3  Wikimedia\Parsoid\Wt2Html\PP\Handlers\Headings::normalizeSectionName() called at [~/services/parsoid/src/Wt2Html/PP/Handlers/Headings.php:48]
#4  Wikimedia\Parsoid\Wt2Html\PP\Handlers\Headings::genAnchors()

The code for handling subpages evidently doesn't account for this case,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Config/Env.php#L587-L615

So, it is title resolution after all ;-) ... except in the 3rd copy of Title code ... in Parsoid ... rather than in mediawiki-title.

I wish three were a clean solution to this title duplication (across repos) mess.

Arlolra triaged this task as Medium priority.
Arlolra moved this task from Backlog to In Progress on the Content-Transform-Team-WIP board.
Arlolra added a project: Parsoid.
Arlolra moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.

Change 921086 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] No need to resolve subpages when resolving lonely fragments

https://gerrit.wikimedia.org/r/921086

Change 921106 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Punt on trying to resolve subpages on "/"

https://gerrit.wikimedia.org/r/921106

Change 921086 merged by jenkins-bot:

[mediawiki/services/parsoid@master] No need to resolve subpages when resolving lonely fragments

https://gerrit.wikimedia.org/r/921086

Change 921106 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Punt on trying to resolve subpages on "/"

https://gerrit.wikimedia.org/r/921106

I wish three were a clean solution to this title duplication (across repos) mess.

Can Parsoid be made to use the MW implementation of title resolution, by injecting an MW specific implementation of some generic title resolver interface?

Perhaps it could, but that is only possible in integrated mode of operation. Standalone modes will no longer work. So, it is not a complete solution.

Change 922148 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a11

https://gerrit.wikimedia.org/r/922148

This hasn't been deployed yet, it'll go out with the train this week

Change 922148 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a11

https://gerrit.wikimedia.org/r/922148

This hasn't been deployed yet, it'll go out with the train this week

https://cs.wiktionary.org/api/rest_v1/page/html/%2F now responds with a 200

I looked at the mediawiki-title library and it considers "/" as valid.

The log says that RESTBase is trying to reach {"internalURI"=>"http://localhost:6502/w/rest.php/cs.wiktionary.org/v3/page/pagebundle/%2F/1258994", "internalMethod"=>"get"} and getting HTTPError: 500: http_error
https://logstash.wikimedia.org/app/discover#/doc/0fade920-6712-11eb-8327-370b46f9e7a5/ecs-test-1-1.11.0-6-2023.20?id=tK2bL4gB_1-qnHEYGQy_

On scandium, executing NO_PROXY="" no_proxy="" curl -I --proxy scandium.eqiad.wmnet:80 http://cs.wiktionary.org/w/rest.php/cs.wiktionary.org/v3/page/pagebundle/%2F/1258994 returns HTTP/1.1 500 Internal Server Error

Without the -I it's an empty response. Locally, though, I'm able to get a response from that page.

The log for the 500 error says PHP Fatal Error: Allowed memory size of 1468006400 bytes exhausted (tried to allocate 53248 bytes)
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-default-1-7.0.0-1-2023.05.18?id=BquqL4gBs53OSt3dH-Ic

So, it must be something about the content of that page that makes Parsoid quickly consume memory. Let's see.

We get this error on MW 1.40. Would we be able to back port this fix for MW 1.40? (As in we get the "PHP Fatal Error: Allowed memory size" error).

Change 983536 had a related patch set uploaded (by Paladox; author: Arlolra):

[mediawiki/services/parsoid@REL1_40] No need to resolve subpages when resolving lonely fragments

https://gerrit.wikimedia.org/r/983536

Change 983537 had a related patch set uploaded (by Paladox; author: Arlolra):

[mediawiki/services/parsoid@REL1_40] Punt on trying to resolve subpages on "/"

https://gerrit.wikimedia.org/r/983537

Change 983536 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_40] No need to resolve subpages when resolving lonely fragments

https://gerrit.wikimedia.org/r/983536

Change 983537 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_40] Punt on trying to resolve subpages on "/"

https://gerrit.wikimedia.org/r/983537