Page MenuHomePhabricator

Redirects on RESTBase testsuite are failing
Closed, ResolvedPublic

Description

We have some failing tests on restbase related to parsoid redirects. An example of a failure is this URL:

https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User:Pchelolo%2fRedirect_Test_Hash

Before it used to return a 300 with location

< location: Main_Page#Test%23123

Now it returns a 200 where the actual HTML content shows that its a redirect:

about="https://en.wikipedia.beta.wmflabs.org/wiki/Special:Redirect/revision/381668"

The source of the requests was our CI on github running against deployment-prep.

Event Timeline

From my local installation of restbase pointing to a local MW on current master:

  • For a page that is a redirect
    • curl 127.0.0.1:7231/localhost/v1/page/html/Cat
    • returns status code 200

After I revert this patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/953342

  • For a page that is a redirect
    • curl 127.0.0.1:7231/localhost/v1/page/html/Cat
    • returns status code 301

An example request on testwiki is:

curl -v -o /dev/null https://test.wikipedia.org/api/rest_v1/page/html/Page-redirect-to-main

Looks like after my patch landed, Parsoid is getting empty content to parse. On my local wiki, this is the output Parsoid is returning:

<body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-parsoid='{"dsr":[0,0,0,0]}'>
<section data-mw-section-id="0" id="mwAQ" data-parsoid="{}">
</section>
</body>

See that zero-width dsr which implies empty content. Will look there next. So, something is broken in the handoff from ParsoidOutputAccsess -> ParserOutputAccess -> Content handlers -> ParsoidParser which is giving parsoid an empty revision content.

Change 966578 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/core@master] WIP: Pass full content to Parsoid for redirect pages

https://gerrit.wikimedia.org/r/966578

Looks like after my patch landed, Parsoid is getting empty content to parse. On my local wiki, this is the output Parsoid is returning:

<body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr" data-parsoid='{"dsr":[0,0,0,0]}'>
<section data-mw-section-id="0" id="mwAQ" data-parsoid="{}">
</section>
</body>

See that zero-width dsr which implies empty content. Will look there next. So, something is broken in the handoff from ParsoidOutputAccsess -> ParserOutputAccess -> Content handlers -> ParsoidParser which is giving parsoid an empty revision content.

The reason this matters is because RESTBase looks for the link meta tag in the body to find the redirect target.

From Yiannis: See this and this to see this handling in RESTBase.

While this feels like not the best strategy, this may have a solution from the early Parsoid / RESTBase days and that has stuck around. So, while we are still using RESTBase and don't want to mess with changing this behavior there, we should preserve this output in Parsoid for redirect pages for now.

The patch above should take care of this.

brennen triaged this task as Unbreak Now! priority.Oct 17 2023, 4:56 PM
brennen added a project: User-brennen.
brennen subscribed.

Raising to UBN as train blocker.

Change 966245 had a related patch set uploaded (by Brennen Bearnes; author: Subramanya Sastry):

[mediawiki/core@wmf/1.42.0-wmf.1] Pass full content to Parsoid for redirect pages

https://gerrit.wikimedia.org/r/966245

Change 966578 merged by jenkins-bot:

[mediawiki/core@master] Pass full content to Parsoid for redirect pages

https://gerrit.wikimedia.org/r/966578

Looks fixed now:

─subbu@earth ~/work/wmf/core  ‹master*› 
╰─➤  curl -I https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User:Pchelolo%2fRedirect_Test_Hash
HTTP/2 302 
content-type: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.8.0"
content-language: en
cache-control: s-maxage=1209600, max-age=0, must-revalidate
location: Main_Page#Test%23123
access-control-allow-origin: *
access-control-allow-methods: GET,HEAD
access-control-allow-headers: accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
access-control-expose-headers: etag
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
referrer-policy: origin-when-cross-origin
x-xss-protection: 1; mode=block
content-security-policy: default-src 'none'; media-src *; img-src *; style-src *;frame-ancestors 'self'
x-content-security-policy: default-src 'none'; media-src *; img-src *; style-src *;frame-ancestors 'self'
x-webkit-csp: default-src 'none'; media-src *; img-src *; style-src *;frame-ancestors 'self'
server: deployment-restbase04
date: Tue, 17 Oct 2023 17:47:27 GMT
etag: W/"381668/380ab270-6d15-11ee-9560-6f7341f62882"
vary: Accept, origin,X-Forwarded-Proto, Accept-Encoding
age: 0
x-cache: deployment-cache-text08 miss, deployment-cache-text08 miss
x-cache-status: miss
server-timing: cache;desc="miss", host;desc="deployment-cache-text08"
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
set-cookie: WMF-Last-Access=17-Oct-2023;Path=/;HttpOnly;secure;Expires=Sat, 18 Nov 2023 12:00:00 GMT
set-cookie: WMF-Last-Access-Global=17-Oct-2023;Path=/;Domain=.wikipedia.beta.wmflabs.org;HttpOnly;secure;Expires=Sat, 18 Nov 2023 12:00:00 GMT
x-client-ip: 65.128.138.73
set-cookie: GeoIP=US:MN:Saint_Paul:44.96:-93.17:v4; Path=/; secure; Domain=.beta.wmflabs.org
set-cookie: NetworkProbeLimit=0.001;Path=/;Secure;Max-Age=3600
content-length: 1514

╭─subbu@earth ~/work/wmf/core  ‹master*› 
╰─➤  curl https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User:Pchelolo%2fRedirect_Test_Hash
<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/" about="https://en.wikipedia.beta.wmflabs.org/wiki/Special:Redirect/revision/381668"><head prefix="mwr: https://en.wikipedia.beta.wmflabs.org/wiki/Special:Redirect/"><meta property="mw:TimeUuid" content="380ab270-6d15-11ee-9560-6f7341f62882"/><meta charset="utf-8"/><meta property="mw:pageId" content="174037"/><meta property="mw:pageNamespace" content="2"/><link rel="dc:replaces" resource="mwr:revision/331866"/><meta property="mw:revisionSHA1" content="326fd256d2f3e26a0e9bb392d70f5ddb2ee0e402"/><meta property="dc:modified" content="2018-07-04T16:39:10.000Z"/><meta property="mw:htmlVersion" content="2.8.0"/><meta property="mw:html:version" content="2.8.0"/><link rel="dc:isVersionOf" href="https://en.wikipedia.beta.wmflabs.org/wiki/User%3APchelolo/Redirect_Test_Hash"/><base href="https://en.wikipedia.beta.wmflabs.org/wiki/"/><title>User:Pchelolo/Redirect Test Hash</title><link rel="stylesheet" href="/w/load.php?lang=en&amp;modules=mediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Csite.styles&amp;only=styles&amp;skin=vector"/><meta http-equiv="content-language" content="en"/><meta http-equiv="vary" content="Accept"/></head><body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><section data-mw-section-id="0" id="mwAQ"><link rel="mw:PageProp/redirect" href="./Main_Page#Test#123" id="mwAg"/></section></body></html>%

Change 966245 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.1] Pass full content to Parsoid for redirect pages

https://gerrit.wikimedia.org/r/966245

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:08:36Z] <brennen@deploy2002> Started scap: Backport for [[gerrit:966245|Pass full content to Parsoid for redirect pages (T349087)]]

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:09:56Z] <brennen@deploy2002> brennen: Backport for [[gerrit:966245|Pass full content to Parsoid for redirect pages (T349087)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:16:18Z] <brennen@deploy2002> Finished scap: Backport for [[gerrit:966245|Pass full content to Parsoid for redirect pages (T349087)]] (duration: 07m 42s)

ssastry claimed this task.

An example request on testwiki is:

curl -v -o /dev/null https://test.wikipedia.org/api/rest_v1/page/html/Page-redirect-to-main
╭─subbu@earth ~/work/wmf/core  ‹master*› 
╰─➤  curl -I https://test.wikipedia.org/api/rest_v1/page/html/Page-redirect-to-main
HTTP/2 302

Verified fixed!

Change 972020 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/core@REL1_41] Pass full content to Parsoid for redirect pages

https://gerrit.wikimedia.org/r/972020

Change 972020 merged by jenkins-bot:

[mediawiki/core@REL1_41] Pass full content to Parsoid for redirect pages

https://gerrit.wikimedia.org/r/972020

Change #967324 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] phpunit: Fix tests relying on implicit wgScript/wgArticlePath

https://gerrit.wikimedia.org/r/967324