Page MenuHomePhabricator

HyperSwitch/errors/not found (404) on beta cluster: There was an issue displaying this preview
Closed, ResolvedPublicBUG REPORT

Description

Since beta cluster was switched out for beta.wmcloud.org/ (T289318: Move *.beta.wmflabs.org to *.beta.wmcloud.org) some key APIs used in our products are not available which breaks those products and associated integration tests (e.g. Popups - the previews that show on a page).

Screenshot 2025-08-18 at 11.35.18 AM.png (912×1 px, 208 KB)

Steps to replicate the issue (include links if applicable):

What happens?:

404 (route not found)

$ curl -s https://en.wikipedia.beta.wmcloud.org/api/rest_v1/page/summary/Polar_bear | jq .
{
  "type": "https://mediawiki.org/wiki/HyperSwitch/errors/not_found#route",
  "title": "Not found.",
  "method": "get",
  "uri": "/en.wikipedia.beta.wmcloud.org/v1/page/summary/Polar_bear"
}

What should have happened instead?:

This should match https://en.wikipedia.org/api/rest_v1/page/summary/Polar_bear

Event Timeline

Jdlrobson-WMF renamed this task from REST API not available on beta cluster to REST API not available on beta cluster: There was an issue displaying this preview.Aug 18 2025, 6:35 PM
Jdlrobson-WMF updated the task description. (Show Details)

I think the problem might be the way MW on beta talks to restbase.

On deployment-restbase05:

curl localhost:7231/en.wikipedia.beta.wmflabs.org/v1/page/summary/Polar_bear

returns the summary for Polar bear

Ideally since (other than serving mathoid) RB is not used in production, we should replicate the same behaviour in beta cluster too.

bd808 renamed this task from REST API not available on beta cluster: There was an issue displaying this preview to HyperSwitch/errors/not found (404) on beta cluster: There was an issue displaying this preview.Aug 20 2025, 5:33 PM
bd808 updated the task description. (Show Details)
Krinkle subscribed.

I'm guessing that in production, even though it uses the URL https://en.wikipedia.org/api/rest_v1/page/summary/Rose_Cleveland, this is diverted by the REST Gateway instance of Envoy, to either a /w/rest.php endpoint, or to a standalone Node.js service.

If the former, then perhaps the Popups extension should be configured to call the new URL under the MW REST API directly instead. This is what VisualEditor does nowadays.

If the latter, then I suppose there is no way to hit that directly from any public URL, so that would require the CDN layer to divert these. Two ways come to mind:

  1. ATS plugin + REST Gateway — This would be most like prod
    • in ATS, keep the rb-mw-mangling plugin, which rewrites part of the RESTBase URL already.
    • in ATS, add a beta version of the gateway-check.lua plugin to direct various subpaths under /api/rest_v1/ to rest-gateway
    • set up REST Gateway somewhere (Helm chart). This may be non-trivial to do in Beta. I suspect it would not actually be abl to reuse much since it seems fairly specific to production, and the indirection would presumably do very little. In prod it takes care of monitoring and in theory might do throttling, but we might not need that in Beta.
  2. ATS plugin only — This would be similar to prod but simpler.
    • in ATS, keep the rb-mw-mangling plugin, which rewrites part of the RESTBase URL already.
    • in ATS, expand the rb-mw-mangling-beta plugin to add more rewrites. This would be similar to what the gateway-check.lua plugin and rest-gateway service do together in prod. In other words, instead of rewriting /api/rest_v1/page/summary/(.*) to a rest-gateway call (which then proxies to mobileapps service), rewrite it directly to wherever the mobileapps service runs in beta.

There appears to be no "REST Gateway" tag in Phabricator.

I think the problem might be the way MW on beta talks to restbase.

On deployment-restbase05:

curl localhost:7231/en.wikipedia.beta.wmflabs.org/v1/page/summary/Polar_bear

returns the summary for Polar bear

The MediaWiki part here is the Popups extension, which uses client-side JavaScript to fetch https://en.wikipedia.beta.wmcloud.org/api/rest_v1/page/summary/Polar_bear, which looks correct. That's as far as the MediaWiki part of this goes, I think? Everything else happens in Varnish/ATS/RESTBase/mobileapps:

  • MediaWiki: https://en.wikipedia.beta.wmcloud.org/api/rest_v1/page/summary/Polar_bear
  • varnish: (mostly no-op, various normalization)
  • ATS: mapping_rules for http://(.*)/api/rest_v1
    • replacement: http://deployment-restbase05.deployment-prep.eqiad1.wikimedia.cloud:7231/api/rest_v1
    • rb-mw-mangling.lua plugin: set path to /en.wikipedia.beta.wmcloud.org/v1/page/summary/Polar_bear

Where does deployment-restbase05 get its list of accepted wikis/hostnames from? I don't see RESTBase-related references to beta.wmflabs.org in Code Search, but I'm guessing it is working with an outdated list.

tgr@deployment-restbase05:~$ sudo systemctl status restbase.service 
...
             ├─ 594 /usr/bin/nodejs restbase/server.js -c /etc/restbase/config.yaml
...
tgr@deployment-restbase05:~$ cat /etc/restbase/config.yaml
...
  paths:

    # BetaCluster
    /{domain:aa.wikipedia.beta.wmflabs.org}: *wikipedia.org
    /{domain:api.wikimedia.beta.wmflabs.org}: *default_project
...

Seems to be coming from https://gerrit.wikimedia.org/g/mediawiki/services/restbase/deploy/+/1586262e70251e81a12ea0f01482b7e45e2b683c/scap/environments/beta/vars.yaml

In production the request path is:

  • Edge (ATS/Varnish) [1]
    • For specific matches of /api/rest_v1/<path>
  • Rest gateway [2]
  • Backing services (for summary that is [3])

[1] https://gerrit.wikimedia.org/g/operations/puppet/+/2aec0180236bb72ae8fe37b767f824f475104ce4/modules/profile/files/trafficserver/gateway-check.lua.conf#24
[2] https://gerrit.wikimedia.org/g/operations/deployment-charts/+/a45a5e5fdec12c161563938edf2c9394167f5fd6/helmfile.d/services/rest-gateway/values.yaml#74
[3] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/mobileapps

Would it make sense to have a deployment of rest gateway in deployment-prep to replicate the same functionality ?

Change #1180864 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/services/restbase/deploy@master] Add new beta domains to domain list

https://gerrit.wikimedia.org/r/1180864

Would it make sense to have a deployment of rest gateway in deployment-prep to replicate the same functionality ?

It would certainly make sense, if someone is willing to do it, and then fix it if it breaks.

In production the request path is:

  • Edge (ATS/Varnish) [1]
    • For specific matches of /api/rest_v1/<path>
  • Rest gateway [2]
  • Backing services (for summary that is [3])

[…]
Would it make sense to have a deployment of rest gateway in deployment-prep to replicate the same functionality ?

I would recommend a slightly simpler setup that should achieve the same, with less maintenance, less initial setup, and less risk of breakage. That is, if this can indeed work. I don't fully understand it myself.

[…] Two ways come to mind:

  1. ATS plugin + REST Gateway — This would be most like prod
    • in ATS, keep the rb-mw-mangling plugin, which rewrites part of the RESTBase URL already.
    • in ATS, add a beta version of the gateway-check.lua plugin to direct various subpaths under /api/rest_v1/ to rest-gateway
    • set up REST Gateway somewhere (Helm chart). This may be non-trivial to do in Beta. I suspect it would not actually be abl to reuse much since it seems fairly specific to production, and the indirection would presumably do very little. In prod it takes care of monitoring and in theory might do throttling, but we might not need that in Beta.
  2. ATS plugin only — This would be similar to prod but simpler.
    • in ATS, keep the rb-mw-mangling plugin, which rewrites part of the RESTBase URL already.
    • in ATS, expand the rb-mw-mangling-beta plugin to add more rewrites. This would be similar to what the gateway-check.lua plugin and rest-gateway service do together in prod. In other words, instead of rewriting /api/rest_v1/page/summary/(.*) to a rest-gateway call (which then proxies to mobileapps service), rewrite it directly to wherever the mobileapps service runs in beta.

I'm trying to understand the mobileapps service, but it doesn't seem to like my requests.

krinkle@deployment-mediawiki13

# What should work
$ curl -i http://deployment-docker-mobileapps02.deployment-prep.eqiad1.wikimedia.cloud:8888/en.wikipedia.beta.wmcloud.org/v1/page/summary/Main_Page
HTTP/1.1 404 Not Found
…
Domain not allowed

# What may have worked before
$ curl -i http://deployment-docker-mobileapps02.deployment-prep.eqiad1.wikimedia.cloud:8888/en.wikipedia.org/v1/page/summary/Main_Page

HTTP/1.1 404 Not Found
…
Domain not allowed

This led me to these recent changes:

16 Apr 2025

Use middleware to restrict access to public endpoints by project
https://gerrit.wikimedia.org/r/c/mediawiki/services/mobileapps/+/1136658

Which led me to this seemingly hardcoded list of domains in mediawiki/services/mobileapps.git:/lib/wmf-projects.js where we basically only allow *.(wikipedia|wikimedia|…).org$.

And in deployment-docker-mobileapps02:/etc/mediawiki-services-mobileapps/config.yaml (Hiera history) does not seem to disable this check or otherwise supply its own list of domains.

$ curl -i http://deployment-docker-mobileapps02.deployment-prep.eqiad1.wikimedia.cloud:8888/en.wikipedia.org/v1/page/summary/Main_Page
HTTP/1.1 404 Not Found
…

{"status":404,"type":"Internal error"}

This is a slightly different error. Anyway, that doesn't matter, but it suggests we are indeed using the production domain regex in beta.

This also suggests that the beta.wmflabs.org > beta.wmcloud.org transition is not the trigger for this breakage, as it already doesn't work.

However, that brings us back to the example @Jgiannelos shared, which certainly appeared to work:

On deployment-restbase05:

curl localhost:7231/en.wikipedia.beta.wmflabs.org/v1/page/summary/Polar_bear

returns the summary for Polar bear

Upon closer inspection, this is a stale copy, presumably from RESTBase storage (Cassandra?) rather than a live request from the page/summary endpoint of the mobileapps service.

krinkle@deployment-restbase05:~$ curl localhost:7231/en.wikipedia.beta.wmflabs.org/v1/page/summary/Polar_bear | jq
{"revision": "650733",
  "tid": "1c18ed6d-216e-11f0-82da-33c0bac9164e",
  "timestamp": "2025-04-25T00:42:05Z",}

Last three revisions at https://en.wikipedia.beta.wmcloud.org/w/index.php?title=Polar_bear&action=history:

  • 655492 (15 July 2025)
  • 655491 (15 July 2025)
  • 650733 (25 April 2025)

I've also purged, deleted, and undeleted it, and nothing happened as a result of that. Anyway, that is likely due to the RESTBase config being outdated after the domain change, but that merely allows new requests to start reaching the next problem, which is that the mobileapps service isn't accepting any requests in Beta Cluster.

Change #1182652 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] trafficserver: Add missing REST Gateway for Beta Cluster

https://gerrit.wikimedia.org/r/1182652

Any ideas on the timeline for this? I am a little concerned that page previews feature is not getting smoke test coverage and wondering if I need to invest time rewriting the test to work with production APIs.

My idea/patch here is a long-term improvement to solve a pre-existing problem, and would make this issue go away as well. That is volunteer work and has no timeline.

The immediate issue here is RESTBase not responding to beta.wmcloud.org. I'm not familiar with how that works, but it looks like Gergo and Yiannis were onto the root cause and there's a configuration patch by Gergo to fix this. I could not find documentation on how to deploy RESTBase changes to the Beta Cluster, or I would have done this myself already. Perhaps ask @Jgiannelos / Content Transform Team for help?

Note that after this fix, the next issue will be that RESTBase has not been updating content since April 2025 (two months before the wmflabs/wmcloud transition). I expect that, after updating the RESTBase configuration to recognise beta.wmcloud.org, there will still be no page previews in Beta, because there is no four-month old copy under those names.

Do the tests assert any particular summary text being displayed? (e.g. based on an edit or newly created page?)

I could not find documentation on how to deploy RESTBase changes to the Beta Cluster

I think you just need to run scap deploy from /srv/deployment/restbase/deploy/?
(Maybe something does that automatically, although if so, I couldn't find it.)

Certainly would be good to have it documented at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated.

Change #1186950 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/mobileapps@master] Fix domains suffix for beta deployment

https://gerrit.wikimedia.org/r/1186950

After applying this ^ patch locally in the deployment-prep and after allowing the new domain in the config I get this from PCS on deployment-prep:

jgiannelos@deployment-docker-mobileapps02:~$ curl localhost:8889/en.wikipedia.beta.wmcloud.org/v1/page/summary/Earth | jq .

{
  "type": "standard",
  "title": "Earth",
  "displaytitle": "<span class=\"mw-page-title-main\">Earth</span>",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "wikibase_item": "Q81566",
  "titles": {
    "canonical": "Earth",
    "normalized": "Earth",
    "display": "<span class=\"mw-page-title-main\">Earth</span>"
  },
  "pageid": 89382,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/The_Blue_Marble_%28remastered%29.jpg/330px-The_Blue_Marble_%28remastered%29.jpg",
    "width": 320,
    "height": 320
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/c/cb/The_Blue_Marble_%28remastered%29.jpg",
    "width": 3000,
    "height": 3000
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "586831",
  "tid": "2a95f501-fec0-11ed-8ce6-86f854a8d480",
  "timestamp": "2023-05-30T08:01:22Z",
  "description": "Third planet from the Sun",
  "description_source": "local",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth",
      "revisions": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth?action=history",
      "edit": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth?action=edit",
      "talk": "https://en.wikipedia.beta.wmcloud.org/wiki/Talk:Earth"
    },
    "mobile": {
      "page": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth",
      "revisions": "https://en.wikipedia.beta.wmcloud.org/wiki/Special:History/Earth",
      "edit": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth?action=edit",
      "talk": "https://en.wikipedia.beta.wmcloud.org/wiki/Talk:Earth"
    }
  },
  "extract": "Earth is the third planet from the Sun and the only place known in the universe where life has originated and found habitability. While Earth may not contain the largest volumes of water in the Solar System, only Earth sustains liquid surface water, extending over 70.8% of the planet with its ocean, making it an ocean world. The polar regions currently retain most of all other water with large sheets of ice covering ocean and land, dwarfing Earth's groundwater, lakes, rivers and atmospheric water. The other 29.2% of the Earth's surface is land, consisting of continents and islands, and is widely covered by vegetation. Below the planet's surface lies the crust, consisting of several slowly moving tectonic plates, which interact to produce mountain ranges, volcanoes, and earthquakes. Inside the Earth's crust is a liquid outer core that generates the magnetosphere, deflecting most of the destructive solar winds and cosmic radiation.",
  "extract_html": "<p><b>Earth</b> is the third planet from the Sun and the only place known in the universe where life has originated and found habitability. While Earth may not contain the largest volumes of water in the Solar System, only Earth sustains liquid surface water, extending over 70.8% of the planet with its ocean, making it an ocean world. The polar regions currently retain most of all other water with large sheets of ice covering ocean and land, dwarfing Earth's groundwater, lakes, rivers and atmospheric water. The other 29.2% of the Earth's surface is land, consisting of continents and islands, and is widely covered by vegetation. Below the planet's surface lies the crust, consisting of several slowly moving tectonic plates, which interact to produce mountain ranges, volcanoes, and earthquakes. Inside the Earth's crust is a liquid outer core that generates the magnetosphere, deflecting most of the destructive solar winds and cosmic radiation.</p>"
}

Change #1186950 merged by jenkins-bot:

[mediawiki/services/mobileapps@master] Fix domains suffix for beta deployment

https://gerrit.wikimedia.org/r/1186950

i just verified it, PCS on deployment-prep should be back to a working state

Thanks for looking at this! I'm still not seeing it on https://en.wikipedia.beta.wmcloud.org/api/rest_v1/page/summary/Polar_bear - does this need to be deployed still or am I using the wrong URL?

Change #1180864 merged by Jgiannelos:

[mediawiki/services/restbase/deploy@master] Add new beta domains to domain list

https://gerrit.wikimedia.org/r/1180864

I run scap on deployment-deploy04:

scap deploy --environment beta

It looks like its working with cache-control: no-cache which invalidates caches:

curl -v http://deployment-restbase05.deployment-prep.eqiad1.wikimedia.cloud:7231/en.wikipedia.beta.wmcloud.org/v1/page/summary/Earth -H "cache-control: no-cache" | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.16.4.101:7231...
* Connected to deployment-restbase05.deployment-prep.eqiad1.wikimedia.cloud (172.16.4.101) port 7231 (#0)
> GET /en.wikipedia.beta.wmcloud.org/v1/page/summary/Earth HTTP/1.1
> Host: deployment-restbase05.deployment-prep.eqiad1.wikimedia.cloud:7231
> User-Agent: curl/7.74.0
> Accept: */*
> cache-control: no-cache
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200
< etag: "586831/d20c0130-8ef1-11f0-b5f2-2df501353921"
< cache-control: s-maxage=1209600, max-age=300
< content-language: en
< content-type: application/json; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/Summary/1.5.0"
< vary: x-restbase-compat, Accept-Encoding
< content-location: https://en.wikipedia.beta.wmcloud.org/api/rest_v1/page/summary/Earth
< access-control-allow-origin: *
< access-control-allow-methods: GET,HEAD
< access-control-allow-headers: accept, content-type, content-length, cache-control, accept-language, api-user-agent, if-match, if-modified-since, if-none-match, dnt, accept-encoding
< access-control-expose-headers: etag
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
< referrer-policy: origin-when-cross-origin
< x-xss-protection: 1; mode=block
< content-security-policy: default-src 'none'; frame-ancestors 'none'
< x-content-security-policy: default-src 'none'; frame-ancestors 'none'
< x-webkit-csp: default-src 'none'; frame-ancestors 'none'
< x-request-id: d18cbf10-8ef1-11f0-859a-4327c217d1e3
< server: deployment-restbase05
< content-length: 3321
< Date: Thu, 11 Sep 2025 09:29:31 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
{ [3321 bytes data]
100  3321  100  3321    0     0   3452      0 --:--:-- --:--:-- --:--:--  3448
* Connection #0 to host deployment-restbase05.deployment-prep.eqiad1.wikimedia.cloud left intact
{
  "type": "standard",
  "title": "Earth",
  "displaytitle": "<span class=\"mw-page-title-main\">Earth</span>",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "wikibase_item": "Q81566",
  "titles": {
    "canonical": "Earth",
    "normalized": "Earth",
    "display": "<span class=\"mw-page-title-main\">Earth</span>"
  },
  "pageid": 89382,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/The_Blue_Marble_%28remastered%29.jpg/330px-The_Blue_Marble_%28remastered%29.jpg",
    "width": 320,
    "height": 320
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/c/cb/The_Blue_Marble_%28remastered%29.jpg",
    "width": 3000,
    "height": 3000
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "586831",
  "tid": "2a95f503-fec0-11ed-9b92-699017414826",
  "timestamp": "2023-05-30T08:01:22Z",
  "description": "Third planet from the Sun",
  "description_source": "local",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth",
      "revisions": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth?action=history",
      "edit": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth?action=edit",
      "talk": "https://en.wikipedia.beta.wmcloud.org/wiki/Talk:Earth"
    },
    "mobile": {
      "page": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth",
      "revisions": "https://en.wikipedia.beta.wmcloud.org/wiki/Special:History/Earth",
      "edit": "https://en.wikipedia.beta.wmcloud.org/wiki/Earth?action=edit",
      "talk": "https://en.wikipedia.beta.wmcloud.org/wiki/Talk:Earth"
    }
  },
  "extract": "Earth is the third planet from the Sun and the only place known in the universe where life has originated and found habitability. While Earth may not contain the largest volumes of water in the Solar System, only Earth sustains liquid surface water, extending over 70.8% of the planet with its ocean, making it an ocean world. The polar regions currently retain most of all other water with large sheets of ice covering ocean and land, dwarfing Earth's groundwater, lakes, rivers and atmospheric water. The other 29.2% of the Earth's surface is land, consisting of continents and islands, and is widely covered by vegetation. Below the planet's surface lies the crust, consisting of several slowly moving tectonic plates, which interact to produce mountain ranges, volcanoes, and earthquakes. Inside the Earth's crust is a liquid outer core that generates the magnetosphere, deflecting most of the destructive solar winds and cosmic radiation.",
  "extract_html": "<p><b>Earth</b> is the third planet from the Sun and the only place known in the universe where life has originated and found habitability. While Earth may not contain the largest volumes of water in the Solar System, only Earth sustains liquid surface water, extending over 70.8% of the planet with its ocean, making it an ocean world. The polar regions currently retain most of all other water with large sheets of ice covering ocean and land, dwarfing Earth's groundwater, lakes, rivers and atmospheric water. The other 29.2% of the Earth's surface is land, consisting of continents and islands, and is widely covered by vegetation. Below the planet's surface lies the crust, consisting of several slowly moving tectonic plates, which interact to produce mountain ranges, volcanoes, and earthquakes. Inside the Earth's crust is a liquid outer core that generates the magnetosphere, deflecting most of the destructive solar winds and cosmic radiation.</p>"
}

FWIW i strongly suggest we stop relying on RB on beta because the more time passes the more the prod/beta envs are going to diverge.

Note @Jgiannelos beta cluster is pointing to production now but given https://en.wikipedia.beta.wmcloud.org/api/rest_v1/page/summary/Polar_bear now returns the summary I can change that back and we can consider this resolved. Thanks!

FWIW i strongly suggest we stop relying on RB on beta because the more time passes the more the prod/beta envs are going to diverge.

What would this mean for the summary endpoint? Is there a plan for moving this API off of Node.js RESTBase? I am not sure who maintains it right now.

PCS is off RESTBase. Summary is one of the endpoints of PCS service [1]. In production is currently served via rest-gateway.
RESTBase was a way to expose APIs (routing + some logic) and its deprecated. Only exists because mathoid is still served via RB other than that all of the endpoints that RB was serving have been migrated outside RB.

Content transform team and mobile apps teams are maintaining the PCS service

tl;dr The way we expose services has changed (restbase deprecation), not the services per se.

[1] https://gerrit.wikimedia.org/g/mediawiki/services/mobileapps

Change #1182652 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] trafficserver: Add missing REST Gateway for Beta Cluster

https://gerrit.wikimedia.org/r/1182652