Page MenuHomePhabricator

Spike: Investigate LanguageConverter redirects for language variants
Open, LowPublic

Description

Background Information

In order to improve PCS performance we need to understand why we need to perform Wiki redirect resolution for titles. RESTBase is not equipped to perform redirects created by MediaWiki-Language-converter, for example: https://sr.wikipedia.org/wiki/%D0%A1%D0%B8%D1%80 and https://sr.wikipedia.org/wiki/Sir point to the same wiki page.

    1. Open Questions
  • Can this be removed from PCS?

Event Timeline

This is an architecture/design issue, and probably applies to the API Gateway more generally. A number of APIs use "title" as a parameter, and MediaWiki core does a relatively large amount of title remapping (space to underscore, capitalization (which can be quite trickly in certain languages), language variant redirects (which depend on the presence of certain pages on wiki) and of course explicit #REDIRECT. It would be worth handling these in a uniform way for "all" MediaWiki APIs which don't live in core -- inside core most of this is magically handled by existing code.

The MW REST endpoints (at least the ones under /page/) are designed to make title resolution explicit to the caller, to avoid duplicate caching. It is up to the client to decide how to handle that redirect. End user clients would follow them, but something like PCS should pass them on to their caller, converting them into an equivalent redirect to its own API. IIRC @Jgiannelos worked on this a while ago.

Title normalization (x -> X, space to underscore, etc) will result in a 301 redirect. Try

curl -v -o /dev/null https://sr.wikipedia.org/w/rest.php/v1/page/x/html 2>&1 | egrep '^<|^>'

Wiki redirects trigger a 307 redirect, try:

curl -v -o /dev/null https://sr.wikipedia.org/w/rest.php/v1/page/Serbia/html 2>&1 | egrep '^<|^>'

It is possible to specify redirect=no to get the HTML representation of the redirect, just like we would do for page views.

However, variant redirects are not currently supported by this endpoint. If you try

curl -v -o /dev/null https://sr.wikipedia.org/w/rest.php/v1/page/Sir/html 2>&1 | egrep '^<|^>'

you get a 404. I suppose that needs fixing, we want to HTML API to behave the same as page views.

It's not quite clear to me what code this should use though. On the one hand, it's a type of normalization, so it should use 301. On the other hand, it's not necessarily permanent, since it is possible to actuall create the page "Sir", and then this redirect would go away. So it should be a 307...

Filed as T338605: Add support for variant redirects to page endpoints.

MSantos lowered the priority of this task from High to Low.Dec 18 2023, 3:25 PM

Since we decide to keep supporting pre-generation for PCS, this is not a pressing issue anymore. I'll remove from the WIP backlog and lower the priority.