Page MenuHomePhabricator

Patch Location headers of HTTP redirects coming from the MW API in Lift Wing services
Open, MediumPublic3 Estimated Story Points

Description

Problem statement

Sometimes the MW API returns HTTP 301 responses carrying a Location header with https links, like:

Location: https://en.wikipedia.org/w/some-path

In our isvcs we explicitly force HTTP when contacting the MW API to allow the Istio/Envoy sidecar to inspect the HTTP content and produce meaningful metrics (so for example, if something goes wrong we can clearly see in Grafana what response codes the MW API is returning).

We also use the aiohttp's default when dealing with redirects, namely follow them "transparently" behind the scenes. This means that if the MW API returns a redirect to a HTTPS domain, aiohttp automatically tries to contact the new endpoint/URL ending up in a almost sure timeout (since https is not allowed as egress traffic from the isvc).

Possible solution

We should try to force aiohttp to not follow redirects, and add a little wrapper in the code that calls the MW API to deal with redirects (if they happen), for example looping until a non HTTP 30X response is returned.

Things like mwapi's async_session.py may need to be patched as well, since the allow_redirect=False option is not a ClientSession one, but something that needs to be enabled/disabled for every request.

Event Timeline

As discussed in the team meeting this task will be restricted to providing a solution for the revertrisk-language-agnostic that currently handles these redirects by rewriting the hosts from the values available in a configuration file. We want to remove this "hack".
Also, a potential idea would be to allow only a specific number of redirects (e.g. up to 3).

Once we figure out a nice way to deal with redirects we will implement and roll them out in all the other model servers in another task(s).

isarantopoulos moved this task from Ready To Go to In Progress on the Machine-Learning-Team board.

Updates:

I worked on this a while ago and created a patch for mwapi's async_session.py, but it didn't function as expected when I tested it today. The idea was to update the api_url of the mwapi.AsyncSession with the new url returned in the Location header. However, for some reason it didn't redirect to the new url in subsequent requests, ended up running into the too many redirects error.

Behind mwapi, we use aiohttp.ClientSession to call MW API. P67828 is an example that only uses aiohttp which demonstrates the idea in the patch, and it works as expected. That said, mwapi adds additional complexity to this task. We need to figure out why it's not working within mwapi.

Alternatively, we could handle redirects in the code that calls mwapi, rather than within mwapi itself. This could be done in the model server's code or knowledge integrity, though it might not simplify the solution. P67830 shows an example where redirects are managed in the function calling the request, not within the request itself.