Page MenuHomePhabricator

Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL
Closed, ResolvedPublic

Description

In T343123#9813736, the Mobile Team reported an issue where the article-descriptions LiftWing API returns a 404 when an encoded colon is used in the request URL. We have been able to reproduce it as shown below:

curl https://api.wikimedia.org/service/lw/inference/v1/models/article-descriptions%3Apredict -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 3}' -H "Content-type: application/json"
{"httpReason":"Not Found","httpCode":404}

We ran the article-descriptions model-server on the ML sandbox and the URL with an encoded colon returned a prediction without any issues:

curl localhost:8080/v1/models/article-descriptions%3Apredict -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 3}' -H "Content-Type: application/json" --http1.1
{"prediction":["Hamlet in Alberta, Canada","human settlement in Alberta, Canada","Town in Alberta, Canada"],"blp":false,"lang":"en","title":"Clandonald","num_beams":3}

We are going to investigate whether there are network settings on LiftWing causing this issue.

Event Timeline

The internal urls also behave properly so it seems that the issue is not on the Lift Wing side but has to do with how the API Gateway translates/encodes the URL.

curl "https://inference.svc.codfw.wmnet:30443/v1/models/article-descriptions%3Apredict" -X POST -d '{"lang": "en", "title": "Clandonald", "num_beams": 2}' -H  "Host: article-descriptions.article-descriptions.wikimedia.org" -H "Content-Type: application/json" 

{"prediction":["Hamlet in Alberta, Canada","human settlement in Alberta, Canada"],"blp":false,"lang":"en","title":"Clandonald","num_beams":2}

I suspect the fix for this is a relatively small change on the API gateway, but the change is a global one so I will need to take some time to test this, even if the impact is to make things standards-compliant. Hoping to get to it later this week

Thank you for looking into this @hnowlan. I've assigned the task to you.

Change #1035481 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] api-gateway: add normalise_paths option, enable in api-gateway

https://gerrit.wikimedia.org/r/1035481

Change #1035481 merged by jenkins-bot:

[operations/deployment-charts@master] api-gateway: add normalise_paths option, enable in api-gateway

https://gerrit.wikimedia.org/r/1035481

The normalisation change has unfortunately not fixed this issue - docs indicate that it should have but I suspect this is something to do with the use of regex matching as opposed to static matching. I'll try to come up with a workaround for the short term

It seems Envoy only normalises a subset of urlencoded characters:

hnowlan@plunkett ~/Code/deployment-charts (hnowlan/T365439-apigw_normalise_path_urls *) $ curl -s localhost:8087/core/v1/wikisource/a/%3A| grep original-path
    "x-envoy-original-path": "/core/v1/wikisource/a/%3A"
hnowlan@plunkett ~/Code/deployment-charts (hnowlan/T365439-apigw_normalise_path_urls *) $ curl -s localhost:8087/core/v1/wikisource/a/%31| grep original-path
    "x-envoy-original-path": "/core/v1/wikisource/a/1"

I'll file an issue upstream, but short-term we'll need a workaround.

Change #1037449 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] api-gateway: add workaround for urlencoded characters in liftwing

https://gerrit.wikimedia.org/r/1037449

Change #1037449 merged by jenkins-bot:

[operations/deployment-charts@master] api-gateway: add workaround for urlencoded characters in liftwing

https://gerrit.wikimedia.org/r/1037449

I am now seeing results when using queries with urlencoded characters. Unfortunately we will need to add a manual hack if there are other non-alphanumeric chars in other parts of the URL in future, but for now I think this works:

hnowlan@plunkett ~ $ curl -s "https://api.wikimedia.org/service/lw/inference/v1/models/outlink-topic-model%3Apredict?hnowlan=123" -X POST -d '{ "rev_id": 123555, "lang": "en", "page_title": "foo" }'  -H "Authorization: Bearer $TOKEN"
{"prediction":{"article":"https://en.wikipedia.org/wiki/foo","results":[{"topic":"STEM.STEM*","score":0.9481645226478577},{"topic":"STEM.Computing","score":0.8080772161483765}]}}