Page MenuHomePhabricator

Setup allowed list for MCS decom
Closed, ResolvedPublic

Description

Background Information

MCS decom is defined to happen in July, but some 3rd parties aren't ready to switchover and a hard cut will be a big disruption. Still we shouldn't allow newcomers to the deprecated endpoints.

What

Create an allowed list for a few projects using the MCS endpoints and slowly take them out until we phase out the endpoints, the initial allowed list is:

  • Kiwix
  • Wikiwand

Acceptance Criteria

  • Current users that reached out for an extension in the deadline are allowed to keep consuming the MCS endpoints
  • New users can't access the endpoints

Event Timeline

So, we need something to identify those users. wikiwand, if I understand the usage of the MCS endpoint correctly, should be doable by virtue of allow the https://www.wikiwand.com/ referer. I based the above on me just browsing wikiwand, search for an article and seeing requests to e.g. api/rest_v1/page/mobile-sections-remaining/Yosemite_National_Park in browser's devtools. Let me know if the above isn't the whole picture, but otherwise, I think we have already a solution here.

Kiwix is an offline reader, and IIRC it scrapes content every now and then to refresh the offline stores. A simple solution would be to target their User-Agent http header. Now, this one is spoofable of course, but if someone decides to impersonate them to access a soon to be removed and previously open to anyone API, there are asking for it.

How do these sound?

So, we need something to identify those users. wikiwand, if I understand the usage of the MCS endpoint correctly, should be doable by virtue of allow the https://www.wikiwand.com/ referer. I based the above on me just browsing wikiwand, search for an article and seeing requests to e.g. api/rest_v1/page/mobile-sections-remaining/Yosemite_National_Park in browser's devtools. Let me know if the above isn't the whole picture, but otherwise, I think we have already a solution here.

Kiwix is an offline reader, and IIRC it scrapes content every now and then to refresh the offline stores. A simple solution would be to target their User-Agent http header. Now, this one is spoofable of course, but if someone decides to impersonate them to access a soon to be removed and previously open to anyone API, there are asking for it.

How do these sound?

Sounds great to me, no objections.

Sounds great to me, no objections.

Cool. Do we have Kiwix's User-Agent ?

Sounds great to me, no objections.

Cool. Do we have Kiwix's User-Agent ?

@Arlolra helped me to find the probable user agent:

Probably MWOffliner/HEAD (contact@kiwix.org) from https://github.com/openzim/mwoffliner/blob/main/src/config.ts#L2
config.ts

userAgent: 'MWOffliner/HEAD',

https://github.com/openzim/mwoffliner|openzim/mwoffliner
I got the email from clicking on the first link in https://phabricator.wikimedia.org/T324866

Rules created, but NOT enabled,

The corresponding VCL is

// FILTER T340036
// Give wikiwand and kiwix an extension to MCS decom. See T340036
// This filter is generated from data in etcd. To disable it, run the following command:
// sudo requestctl disable 'cache-text/T340036'
if (req.url ~ "^/api/rest_v1/page/mobile-" && !(req.http.Referer ~ "https://www.wikiwand.com/" || req.http.User-Agent ~ "^MWOffliner/.*$")) {
    set req.http.X-Requestctl = req.http.X-Requestctl + ",T340036";
    return (synth(403, "Mobile Content Service is decommissioned. See https://phabricator.wikimedia.org/T328036"));
}

So, if a request is for a URL that matches /api/rest_v1/page/mobile- and the Referer is not https://www.wikiwand.com/ or the User-Agent doesn't match MWOffliner/.* we send back a 403 with a message Mobile Content Service is decommissioned. See https://phabricator.wikimedia.org/T328036

serviceops is waiting to be told when to enable the rule and if you 'd like a different message to be sent to everyone else.

@akosiaris the deadline we defined for the deprecation is July 1st 2023 (or after depending on your availability), we can flip the switch then.

Does the rule applies to both internal and external connections?

@akosiaris the deadline we defined for the deprecation is July 1st 2023 (or after depending on your availability), we can flip the switch then.

That's a Saturday at the start of a long US week-end (July 4th is on Tuesday). I suggest we aim for Wednesday 5th, if possible.

Does the rule applies to both internal and external connections?

External only.

@MSantos, change deployed today. e.g. https://en.wikipedia.org/api/rest_v1/page/mobile-sections now returns a 403 with the above message. I 've just tested wikiwand.com, it still works fine and requests from it function.

Should we resolve this?

MSantos assigned this task to akosiaris.

@MSantos, change deployed today. e.g. https://en.wikipedia.org/api/rest_v1/page/mobile-sections now returns a 403 with the above message. I 've just tested wikiwand.com, it still works fine and requests from it function.

Should we resolve this?

Thanks @akosiaris let's resolve this. Just tested and it works normally I'll send a message in wikitech-l with the announcement.

@MSantos @akosiaris thanks for your help with this!
We call /mobile-sections-lead on the server side and had 403 for the last few hours, we realized the User-Sgent we use, "Wikiwand/0.1 (https://www.wikiwand.com; admin@wikiwand.com)", is probably being ignored, we added "https://www.wikiwand" as referer and it seems to solve the issue.

Should we use "^MWOffliner/.*$" instead?

Also, it seems client-side requests, with Api-User-Agent, are ignored as well, since we cannot overwrite referer or user-agent in the browser /mobile-sections-remaining is not working locally (will only work on production with referer="https://www.wikiwand.

Is it possible Api-User-Agent is being ignored as well?

Lets avoid using MWOffliner as it is a different API consumer and we wont be able to track the deprecation.

It seems "Wikiwand/0.1 (https://www.wikiwand.com; admin@wikiwand.com)" is blocked on some (if not all) end points (mobile-sections-lead for sure)

I think for wikiwand we only allow requests based on referer. Should we add or replace the rule with the user agent?

From comms with wikiwand:

It seems User-Agent and Api-User-Agent (for client-side requests) are ignored, can you please allow requests with "Wikiwand/0.1 (https://www.wikiwand.com; admin@wikiwand.com)" on both?

Wikiwand/0.1 (https://www.wikiwand.com; admin@wikiwand.com) added to the list of user-agents. Please advise if it doesn't work, otherwise please resolve.

Regarding the Api-User-Agent question, we have 0 rules about this one and if the Referer approach works fine (I understand from T340036#8997290 that it does), it would be preferable to not add one more rule.

Thank you @akosiaris

We can only run client requests in the production URL, I guess it'll do for now until we complete the transition

Hi there! I'm responsible for Kiwix migration to another API, but given the discussion above I'm curious whether you have plans to add MWOffliner to the allowed list to get access to /mobile-sections. And if so, how long it will be working? I assume that even though MWOffliner User-Agent was added earlier, MCS completely disabled already, because I've got 403 error page for this curl request:

curl -H "User-Agent: MWoffliner/1.13.0 (contact@kiwix.org)" https://en.wikipedia.org/api/rest_v1/page/mobile-sections

@akosiaris @MSantos May I underline Vadim's request: carifying if we (at Kiwix) can still benefit from the mobile-sections API until we finish with the MWoffliner code update is an important and relatively time-dependant question.

@Kelson the allowed list policy will end by the end of September. I was assuming you already got access to it, is this resolved?

Hi there! I'm responsible for Kiwix migration to another API, but given the discussion above I'm curious whether you have plans to add MWOffliner to the allowed list to get access to /mobile-sections. And if so, how long it will be working? I assume that even though MWOffliner User-Agent was added earlier, MCS completely disabled already, because I've got 403 error page for this curl request:

curl -H "User-Agent: MWoffliner/1.13.0 (contact@kiwix.org)" https://en.wikipedia.org/api/rest_v1/page/mobile-sections

Hi @vadim-kovalenko, the regex is "^MWOffliner/.*$" as noted above, your User-Agent start with MWoffliner/. Note the difference in capitalization of the O after MW.

@akosiaris , I've updated regexp, and now it works, thank you!

MSantos updated the task description. (Show Details)

I guess it's about time I ask if it is ok to remove those exceptions now and return 403 to everyone for these endpoints.