Page MenuHomePhabricator

Switchover plan from RESTbase to REST Gateway for Reading Lists endpoints
Closed, ResolvedPublic

Description

Summary: certain Reading Lists endpoints currently handled by RESTbase should be rerouted to equivalent MediaWiki REST endpoints. This will unblock this portion of RESTbase sunset. Rerouting is necessary (as opposed to switching callers immediately) because the primary (and perhaps only) callers are WMF mobile apps installed on end user devices, and it will take some time to transition all app users to new app versions that support new paths. See T336693: Re-implement reading lists REST interface outside RESTBase and T348493: Reading List REST Interface: reroute calls for more info.

Mapping of production URLs to be routed to MediaWiki-REST-API endpoints implemented in the Reading Lists extension:


  1. Setup lists (for the logged-in user)
  1. Teardown lists (for the logged-in user)
  1. Get all lists (for the logged-in user)
  1. Create a new list (for the logged-in user)
  1. Update a list (belonging to the logged-in user)
  1. Delete a list (belonging to the logged-in user)
  1. Create multiple new lists (for the logged-in user)
  1. Get all entries for the given list (belonging to the logged-in user)
  1. Add a new list entry (to a list belonging to the logged-in user)
  1. Deletes a list entry (from a list belonging to the logged-in user)
  1. Add multiple new entries (to a list belonging to the logged-in user)
  1. Get lists belonging to the logged-in user which contain a given page
  1. Get lists recent changes to the logged in users's lists

Additional Info

  • An interactive sandbox for the existing endpoints is here: https://en.wikipedia.org/api/rest_v1/#/Reading%20lists
  • An OpenAPI spec is here: https://en.wikipedia.org/api/rest_v1/?spec (search for tag "Reading Lists")
  • All POST/PUT/DELETE operations may include a "csrf_token" query parameter. The new endpoints also allow this to be passed in the body as "token", so callers may transition to that. But at least initially, expect csrf_token query parameters to be present.
  • Example query parameters and bodies not provided for endpoints that consume them. If that's needed information, we can provide it. (Or see the above linked sandbox/spec)

TIMING NOTES
If possible, let's schedule the initial switch for the week of April 14, starting with testwiki ONLY for ~2 weeks to support manual testing
Iterative rollout to additional wikis tentatively starting week of April 28. We will provide the desired rollout list and timing soon.

Additional Configuration:

All forwarded calls should include a x-restbase-compat header with a value of 'true'. This instructs the MW REST endpoints to return RESTbase-compatible data.

Acceptance Criteria

  • This list is reviewed and approved by @MSantos and @HCoplin-WMF
  • This list is reviewed and approved by ServiceOps
  • Endpoints are routed through REST Gateway
  • Monitoring and testing indicate success

Event Timeline

Task itself looks good to me! One request is that we hold off on execution until we confirm the plan and test strategy with the mobile team to avoid any unexpected outages. I'll reach out to them today and report back with their preferences for timing, and if they'd like everything to be released to testwiki first or something.

Confirmed that we will want to roll out iteratively based on feedback from the mobile apps team:

  • Testwiki first, to enable internal/manual testing (~1-2+ weeks, depending on issues found).
  • ~6 small to medium wikis to ensure stability and solicit feedback (~1-2 weeks).
  • All remaining wikis.

We should have more information soon about their desired schedule too. Will report back once I have that info, so we can align on timeline.

Timeline update based on mobile team capacity:

  • If possible, let's schedule the initial switch for the week of April 14.
    • Start with testwiki ONLY for ~2 weeks to support manual testing
  • Iterative rollout to additional wikis tentatively starting week of April 28.

Following up on this -- can we get an updated timeline from SRE (@akosiaris; @MSantos) or when this work can get pulled in, with the assumption that we will first launch to testwiki only, then roll out across remaining wikis?

Adding @hnowlan to give some input as well.

Thanks for the details, I am going to get started on this change today. I note that the URLs for MW endpoints are of the format <domain>/w/rest.php/readinglists/v0/lists/teardown - is this correct? Most other API endpoints have v1 in the path (for example page html revisions are of the form /w/rest.php/v1/revision/$PAGE/html)

Change #1143127 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] rest-gateway: route reading lists API

https://gerrit.wikimedia.org/r/1143127

@hnowlan -- just want to confirm that we will follow this rollout path, so that the mobile apps team has time to test and respond:

  • Testwiki first, to enable internal/manual testing (~1-2+ weeks, depending on issues found).
  • ~6 small to medium wikis to ensure stability and solicit feedback (~1-2 weeks).
  • All remaining wikis.

It was in an earlier comment, so it may have been overlooked :)

Also confirmed that my understanding is that all readinglist endpoints are all under v0, as they are all marked as 'unstable'.

@hnowlan -- just want to confirm that we will follow this rollout path, so that the mobile apps team has time to test and respond:

  • Testwiki first, to enable internal/manual testing (~1-2+ weeks, depending on issues found).
  • ~6 small to medium wikis to ensure stability and solicit feedback (~1-2 weeks).
  • All remaining wikis.

It was in an earlier comment, so it may have been overlooked :)

For internal testing, once the change is live on the rest-gateway (which will not route any external production traffic), testing can begin immediately. That rollout plan sounds good to me and can easily be done via the existing logic we have for migrating APIs.

Awesome. @Seddon confirmed that the apps team is good to go with internal testing whenever you are, and would like to get this rolling as soon as possible to avoid conflicts with other commitments this quarter. Let's plan on 1 week for each phase unless they report something here.

Thanks, @hnowlan !

Change #1143127 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: route reading lists API

https://gerrit.wikimedia.org/r/1143127

Internal testing can now be run against the REST gateway using URLs of the form https://rest-gateway.discovery.wmnet:4113/en.wikipedia.org/v0/data/lists/ - don't forget to set the Host header per-wiki though! :) Please let me know if there's anything I can help with or any issues. I will prepare a change for the edge that we can move forward with once the gateway has been validated (or we can move directly ahead with the edge for more extensive testing as this will only affect testwiki at first)

Change #1148285 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] trafficserver: route testwiki reading lists APIs without restbase

https://gerrit.wikimedia.org/r/1148285

Change #1148285 merged by Hnowlan:

[operations/puppet@production] trafficserver: route testwiki reading lists APIs without restbase

https://gerrit.wikimedia.org/r/1148285

Change #1148813 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] rest-gateway: fix typo in incoming URL

https://gerrit.wikimedia.org/r/1148813

Change #1148813 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: fix typo in incoming URL

https://gerrit.wikimedia.org/r/1148813

The lists API on testwiki is now routed via the rest-gateway and without restbase. You can verify this by the server header changing from restbase to envoy. There are already some (unsurprising) differences in responses for errors:

hnowlan@plunkett ~ $ curl -s https://en.wikipedia.org/api/rest_v1/data/lists/ | jq .
{
  "type": "https://mediawiki.org/wiki/HyperSwitch/errors/unauthorized",
  "title": "notloggedin",
  "method": "get",
  "detail": "You must be logged in to view your private information.",
  "uri": "/en.wikipedia.org/v1/data/lists/"
}
hnowlan@plunkett ~ $ curl -s https://test.wikipedia.org/api/rest_v1/data/lists/ | jq .
{
  "type": "MediaWikiError/Bad_Request",
  "title": "rest-permission-denied-anon",
  "method": "get",
  "detail": "Not accessible by anonymous user",
  "uri": "/w/rest.php/readinglists/v0/lists/",
  "errorKey": "rest-permission-denied-anon",
  "messageTranslations": {
    "en": "Not accessible by anonymous user"
  },
  "httpCode": 400,
  "httpReason": "Bad Request"
}

Change #1149624 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] trafficserver: restbaseless reading lists API for ~group0

https://gerrit.wikimedia.org/r/1149624

Change #1149625 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] trafficserver: restbaseless reading lists API for all wikis

https://gerrit.wikimedia.org/r/1149625

Change reviews are open for moving the API to something that kinda mimicks group1 and then to full production. Let me know when you have an idea of when you'd like to move these changes forwards.

Changes look good, +1'd. I'll let @HCoplin-WMF and/or @Seddon comment on timing. From a technical perspective, there are no blockers I'm aware of.

I've done a round of testing of our Reading Lists feature and its various workflows, all while communicating with testwiki.
Everything seems to work as expected; Not seeing anything unusual from any particular endpoint.

Jdlrobson removed a project: Web-Team.
Jdlrobson subscribed.

Does not impact web team.

Are we ready to proceed with the quasi-group1 migration at this point? And do we have an idea of the timeframe we'd like to do full production after that point?

@hnowlan -- just want to confirm that we will follow this rollout path, so that the mobile apps team has time to test and respond:

  • Testwiki first, to enable internal/manual testing (~1-2+ weeks, depending on issues found).
  • ~6 small to medium wikis to ensure stability and solicit feedback (~1-2 weeks).
  • All remaining wikis.

It was in an earlier comment, so it may have been overlooked :)

According to this, we should be good to go with ~6 small to medium wikis for 2 weeks before full roll-out.

Change #1149624 merged by Hnowlan:

[operations/puppet@production] trafficserver: restbaseless reading lists API for ~group1

https://gerrit.wikimedia.org/r/1149624

Restbaseless reading list APIs are being rolled out for cawiki, hewiki, itwiki, meta and commons over the next 30 minutes or so. Some basic tests on my end looked okay, please let me know if there's anything I can do on our end to follow up on this.

Given that the migration of reading lists is blocking the deprecation of parsoid in restbase (T344944), is there any chance we could move the timeline up a bit for a full rollout? Details of the service's condition in the rest-gateway can be seen on this dashboard

Just confirming here that we spoke in Slack, and everyone is good to move up the timeline. @hnowlan confirmed starting the full roll out as early as tomorrow (June 11).

Change #1149625 merged by Hnowlan:

[operations/puppet@production] trafficserver: restbaseless reading lists API for all wikis

https://gerrit.wikimedia.org/r/1149625

HCoplin-WMF claimed this task.

Marking as resolved! Feel free to submit a bug or reopen if any issues come up. Thanks, all!