Page MenuHomePhabricator

[Bug] GET /data/lists doesn't include "next" in an incomplete response
Open, NormalPublic

Description

Steps to Reproduce
  1. Set up an account with 22 reading lists
  2. Perform a get request (https://en.wikipedia.org/api/rest_v1/data/lists/)

Expected results: the response includes the "next" parameter that can be piped into the next request as documented here https://en.wikipedia.org/api/rest_v1/#/Reading%20lists/get_data_lists_

Actual results: the response does not include the "next" parameter and does not include all reading lists (the default reading list is missing and the count is not right)

Affects all GET endpoints, not just lists.

Analysis

rMWd9f688698ce0: rdbms: clean up and refactor ResultWrapper classes (deployed with wmf/1.34.0-wmf.8, so between June 4-6) changed a detail of how MediaWiki iterates DB results (when iterated in a foreach loop, indexes changed from 1-based to 0-based), which broke the check in the ReadingLists extension detecting whether there are more results available (when 10 results are requested, the extension would fetch 11 and check if the 11th exists). Since extension CI tests are only run when the extension code changes, not when core code changes, this was only caught the next time extension code was changed (T226640: ReadingLists CI broken). The error was fixed in rERLS151625b93bc9: Fix API continuation and that was deployed on July 3.

Impact

Applications which have updated their local copy of the lists based on one of the erroneous responses have incorrect local copies (partial or outdated, depending on whether the bug affected them during a fetch or a sync) and their local copies will have to be invalidated.

Conclusions

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 27 2019, 7:06 PM

@Tgr any idea what's going on here?

Adding iOS for tracking

Restricted Application added a project: Wikipedia-iOS-App-Backlog. · View Herald TranscriptJun 27 2019, 8:12 PM
Tgr added a comment.Jun 28 2019, 5:54 AM

See T226640: ReadingLists CI broken. Should be fixed with the next MediaWiki deploy.

JMinor triaged this task as Normal priority.Jul 1 2019, 6:38 PM
JMinor moved this task from Needs Triage to Tracking on the Wikipedia-iOS-App-Backlog board.

Fix should be live now.

fixed. thank you!

Tgr added a subscriber: Dbrant.Jul 5 2019, 2:46 PM

Is this fully resolved, or does it need follow-up for clearing incorrect client-side caches?

Ping @Dbrant, this probably affected Android as well.

Looks good on iOS

@Tgr We just had a user mention this is still happening for him - see T226291 - so we may want to have a look at any potential lingering cacheing issues.

Tgr added a comment.EditedJul 10 2019, 9:41 AM

There are two ways to use the Reading Lists API: fetch all lists (or all pages in a given list) for a given user, or fetch changes since a given timestamp. I'm not sure about the current app logic, but AIUI the idea was to do a full fetch first and then syncs and only do another full fetch of the app has been offline for so long that the API does not remember all changes anymore (currently that means more than 30 days). During the ~1 month the bug was live, any response longer than 11 items was terminated without indicating that it is a partial response and should be continued in another request (this was a bug with the DB result iteration logic so it affected everything - GET /data/lists/, GET /data/lists/{id}/entries/, GET /lists/changes/since/{date}). So if a client did a partial fetch or sync during that time and was made believe it was a full one, later it would only try to sync changes after that time, it wouldn't know to request the missing items.

If that is the problem (and not, say, another server-side bug we are not yet aware of), the only reliable way to fix it is to force the client to make a full fetch instead of a sync. That probably needs to happen on the client side.
(Other things considered that wouldn't work: change the last sync timestamp stored in the client to be before the start of the bug, or change the list / list item last change times in the DB to be after the fixing of the bug. Neither would work because the bug has started more than 30 days ago so some changes might have been garbage collected already.)

This issue seems disruptive enough that there should be an incident report + announcement, I will do that.

Charlotte moved this task from Needs Triage to Tracking on the Wikipedia-Android-App-Backlog board.

Well darn. Thanks for this clear explanation @Tgr.

@Dbrant let's have a wee chat about this with the others at standup.

Tgr updated the task description. (Show Details)Jul 10 2019, 10:23 AM
Tgr updated the task description. (Show Details)

Moving this to tracking on the RI board as I believe the API is fixed, and we probably can't help with the client-side cleanup.