See attached screenshot. "Megan, Duchess of Sussex" article appears twice in the Trending feed. I would expect the same article to appear just once. App version: 2.7.232-r-2018-04-17
Description
See attached screenshot. "Megan, Duchess of Sussex" article appears twice in the Trending feed. I would expect the same article to appear just once. App version: 2.7.232-r-2018-04-17
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
mediawiki/services/mobileapps | master | +285 -24 | Most-read: Filter redirect-caused duplicates |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Mholloway | T195390 Trending shows same article twice | |||
Declined | • Mholloway | T199455 Move summary hydration and result duplication into MCS |
Event Timeline
Looks like the API gives the same articles from the "most read" recently, and not only the Megan, Duchess of Sussex but also the Prince_Harry,_Duke_of_Sussex appears twice.
I took a quick look and found the bug happens in the following days:
https://en.wikipedia.org/api/rest_v1/feed/featured/2018/05/22
https://en.wikipedia.org/api/rest_v1/feed/featured/2018/05/21
https://en.wikipedia.org/api/rest_v1/feed/featured/2018/05/20
https://en.wikipedia.org/api/rest_v1/feed/featured/2018/05/19
Tagging the iOS app too since I noticed it in the iOS app. This is probably just for tracking for the iOS team, though.
I think this is sort of a one-off situation caused by the recent move of the "Meghan Markle" page to "Meghan, Duchess of Sussex" (by Jimbo Wales himself). We pick up both separately in our pageview data (example), and the redirect is resolved during processing. I guess we'll need to add a dedupe step or something.
Change 434806 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Most-read: Filter redirect-caused duplicates
Triaging this as low-priority since the situation in which it arises—in which an article is moved precisely as it spikes in popularity—is relatively rare.
Change 434806 abandoned by Mholloway:
Most-read: Filter redirect-caused duplicates
Reason:
After some discussion, it seems we'll be going in the direction of generalizing result deduplication in RESTBase.
Mentioned in SAL (#wikimedia-operations) [2018-07-17T10:31:32Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@622941d]: Expose the data/mobile/javascript end point, deduplicate most-read results and increase page/related response size to 20 - T199458 T195390
Mentioned in SAL (#wikimedia-operations) [2018-07-17T10:43:36Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@622941d]: Expose the data/mobile/javascript end point, deduplicate most-read results and increase page/related response size to 20, take #2 - T199458 T195390
Mentioned in SAL (#wikimedia-operations) [2018-07-17T11:03:19Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@622941d]: Expose the data/mobile/javascript end point, deduplicate most-read results and increase page/related response size to 20, take #2 - T199458 T195390 (duration: 19m 43s)