Page MenuHomePhabricator

Wikipedia app - random article randomness is weak
Closed, ResolvedPublic8 Estimated Story Points

Description

I regularly use the Wikipedia app on Android platform. I usually use the "Random article" function.
My remark is that the "randomness" is very weak. Very often it gives the same article within 50 clicks.

I recommend to improve the random generator.
Meanwhile I suggest to store the already shown articles up to 100.000 pieces, and do not allow to show the same article within this.

Event Timeline

LGoto renamed this task from Wikipedia Android app - random article randomness is weak to Wikipedia app - random article randomness is weak.Dec 7 2020, 5:10 PM
LGoto triaged this task as Low priority.
LGoto raised the priority of this task from Low to Needs Triage.
LGoto triaged this task as Low priority.
LGoto moved this task from Needs Triage to Tracking on the Wikipedia-Android-App-Backlog board.
LGoto subscribed.

Sounds like a caching issue.

AnnaMikla set the point value for this task to 8.Jan 4 2021, 1:14 PM

Hi! I've tried to reproduce the issue on Android platform. I've tested "Random article" function with over 300 items but did not get duplicates.

The same for iOs - tested 300+ random articles - no duplication. Also looked through the php code (which is responsible for the content (random articles) ) - the "randomness" is very high.

There is a setting here: https://github.com/wikimedia/wikifeeds/blob/master/lib/random.js#L71 -> 12 items are requested. Maybe we could increase this value ?
Or there is a possibility the testing should be performed in some specific way ?

What, you, guys could suggest to reproduce the issue ? Or can we just increase item requested from 12 to, for instance, 24 ?

Thanks!

Could this be a caching related issue that either got resolved or is only affecting specific cases?

There's a bug with caching here, but now what you'd think. Restbase is setting a very low cache-control, 1 second for client and 2 seconds for shared caches, but it uses incorrect cache_control header :) So, in reality this is all uncached.

Perhaps fixing RESTBase to remove cache_control altogether could be a good idea.

As for fixing the randomness - this API is just a wrapper over MW action API random article feature, which also filters out some undesirable articles (thus, fetching a list of 12 while returning just one).

One thing I can imagine improving is that in wikifeeds after 12 pages are selected, the score is calculated and then a page with max score is returned. But often the score will be the same for all pages, so the first one out of 12 will be returned, which is deterministic. So, perhaps adding a random number 0 to 1 to a score will make the randomness a little better.

Change 674002 had a related patch set uploaded (by Art.tsymbar; owner: arttsymbar):
[mediawiki/services/wikifeeds@master] Retrieving random page from a list of items with max score. This change is required but does not resolves the issue completely as fixing RESTBase to remove cache_control is needed as well.

https://gerrit.wikimedia.org/r/674002

Change 674876 had a related patch set uploaded (by Art.tsymbar; author: arttsymbar):
[mediawiki/services/restbase@master] Remove cache_control for random

https://gerrit.wikimedia.org/r/674876

Change 674002 merged by jenkins-bot:

[mediawiki/services/wikifeeds@master] Retrieving random page from a list of items with max score. This change is required but does not resolves the issue completely as fixing RESTBase to remove cache_control is needed as well.

https://gerrit.wikimedia.org/r/674002