Page MenuHomePhabricator

Special:RandomInCategory does not return all pages with equal probability
Closed, DuplicatePublic

Description

RandomInCategory uses the page_random field to draw a random page. While this field is uniformly distributed among all pages in a large wiki, it is certainly not uniformly distributed among pages within a category, particularly a small- to medium-size category (think hundred of pages, not tens of thousands).

Use case

The draft review process on en uses Category:Pending AfC submissions to keep track of all the drafts awaiting review, and Special:RandomInCategory to let reviewers pick the next draft to work on. The problem is, the results are very much non-random. See this Village Pump thread for my analysis of the problem.

Possible solution

Methods to draw "fairly" from a non-uniformed distribution could be implemented. A quick search leads to this page on StackOverflow which leads to the alias method which apparently has O(1).

This approach would require retrieving all pages in a category though; this is against what we currently do (which is to rely on the database's ORDER BY mechanism, only retrieving the pertinent rows). I think a smarter approach would be to use existing code when the category size is large, and use a more balanced approach (say, the alias method) if the category is smaller in size.

I can start patching something together, but would rather hold off a few days so that:

  • We decide if we need any analytics before we move forward (such as how often RandomInCategory is used on, say, enwiki; how often it is for small-to-medium categories, etc.)
  • We decide which method we want to implement (I recall there are more options, though none comes immediately to mind)