RandomInCategory uses the `page_random` field to draw a random page. While this field is uniformly distributed among all pages in a large wiki, it is certainly not uniformly distributed among pages within a category, particularly a small- to medium-size category (think hundred of pages, not tens of thousands).
===== Use cases
On enwiki: The draft review process on en uses [[:Category:Pending AfC submissions]] to keep track of all the drafts awaiting review, and Special:RandomInCategory to let reviewers pick the next draft to work on. The problem is, the results are very much non-random. See [[ https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=911424503#Picking_a_draft_to_review:_RandomInCategory_isn't_very_random | this Village Pump thread ]] for my analysis of the problem.
On Commons: I (DerHexer) use RandomInCategory for some of the categories on Commons of images which I uploaded, bookmarked in my browser. E.g. https://commons.wikimedia.org/wiki/Special:RandomInCategory/Files_by_DerHexer/Internationales_Deutsches_Turnfest_Berlin_2017 , even more obvious in smaller categories like https://commons.wikimedia.org/wiki/Special:RandomInCategory/Elisabeth_Seitz_at_Internationales_Deutsches_Turnfest_Berlin_2017 .
Unlike expected, I do not get random images but some very often, sometimes even in the similar order when I click the bookmark, sometimes it opens only subcategories but not files. How can this happen? The function is very helpful and I would very much appreciate to promote this onwiki but this broken I am a bit hesitant.
Thanks for fixing in advance!
===== Possible solution
Methods to draw "fairly" from a non-uniformed distribution could be implemented. A quick search leads to [[ https://stackoverflow.com/questions/14915899/pick-random-element-from-set-with-non-uniform-distribution | this page on StackOverflow ]] which leads to [[ https://pandasthumb.org/archives/2012/08/lab-notes-the-a.html | the alias method ]] which apparently has O(1).
This approach would require retrieving all pages in a category though; this is against what we currently do (which is to rely on the database's ORDER BY mechanism, only retrieving the pertinent rows). I think a smarter approach would be to use existing code when the category size is large, and use a more balanced approach (say, the alias method) if the category is smaller in size.
I (Huji) can start patching something together, but would rather hold off a few days so that:
* We decide if we need any analytics before we move forward (such as how often RandomInCategory is used on, say, enwiki; how often it is for small-to-medium categories, etc.)
* We decide which method we want to implement (I recall there are more options, though none comes immediately to mind)