Page MenuHomePhabricator

page_random is 0 for many pages, excluding them from "random page" requests
Closed, ResolvedPublic

Description

Playing with the dumps on zedler, I just found out that a lot of pages have a
page_random value of 0, which causes them to never show up as a "random page".
This seems to be the case mainly for image description pages (namespace 6).

This not only causes the the "random page" and "random image" features to behave
incorrectly, it also effects tools that rely on a random selection of pages.
Please have a look at what is causing this.

Below are some of the results I found when playing with this:

mysql> use commonswiki;
mysql> select count(*) from page where page_random = 0;
+----------+

55832

+----------+
mysql> select count(*) from page where page_random = 0 and page_namespace = 6;
+----------+

54409

+----------+
mysql> use enwiki;
mysql> select count(*) from page where page_random = 0;
+----------+

62879

+----------+
mysql> select count(*) from page where page_random = 0 and page_namespace = 6;
+----------+

56830

+----------+

(I appologize in advance for broken formating - MediaZill needs a preview button...)


Version: 1.6.x
Severity: normal

Details

Reference
bz3946

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 8:54 PM
bzimport added a project: Wikimedia-Rdbms.
bzimport set Reference to bz3946.
bzimport added a subscriber: Unknown Object (MLST).
daniel created this task.Nov 13 2005, 1:45 AM

This is a conversion issue -- since Special:Randompage previously only applied to non-
redirects in namespace 0, code which inserted image description pages and redirects
tended to miss setting cur_random. Because of the consolidation of article insert code
in 1.5, this is now resolved for new pages. However there will still be many pages left
with page_random=0, as Daniel points out.

I've fixed this in Wikimedia databases, but it might be nice to have a query like

UPDATE page SET page_random=RAND() WHERE page_random=0

in future installers.

robchur wrote:

Bear with me if I'm missing the trick or the point, but wouldn't you want that
to be UPDATE page SET page_random=RAND() WHERE page_random=0,
page_is_redirect=0; in order to exclude redirects?

avarab wrote:

(In reply to comment #2)

to be UPDATE page SET page_random=RAND() WHERE page_random=0,
page_is_redirect=0; in order to exclude redirects?

First, that's invalid SQL, second page_random should be a nonzero value for
every row, including for rows where page_is_redirect = 0.

robchur wrote:

Sorry; I wasn't thinking. Yes, it's invalid SQL and yes I made the glaring
balls-up of specifying WHERE page_is_redirect = 0, when in fact my comment on it
made it clear I meant the other way round.

avarab wrote:

Added a database update for this bug to HEAD, marking this as FIXED in CVS HEAD