Page MenuHomePhabricator

Add option to exclude disambiguation pages from random/morelike (&others?) results.
Open, LowPublic

Description

For example, presently to show a single non-disambig random article we have to use one of the two following approaches:

  • add "pageprops" to the random query, run the query, check returned pageprops for "disambiguation" and if found keep re-requesting another random article until we get an article without "disambiguation" pageprop
  • make the random query fetch multiple random articles, say 5, and check for one without "disambiguation" pageprop, hoping that we didn't get 5 disambiguation articles

What would be nice to have is a parameter to exclude disambig pages, so, in the case of random, we could just ask for a single non-disambig random article.

Event Timeline

Mhurd raised the priority of this task from to Needs Triage.
Mhurd updated the task description. (Show Details)
Mhurd added a project: Discovery-ARCHIVED.

Probably want to exclude redirects as well.

I've readded Discovery-ARCHIVED to this; since the task relates to content discovery mechanisms, this is definitely with Discovery's realm.

Note that the API querys for list=random are currently along the lines of

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001

You'll need to make sure that whatever changes you propose have decent performance when combined into these queries.

This can't really be prioritised right now.

There are two ways to solve this:

  1. Using a bot, put everything into a category called category: index and then use Special: RandominCategory/index

Note Wikipedia doesn't have a proper index, which for an encyclopedia is ... odd.

  1. Make a disambiguation: namespace and remove disambiguation pages from the main namespace, where they have no business really.