Page MenuHomePhabricator

Add option to exclude disambiguation pages from random/morelike (&others?) results.
Open, LowPublic

Description

For example, presently to show a single non-disambig random article we have to use one of the two following approaches:

  • add "pageprops" to the random query, run the query, check returned pageprops for "disambiguation" and if found keep re-requesting another random article until we get an article without "disambiguation" pageprop
  • make the random query fetch multiple random articles, say 5, and check for one without "disambiguation" pageprop, hoping that we didn't get 5 disambiguation articles

What would be nice to have is a parameter to exclude disambig pages, so, in the case of random, we could just ask for a single non-disambig random article.

Event Timeline

Mhurd raised the priority of this task from to Needs Triage.
Mhurd updated the task description. (Show Details)
Mhurd added a project: Discovery.

Probably want to exclude redirects as well.

I've readded Discovery to this; since the task relates to content discovery mechanisms, this is definitely with Discovery's realm.

Note that the API querys for list=random are currently along the lines of

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001

You'll need to make sure that whatever changes you propose have decent performance when combined into these queries.

This can't really be prioritised right now.