Add option to exclude disambiguation pages from random/morelike (&others?) results.
Open, LowPublic

Description

For example, presently to show a single non-disambig random article we have to use one of the two following approaches:

  • add "pageprops" to the random query, run the query, check returned pageprops for "disambiguation" and if found keep re-requesting another random article until we get an article without "disambiguation" pageprop
  • make the random query fetch multiple random articles, say 5, and check for one without "disambiguation" pageprop, hoping that we didn't get 5 disambiguation articles

What would be nice to have is a parameter to exclude disambig pages, so, in the case of random, we could just ask for a single non-disambig random article.

Mhurd created this task.Nov 25 2015, 6:34 PM
Mhurd updated the task description. (Show Details)
Mhurd raised the priority of this task from to Needs Triage.
Mhurd added a project: Discovery.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 25 2015, 6:34 PM

Probably want to exclude redirects as well.

MaxSem edited projects, added MediaWiki-API; removed Discovery.Nov 25 2015, 10:05 PM

I've readded Discovery to this; since the task relates to content discovery mechanisms, this is definitely with Discovery's realm.

Anomie added a subscriber: Anomie.Nov 26 2015, 1:47 AM

Note that the API querys for list=random are currently along the lines of

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND page_random >= $START ORDER BY page_random, page_id LIMIT 5001

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) ORDER BY page_random, page_id LIMIT 5001

SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random, page_is_redirect FROM page WHERE page_namespace IN (...) AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001
SELECT page_id, page_title, page_namespace, page_random FROM page WHERE page_namespace IN (...)  AND page_is_redirect = $REDIR AND (page_random = $START AND page_id >= $STARTID OR page_random > $START) AND page_random < $END ORDER BY page_random, page_id LIMIT 5001

You'll need to make sure that whatever changes you propose have decent performance when combined into these queries.

Deskana triaged this task as Low priority.Dec 29 2015, 10:25 PM

This can't really be prioritised right now.

Deskana moved this task from Needs triage to Search on the Discovery board.Dec 29 2015, 10:25 PM