Add pagination to Special:Lonelypages
OpenPublic

Description

Author: mapellegrini

Description:
[[Special:Lonelypages]] (which is linked to from the en main page to increase
the visibility of orphaned articles) cannot display more than the first 1,000
orphaned pages (by alphabetical order)


Version: unspecified
Severity: enhancement
URL: https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=69660

bzimport added a project: MediaWiki-Special-pages.Via ConduitNov 21 2014, 8:33 PM
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz2415.
bzimport created this task.Via LegacyJun 14 2005, 9:37 PM
hashar added a comment.Via ConduitJul 13 2005, 5:37 PM

The limit of 1000 pages is set to make it faster. Resolve
some of those lonely page and you will get the next.

bzimport added a comment.Via ConduitJul 14 2005, 2:04 AM

zigger wrote:

(In reply to comment #1)

The limit of 1000 pages is set to make it faster. Resolve
some of those lonely page and you will get the next.

This is prevented by the combination of the limit, the caching, and no visual
indication of resolved orphans makes this difficult. See bug 2599.

Sj added a comment.Via ConduitJan 21 2006, 2:06 AM

Static dumps of these specialpage outputs once a [month? for starters] -- the entire output, even if it's 20k article titles -- would be helpful. Likewise for ancientpages, &c.

bzimport added a comment.Via ConduitJan 21 2006, 2:23 AM

avarab wrote:

This bug depends on bug 4699 to be able to enable this without bogging down the
servers, adding dependancy. Changing component since this is a Wikimedia
configuration issue.

bzimport added a comment.Via ConduitJan 21 2007, 4:39 PM

shunpiker wrote:

Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Instead of using LIMIT to restrict the number of rows fetched from the query
cache, SELECT an indexed range. This should make operations on the query cache
trivially expensive so that the limits can be disabled in the $wgQueryPages
array. (Loading the cache without a LIMIT is only a little more expensive since
the bulk of the expense of these queries is in assembling the results rather
than returning them.)

For the purpose of this patch, I only turned off the cache limit on
SpecialLonelypages, but I recommend that the rest of the cache limits be
likewise disabled.

Tested with a build of the SimpleEnglish data.

attachment consolidated.patch ignored as obsolete

bzimport added a comment.Via ConduitJan 21 2007, 4:40 PM

shunpiker wrote:

Comment on attachment 3106
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

might also be relevant to #4699
http://bugzilla.wikimedia.org/show_bug.cgi?id=4699

bzimport added a comment.Via ConduitJan 22 2007, 3:02 AM

shunpiker wrote:

Comment on attachment 3106
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Please note schema changes -- before deploying code:

alter table querycache add column qc_serial int(5) unsigned NOT NULL default
'0';
alter table querycache add unique key (qc_type,qc_serial);

bzimport added a comment.Via ConduitJan 22 2007, 11:32 PM

shunpiker wrote:

Wish I could edit previous comments! Anyway, with deeper study of MySql's LIMIT
optimizations, it seems like some queries may benefit from LIMITs more than
others, particularly ones where the results can be compiled during a range scan.
But queries like this one (and its evil twin, Deadendpages) seem to do most of
their work before the LIMIT is applied. Would it be possible to test this one
time -- disabling the LIMIT for Lonelypages and comparing run time and row count
to the LIMITed version?

bzimport added a comment.Via ConduitMar 17 2007, 9:32 AM

shunpiker wrote:

Comment on attachment 3106
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Index: includes/QueryPage.php

  • includes/QueryPage.php (revision 19480)

+++ includes/QueryPage.php (working copy)
@@ -19,7 +19,7 @@
array( 'DoubleRedirectsPage', 'DoubleRedirects'
),
array( 'ListUsersPage', 'Listusers'
),
array( 'ListredirectsPage', 'Listredirects' ),

  • array( 'LonelyPagesPage', 'Lonelypages'

),
+ array( 'LonelyPagesPage', 'Lonelypages', false
),
array( 'LongPagesPage', 'Longpages'
),
array( 'MostcategoriesPage', 'Mostcategories'
),
array( 'MostimagesPage', 'Mostimages'
),
@@ -114,6 +114,15 @@

		return "SELECT 'sample' as type, 0 as namespace, 'Sample

result' as title, 42 as value";
}

+ ASSUMPTION: arguments are SQL clean
+ function getCacheSQL($querycache,$type,$limit,$offset) {
+ $first = $offset + 1;
+ $last = $offset + $limit;
+ $sql = "SELECT qc_type as type, qc_serial as serial_id,
qc_namespace as namespace, qc_title as title, qc_value as value FROM
$querycache WHERE qc_type='$type' AND qc_serial BETWEEN $first AND $last ORDER
BY qc_serial";
+
Cache results are assumed to be sorted on insert
+ return $sql;
+ }
+
/**

  • Override to sort by increasing values
	 */

@@ -221,8 +230,9 @@

		if ( $res ) {
			$num = $dbr->numRows( $res );
			# Fetch results
  • $insertSql = "INSERT INTO $querycache

(qc_type,qc_namespace,qc_title,qc_value) VALUES ";
+ $insertSql = "INSERT INTO $querycache
(qc_type,qc_serial,qc_namespace,qc_title,qc_value) VALUES ";

			$first = true;

+ $serial_id = 0;

			while ( $res && $row = $dbr->fetchObject( $res ) ) {
				if ( $first ) {
					$first = false;

@@ -237,6 +247,7 @@

				$insertSql .= '(' .
					$dbw->addQuotes( $row->type ) . ',' .

+ $dbw->addQuotes( ++$serial_id ) . ',' .

					$dbw->addQuotes( $row->namespace ) .

',' .

					$dbw->addQuotes( $row->title ) . ',' .
					$dbw->addQuotes( $value ) . ')';

@@ -288,13 +299,13 @@

		if ( !$this->isCached() ) {
			$sql = $this->getSQL();

+ $sql .= $this->getOrder();
+ $sql = $dbr->limitResult($sql, $limit, $offset);

		} else {
			# Get the cached result
			$querycache = $dbr->tableName( 'querycache' );
			$type = $dbr->strencode( $sname );
  • $sql =
  • "SELECT qc_type as type, qc_namespace as

namespace,qc_title as title, qc_value as value

  • FROM $querycache WHERE qc_type='$type'";

+ $sql =
$this->getCacheSQL($querycache,$type,$limit,$offset);

			if( !$this->listoutput ) {

@@ -324,8 +335,6 @@

		}
  • $sql .= $this->getOrder();
  • $sql = $dbr->limitResult($sql, $limit, $offset);
		$res = $dbr->query( $sql );
		$num = $dbr->numRows($res);

Index: includes/SpecialLonelypages.php

  • includes/SpecialLonelypages.php (revision 19480)

+++ includes/SpecialLonelypages.php (working copy)
@@ -35,8 +35,7 @@

		return
		  "SELECT 'Lonelypages'  AS type,
			  page_namespace AS namespace,
  • page_title AS title,
  • page_title AS value

+ page_title AS title

		     FROM $page
		LEFT JOIN $pagelinks
		       ON page_namespace=pl_namespace AND page_title=pl_title

@@ -45,6 +44,11 @@

		      AND page_is_redirect=0";

}
+
+ function getOrder() {
+ return ' ORDER BY title '; // no value column for this query
+ }
+
}

/**

Index: maintenance/archives/patch-querycache-serial.sql

  • maintenance/archives/patch-querycache-serial.sql (revision 0)

+++ maintenance/archives/patch-querycache-serial.sql (revision 0)
@@ -0,0 +1,8 @@
+-- Add a serial column to store an id for each row according to qc_type.
+-- Add an index (qc_type, qc_serial) to facilitate paged sequential reads.
+
+ALTER TABLE querycache
+ ADD (qc_serial int(5) unsigned);
+
+ALTER TABLE querycache
+ ADD UNIQUE KEY(qc_type, qc_serial);

Index: maintenance/mysql5/tables-binary.sql

  • maintenance/mysql5/tables-binary.sql (revision 19480)

+++ maintenance/mysql5/tables-binary.sql (working copy)
@@ -946,6 +946,9 @@

  • A key name, generally the base name of of the special page. qc_type char(32) NOT NULL,

+ -- A serial id for each cache row according to type
+ qc_serial int(5) unsigned,
+

  • Some sort of stored value. Sizes, counts... qc_value int(5) unsigned NOT NULL default '0',

Index: maintenance/mysql5/tables.sql

  • maintenance/mysql5/tables.sql (revision 19480)

+++ maintenance/mysql5/tables.sql (working copy)
@@ -937,6 +937,9 @@

  • A key name, generally the base name of of the special page. qc_type char(32) NOT NULL,

+ -- A serial id for each cache row according to type
+ qc_serial int(5) unsigned,
+

  • Some sort of stored value. Sizes, counts... qc_value int(5) unsigned NOT NULL default '0',

Index: maintenance/postgres/tables.sql

  • maintenance/postgres/tables.sql (revision 19480)

+++ maintenance/postgres/tables.sql (working copy)
@@ -362,10 +362,12 @@

CREATE TABLE querycache (

qc_type	 TEXT	   NOT NULL,

+ qc_serial SMALLINT,

qc_value	 SMALLINT  NOT NULL,
qc_namespace  SMALLINT  NOT NULL,
qc_title	 TEXT	   NOT NULL

);
+CREATE UNIQUE INDEX querycache_type_serial ON querycache (qc_type, qc_serial);
CREATE INDEX querycache_type_value ON querycache (qc_type, qc_value);

CREATE TABLE querycache_info (

Index: maintenance/tables.sql

  • maintenance/tables.sql (revision 19480)

+++ maintenance/tables.sql (working copy)
@@ -925,6 +925,9 @@

  • A key name, generally the base name of of the special page. qc_type char(32) NOT NULL,

+ -- A serial id for each cache row according to type
+ qc_serial int(5) unsigned,
+

  • Some sort of stored value. Sizes, counts... qc_value int(5) unsigned NOT NULL default '0',

@@ -932,6 +935,7 @@

qc_namespace int NOT NULL default '0',
qc_title char(255) binary NOT NULL default '',

+ UNIQUE KEY (qc_type,qc_serial),

KEY (qc_type,qc_value)

) TYPE=InnoDB;

Index: maintenance/updaters.inc

  • maintenance/updaters.inc (revision 19480)

+++ maintenance/updaters.inc (working copy)
@@ -40,6 +40,7 @@

$wgNewFields = array(

  1. table field patch file (in

maintenance/archives)
+ array( 'querycache', 'qc_serial',
'patch-querycache-serial.sql' ),
array( 'ipblocks', 'ipb_id', 'patch-ipblocks.sql' ),
array( 'ipblocks', 'ipb_expiry', 'patch-ipb_expiry.sql' ),
array( 'recentchanges', 'rc_type', 'patch-rc_type.sql' ),

bzimport added a comment.Via ConduitMar 17 2007, 9:37 AM

shunpiker wrote:

Querycache enhancement (serial range scans for paging), version 2

Update to previous patch to include updaters.inc/archives. qc_serial is now
nullable with no default in order to support updating that table without having
to empty the legacy querycache.

attachment consolidated3.patch ignored as obsolete

bzimport added a comment.Via ConduitNov 9 2011, 4:59 PM

sumanah wrote:

The bug is still reproducible. https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500

Riley, I'm sorry it's taken so long for you to get a response to your patch. I am trying to get developers to respond soon.

bzimport added a comment.Via ConduitNov 9 2011, 7:15 PM

shunpiker wrote:

Thanks, Sumana. I'd be glad to look into the details if the points of contact for this patch have changed since (wow!) 2007.

Catrope added a comment.Via ConduitNov 20 2011, 6:39 PM

Marking patch as obsolete since it doesn't apply anymore, at all. The QueryPage subsystem saw major architectural changes in December 2010. For one thing, there is now no raw SQL left, which means that most of the patch will need to be rewritten.

bzimport added a comment.Via ConduitNov 26 2011, 8:49 PM

sumanah wrote:

Removed "patch" and "need-review" keywords.

Riley, it does look like -- as you put it -- the points of contact for the patch have changed rather a lot since 2007. You can look around on mediawiki.org with a search for "QueryPage", visit the MediaWiki development community in IRC -- https://www.mediawiki.org/wiki/MediaWiki_on_IRC -- or in the mailing list -- https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- to learn more about the current state of this component and talk about approach. Thank you for your interest; hope we can help!

bzimport added a comment.Via ConduitNov 28 2011, 5:51 AM

shunpiker wrote:

Rewrite of previous patch using branch REL1_17 (104384)

Rewrote the patch to conform to changes since it was originally written.

Attached: 3353_rewrite.patch

bzimport added a comment.Via ConduitNov 28 2011, 7:05 AM

shunpiker wrote:

With a querycache of 250,000 rows, including 50,000 lonelypages rows, a query with a large offset without the patch:

select SQL_NO_CACHE qc_type, qc_namespace, qc_title, qc_value from querycache
where qc_type = 'lonelypages'
order by qc_value ASC
limit 500 offset 35000;

--> about 0.191 seconds

The same offset with the patch:

select SQL_NO_CACHE qc_type, qc_namespace, qc_title, qc_value from querycache
where qc_type = 'lonelypages'"+
and qc_seq between 35001 and 35500
order by qc_value ASC;

--> about 0.004 seconds
bzimport added a comment.Via ConduitNov 28 2011, 3:49 PM

sumanah wrote:

Riley, you based your current patch off branch REL1_17 -- for better freshness & reviewability, you should base it off SVN trunk.

bzimport added a comment.Via ConduitNov 28 2011, 6:54 PM

sumanah wrote:

Riley noted to me:

"You linked Lonelypages on meta with a note that the bug is still reproducible:

https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500

But when I click through, I don't see the problem, which is to say, I'm able to click through on the following page of 500 results, and the next, with the last result being the 1948th. Am I failing to notice something?

On the other hand, I do see the problem in the Spanish wikipedia:

http://es.wikipedia.org/w/index.php?title=Especial:P%C3%A1ginasHu%C3%A9rfanas&limit=500&offset=4500

Here you can see the that the results are limited to 5,000 which falls something short of the letter "C"."

I thought the bug was reproducible on meta when I posted that comment on the 9th, but now I cannot reproduce it there. However, I can, like Riley, reproduce it on Spanish Wikipedia.

bzimport added a comment.Via ConduitNov 28 2011, 7:39 PM

shunpiker wrote:

Ok, will rewrite patch for trunk. Thanks!

bzimport added a comment.Via ConduitMay 16 2012, 7:57 PM

sumanah wrote:

Riley, when you rewrite the patch, please submit it directly into Git, our source control system. You can do that with developer access: https://www.mediawiki.org/wiki/Developer_access

Thanks!

bzimport added a comment.Via ConduitOct 12 2012, 1:32 AM

sumanah wrote:

Riley, were you able to rewrite for trunk?

Aklapper added a comment.Via ConduitNov 26 2012, 1:46 PM

Riley: Did you manage to rewrite the patch? If not, how could we help?

Add Comment