Page MenuHomePhabricator

An extension to create list of most recent articles in 1-3 categories.
Closed, ResolvedPublic

Description

Author: wmf.amgine3691

Description:
An extension to sort articles for the most recent n articles which are in
category x [AND category y [AND category z]].

The extension is vital to automate the main page of Wikinews, to allow
contributors to concentrate on producing articles rather than maintaining a very
rapidly aging news list. It can allow the most current versions of articles to
appear while they are news. Additional applications exist for Wikipedias.

This version has been tested on small-scale installations of mediawiki, 1.4b3
and 1.4b5.

Source:

<pre>
<?php
/*

Contributors: n:User:Amgine, n:User:IlyaHaykinson

To install: add following to LocalSettings.php

include("extensions/dynamicpagelist.php");

*/

$wgExtensionFunctions[] = "wfDynamicPageList";

function wfDynamicPageList() {

global $wgParser;

$wgParser->setHook( "DynamicPageList", "DynamicPageList" );

}

The callback function for converting the input text to HTML output

function DynamicPageList( $input ) {

global $wgScriptPath, $wgServer, $wgUser;

$aParams = array();

$sTok = strtok($input, "\n");
while ($sTok)
{
  $aParams[] = $sTok;
  $sTok = strtok("\n");
}

foreach($aParams as $sParam)
{
  $aParam = explode("=", $sParam);
  $sType = $aParam[0];
  $sArg = $aParam[1];
  if ($sType == 'category')
  {
    $sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg);
    $sCatName = str_replace('\'','\\\'',$sCatName);
    $aCategories[] = $sCatName;
  }
  else if ('count' == $sType)
  {
    $iCount = (1 * $sArg);
  }
}

$iCatCount = count($aCategories);
if ($iCatCount < 1)
  return "!!too few categories!!";
if ($iCatCount > 3)
  return "!!too many categories!!";
if ($iCount < 1)
  $iCount = 1;
if ($iCount > 50)
  $iCount = 50;

$sSql = 'SELECT cur_namespace, cur_title FROM cur';
for ($i = 0; $i < $iCatCount; $i++) {
  $sSql .= ', categorylinks AS c' . ($i+1);
}

$sSql .= ' WHERE 1=1 ';

for ($i = 0; $i < $iCatCount; $i++) {
  if ($i > 0)
    $sSql .= ' AND c1.cl_from = c'.($i+1).'.cl_from';
  $sSql .= ' AND c'.($i+1).'.cl_to = \''.$aCategories[$i].'\'';
}

$sSql .= ' AND cur_id = c1.cl_from ORDER BY cur_timestamp DESC LIMIT 0,' .

$iCount;

//$output .= $sSql . "<br />";                                             
                                  

# process the query                                                        
                                  
$res = wfQuery($sSql, DB_READ);

$sk =& $wgUser->getSkin();

$output .= "<ul>\n";

# process results of query                                                 
                                  
while ($row = wfFetchObject( $res ) ) {
    $title = Title::makeTitle( $row->cur_namespace, $row->cur_title);
    $output .= '<li>' . $sk->makeLinkObj($title) . '</li>' . "\n";
}
$output .= "</ul>\n";

return $output;

}
?>
</pre>


Version: unspecified
Severity: enhancement
URL: http://www.ilya.us/wiki/index.php?title=DynamicPageList_extension

Details

Reference
bz1411

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:10 PM
bzimport set Reference to bz1411.
bzimport added a subscriber: Unknown Object (MLST).

wmf.amgine3691 wrote:

The extension, updated 1-jan-05.

The extension has been tested on Mediawiki 1.4b3 and 1.4b5

attachment dynamicpagelist.php ignored as obsolete

jeluf wrote:

I think this line

$sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg);

would break in UTF8 wikis. PHP considers e.g. 0xA0 as whitespace, but 0xA0 is
also the second byte
of the cyrillic character P. Do you really need "any whitespace" here, or just
"blank"?

mediazilla wrote:

(In reply to comment #2)

I think this line

$sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg);

would break in UTF8 wikis. PHP considers e.g. 0xA0 as whitespace, but 0xA0 is
also the second byte
of the cyrillic character P. Do you really need "any whitespace" here, or just
"blank"?

Good point. Both preg_replace lines are now instead:

$title = Title::makeTitle('',$sArg);
$sCatName = wfStrencode($title->getDbKey(), DB_READ);

updated version at http://www.ilya.us/wiki/index.php?title=DynamicPageList_extension

jeluf wrote:

Some more remarks:

if ($sType == 'category')
 ...
else if ('count' == $sType)

Could you use MagicWord's for these? Some languages prefer to use localized
versions of those tags, e.g. when categories aren't named "Category" (Perhaps
we should extend the extension framework to allow localized versions of tags, too??)

return "!!too few categories!!";

Use wfMsg for any output strings, so that the messages can be translated.

if ($iCatCount > 3)

I don't like that hardcoded limit. I'm not sure we need a limit, will benchmark
this tonight. If we need it, it should be an option.

PS: I changed the severity to "enhancement", since this is a new feature.

mediazilla wrote:

(In reply to comment #4)

Some more remarks:

if ($sType == 'category')
 ...
else if ('count' == $sType)

Could you use MagicWord's for these? Some languages prefer to use localized
versions of those tags, e.g. when categories aren't named "Category" (Perhaps
we should extend the extension framework to allow localized versions of tags,

too??)

I disagree for three reasons.

  1. The current magic word architecture isn't meant for extensions as much as

pages and the Parser.

  1. "category" and "count" are internal commands for the extension only. the

namespace for category can still be localize in the wiki (i.e. category=blah
will map to the category named blah no matter what the namespace is)

  1. Other extensions (i.e. easytimeline, which is deployed on Wikipedia) also do

not localize their internal commands.

return "!!too few categories!!";

Use wfMsg for any output strings, so that the messages can be translated.

Done.

if ($iCatCount > 3)

I don't like that hardcoded limit. I'm not sure we need a limit, will benchmark
this tonight. If we need it, it should be an option.

Done. Now uses parameters near the top of the function. I still think it's
better to have a limit, otherwise we run the risk of DOS via queries with
immense amounts of joins.

I've updated the code on the web site. I urge those with proper powers to please
deploy this asap if there are no more objections.

mediazilla wrote:

Comment on attachment 230
The extension, updated 1-jan-05.

<?php
/*

Purpose: outputs a bulleted list of most recent

		items residing in a category, or a union		        
		of several categories.

Contributors: n:User:Amgine, n:User:IlyaHaykinson

To install: add following to LocalSettings.php

include("extensions/dynamicpagelist.php");

*/

$wgExtensionFunctions[] = "wfDynamicPageList";

function wfDynamicPageList() {

    global $wgParser, $wgMessageCache;

    $wgMessageCache->addMessages( array(
					'dynamicpagelist_toomanycats' =>

'DynamicPageList: Too many categories!',

					'dynamicpagelist_toofewcats' =>

'DynamicPageList: Too few categories!'

					)
				  );

    $wgParser->setHook( "DynamicPageList", "DynamicPageList" );

}

// The callback function for converting the input text to HTML output
function DynamicPageList( $input ) {

global $wgScriptPath, $wgServer, $wgUser;

// parameters							        

//minimum and maximum number of category unions			        
$iMinCategories = 1;
$iMaxCategories = 3;

//minimum and maximum number of results allowed.			        
$iMinResultCount = 1;
$iMaxResultCount = 50;

//whether unlimited results are allowed (when count is ommitted)	        
$bAllowUnlimitedResults = true;

// end params							        


$aParams = array();
$bCountSet = false;

$sTok = strtok($input, "\n");
while ($sTok)
{
  $aParams[] = $sTok;
  $sTok = strtok("\n");
}

foreach($aParams as $sParam)
{
  $aParam = explode("=", $sParam);
  $sType = $aParam[0];
  $sArg = $aParam[1];
  if ($sType == 'category')
  {

$title = Title::makeTitle('',$sArg);
$sCatName = wfStrencode($title->getDbKey(), DB_READ);
$aCategories[] = $sCatName;

}
else if ('count' == $sType)
{

//ensure that $iCount is a number;
$iCount = (1 * $sArg);
$bCountSet = true;

  }
}

$iCatCount = count($aCategories);
if ($iCatCount < $iMinCategories)
  return wfMsg( 'dynamicpagelist_toofewcats' ); // "!!too few

categories!!";

if ($iCatCount > $iMaxCategories)
  return wfMsg( 'dynamicpagelist_toomanycats' ); // "!!too many

categories!!";

if (true == $bCountSet)
{
  if ($iCount < $iMinResultCount)

$iCount = $iMinResultCount;

if ($iCount > $iMaxResultCount)

$iCount = $iMaxResultCount;

}
else
{
  if (false == $bAllowUnlimitedResults)
  {

$iCount = $iMaxResultCount;
$bCountSet = true;

  }
}


//build the SQL query						        

$sSql = 'SELECT cur_namespace, cur_title FROM cur';
for ($i = 0; $i < $iCatCount; $i++) {
  $sSql .= ', categorylinks AS c' . ($i+1);
}

$sSql .= ' WHERE 1=1 ';

for ($i = 0; $i < $iCatCount; $i++) {
  if ($i > 0)

$sSql .= ' AND c1.cl_from = c'.($i+1).'.cl_from';

  $sSql .= ' AND c'.($i+1).'.cl_to = \''.$aCategories[$i].'\'';
}

$sSql .= ' AND cur_id = c1.cl_from ORDER BY cur_timestamp DESC';

if (true == $bCountSet)
{
  $sSql .= ' LIMIT 0,' . $iCount;
}

//DEBUG: output SQL query						        
//$output .= $sSql . "<br />";					        

// process the query						        
$res = wfQuery($sSql, DB_READ);

$sk =& $wgUser->getSkin();

//start unordered list						        
$output .= "<ul>\n";

//process results of query, outputing equivalent of <li>[[Article]]</li>

for each result

while ($row = wfFetchObject( $res ) ) {

$title = Title::makeTitle( $row->cur_namespace, $row->cur_title);
$output .= '<li>' . $sk->makeLinkObj($title) . '</li>' . "\n";

}

//end unordered list						        
$output .= "</ul>\n";

return $output;

}
?>

Please don't paste large amounts of code into comments. It's hard to read,
clutters up the page, and doesn't get formatted correctly. Use the 'Create
attachment' link.

wmf.amgine3691 wrote:

Cleaned up, simplified, renamed to avoid conflict

This is a rewrite to simplify and speed up the previous version. The extension
is renamed as a kluge so I could compare it with the previous on my test
installation.

This version includes:

configurable maximum category searches (may be configured for unlimited,
default 3)
configurable maximum return articles (may be configured to allow unlimited,
default 5)

attachment latestArticles.php ignored as obsolete

mediazilla wrote:

Updated with Amgine's optimizations merged in

Incorporated Amgine's optimizations. Kept greater configurability (min/max
categories, min/max results, and unlimited categories and results are all
parameters).

attachment dynamicpagelist.php ignored as obsolete

Created attachment 275
Updated with some fixes

I've made a few fixes:

  • If a line doesn't contain a "=", avoid triggering a notice error for unset

variables in $aParam[1]

  • Title::makeTitle is not safe for user-provided data as it does no sanitizing;

use Title::newFromText. (This also will allow people to write 'Category:Foo'
and get the expected thing.)

  • If invalid, $title will be null and the getDbKey() call can fail, so skip the

line.

  • Avoid encoding the dbkey so early, it makes the code harder to follow.
  • Use IntVal instead of 1 * $sArg. (Note that multiplication will allow float

values; this may be harmless here but is not really what we meant.)

  • Fixed some tabs -- when using spaces for indentation, try to avoid mixing

tabs as the tab size may not be the same for all editors and it uglifies the
code.

  • Extensions return raw HTML; avoid returning a raw message as this allows

careless or rogue sysops or hijacked sysop accounts to break the wiki (invalid
HTML) or create security risks (JavaScript exploits etc). Use
htmlspecialchars() to force plaintext, or run the message through the wiki
parser. (Used htmlspecialchars for now.)

  • Use booleans as booleans rather than false == and true ==, for readability

and consistency.

  • Switched database calls to the new OO functions.
  • In MediaWiki 1.4 tables may have a configurable prefix; get the canonical

name with $dbr->tableName().

  • For readability, move the 'cur_id=cl.cl_from' clause to the top and eliminate

the 1=1 clause.

  • LIMIT N,N is a MySQL-ism. Since no offset is needed, just use LIMIT N for

portability (use $dbr->limitResults() if needed).

  • If there are no results, the <ul></ul> produced doesn't validate (a list must

contain at least one item). Return an error message instead.

  • Initialize $output before using it, or a notice is thrown when error

reporting is put up high.

  • Use $sk->makeKnownLinkObj() instead of $sk->makeLinkObj(). We know the pages

exist, so we can avoid hitting the database again to check.

You might also consider making the configuration options settable from
LocalSettings.php, so the extension code doesn't have to be altered.

attachment dynamicpagelist.php ignored as obsolete

wmf.amgine3691 wrote:

Moved variables to LocalSettings.php

Moves configuration variables to LocalSettings.php

attachment dynamicpagelist.php ignored as obsolete

wmf.amgine3691 wrote:

Correct version - with the changes implemented and not just doodling...

Correctly implemented the parameter variables.

attachment dynamicpagelist.php ignored as obsolete

bugzilla_wikipedia_org.to.jamesd wrote:

Not good enough on the database side. Try explain select for a set of three
categories. You'll find that the explain has the magic words "using filesort".
That translates as "every record in the category will be retrieved, I'll sort
them, then I'll return the number matching the limit". That is, it scales very
badly.

To avoid this there are several approaches you can use. First, most important,
is to arrange to have the key matching the order by. If you get records from,
say, recent changes, you can eliminate records not matching the category pretty
quickly and recent changes is usually very well cached, so it's not too painful
to scan back in tiemstamp order to find matching entries. Getting distinct hits
might be an issue - need to see what makes the limit effective.

Next, try a union with each select in the union subject to the limit and a final
overall limit. Because of the way MySQL before version 5 handles indexes this
can be substantially faster when a multipart index is used and you're using
different values of the leading key parts.

avarab wrote:

Comments (note that I've only read the code, not executed it):

  • You're using extension input as paramaters, don't, write it for 1.5 where we

have them in the arguments (like <DynamicPageList category="foo"
count="5"></DynamicPageList>

  • It'll only work with the 1.4 schema
  • if there's no row returned at line 158 the output will be <ul>\n</ul> which is

invalid XHTML

wmf.amgine3691 wrote:

version 2.0

  • Add unset cache
  • break out parameter parsing, query build, output build
  • Add category OR
  • Add namespace OR
  • expand error handling

Most new content adapated from DPL2 by w:de:Benutzer:Unendlich (Fabian)

Attached:

wmf.amgine3691 wrote:

Comment on attachment 1128
version 2.0

<grumble>

wmf.amgine3691 wrote:

Diff version 1.9, 2.0

Diff as per IRC suggestion

Attached:

Latest attachment is very awkwardly written; use an object instead of passing
this associative array.

ayg wrote:

Wasn't this long ago enabled on Wikinews?

Long-since incorporated to the features of DPL.