Page MenuHomePhabricator

Magic word for number of items in a category
Closed, ResolvedPublic


Author: ernestcline

It would be very handy for things like Tables of Contents, if there was a way
for them to be rendered only if the number of articles and subcategories was
sufficent to cause the the category to span more than one page. If there was a
variable for the number of items (and or pages) in a category, then the existing
#if functionality would enable this to be done without any other changes.

Version: unspecified
Severity: enhancement



Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:18 PM
bzimport set Reference to bz6943.
bzimport added a subscriber: Unknown Object (MLST).

Jmax.code wrote:

For what it's worth, this feature could be used in many applications, including
keeping the count of featured articles on the main page (on en wikipedia -- this
is currently implemented using a bot I coded, en:User:Jmax-bot).

I will attach a patch that implements this. It is a bit slow, as it is
recursive. But, it seems to work fine on my machine (p4-class celeron, 1G ram,
low load). I'm not sure that I've implemented it perfectly, as I'm still
getting used to the mediawiki framework. Please be gentle!

Jmax.code wrote:

implemented PAGESINCATEGORY magicword

attachment 6943.diff ignored as obsolete

Jmax.code wrote:

non-recursive PAGESINCATEGORY magicword

Updated patch -- this one does *not* recurse (and actually works, heh)

attachment 6943.diff ignored as obsolete

argento3k wrote:

*** Bug 4540 has been marked as a duplicate of this bug. ***

argento3k wrote:

I have a question. Does this magicword sum the number of articles in the
subcategories of the selected category?

For example in this case
10 pages with ONLY the category [[Category:Coats of arms]]
10 pages with ONLY the category [[Category:Blasons]]
[[Category:Blasons]] is a subcategory of [[Category:Coats of arms]]

the {{PAGESINCATEGORY}} magicword value is 10 or 20?

If articles in subcategories are not counted, it would have some sense to
implement another magicworld with the total sum, something like

ayg wrote:

"Non-recursive" = "doesn't count subcategories".

ernestcline wrote:

The problem with a recursive version would be what to do with a category loop?
Detecting loops would likely be expensive.

dan.bolser wrote:

The problem with a recursive version would be what to do with a category loop?
Detecting loops would likely be expensive.

Hash map?

This is much easier now that there is a category table that includes a count of all the pages in each category.

Created attachment 4741
PAGESINCATEGORY using category table

New patch that uses the cat_pages field in the category table.

attachment patch2.patch ignored as obsolete

Created attachment 4743
PAGESINCATEGORY using category table

Slight fix to previous patch for pages defined in category table but not containing any pages, now correctly returns 0 instead of the nosuchcategory message.

attachment patch2.patch ignored as obsolete

Created attachment 4748
Yet another patch

This time using existing functions in the Category class to get the page count, possibly more efficient than previous patches that used the selectField wrapper to query the database (if it isn't, use patch number 4).

attachment patch4.patch ignored as obsolete

ayg wrote:

They should be the same, basically, just the Category wrapper makes it more readable. A couple of comments:

  1. This is another one of those nasty parser tags that does a DB query for every time it's included in the page. If no system is devised for batching this sort of thing so it's one query per page render, or at least one query per preprocessor pass, I'd be really leery about committing more stuff like this. We have hacky limits on the #ifexists parser functions or whatever it's called; this should at least have similar to avoid someone making a template that uses it 4500 times and does 4500 queries (albeit very fast queries).
  1. Use hyphens to separate words in new message names: no-such-category, not nosuchcategory.

Created attachment 4750
Patch with limit

Sets a limit on PAGESINCATEGORY calls.

attachment patch6.patch ignored as obsolete

ayg wrote:

+ global $wgPICcount, $wgMaxPICcount;

What are these globals supposed to do? They should be added to DefaultSettings.php with documentation and default values. Don't try to use variables without initializing them. You should be using error_reporting( E_ALL | E_STRICT ) when doing development, and fixing any printouts you get.

For $wgPICcount, I'd go more in line with r27946 and make it a Parser member variable that's cleared on clearState(). Currently it's per-request, so for instance it would be incorrectly high if a job is being run along with the current request. r27977 also showcases a good thing to add here, as do some later commits to that.

Actually, what I would do is make the current ifexist-specific-limit into a global parser variable like mExpensiveFunctionCount, and have both ifexist and this increment it independently. Otherwise the limit of 100 simple queries that should really be combined has now crept up to 200, and then 300, 400, ... The code for ifexist limit tracking is more complicated but much more complete and useful: it provides warnings on preview if you go over the limit, things like that. It would be best if that could be moved into some central Parser methods that all expensive functions can share.

+ if (!$wgMaxPICcount) {
+ $wgMaxPICcount = 100;
+ }

Never do this in PHP < 6. Initialization needs to be unconditional. Otherwise you're opening yourself up to register_globals vulnerabilities: if register_globals in on and $wgMaxPICcount is uninitialized, remote parties can initialize it by just setting &wgMaxPICcount=1000000000 in the query URL. (And if you are going to do it, use isset() rather than checking the variable directly, to avoid an E_NOTICE. This also allows people to set the limit to 0 if they want.)

+ $count = $category->getPageCount();
+ if ($count == '') {
+ return wfMsgForContent('no-such-category');
+ }

Checking for $count == '' is a little odd, since $count will possibly be false and possibly be 0, but never the empty string. !$count would be a more sensible way to write that. Also, is this the best way to handle the error? It seems like returning 0 would make the most sense in both cases (there are, after all, 0 members of the category). So perhaps

			if( $count === false ) {
				$count = 0;

(This seems to highlight a flaw in the design of the Category methods: you have no way to tell the difference between a category that happens not to exist, and one that cannot possibly exist because the name is invalid. I should fix that.)

+ return wfMsgForContent('too-many-pagesincategory', $wgMaxPICcount);

I was going to say this should be 'too-many-pages-in-category', but of course you're correct, it shouldn't be. :)

Overall it looks okay with these fixed. I'd be willing to commit it, barring disapproval from some Higher Power.

I'll do the fixes later, probably tomorrow. If I recall correctly, I used "if ($count == '')" to differentiate between categories that currently have no pages in them but otherwise exist (or are at least defined in the category table) and categories don't actually exist. if ($count === false) will give the same results, so I'll use that.

ayg wrote:

(In reply to comment #16)

I'll do the fixes later, probably tomorrow. If I recall correctly, I used "if
($count == '')" to differentiate between categories that currently have no
pages in them but otherwise exist (or are at least defined in the category
table) and categories don't actually exist.

That doesn't differentiate between them. Remember that in PHP, 0 == '' and '' == false, so it will be true for both scenarios.

I wouldn't distinguish between the two cases, personally. The distinction is a little arcane. A category with 0 pages in it should be treated as nonexistent for most purposes; this is what we've historically done. The category row gets left there instead of being deleted because otherwise you'd be giving the same category different ID's for no real reason, as members are removed and re-added, instead of preserving the ID.

Created attachment 4769
Updated patch

*Now uses Parser member variable mExpensiveFunctionCount

*Changes also to ParserFunctions extension to use mExpensiveFunctionCount instead of ifexist-specific limit.

*Global $wgExpensiveParserFunctionLimit, set by default at 100

*Moves the on-preview over-limit warning, category, and limit report out of the extension and into Parser.php and MessagesEn (changed to say "expensive parser functions" instead of #ifexist).

*PAGESINCATEGORY now returns 0 if no title is passed, a non-existent category is given, or the limit is exceeded. This is close to how ifexist works (returns the "else" text if the limit is reached and could help if its used with other parser functions - {{#ifexpr:{{PAGESINCATEGORY:{{{1|}}} > ... }} won't give a big red expression error if the limit is exceeded or an incorrect title is used.

attachment parserpatch.patch ignored as obsolete

Created attachment 4770
Slight fix

Slight cosmetic fix from previous patch (had a line commented out from testing I forgot to remove).

attachment parserpatch.patch ignored as obsolete

Created attachment 4793
Per Brion

Per Brion in IRC:
if (!$category) {

return 0;

will return 0 for Category:0


Looking good! Applied on r32932.

Wiki.Melancholie wrote:

Please see [[Bug 13691]].

Wiki.Melancholie wrote:

Bug 13691 (this way?)