PAGESINCATEGORY should differentiate between pages and subcategories
Closed, ResolvedPublic


Author: londenp

This magic word PAGESINCATEGORY, which is very useful, counts the number of articles and subcategories in a category.

I don't know if this is wished for functionality for this magic word, but it would help a lot of

a) <nowiki>{{Pagesincategory:category}}</nowiki> would not count the subcategories of a certain category
b) a new magic word is created which will count only the amount of articles in a certain category next to the existing magic word.

I did not found a bug/feature request about this, but if there is already one: sorry about this one.


Version: unspecified
Severity: enhancement


bzimport set Reference to bz14237.
bzimport created this task.May 23 2008, 3:21 PM

Wiki.Melancholie wrote:

*** Bug 13691 has been marked as a duplicate of this bug. ***

londenp wrote:

It seems that bug 13691 is not an exact duplicate of this bug, although about the same magic word. That bug says it is been resolved, but then this bug is turned into a feature request for a new magic word, so that is b) in above comment.


redekopmark wrote:

What about a new magic word {{ARTICLESINCATEGORY}} that would only display the number of mainspace pages in a category? similar to the differences between NUMBEROFPAGES NUMBEROFARTICLES

I think ARTICLES should never be used in magic words. PAGES should be used instead.

I think the current behaviour is confusing. I could imagine PAGESINCATEGORY only reporting the number of pages in a category, excluding FILES and CATEGORIES. This implies there would be 4 magic words to report on either all category members (MEMBERSINCATEGORY), files in category (FILESINCATEGORY), categories in category (CATEGORIESINCATEGORY), and pages in category (PAGESCATEGORY).

  • Bug 15645 has been marked as a duplicate of this bug. ***

I'm opposed to Siebrand's view. Pages for me are any pages, including subcategories, files, talk pages, articles, project pages. They only differ by the namespace in which they reside, and there are possibly many other namespaces (don't assume that all wikis will behave like Wikipedia).

If you want to have counts be namespace, then what would be needed is a two parameter syntax like:


to make the restriction (the same magic keyword can be used, to provide separate counts for each namespace).

or even possibly like
if you want to include a list of several namespaces to include in the count.

The existing difference between NUMBEROFPAGES and NUMBEROFARTICLES does not rely on namespace differenciation but on statistical parameters (notably the page size, excluding included templates).

Introducing the term "member" will just add more confusion.

demon added a comment.Sep 30 2010, 3:19 PM
  • Bug 21822 has been marked as a duplicate of this bug. ***
demon added a comment.Sep 30 2010, 3:19 PM
  • Bug 25376 has been marked as a duplicate of this bug. ***
demon added a comment.Sep 30 2010, 3:19 PM

Duping both of those bugs to this. Implementation per comment 6 (or similar) would solve all of these bugs at once.

note that multiple parameters for the syntax I propose may be reduced to just one:
where restriction may be:

  • "" : no namespace id at all, useful to add namespaces
  • "*" : all namespace ids (the default), useful to remove namespaces

followed by one or more of:

  • "+id" : add this namespace id to the current list
  • "-id" : remove this namespace id from the list

if the restriction does not start by "*" or "+" or "-", then "+" is implied
The namespace id could be either the numeric id, or a selector like "talk" to select all talk namespaces, and "subject" to select all subject namespaces.

The namespace id can then take the forms:

  • an integer, the raw namespace number
  • a name, a namespace name (converted to a namespace id, should recognize the synonyms, notably localized names or English names, or site-specific names)
  • "odd": all odd namespace ids (i.e. "talk" namespaces associated to any subject namespace)
  • "even": all even namespace ids (i.e. "subject" namespaces)

For example:

  • {{PAGESINCATEGORY:categoryname|*}} : equivalent to {{PAGESINCATEGORY:categoryname|}} and to {{PAGESINCATEGORY:categoryname}} (existing syntax)
  • {{PAGESINCATEGORY:categoryname|:}} : count only pages of the main namespace, that are members of the specified category name
  • {{PAGESINCATEGORY:categoryname|0}} : count only pages of the main namespace, that are members of the specified category name ; equivalent to {{PAGESINCATEGORY:categoryname|+0}}
  • {{PAGESINCATEGORY:categoryname|+project:+talk}} : count only pages of the "project:" or of any talk namespaces, that are members of the specified category name ; equivalent to {{PAGESINCATEGORY:categoryname|+0}}
  • {{PAGESINCATEGORY:categoryname|-talk}} : count all pages of any namespace excluding the talk namespaces (odd ids) that are members of the specified category name; equivalent to {{PAGESINCATEGORY:categoryname|*-talk}}

The restriction can easily be implemented as WHERE clauses in the SQL select that will match the specified namespace ids, combined as a parenthetic list of 'OR id=value' (positive selections), followed by a list of exclusions with 'AND NOT id=value' (negative selections), and possibly with the "IN" operator if sets are available in the SQL syntax.

Some ideas about the SQL server-side cost of counting members in a specific category:

The SQL cost should with the restrictions above will be either the same (or better) as performing a select without the namespace restriction (because this is just a restriction of the existing syntax, and this should never reduce the selectivity of the SQL query, but may in fact help to improve it).

However, this means that the existing restriction (for costly parser functions) should remain (because counting pages that are members of a category, independantly of which namespace they belong may be costly in very populated categories, depending on how members of categories are indexed).

As this cost is effectively the cost of a:

SELECT COUNT(*) from categorymembers
WHERE category_pageid = $CATEGORYPAGEID
AND member_namespaceid = $CATEGORYNAMESPACEID

aggregate (note: I don't know the exact schema impelementation which varies across Mediawiki versions, so replace the table names and column names appropriately), one way to solve it would be to use:

SELECT 1 from categorymembers
WHERE category_pageid = $CATEGORYPAGEID
AND member_namespaceid = $CATEGORYNAMESPACEID

and then let the PHP code count the returned "1" rows: if there are 50 rows, then the category is too much populated, and COUNT(*) may take time, so the function can be considered costly. If the cost limit is reached, just return this limit value to the page calling the function, otherwise perform the same select, replacing "SELECT 1" by "SELECT COUNT(*)" (without the LIMIT clause) to return the exact value, or return the last known estimate from a separate caching aggregate table that will be updated separately (using a max timestamp of validity), to avoid reusing the same aggregate repetitively because of templated pages using this function and frequent accesses by many users viewing or editing various pages.

The value specified in the "LIMIT" clause above (here "50") may be tuned; and this first check (for performance) may be removed completely, or removed if the SQL schema includes an index that precompute aggregates for counting members in each specific category (in which case there will not be any need to perform a SELECT COUNT(*) aggregate, given that the count will be retrieved directly from a precomputed aggregate caching table, that should be updated asynchronously, either as a batch, or when the selective SELECT in the cache detects that the stored value is out of date, in which case it will perform the SELECT COUNT(*) from the non-cached table, just to update the caching table and its timestamp).

philippe.vigneau wrote:

I don't know if the index on the two columns (category_pageid, member_namespaceid) exists on the table categorymembers, but it seems to me that is the only thing that may be added in the database... performance can only be better...

so when this improvment will be done ?...

I would love to see this one implemented. I was just looking up how to count files in a directory (excluding sub-directories) when I learn that you can not. I was hoping to use that to allow commons template [[commons:Template:MetaCat]] to list metacategories (categories which should contain only other categories) with files.

test5555 wrote:

For files, see Bug 21822

a patch commited with gerrit 12790

successfully merged

You can use {{PAGESINCATEGORY:catname|subcats}} or {{PAGESINCATEGORY:catname|subcats|R}} or {{PAGESINCATEGORY:catname|R|subcats}}
to get the count of subcats in the category or with 'pages' to get the count of pages.

Add Comment