Page MenuHomePhabricator

PAGESINCATEGORY should decode HTML entities of input - if {{PAGENAME}} contains ' or " it will display 0
Open, NormalPublic

Description

From the linked URL:


{{PAGESINCATEGORY:{{PAGENAME}}}} doesn't work if {{PAGENAME}} contains ' or " or other characters accepted in Mediawiki page names but that are unexpectedly returned as HTML-encoded (in tht case it will display 0).

It is {{PAGENAME}} which is the cause of the problem, because it works when you change it with the title in clear text.


This can be tested for https://www.mediawiki.org/wiki/Category:Chris_G%27s_botclasses

Indeed, if I use {{subst:PAGENAME}} and hit "show changes", I see it's being substituted as "Chris G's botclasses".

I don't know why {{subst:PAGENAME}} is giving HTML encoded entities as output, but that's odd. Since fixing this may break things, PAGESINCATEGORY should check for HTML entities and decode them to check for pagename, just as it was done in bug 35628.


This bug was reported already several years ago (long before Phabricator emerged) and never corrected. It caused various bugs notably with templates that are used to detect redirected categories, or other tracking categories : PAGESINCATEGORY is used to count the number of pages remaining in those tracked categories nad recategorize them appropriately depending on their state (this is used for maintenance).

For now the work around is to pass the result of {{PAGESNAME}} (or {{SUBPAGENAME}} and similar) as the only parameter of {{titlepage: }} in order to HTML-decode those entities before passing the result to {{PAGESINCATEGORY: }}, but there are also specific issues because sometimes a category name (or subpage name) also starts by an existing namespace and {{titlepage: }} will transform those titles by "canonicalizing" them, such as replacing namespace names, changing the capitalization or substituting some initial sequences with . and .. (there's currently no other builtin MediaWiki function to HTML-decode some strings without performing other transforms: prior to use PAGEINCATEGORY, we must first make sure that we are effectively in a Category namespace, then make sure we use the {{FULLPAGENAME}} prior to HTML-decoding it and finally dropping the initial namespace prefix that {{PAGESINCATEGORY:}} rejects in its parameter, even if it is "Category:"; however there's no simple way in MediaWiki to drop the namespace without using once again the PAGENAME parser function which will reconvert the HTML-decoded characters to the HTML-encoded form)

The simplest solution is then to change the PAGESINCATEGORY parserfunction to HTML-decode its parameter (another solution would be to fix the PAGENAME parser function so that it will never HTML-encode its returned value (something that it should have never done, but this is like this since so many years in lot of MediaWiki versions that it will be difficult to reverse it : fixing PAGESINCATEGORY will be much simpler, simply because there's no valid MediaWiki category name that can contain litteral ampersands in their name).

In summary, fix {{PAGESINCATEGORY:}} to process its parameter exactly like what is done for the {{#ifeq:}} or {{#switch:}} parser functions.

  • If after tests this solution causes compatibility problems, then provide a new parser function that will HTML-decode an input parameter, so that we can safely fix the templates that need to need {{PAGESINCATEGORY:{{PAGENAME}}}} or {{PAGESINCATEGORY:{{PAGENAME: Some category name}}}}.

For the list of characters to decode, look at the documentation of the {{PAGENAMEE}} encoding in MediaWiki-wiki that compare the various encodings used (and documents this issue since several years):

https://www.mediawiki.org/wiki/Manual:PAGENAMEE_encoding


Version: 1.24rc
Severity: normal
URL: https://www.mediawiki.org/wiki/Thread:Project:Support_desk/0_doesnt_work_%28allways%29
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=35628

Details

Reference
bz67196

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 3:29 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz67196.
bzimport added a subscriber: Unknown Object (MLST).

The problem is not alone with pagesincategory, all other parser functions which takes a title have this problem, see also bug 16474.

Verdy_p added a comment.EditedJun 27 2014, 9:53 PM

There's a workaround which is to redecode the parameter of PAGESINCATEGORY with #titleparts.

But I'm still convinced that we should not have to use this trick in wikicode, given that there should not exist any valid category name containing verbatic character entities (it is still possible that they exist, because we have allowed litteral ampersands in pagenames without requring them to be HTML-encoded with named entities, so this causees an ambiguity (but I'm not convinced that we have any valid page name containing verbatim named entities; and not that it's impossible to include verbatic sharp signs "#" so you cannot include verbatim numeric entities).

So what we could do is to HTML-encode quotes, ampersands, and lower-than/greater-than signs, by using numeric entities, instead of named entities (" ' < >), so that they can safely be URL-decoded by PAGESINCATEGORIES (which would continue to treat named entities as verbatim without decoding them automatically like numeric entities.)

Change 145724 had a related patch set uploaded by Brian Wolff:
Have Title::makeTitleSafe decode html entities.

https://gerrit.wikimedia.org/r/145724

It's not just ', " too.

Aklapper renamed this task from PAGESINCATEGORY should decode HTML entities of input - if {{PAGENAME}} contains ' it will display 0 to PAGESINCATEGORY should decode HTML entities of input - if {{PAGENAME}} contains ' or " it will display 0.Jun 12 2015, 3:42 PM
Aklapper set Security to None.
Verdy_p updated the task description. (Show Details)Jun 12 2015, 6:23 PM
Verdy_p updated the task description. (Show Details)
Verdy_p updated the task description. (Show Details)Jun 12 2015, 6:26 PM