$wgUseCategoryBrowser generates many dupes
OpenPublic

Description

Author: srainwater

Description:
I turned on $wgUseCategoryBrowser and discovered it displays a very large number of duplicate entries. I'm using this on a large wiki (Camera-Wiki.org) with several thousand pages and hundreds of categories. In some cases it displays the top level category entry as many as 10 or 20 times and many categories are display 3 to 5 times.

Seems like a simple fix to add code to filter out duplicates. If someone can point me to the appropriate piece of code I'd be happy to provide a patch.

Here's a typical display from the bottom of one page in our wiki:

Root category
Root category
Root category
Root category
Root category
Root category
Root category
Root category
Root category > Cameras
Root category > Cameras
Root category > Cameras > Cameras by first letter > B
Root category > Cameras > Cameras by first letter > C
Root category > Cameras > Medium format > 127 film
Root category > Companies > Camera makers
Root category > Countries > Italy
Root category > Countries > Italy > Bencini
Root category > Imaging media > Film > Film formats
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Special categories
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Hidden categories > Image by AWCam
Root category > Templates > Wiki > Hidden categories > Image by Dirk HR Spennemann
Root category > Templates > Wiki > Hidden categories > Image by Rick Soloway
Root category > Templates > Wiki > Hidden categories > Image by jgs4309976


Version: 1.18.x
Severity: normal

bzimport added a project: MediaWiki-Categories.Via ConduitNov 22 2014, 12:04 AM
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz33614.
bzimport created this task.Via LegacyJan 9 2012, 8:47 PM
bzimport added a comment.Via ConduitJan 9 2012, 9:08 PM

srainwater wrote:

Found a fairly trivial fix for this. In Skin.php, I added an array_unique() to the explode(). the line was:

$tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) );

I changed it to:

$tempout = array_unique( explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ) );

The only drawback now is that it still displays hidden categories, which doesn't seem right. Probably a separate bug however.

Here's the current output from the same page as show in initial comment:

Root category
Root category > Cameras
Root category > Cameras > Cameras by first letter > B
Root category > Cameras > Cameras by first letter > C
Root category > Cameras > Medium format > 127 film
Root category > Companies > Camera makers
Root category > Countries > Italy
Root category > Countries > Italy > Bencini
Root category > Imaging media > Film > Film formats
Root category > Special categories
Root category > Templates > Wiki > Flickr image
Root category > Templates > Wiki > Hidden categories > Image by AWCam
Root category > Templates > Wiki > Hidden categories > Image by Dirk HR Spennemann
Root category > Templates > Wiki > Hidden categories > Image by Rick Soloway
Root category > Templates > Wiki > Hidden categories > Image by jgs4309976

bzimport added a comment.Via ConduitJan 9 2012, 10:24 PM

srainwater wrote:

Upon further thought, there's still redundancy here. For example:

If the page is in:

A > B > C > D

There's really no point in also displaying these lines:

A
A > B
A > B > C

As they're all included in D. They're not really paths to the given page anyway. Really what's wanted is a list of unique paths through the hierarchy to the given page. There's no need to provide additional paths to each point along the way. If that makes sense.

MarkAHershberger added a comment.Via ConduitJan 11 2012, 5:37 PM

Adding patch keyword for solution in comment #1

hashar added a comment.Via ConduitJan 12 2012, 10:38 PM

If the page is in:

A > B > C > D

Well the whole idea of category browser is to put the article in D category and skipping A,B,C :-b

array_unique() works there. But it is on display. We should be able to filter before rendering, I E when building the category tree.

Technical13 added a comment.Via ConduitMar 12 2012, 2:26 AM

I've also noticed that the hiddencats display regardless of the status of the Show Hidden Categories checkbox in user preferences. Need a way to actually hide the hidden cats..

bzimport added a comment.Via ConduitMar 12 2012, 2:43 AM

ken wrote:

Thanks for this bug report! I thought for sure I had something in my wiki configured incorrectly.

I ended up hacking my 1.17 wiki to fix this. I replaced this line from includes/Skin.php:

$tempout = array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ));

with this:

if ($wgUser->getBoolOption( 'showhiddencats' )) {

$tempout = array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) ));

}
else {

$tempout = preg_grep( "/Hidden categories/", array_unique(explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) )), PREG_GREP_INVERT );

}

bzimport added a comment.Via ConduitMar 12 2012, 2:49 AM

ken wrote:

sorry, I pasted the wrong line for the "original" line. The original line is this (it does not have array_unique in it):

$tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) );

Bawolff added a comment.Via ConduitMar 12 2012, 2:10 PM

(In reply to comment #6)

Thanks for this bug report! I thought for sure I had something in my wiki
configured incorrectly.

I ended up hacking my 1.17 wiki to fix this. I replaced this line from
includes/Skin.php:

Glad to hear you got this working on your wiki.

(In response more to the patch keyword added by others then to your comment) We can't directly incorporate your code into core MediaWiki since there is no guarantee that the hidden category's name is actually "Hidden categories" (with i18n and all).

Ideally this filtering would be done when querying the db/building the list of categories, as opposed to after the fact.

Technical13 added a comment.Via ConduitMar 12 2012, 2:35 PM

Correct me if I am wrong, but wouldn't it be feasible to replace:

$tempout = explode( "\n", $this->drawCategoryBrowser( $parenttree, $this ) );

With this:

if ($wgUser->getBoolOption( 'showhiddencats' )) {

$tempout = array_unique(explode( "\n", $this->drawCategoryBrowser(

$parenttree, $this ) ));
}
else {

$tempout = preg_grep( "/MediaWiki:Hidden-categories/", array_unique(explode( "\n",

$this->drawCategoryBrowser( $parenttree, $this ) )), PREG_GREP_INVERT );
}

So that instead of specifically specifying it as the hidden category's name as "Hidden categories" you have it refer to the MediaWiki page that the name is actually set on?

MarkAHershberger added a comment.Via ConduitMar 14 2012, 5:50 PM
$tempout = preg_grep( "/MediaWiki:Hidden-categories/",

can't be used:

We can't directly incorporate your code into core MediaWiki
since there is no guarantee that the hidden category's name is
actually "Hidden categories" (with i18n and all).

Bawolff added a comment.Via ConduitMar 14 2012, 6:00 PM

preg_grep( "/" . preg_quote( wfMsgForContent( "MediaWiki:Hidden-categories" ), "/" ) ."/", ...

Would work, which is what I believe you were trying to get at (In theory anyways, I haven't tested it). However, I think it would be preferable to look for the cat_hidden prop in page_props table when doing the actual db query.

Technical13 added a comment.Via ConduitMar 14 2012, 6:12 PM

(In reply to comment #11)

preg_grep( "/" . preg_quote( wfMsgForContent( "MediaWiki:Hidden-categories" ),
"/" ) ."/", ...

Would work, which is what I believe you were trying to get at (In theory
anyways, I haven't tested it). However, I think it would be preferable to look
for the cat_hidden prop in page_props table when doing the actual db query.

That was what I was trying to get at.. That way, it wouldn't matter what the hiddencat name actually was, as it would be defined correctly in all instance on that page anyways.

bzimport added a comment.Via ConduitJun 20 2012, 7:24 PM

balano wrote:

I was also seeing duplicates in my small (1000 non-stubs) MW 1.18.2. I believe at least some of the duplicates are coming because the category browser drops the bottom level category off some entries. I've documented this behavior at

http://www.mediawiki.org/w/index.php?title=Help:Categories&stable=0&shownotice=1&fromsection=Adding_a_page_to_a_category#Adding_a_page_to_a_category

and I repeat it here:

(At least in MediaWiki 1.18.2) if a category is a subcategory of more than one parent, both hierarchies will be listed, but the tagged category will be stripped off all but one of these. This creates the potential for what appear to be duplicate entries if a category with multiple parents and one of its parents are both tagged on a page. For example suppose Maryanne is a subcategory of both Mary and Anne. If a page tags categories Maryanne and Anne then the Category breadcrumbs will show

Anne
Anne
Mary -> Maryanne

"Anne" appears to be duplicated, but what is meant is

Anne
Anne -> Maryanne
Mary -> Maryanne

Liuxinyu970226 added a subscriber: Liuxinyu970226.Via WebJan 11 2015, 3:49 AM

Add Comment