CategoryTree is inefficient
Closed, ResolvedPublic

Description

categorytree forces scans of millions of rows (for large categories) to return 0 row datasets quite often.

this has been discussed in the past a lot, I'll write it down yet again.

there're few fixes, one is by prefixing cl_sortkey with character not used in category names and filtering by it - would need rewriting sortindex only for subset of rows.

another is creating separate columns, creating or replacing indexes, etc - in categorylinks table

another is maintaining subcategories table per category


Version: unspecified
Severity: enhancement

bzimport set Reference to bz23682.
Domas created this task.Via LegacyMay 27 2010, 11:55 AM
Stepro added a comment.Via ConduitMay 27 2010, 2:35 PM

In fact this feature has been disabled today: Please check if it's really necessary to disable this in all projects. Maybe this ist usefull for large projects such as wikipedias, but I don't believe this feature must be kicked off in the small ones.

Krinkle added a comment.Via ConduitMay 27 2010, 2:39 PM

On the wiki's it is disabled it would be nice for the extension to contain some dummy code or empty string return to prevent errors all over the place.

This extension is/was widely used. And to discourage it from being unlinked everywhere (since we optimists all asume it might be effecient one day) a blank return on the wiki's it's disabled on is better then an unexpected red error message or raw return of the <categorytree /> syntax

bzimport added a comment.Via ConduitMay 27 2010, 2:44 PM

ayg wrote:

We also want to mark subcategories specially in categorylinks so that we can page subcategories separately from actual category members. Currently in a large category, you'd never find any subcategories, because they're on page 236 under "D" or whatever. People hack around this by setting manual sort keys like " ".

Domas added a comment.Via ConduitMay 27 2010, 2:47 PM

Stepro,
small problems doing inefficient stuff means there are thousand projects doing inefficient stuff :(

Aryeh,
sure, we could just use automatic sort key hacks too - or have separate column for this or ... (I prefer sortkey hacks :-)

Krinkle,
good idea - do you have any placeholder code for filling in extension parser tags with ""?

bzimport added a comment.Via ConduitMay 27 2010, 3:35 PM

ayg wrote:

I also prefer sortkey hacks. Much easier to deploy.

tstarling added a comment.Via ConduitMay 28 2010, 1:21 AM

Thanks Domas, your point is taken. I've re-enabled CategoryTree so that we can have functional wikis while a solution is being developed.

bzimport added a comment.Via ConduitMay 28 2010, 1:36 AM

dpld wrote:

Thanks Tim for putting common sense first and re-enabling the extension until a real solution is provided.

tstarling added a comment.Via ConduitMay 28 2010, 2:18 AM

Proposed fix for the problem of expensive queries returning zero rows in r66987.

Domas added a comment.Via ConduitMay 28 2010, 5:16 AM

I knew it!!! :-)

zero rows is just one of cases, there are other expensive queries that return 10, 20, 30 rows - all with low selectivity. I have no idea why anyone thinks that is a feasible fix.

Philippe-WMF added a comment.Via ConduitMay 28 2010, 5:29 AM

I don't know the technical issues here, but from a user perspective, disabling this without notification when it's widely used across many wikis is less than ideal. If there are performance issues, wouldn't it be best to work them out and put a change in... not just turn it off, unless there is critical reason to do so.

Domas added a comment.Via ConduitMay 28 2010, 5:40 AM

Philippe, developers should've known that subcategory fetches are absolutely inefficient (and features like that have been disabled quite a few times, starting with simple categories listing, then API/ajax, then this).

Instead of fixing the root problem, some people prefer to hide this functionality in 'extensions', probably expecting to have different roll-out and less review/attention (just like with DPL :)

OTOH, users don't announce about their million-page-sized actions beforehand, do they?

tstarling added a comment.Via ConduitMay 28 2010, 7:33 AM

(In reply to comment #9)

I knew it!!! :-)

zero rows is just one of cases, there are other expensive queries that return
10, 20, 30 rows - all with low selectivity. I have no idea why anyone thinks
that is a feasible fix.

I didn't say it was a feasible fix for the whole bug, just that it was a feasible fix for the case where there are zero subcategories.

Domas added a comment.Via ConduitMay 28 2010, 7:44 AM

makes sense! :) Thanks Tim! This would cover some edge cases like 'Living people', until, of course, there's a single subcategory there.

Ilmari_Karonen added a comment.Via ConduitMay 28 2010, 7:44 PM

(In reply to comment #11)
I remember asking before at bug 19640 if this was an issue with CategoryTree too, but never got a response there. Apparently it is, then. Maybe now that Brion is no longer CTO, someone could finally implement the sortkey hack you proposed? ;)

tstarling added a comment.Via ConduitJun 1 2010, 9:09 AM

I've committed r67179 now, which comes closer to a complete workaround. I'll deploy it shortly.

For a final fix, we should aim to have a carefully considered design, possibly including solutions for bug 164, bug 1211, etc.

bzimport added a comment.Via ConduitJul 21 2010, 9:04 PM

ayg wrote:

I'll be working on this bug. I hope to have a (proper) solution coded up within a couple of weeks. I wrote a post to wikitech-l about it, and encourage people to respond there rather than here (since this involves several bugs):

http://lists.wikimedia.org/pipermail/wikitech-l/2010-July/048399.html

bzimport added a comment.Via ConduitAug 16 2010, 10:06 PM

ayg wrote:

This should be fixed in r71174, although I don't understand CategoryTree well enough to claim that it's fixed for all uses of the extension. Running the query that MediaWiki now produces, on a category with 1000 subcategories and 1000 other pages in the category:

mysql> SHOW STATUS LIKE 'Handler\_%';
+----------------------------+-------+

Variable_nameValue

+----------------------------+-------+

Handler_commit0
Handler_delete0
Handler_discover0
Handler_prepare0
Handler_read_first0
Handler_read_key0
Handler_read_next0
Handler_read_prev0
Handler_read_rnd0
Handler_read_rnd_next62
Handler_rollback0
Handler_savepoint0
Handler_savepoint_rollback0
Handler_update0
Handler_write60

+----------------------------+-------+
15 rows in set (0.00 sec)

mysql> SELECT page_id,page_namespace,page_title,page_is_redirect,page_len,page_latest,cl_to,cl_from,cat_id,cat_title,cat_subcats,cat_pages,cat_files FROM page JOIN categorylinks ON ((cl_from = page_id)) LEFT JOIN category ON ((cat_title = page_title AND page_namespace = 14)) WHERE cl_to = 'Test' AND cl_type = 'subcat' ORDER BY cl_type, cl_sortkey LIMIT 200;
...snip...

mysql> SHOW STATUS LIKE 'Handler\_%';
+----------------------------+-------+

Variable_nameValue

+----------------------------+-------+

Handler_commit1
Handler_delete0
Handler_discover0
Handler_prepare0
Handler_read_first0
Handler_read_key404
Handler_read_next199
Handler_read_prev0
Handler_read_rnd0
Handler_read_rnd_next62
Handler_rollback0
Handler_savepoint0
Handler_savepoint_rollback0
Handler_update0
Handler_write60

+----------------------------+-------+
15 rows in set (0.00 sec)

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.