Categorymembers namespace filtering is inefficient, uses ugly hack in miser mode
Closed, DeclinedPublic

Description

On r53052 domas disabled querying specific namespace if miser mode is enabled.

The query now produce erroneus results.
*It silently skips the page_namespace where clause instead of giving an error.
*It's not documented on api.php nor www.mediawiki.org

Either 'fix' the above issues or implement whatever domas told about how to have efficient subcategory access.


Version: 1.16.x
Severity: normal

bzimport added a project: MediaWiki-API.Via ConduitNov 21 2014, 10:38 PM
bzimport set Reference to bz19640.
Platonides created this task.Via LegacyJul 10 2009, 2:09 PM
bzimport added a comment.Via ConduitJul 10 2009, 7:19 PM

cbm.wikipedia wrote:

This change is now live on the WMF servers, giving results that conflict with the documentation dynamically generated by api.php.

Anomie added a comment.Via ConduitJul 10 2009, 10:02 PM

(In reply to comment #0)

Either 'fix' the above issues or implement whatever domas told about how to
have efficient subcategory access.

It would be nice if any of the rest of us could even '''know''' whatever domas told.

Platonides added a comment.Via ConduitJul 11 2009, 12:15 AM

It would be nice if any of the rest of us could even '''know''' whatever domas
told.

I only know what is in r53052 comment.

Mr.Z-man added a comment.Via ConduitJul 11 2009, 12:29 AM

I made a couple tweaks to the disabling in r53087 so that it would die and give an error, rather than silently ignoring it, but it hasn't yet been synced to Wikimedia.

bzimport added a comment.Via ConduitJul 11 2009, 12:39 AM

herd wrote:

(In reply to comment #3)

> It would be nice if any of the rest of us could even '''know''' whatever domas
> told.

I only know what is in r53052 comment.

From http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20090710.txt :

[19:42:35] <domas> that was ages ago
[19:42:49] <domas> there're few ways how to provide subcategories
[19:43:04] <domas> it is simple
[19:43:09] <domas> you just structure data that way

(more in the log, well... more of domas' brand of humor, heh)

Anomie added a comment.Via ConduitJul 11 2009, 3:17 AM

(In reply to comment #5)

From http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20090710.txt :

[19:42:35] <domas> that was ages ago
[19:42:49] <domas> there're few ways how to provide subcategories
[19:43:04] <domas> it is simple
[19:43:09] <domas> you just structure data that way

(more in the log, well... more of domas' brand of humor, heh)

Isn't the data already structured "that way", i.e. each page is specifically tagged with a namespace number? And in that log, when asked about it, he says the nothing quoted above and then starts complaining about multi-language templates on commons, which seems to have nothing to do with the API's cmnamespace.

I must be missing something big in there, hopefully someone who understands this will come by to enlighten me. And I still have to wonder how something that's been in the code since '''June 2007''' is suddenly such a problem that requires breaking things in the hastiest possible way.

Anomie added a comment.Via ConduitJul 15 2009, 4:02 AM

Created attachment 6341
Patch against r53286 to apply cmnamespace in php rather than sql (and rather than just ignoring it with a warning)

Rather than just ignoring cmnamespace (even with a warning), why not filter by cmnamespace in the PHP code that builds the result dataset when $wgMiserMode is enabled?

While this could result in the categorymembers node being empty if the requested namespaces are particularly sparse in the category, that sort of thing can happen anyway since r46845 so it's much less of a breaking change. It also would save bandwidth in the same circumstances, by not returning 5000 results the client doesn't want; when categorymembers is being used as a generator, it would also save having the generated modules querying those unwanted pages.

attachment diff ignored as obsolete

Domas added a comment.Via ConduitJul 15 2009, 6:32 AM

well, proper fix would be prefixing categories sortkey with something what sorts high up.
brion thought it was too dirty, so the change didn't get much traction, and we have current situation.

Brad's PHP filtering is... oh... well... I won't comment ;-)

Catrope added a comment.Via ConduitJul 15 2009, 10:07 AM

(In reply to comment #8)

well, proper fix would be prefixing categories sortkey with something what
sorts high up.
brion thought it was too dirty, so the change didn't get much traction, and we
have current situation.

Brad's PHP filtering is... oh... well... I won't comment ;-)

...a dirty, temporary hack, but good enough for now. Committed in r53304. Of course this filtering should be moved to the database side soon.

Anomie added a comment.Via ConduitJul 19 2009, 3:06 AM
  • Bug 19816 has been marked as a duplicate of this bug. ***
Ilmari_Karonen added a comment.Via ConduitAug 2 2009, 11:36 AM

Okay, this is just FUBAR. If we don't currently have an efficient way of getting the subcategories of a category, then one should be created. Unlike Domas, I'm no DB expert, but I'd suggest adding a cl_fromns (?) field to the categorylinks table and creating new indexes to use it. (Unfortunately, I don't think MySQL is smart enough to use those indexes for queries that don't involve the namespace, so we'd either have to keep some redundant indexes or use the UNION hack. Domas can probably say which one he thinks would be the lesser evil.) Or just use the sortkey hack that Domas originally suggested.

Ilmari_Karonen added a comment.Via ConduitAug 2 2009, 11:42 AM

By the way, doesn't the CategoryTree extension have the exact same performance problem, or is the fact that it uses memcached enough to avoid any issues there?

bzimport added a comment.Via ConduitAug 5 2009, 5:29 AM

herd wrote:

*** Bug 20072 has been marked as a duplicate of this bug. ***

Raymond added a comment.Via ConduitAug 5 2009, 1:27 PM
  • Bug 20072 has been marked as a duplicate of this bug. ***
Ilmari_Karonen added a comment.Via ConduitAug 24 2009, 11:41 PM

Anyway, I'm reopening this (with a slightly changed summary) because the temporary hack applied in r53304 really is ugly and inefficient and should be replaced by a better fix (e.g. adding new fields and indexes to the categorylinks table, or using the sortkey hack suggested by Domas).

duplicatebug added a comment.Via ConduitJul 13 2010, 5:56 PM
  • Bug 24354 has been marked as a duplicate of this bug. ***
Reedy added a comment.Via ConduitJan 7 2011, 2:21 AM

mysql> DESCRIBE SELECT /* ApiQueryCategoryMembers::run */ cl_from,cl_sortkey,page_namespace,page_title,page_id FROM mw_page,mw_categorylinks FORCE INDEX (cl_sortkey) WHERE (cl_from=page_id) AND cl_to = 'Oh_rly' ORDER BY cl_sortkey, cl_from LIMIT 11\G
+----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+

1SIMPLEmw_categorylinksrefcl_sortkeycl_sortkey257const2Using where; Using index; Using filesort
1SIMPLEmw_pageeq_refPRIMARYPRIMARY4wikidb.mw_categorylinks.cl_from1

+----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+
2 rows in set (0.00 sec)

mysql>

Was wondering.

Just removing the order by removes the filesort ;)

bzimport added a comment.Via ConduitJan 7 2011, 8:05 AM

Bryan.TongMinh wrote:

The proper order by for the cl_sortkey index would be cl_to,cl_type,cl_sortkey

Reedy added a comment.Via ConduitJan 7 2011, 6:21 PM

IF that is all that's required to fix it... It's worth adding an index (though, I've got a feeling adding the index might be quite slow especially on wmf et al. And is it still right post the sorting changes?), and backporting...

Catrope added a comment.Via ConduitJan 7 2011, 7:43 PM

(In reply to comment #19)

IF that is all that's required to fix it... It's worth adding an index (though,
I've got a feeling adding the index might be quite slow especially on wmf et
al. And is it still right post the sorting changes?), and backporting...

That's not the problem. The problem is that you have to join in the page table to filter by namespace, because the namespace isn't in the categorylinks table.

Peachey88 added a comment.Via ConduitApr 30 2011, 12:10 AM

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

Reedy added a comment.Via ConduitMay 10 2011, 11:35 PM

RESOLVED LATER

There's nothing actionable by the API at the moment

If there does become something in future, feel free to reopen the bug at that point

Do we have a tracking bug for that or something?

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.