Automatic category redirects
OpenPublic

Description

Author: p_simoons

Description:
Supposing category:A redirects to category:B.
Would it be feasible to automatically move all articles placed in cat:A into
cat:B instead?
Alternateively, would it be possible to create a Specialpage that lists all
categories that are redirects, so that a bot can do the moving?


Version: unspecified
Severity: enhancement
See Also: T7346

Older changes are hidden. Show older changes.
brion added a comment.Via ConduitFeb 5 2006, 6:48 PM
  • Bug 4879 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitFeb 24 2006, 12:29 AM

brianna.laugher wrote:

Having this problem solved for the Commons would be fantastic. Because of this
problem we currently have a restriction that all category names should be in
English. As you can imagine that hardly does anything to promote the
multilingual policy of the Commons and I'm sure it's one of the things that
turns off would-be contributors whose native tongue is not English.

If this was implemented (as I understand it), we'd be able to have
[[:Category:Maus]], [[:Category:Mouse]] and [[:Category:Mysz]], and the effect
of putting an image in any of them would be the same in the end.

bzimport added a comment.Via ConduitFeb 24 2006, 9:12 AM

dbenbenn wrote:

It seems to me that we don't want to "automatically move all articles". When
[[A]] redirects to [[B]] and you link to A in an article, MediaWiki doesn't
replace that with [[B|A]]. I think all that's necessary is that if Category:A
redirects to Category:B, then every article in Category:A appears in the
category listing for Category:B. Implementing this wouldn't require having
articles store category links as metadata, since it has nothing to do with
''how'' a particular article got into a category.

JakobVoss added a comment.Via ConduitApr 9 2006, 3:44 PM

It would also help get a message in category view at the list of pages in this
category that shows a list of categories that redirect to the viewed category.
Example:

  1. There is Category:Mouse
  2. People regularly use Category:Maus instead of Category:Mouse
  3. So you create Category:Maus as a redirect to Category:Mouse
  4. If you view Category:Mouse you miss all the articles in Category:Maus
  5. So a message is shown: "Category:Mouse is also know as Category:Maus. Please

move the articles to the redirected category."

The current status is unsatisfying. In commons 857 categories are redirects and
1263 pages use this categories (452 pages in 677 categories for the English
Wikipedia):

SELECT COUNT(*) from page where page_is_redirect=1 AND page_namespace=14;

SELECT COUNT(DISTINCT cl_from) from categorylinks, page WHERE
page_is_redirect=1 AND page_namespace=14 AND cl_to=page_title;

bzimport added a comment.Via ConduitApr 9 2006, 3:50 PM

rotemliss wrote:

(In reply to comment #11)

It would also help get a message in category view at the list of pages in this
category that shows a list of categories that redirect to the viewed category.
Example:

  1. There is Category:Mouse
  2. People regularly use Category:Maus instead of Category:Mouse
  3. So you create Category:Maus as a redirect to Category:Mouse
  4. If you view Category:Mouse you miss all the articles in Category:Maus
  5. So a message is shown: "Category:Mouse is also know as Category:Maus. Please move the articles to the redirected category."

    The current status is unsatisfying. In commons 857 categories are redirects and 1263 pages use this categories (452 pages in 677 categories for the English Wikipedia):

    SELECT COUNT(*) from page where page_is_redirect=1 AND page_namespace=14;

    SELECT COUNT(DISTINCT cl_from) from categorylinks, page WHERE page_is_redirect=1 AND page_namespace=14 AND cl_to=page_title;

There is no need to do that - if we fix that bug, the pages in Category:Maus are
automatically shown in the Category:Mouse. Your suggestion is just a workaround.

bzimport added a comment.Via ConduitMay 11 2006, 1:35 PM

Eugene.Zelenko wrote:

*** Bug 5893 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitMay 11 2006, 4:17 PM

moses.mason wrote:

Category redirect is especially useful in chinese wikipeida and wikinews.

As you may know, In chinese, one thing can be writen both in traditional chinese
and simplified chinese. These character to represent the same thing and both of
them are correct and should be existe. For example, [[Category:災難]] and
[[Category:灾难]] are the same (BTW: "災難" and "灾难" mean "disaster").

However, redirect [[Category:災難]] to [[Category:灾难]] is useless, now.
Articles categorized under 災難 still can not be seen in [[Category:灾难]].
Category redirect only redirected the category *itself*, but not the articles
categorized in the category.

bzimport added a comment.Via ConduitJul 20 2006, 12:55 AM

ayg wrote:

*** Bug 6750 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitDec 19 2006, 6:33 PM

rotemliss wrote:

Patch

This patch changes Parser::replaceInternalLinks: when parsing a link to a
category (e.g. [[Category:A]], *not* [[:Category:A]]), checking if it's exist
in the "redirect" table (using a slave - I think it's OK), and if so and the
redirect is to another category, overriding the current title (variable $nt)
with the redirected title.

This fixes both the display in the page and the DB (table "categorylinks"), as
the additions to this table are done using the same parser method.

The patch works for me, however it should still be checked for regressions.

attachment patch ignored as obsolete

bzimport added a comment.Via ConduitJan 6 2007, 6:13 PM

rotemliss wrote:

Patch

Two additions:

  1. Allowing categories to be moved.
  2. Updating categorylinks when moving categories.

TODO: Fix new redirect (currently links to [[Category:B]] instead of
[[:Category:B]]), update categorylinks when editing a category.

attachment patch ignored as obsolete

bzimport added a comment.Via ConduitJan 6 2007, 8:06 PM

rotemliss wrote:

Patch

Fixing redirect page, move page message and links to the page when it becomes a
redirect.

Summary of the patch changes:

  1. Redirect the categories: if a category is a redirection, make the links to

it and the items in "categorylinks" table refer to the redirected category when
parsing the page.

  1. Update categorylinks when a category is edited and becomes a redirect.
  2. Make category moves possible - remove it from the forbidden namespaces.
  3. Update categorylinks when a category is moved.
  4. Fix the redirect page left when a category is moved: used a colon to prevent

inclusion in the category.

  1. Fix the links in pagemovedtext.

Things which still have to be done:

  1. Update categorylinks when a redirect category becomes a redirection to

another category.

  1. Update categorylinks when a redirect category becomes a regular category

which is not a redirection.
I think that these require another field in categorylinks, "cl_original_to"
(may be null, or maybe same to "cl_to" if not redirected?), which specifies the
original target (which is now a redirection). If it's not added, it's not
possible to update categorylinks because it's not *known* which pages are in
this category. I don't know if this field should be added, these things should
not be fixed, or there is another way to do it. Any ideas?

Attached: patch

bzimport added a comment.Via ConduitFeb 25 2007, 6:39 PM

rickblock wrote:

As an expediency, somewhat short of full category redirect support, can a change be made so that when an article is
saved it is added to the target of a redirected category rather than the redirected category (i.e. if category:A is redirected
to category:B, when changing an article to add it to cateogry:A the article is actually added to category:B)? Doing this
one change, in combination with recat bots like RobotG, would enable category redirects to work nearly perfectly.

bzimport added a comment.Via ConduitMar 8 2007, 10:27 PM

fabartus wrote:

Endorse expediency request above with fervency! Moving/remaning would be nice
too, but per
[http://en.wikipedia.org/wiki/Wikipedia_talk:Categories_for_discussion#Propose_tagging_with_both_and_expanding_use_of_Cat_redirects_overall
this] the method of combined soft and hard redirects put together with the
proper linking at the time of saving a page would pretty well cover normal
editing objections to redirecting categories.

CONSIDER: There are multiple ways to phrase the equivalent page classification
in English, but note the three legged stool... most every language's wiki, no
matter what type project, one way or another connects with interwiki's to the
English Wikipedia articles pages (if only through their own article on the topic
in their Wikipedia), the commons category, and/or the Wikipedia category (Which
I labor mightily to synchronize, as much as possible, so I know them well).

How a language alliterates into English translation we native speakers would
find to be an awkward phrasing more often than not, AND VICE VERSA, so on the
commons, redirects of categories are tolerated specifically to cover such
'ambiguities', including redirects of foreign category names to the "Official"
English name (Unlike most namings on the commons, Principal Category names are
English by fiat... articles, images, etc. can all be other languages. So the
importance should be obvious, I hope!).

So too do we native speakers have our choice of how to state a category name...
(e.g. Countries in Europe, vs. Countries of Europe) the halls of the
en.Wikipedia (and I suspect all others!) CFD discussions are ankle deep in blood
from some of those debates! That in part exists because of different schemes in
related categorisation (Geography, Maps, and History all intersect Countries...
so comes complications, or a need to alias however belatedly! <g>) Which frankly
is a needless waste of time, were it easy to alias a name, and that name be the
one 'online' per this proposal.

In one sense these names issues are are trivialities, but they are important
trivialities, as recollection and modes of phrase formulation are inherently
personal things involving the way each of us thinks. So there is a natural
factionalism as others think like me, and some like him, and even when
compromises occasion, they involve a lot of work for someone... which is
hopefully the guys not thinking like me! <BSEG> Bottom line, aliasing would
prevent and eliminate a lot of relatively uselessly wasted man-hours renaming
things. The computers should be doing that drudgery, not we humans, save for you
developers... if you do it once, your effort pays back over and over for the many.

In sum, this matter has a inherently more important and higher priority than
convenience of one editor, but to multiples of the many of many's of editor's
across all nationalities! I guess I'm saying this has had far too low a priority
heretofore. It seems simple, so unimportant, but categories are fundamental to
organising the projects, hence the effects are vastly magnified. In one swell
foop, all the nit-picky (local language) name choices can be trivialized and one
uniform name emerge in each locale&mdash;yet still not only retain, but actually
enhance interconnectivity between sisters and within a given project. So please
do expedite both a determination on Rick's iterim proposal, and a full
implementation allowing name moves and the like.

Just cutting down the debate will free many daily man-hours each day on CFD, so
delay is frankly, costly and world wide costly at that. Best regards // ~~~~

bzimport added a comment.Via ConduitApr 22 2007, 2:12 PM

ayg wrote:

  1. Update categorylinks when a category is edited and becomes a redirect. ...
  2. Update categorylinks when a category is moved.

This might be an issue. I don't know if an UPDATE of, for the current worst case,
a couple hundred thousand rows is acceptable. An alternative would be to check the
redirect status at display time rather than on update, as we do for pages: retrieve
all pages that are in category X or anything that redirects to it. Of course that
means faster UPDATE and slower SELECT, which is generally the reverse of what's
wanted. We should ask Domas or Tim or someone what's best, I guess.

Things which still have to be done:

  1. Update categorylinks when a redirect category becomes a redirection to another category.
  2. Update categorylinks when a redirect category becomes a regular category which is not a redirection. I think that these require another field in categorylinks, "cl_original_to" (may be null, or maybe same to "cl_to" if not redirected?), which specifies the original target (which is now a redirection). If it's not added, it's not possible to update categorylinks because it's not *known* which pages are in this category. I don't know if this field should be added, these things should not be fixed, or there is another way to do it. Any ideas?

Short of reparsing every page in the category, this probably does require an
extra field, yes. Other schema updates might be necessary to make the updates
for large categories efficient. I think that whatever happens will be more
efficient than bots loading tons of pages and forcing them to be reparsed, though.
:)

bzimport added a comment.Via ConduitJun 13 2007, 12:09 AM

robchur wrote:

*** Bug 10236 has been marked as a duplicate of this bug. ***

Catrope added a comment.Via ConduitJun 25 2007, 3:33 PM

Created attachment 3823
Include redirected members in category view

The patch submitted earlier kind of scares me. Consider the following scenario:

  1. Page is categorized in Category:A
  2. Category:A becomes a redirect to Category:B
  3. Page is updated accordingly
  4. Category:A becomes a redirect to Category:C
  5. Page is NOT updated accordingly, since it is treated as a member of Category:B.

Someone suggested a new DB field to counter this, but that isn't necessary.

The attached patch fixes this bug in a simpler way, without the problem described above. When you view Category:B, the code will check if any other categories redirect to B. If Category:A redirects to Category:B, both A and B's members will show up when you view Category:B. As usual, double redirects won't work, i.e. if A redirects to B and B redirects to C, Category:C will show B and C's members, but not A's.

The attached patch makes moving categories easy, just remove NS_CATEGORY from the forbidden namespaces. Since everything is handled transparently through redirects (just like we do with normal pages), no problems should ensue.

Attached: category-redir.patch

bzimport added a comment.Via ConduitJun 25 2007, 3:40 PM

lowzl wrote:

A solution was proposed using the redirects table in bug 8685 ...

Catrope added a comment.Via ConduitJun 25 2007, 3:45 PM

Created attachment 3824
Include redirected members in API list=categorymembers

This patch does the same thing as my previous patch, with the difference that this one fixes listing category members in the MediaWiki API as opposed to the category page itself.

Attached: category-redir-api.patch

bzimport added a comment.Via ConduitJun 25 2007, 5:10 PM

ayg wrote:

*** Bug 8685 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitJun 25 2007, 5:16 PM

ayg wrote:

I'm hardly an SQL expert, I'm afraid, but any particular reason you added an extra query rather than joining? I doubt it makes much difference, though, performance-wise.

I'm a bit alarmed that the change has to be made separately for the API, rather than both calling a general-purpose public method of something, but I guess that's a separate issue.

I'll take a look at this and hopefully commit something today or tomorrow. Although I notice a few bugs assigned to me that I've totally forgotten about, so let's hope this doesn't become one. :D

Catrope added a comment.Via ConduitJun 25 2007, 5:48 PM

(In reply to comment #24)

A solution was proposed using the redirects table in bug 8685 ...

(In reply to comment #27)

I'm hardly an SQL expert, I'm afraid, but any particular reason you added an
extra query rather than joining? I doubt it makes much difference, though,
performance-wise.

The JOIN suggestion suggested in bug 8685 didn't work (selected the wrong data from the page table), and since I'm not particularly good at writing complex SQL queries either, I decided to do it this way. I think my way may actually be faster, since the latter query is still a regular category lookup which is indexed. A complex JOIN statement wouldn't have the indexing benefit (correct me if I'm wrong).

I'm a bit alarmed that the change has to be made separately for the API, rather
than both calling a general-purpose public method of something, but I guess
that's a separate issue.

This is partly due to the fact the API provides much more information and filtering options than you'll ever need in a regular page. Also, the current code mixes DB code with UI code, which makes a lot of functions unusable for the API. Article.php and EditPage.php are good examples.

I'll take a look at this and hopefully commit something today or tomorrow.
Although I notice a few bugs assigned to me that I've totally forgotten about,
so let's hope this doesn't become one. :D

We're all humans, we all need breaks ;) take your time.

bzimport added a comment.Via ConduitJun 26 2007, 12:35 AM

ayg wrote:

Checking EXPLAIN shows that the query will use a filesort due to replacement of simple equality with a check for IN. I got the same trying the one-query join technique, adjusted to give correct results. Domas will probably kill me if I add a gratuitous filesort to every category, so I (we) will have to ask him why it's filesorting and how to stop it.

(By the way, more easily fixed but also significant, your check for rd_title='title' alone can't use the redirect table's indexes, because the index is on (rd_namespace, rd_title). You need to add 'rd_namespace' => NS_CATEGORY to the conditions for that query to be efficient.)

Catrope added a comment.Via ConduitJun 26 2007, 1:14 PM

(In reply to comment #29)

(By the way, more easily fixed but also significant, your check for
rd_title='title' alone can't use the redirect table's indexes, because the
index is on (rd_namespace, rd_title). You need to add 'rd_namespace' =>
NS_CATEGORY to the conditions for that query to be efficient.)

By all means do so. I know just enough about MySQL to get by, and have no idea how all those optimizations work.

Catrope added a comment.Via ConduitJun 30 2007, 7:54 AM

(In reply to comment #29)

Checking EXPLAIN shows that the query will use a filesort due to replacement of
simple equality with a check for IN. I got the same trying the one-query join
technique, adjusted to give correct results. Domas will probably kill me if I
add a gratuitous filesort to every category, so I (we) will have to ask him why
it's filesorting and how to stop it.

I don't really understand any of that (for instance, my query doesn't use IN AFAIK), but I understand it's a performance problem. How can that be solved?

bzimport added a comment.Via ConduitJul 1 2007, 2:30 AM

ayg wrote:

Your code contains an IN because you have 'cl_to' => $titles as a condition, with $titles an array, and that translates to (cl_to IN ($titles)). I don't know how it can be solved, try asking Domas or someone.

bzimport added a comment.Via ConduitJul 3 2007, 3:08 AM

ayg wrote:

After discussion with Domas, it seems that any attempt to check for redirects in the current schema will *probably* cause a filesort, or at least all the ones suggested did. We probably need a new field, cl_real_to or something, that has the redirect pre-resolved. When adding a category to a page, the actual target would be put in cl_to as now; then if it's a redirect, the redirect target would be put in cl_real_to, otherwise that would be a copy of cl_to (or it would be NULL, depending on which works better). Then cl_real_to would be used for displaying category pages in place of cl_to. Whenever a category is changed to a redirect, or the target of a category redirect is changed, categorylinks would be updated appropriately.

River pointed out that if cl_real_to is an id instead of a title, it will persist across renames of the category. But Rob pointed out that that only works if the category has an associated page. River then suggested a cat_id, which may or may not be going too far for this exercise. We can always stick updates for cl_real_to in the job queue, basically mimicking the current bot-update situation.

Catrope added a comment.Via ConduitJul 3 2007, 9:42 AM

(In reply to comment #33)
I think cl_real_to is the way to go. Queries would be indexed, you'd have something like WHERE cl_to='title' OR cl_real_to='title';. Updating cllinks would be the simple (but potentially massive) query UPDATE categorylinks SET cl_real_to='redirtarget' WHERE cl_to='redirname';.

bzimport added a comment.Via ConduitSep 29 2008, 5:46 PM

Eugene.Zelenko wrote:

*** Bug 15742 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitNov 5 2008, 1:15 PM

catlow wrote:

Sorry, the technical stuff is over my head, but can someone explain what the chances are of this being fixed? What effect do the patches referred to have? Can we expect members of redirected categories to show up on the target category page?

bzimport added a comment.Via ConduitDec 13 2008, 9:42 PM

Sebastian wrote:

Just came here from http://en.wikipedia.org/wiki/Wikipedia_talk:Categories_for_discussion#Category_redirects and wanted to add a vote for this bug.

Disclaimer: It's a long time that I last was here, and I didn't find a "vote" feature, as the Mozilla db has. Also, I didn't spend much time to understand the discussion of this and related bugs, and I don't understand the difference to bug 710, which is supposedly fixed.

bzimport added a comment.Via ConduitDec 13 2008, 11:10 PM

ayg wrote:

(In reply to comment #37)

I didn't find a "vote"
feature, as the Mozilla db has.

Bottom right, "Vote for this bug" (Ctrl-F for "vote" would have found it).

I don't understand the difference
to bug 710, which is supposedly fixed.

Bug 710 is about redirects to a category working when you navigate directly to the page, the same way they work for other pages. Prior to that bug's resolution, I guess "#REDIRECT [[:Category:XYZ]]" would do nothing, or be buggy or something (it was before my time). This is about redirects actually including one category's contents in another.

bzimport added a comment.Via ConduitFeb 3 2009, 2:27 AM

PhiLiP.NPC wrote:

may fixed in r46706.

bzimport added a comment.Via ConduitFeb 7 2009, 5:02 PM

catlow wrote:

This seems not to be working (at least, I just tried it on English Wikipedia and it didn't work - I don't know if it's supposed to be live there yet).

Platonides added a comment.Via ConduitFeb 7 2009, 5:05 PM

Wikipedia is still on r46424, see Special:Version.

bzimport added a comment.Via ConduitFeb 12 2009, 12:47 PM

catlow wrote:

I'm glad this is going to be solved soon, but there is the problem of potential exploitation by vandals. I've filed a new bug (bug 17461) to address this.

bzimport added a comment.Via ConduitFeb 18 2009, 3:01 PM

catlow wrote:

Don't like to sound impatient, but when can we expect this fix to actually come live?

bzimport added a comment.Via ConduitFeb 19 2009, 5:01 PM

catlow wrote:

OK, it is live, thanks. There still seems to be a slight problem, though, in that you can't get a list of members of the redirected category specifically. I've raised this in a new bug (bug:17571).

bzimport added a comment.Via ConduitApr 3 2009, 10:10 PM

ipatrol6010 wrote:

It has been decided that the change will be made in the next mediawiki full release ( http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/RELEASE-NOTES?view=markup ) , so just be patient.☺

tstarling added a comment.Via ConduitMay 19 2009, 8:06 AM

Reverted, see CodeReview r46706.

bzimport added a comment.Via ConduitMay 19 2009, 9:50 AM

catlow wrote:

Why has this potentially helpful change been reverted? It seemed to be working well; the only problem was bug 17571, which surely can't be difficult to fix. We know that the tables don't get updated straight away when a category is changed to/from a redirect, and I presume we wouldn't want them to. Bots would handle emptying existing categories when they get redirected, exactly as they do now.

tstarling added a comment.Via ConduitMay 19 2009, 11:47 AM

(In reply to comment #47)

Why has this potentially helpful change been reverted? It seemed to be working
well; the only problem was bug 17571, which surely can't be difficult to fix.
We know that the tables don't get updated straight away when a category is
changed to/from a redirect, and I presume we wouldn't want them to. Bots would
handle emptying existing categories when they get redirected, exactly as they
do now.

The tables indeed were not updated straight away, in fact they were not updated at all, ever. You'd have to have a bot go through and edit every page in the category, every time the redirect status or redirect target changed.

It's possible to do these updates immediately, with negligible performance loss, and to retire the bots. But it would be much more difficult to implement that feature if the categorylinks table was significantly polluted with spurious links from r46706.

bzimport added a comment.Via ConduitMay 19 2009, 12:34 PM

catlow wrote:

I don't think it was ever envisaged that the tables would be updated automatically (I didn't think that was desirable anyway, since inappropriate redirects of large categories, and subsequent reversions, would cause lots of extra processing, of the sort that doesn't seem to happen when e.g. templates with categories get updated). But if you say it can be done, then we'll wait in eager anticipation...

Catrope added a comment.Via ConduitMay 19 2009, 1:49 PM

(In reply to comment #49)

I don't think it was ever envisaged that the tables would be updated
automatically (I didn't think that was desirable anyway, since inappropriate
redirects of large categories, and subsequent reversions, would cause lots of
extra processing, of the sort that doesn't seem to happen when e.g. templates
with categories get updated). But if you say it can be done, then we'll wait in
eager anticipation...

Templates with categories don't cause immediate updates because those updates are put in the job queue and executed later. Presumably, updating for category redirect changes would also use the job queue.

bzimport added a comment.Via ConduitMay 19 2009, 2:09 PM

ayg wrote:

Templates with categories don't cause immediate updates because those updates require reparsing of large numbers of pages. Category redirects don't, I don't see any reason why they should need the job queue. Except for really giant categories, maybe, where you'd want to batch the updates to not lag the slaves.

Platonides added a comment.Via ConduitMay 22 2009, 6:02 PM

Making a "normal" Category a category can be done straight away, but unredirecting a category requires reparsing all category members.

bzimport added a comment.Via ConduitMay 22 2009, 7:19 PM

ayg wrote:

Or adding an extra column to categorylinks. That seems like a better idea, unless un-redirecting is expected to be very rare.

Platonides added a comment.Via ConduitMay 22 2009, 10:55 PM

That's probably the way to go. What would be that column?

bzimport added a comment.Via ConduitMay 24 2009, 1:56 PM

ayg wrote:

cl_to_original or such, an unredirected variant of cl_to. Then if a redirect chain changes, you could do UPDATE categorylinks SET cl_to='New_redirect_target' WHERE cl_to_original IN ('Original_category1', 'Original_category2');. You'd want an index on cl_to_original, of course, so this is a pretty heavyweight addition to the table.

bzimport added a comment.Via ConduitJul 11 2009, 9:37 PM

ipatrol6010 wrote:

I think that the best solution is you place [[A]] into [[Category:Foo]] and Foo redirects to Bar so you see [[A]] in [[Category:Bar]] and clicking on the catlink to Foo leads you to Bar. For commons they can have "co-categories" where a member of one co-category is visible in all other co-categories. This can be done by having all the categories have [[;Category:Fu]] [[;Category:Fuz]] [[;Category:Faz]] [[;Category:(...)]]

bzimport added a comment.Via ConduitNov 17 2009, 12:34 PM

catlow wrote:

Hello, is anyone still working on this? Any progress lately? It all seemed to be going so well at one point...

Jidanni added a comment.Via ConduitApr 1 2010, 4:38 AM

Sorry my following observation is probably noted above, but I didn't check.

On [[Page A]] put "[[Category:C1]]".
Now on [[Category:C1]] put
"#REDIRECT [[Category:C2]]".

Note how Page A is not listed on Category:C2.
Instead the only way to hunt down Page A in the categories is to
visit Category:1&redirect=no !

Jidanni added a comment.Via ConduitApr 1 2010, 4:40 AM

(In reply to comment #58)

visit Category:1&redirect=no !

I meant Category;C1&redirect=no. The redirect=no part is not something the average user will know to try. So the category entry is effectively lost in this sense.

Peachey88 added a comment.Via ConduitApr 30 2011, 12:10 AM

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

Bawolff added a comment.Via ConduitNov 8 2011, 9:13 PM
  • Bug 32262 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitNov 9 2011, 4:15 AM

sumanah wrote:

Adding the keywords that seem right -- if the patches still need reviewing, please change "reviewed" to "need-review".

Qgil added a comment.Via ConduitMar 23 2013, 6:24 PM

This feature request is being proposed at

http://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Automatic_category_redirects

and I'm considering whether to add it or not to

https://www.mediawiki.org/wiki/Summer_of_Code_2013#Project_ideas

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Is there a reasonable support from the MediaWiki core maintainers to incorporate this feature if it's developed and meets the quality criteria?

Without these qualifications in place we can't even consider the proposal for
GSOC 2013.

Bawolff added a comment.Via ConduitMar 23 2013, 6:26 PM

(In reply to comment #63)

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Yes me :)

Is there a reasonable support from the MediaWiki core maintainers to
incorporate this feature if it's developed and meets the quality criteria?

I think so. Would require schema changes which is the only bit that could potentially be sticky.

Parent5446 added a comment.Via ConduitApr 23 2013, 6:38 AM

The more you know...

The current query for getting category members is:
SELECT ...
FROM page
INNER JOIN categorylinks

FORCE INDEX (cl_sortkey)
ON ((cl_from = page_id))

LEFT JOIN category

ON ((cat_title = page_title) AND page_namespace = '14')

WHERE cl_to = 'Test' AND cl_type = 'page'
ORDER BY cl_sortkey
LIMIT 201

And, true enough, if you change the cl_to check from a comparison to an IN operator, it triggers a filesort. *However*, if you instead move the contents of the WHERE clause into the INNER JOIN condition, then the filesort disappears. The resulting query is:

SELECT ...
FROM page
INNER JOIN categorylinks

FORCE INDEX (cl_sortkey)
ON ((cl_from = page_id) AND (cl_to IN ('Test')) AND (cl_type = 'page'))

LEFT JOIN category

ON ((cat_title = page_title) AND page_namespace = '14')

ORDER BY cl_sortkey
LIMIT 201

Now I'm not too much of an expert on databases, but theoretically this should produce the exact same results (since it's an INNER JOIN) but still be efficient (because the cl_sortkey index includes the cl_from and cl_to columns).

This would eliminate the need for any new columns and whatnot.

Qgil added a comment.Via ConduitApr 23 2013, 9:08 PM

Just a note to say that Liangent has applied to GSoC with a proposal related to this report. Good luck!

https://www.mediawiki.org/wiki/User:Liangent/cat-redir

Bawolff added a comment.Via ConduitApr 23 2013, 9:40 PM

Re comment 66:

If I have more than a single category in the IN condition when doing that, I get a filesort:

mysql> describe SELECT /* CategoryViewer::doCategoryQuery Bawolff */ page_id,page_title,page_namespace,page_len,page_is_redirect,cl_sortkey,cat_id,cat_title,cat_subcats,cat_pages,cat_files,cl_sortkey_prefix,cl_collation FROM page INNER JOIN categorylinks FORCE INDEX (cl_sortkey) ON ((cl_from = page_id) AND cl_to in ('Foo', 'se') and cl_type = 'page') LEFT JOIN category ON ((cat_title = page_title) AND page_namespace = '14') ORDER BY cl_sortkey LIMIT 2\G

  • 1. row ******* id: 1 select_type: SIMPLE table: categorylinks type: range

possible_keys: cl_sortkey

    key: cl_sortkey
key_len: 258
    ref: NULL
   rows: 559
  Extra: Using where; Using filesort
  • 2. row ******* id: 1 select_type: SIMPLE table: page type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 4
    ref: wikidb.categorylinks.cl_from
   rows: 1
  Extra:
  • 3. row ******* id: 1 select_type: SIMPLE table: category type: eq_ref

possible_keys: cat_title

    key: cat_title
key_len: 257
    ref: wikidb.page.page_title
   rows: 1
  Extra:

3 rows in set (0.00 sec)

Parent5446 added a comment.Via ConduitApr 24 2013, 5:13 AM

Hmm, damn databases.

Parent5446 added a comment.Via ConduitApr 28 2013, 9:17 PM

Success! So the issue is that the cl_sortkey index on categorylinks puts the cl_to column before the cl_sortkey column, so when you add the "cl_to IN ...", it can no longer use the index to sort by cl_sortkey (from the ORDER BY clause).

After adding the following index:

ALTER TABLE categorylinks
ADD UNIQUE cl_newsort ( cl_type, cl_sortkey, cl_to, cl_from )

And then running the following query:

EXPLAIN EXTENDED SELECT cl_from
FROM categorylinks
INNER JOIN page ON
page_id = cl_from
LEFT JOIN category ON
cat_title = page_title AND
page_namespace = 14
WHERE
cl_type = 'page' AND
cl_to IN ( 'Foo', 'Test' )
ORDER BY cl_sortkey

I finally got no more filesort. (I was even able to get rid of the FORCE INDEX usage.) If somebody could please check this and make sure I'm still sane, and that MySQL isn't just inventing things to trick my mind, that'd be great.

Bawolff added a comment.Via ConduitApr 28 2013, 9:25 PM

I havent tested this, but I would guess that unless it is doing something very fancy with merging indecies, this would cause very large scans of the categorylinks table. (Since it wouldn't be able to skip to only results in the relavent category). filesort isnt the only way that a db query can be inefficient.

Parent5446 added a comment.Via ConduitApr 28 2013, 9:33 PM

(In reply to comment #71)

I havent tested this, but I would guess that unless it is doing something
very
fancy with merging indecies, this would cause very large scans of the
categorylinks table. (Since it wouldn't be able to skip to only results in
the
relavent category). filesort isnt the only way that a db query can be
inefficient.

Hmm, you're right. Now that I realize it, this would require scanning the entire cl_sortkey index (I think).

Qgil added a comment.Via ConduitMay 3 2013, 9:56 PM

Just a note to say that Andre Saboia has submitted a GSoC proposal related to this report: https://www.mediawiki.org/wiki/User:Anboia/Automatic_category_redirects

gerritbot added a comment.Via ConduitMay 24 2013, 7:44 AM

Related URL: https://gerrit.wikimedia.org/r/65176 (Gerrit Change I29a629a514f9568d0ee4d967c516dfd599dc11ba)

Aklapper added a comment.Via ConduitSep 26 2013, 2:37 PM

Tyler: The patch received a -1, do you plan to rework it?

Bawolff added a comment.Via ConduitSep 26 2013, 2:44 PM

If I ever have free time again (read: probably not for a while), I offer to help Tyler address some of the issues with the patch.

Parent5446 added a comment.Via ConduitSep 26 2013, 2:59 PM

(In reply to comment #76)

If I ever have free time again (read: probably not for a while), I offer to
help Tyler address some of the issues with the patch.

That would be great. I don't have much free time myself, although once I do I'll definitely work on it.

Bawolff added a comment.Via ConduitSep 26 2013, 3:00 PM

(In reply to comment #77)

(In reply to comment #76)
> If I ever have free time again (read: probably not for a while), I offer to
> help Tyler address some of the issues with the patch.

That would be great. I don't have much free time myself, although once I do
I'll definitely work on it.

yeah, somebody should make a graph of number of commits to mediawiki by volunteers vs when school semester starts.

Qgil added a comment.Via ConduitOct 24 2013, 8:35 PM

I'm delisting this project from https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Automatic_category_redirects since it looks like you are almost there.

Bawolff added a comment.Via ConduitOct 24 2013, 8:37 PM

Remove milestone 1.22 - Given that this has somewhat stalled due to lack of time on the part of interested parties, seems unlikely it could possibly make it to 1.22.

He7d3r awarded a token.Via WebNov 24 2014, 11:58 AM
Liuxinyu970226 removed a subscriber: Liuxinyu970226.Via WebNov 28 2014, 3:57 AM
reeves87 removed a subscriber: reeves87.Via EmailNov 28 2014, 12:52 PM
Cenarium added a comment.Via WebDec 11 2014, 6:54 PM

T77903 isn't a duplicate of this task, it's more of a relatively easy fix until we get those long standing difficult core problems solved, which may not be before a very long time.

Nemo_bis awarded a token.Via WebDec 12 2014, 8:00 AM
Kozuch awarded a token.Via WebDec 17 2014, 8:30 PM

This task was mentioned in https://www.mediawiki.org/w/index.php?title=Outreach_programs/Possible_projects&oldid=1404823#Very_raw_projects as a possible candidate for Google Summer of Code or similar programs. Do you think it is a good candidate?

Qgil added a comment.Via WebFeb 11 2015, 1:44 PM

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Qgil added a comment.Via WebFeb 17 2015, 9:39 AM

@Parent5446, this task is assigned to you. Do you want to work on it or propose it to the next GSoC/Outreachy round?

Qgil moved this task to Need Discussion on the Possible-Tech-Projects workboard.Via WebFeb 17 2015, 9:39 AM
NiharikaKohli added a subscriber: NiharikaKohli.Via WebMar 6 2015, 5:58 AM

@Bawolff @Parent5446 Has this already been done? If not, is there interest in pushing this for upcoming GSoC round?

Nemo_bis added a comment.EditedVia WebMar 6 2015, 9:32 AM

Has this already been done?

No.

Bawolff added a comment.Via WebMar 6 2015, 11:54 AM

There is a partial patch from way back. See comments on gerrit.

NiharikaKohli added a comment.Via WebMar 6 2015, 12:43 PM

(In reply to comment #63)

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Yes me :)

Does this still hold true @Bawolff?

Bawolff added a comment.Via WebMar 6 2015, 3:18 PM

(In reply to comment #63)

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Yes me :)

Does this still hold true @Bawolff?

Not really. Like if there was a student who really super wanted to do this idea and no other, maybe, but generally I'd rather not mentor this idea, this round.

Add Comment