Automatic category redirects
Open, LowPublic

Description

Also proposed in the Community-Wishlist-Survey-2016. The proposal received 47 support votes and was ranked #25 out of 265 proposals. View full proposal with discussion and votes here.


Author: p_simoons

Description:
Supposing category:A redirects to category:B.
Would it be feasible to automatically move all articles placed in cat:A into
cat:B instead?
Alternateively, would it be possible to create a Specialpage that lists all
categories that are redirects, so that a bot can do the moving?


Version: unspecified
Severity: enhancement
See Also: T7346

Details

Reference
bz3311
There are a very large number of changes, so older changes are hidden. Show Older Changes

catlow wrote:

OK, it is live, thanks. There still seems to be a slight problem, though, in that you can't get a list of members of the redirected category specifically. I've raised this in a new bug (bug:17571).

ipatrol6010 wrote:

It has been decided that the change will be made in the next mediawiki full release ( http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/RELEASE-NOTES?view=markup ) , so just be patient.☺

Reverted, see CodeReview r46706.

catlow wrote:

Why has this potentially helpful change been reverted? It seemed to be working well; the only problem was bug 17571, which surely can't be difficult to fix. We know that the tables don't get updated straight away when a category is changed to/from a redirect, and I presume we wouldn't want them to. Bots would handle emptying existing categories when they get redirected, exactly as they do now.

(In reply to comment #47)

Why has this potentially helpful change been reverted? It seemed to be working
well; the only problem was bug 17571, which surely can't be difficult to fix.
We know that the tables don't get updated straight away when a category is
changed to/from a redirect, and I presume we wouldn't want them to. Bots would
handle emptying existing categories when they get redirected, exactly as they
do now.

The tables indeed were not updated straight away, in fact they were not updated at all, ever. You'd have to have a bot go through and edit every page in the category, every time the redirect status or redirect target changed.

It's possible to do these updates immediately, with negligible performance loss, and to retire the bots. But it would be much more difficult to implement that feature if the categorylinks table was significantly polluted with spurious links from r46706.

catlow wrote:

I don't think it was ever envisaged that the tables would be updated automatically (I didn't think that was desirable anyway, since inappropriate redirects of large categories, and subsequent reversions, would cause lots of extra processing, of the sort that doesn't seem to happen when e.g. templates with categories get updated). But if you say it can be done, then we'll wait in eager anticipation...

(In reply to comment #49)

I don't think it was ever envisaged that the tables would be updated
automatically (I didn't think that was desirable anyway, since inappropriate
redirects of large categories, and subsequent reversions, would cause lots of
extra processing, of the sort that doesn't seem to happen when e.g. templates
with categories get updated). But if you say it can be done, then we'll wait in
eager anticipation...

Templates with categories don't cause immediate updates because those updates are put in the job queue and executed later. Presumably, updating for category redirect changes would also use the job queue.

ayg wrote:

Templates with categories don't cause immediate updates because those updates require reparsing of large numbers of pages. Category redirects don't, I don't see any reason why they should need the job queue. Except for really giant categories, maybe, where you'd want to batch the updates to not lag the slaves.

Making a "normal" Category a category can be done straight away, but unredirecting a category requires reparsing all category members.

ayg wrote:

Or adding an extra column to categorylinks. That seems like a better idea, unless un-redirecting is expected to be very rare.

That's probably the way to go. What would be that column?

ayg wrote:

cl_to_original or such, an unredirected variant of cl_to. Then if a redirect chain changes, you could do UPDATE categorylinks SET cl_to='New_redirect_target' WHERE cl_to_original IN ('Original_category1', 'Original_category2');. You'd want an index on cl_to_original, of course, so this is a pretty heavyweight addition to the table.

ipatrol6010 wrote:

I think that the best solution is you place [[A]] into [[Category:Foo]] and Foo redirects to Bar so you see [[A]] in [[Category:Bar]] and clicking on the catlink to Foo leads you to Bar. For commons they can have "co-categories" where a member of one co-category is visible in all other co-categories. This can be done by having all the categories have [[;Category:Fu]] [[;Category:Fuz]] [[;Category:Faz]] [[;Category:(...)]]

catlow wrote:

Hello, is anyone still working on this? Any progress lately? It all seemed to be going so well at one point...

Sorry my following observation is probably noted above, but I didn't check.

On [[Page A]] put "[[Category:C1]]".
Now on [[Category:C1]] put
"#REDIRECT [[Category:C2]]".

Note how Page A is not listed on Category:C2.
Instead the only way to hunt down Page A in the categories is to
visit Category:1&redirect=no !

(In reply to comment #58)

visit Category:1&redirect=no !

I meant Category;C1&redirect=no. The redirect=no part is not something the average user will know to try. So the category entry is effectively lost in this sense.

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

  • Bug 32262 has been marked as a duplicate of this bug. ***

sumanah wrote:

Adding the keywords that seem right -- if the patches still need reviewing, please change "reviewed" to "need-review".

Qgil added a comment.Mar 23 2013, 6:24 PM

This feature request is being proposed at

http://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Automatic_category_redirects

and I'm considering whether to add it or not to

https://www.mediawiki.org/wiki/Summer_of_Code_2013#Project_ideas

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Is there a reasonable support from the MediaWiki core maintainers to incorporate this feature if it's developed and meets the quality criteria?

Without these qualifications in place we can't even consider the proposal for
GSOC 2013.

(In reply to comment #63)

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Yes me :)

Is there a reasonable support from the MediaWiki core maintainers to
incorporate this feature if it's developed and meets the quality criteria?

I think so. Would require schema changes which is the only bit that could potentially be sticky.

The more you know...

The current query for getting category members is:
SELECT ...
FROM page
INNER JOIN categorylinks

FORCE INDEX (cl_sortkey)
ON ((cl_from = page_id))

LEFT JOIN category

ON ((cat_title = page_title) AND page_namespace = '14')

WHERE cl_to = 'Test' AND cl_type = 'page'
ORDER BY cl_sortkey
LIMIT 201

And, true enough, if you change the cl_to check from a comparison to an IN operator, it triggers a filesort. *However*, if you instead move the contents of the WHERE clause into the INNER JOIN condition, then the filesort disappears. The resulting query is:

SELECT ...
FROM page
INNER JOIN categorylinks

FORCE INDEX (cl_sortkey)
ON ((cl_from = page_id) AND (cl_to IN ('Test')) AND (cl_type = 'page'))

LEFT JOIN category

ON ((cat_title = page_title) AND page_namespace = '14')

ORDER BY cl_sortkey
LIMIT 201

Now I'm not too much of an expert on databases, but theoretically this should produce the exact same results (since it's an INNER JOIN) but still be efficient (because the cl_sortkey index includes the cl_from and cl_to columns).

This would eliminate the need for any new columns and whatnot.

Qgil added a comment.Apr 23 2013, 9:08 PM

Just a note to say that Liangent has applied to GSoC with a proposal related to this report. Good luck!

https://www.mediawiki.org/wiki/User:Liangent/cat-redir

Re comment 66:

If I have more than a single category in the IN condition when doing that, I get a filesort:

mysql> describe SELECT /* CategoryViewer::doCategoryQuery Bawolff */ page_id,page_title,page_namespace,page_len,page_is_redirect,cl_sortkey,cat_id,cat_title,cat_subcats,cat_pages,cat_files,cl_sortkey_prefix,cl_collation FROM page INNER JOIN categorylinks FORCE INDEX (cl_sortkey) ON ((cl_from = page_id) AND cl_to in ('Foo', 'se') and cl_type = 'page') LEFT JOIN category ON ((cat_title = page_title) AND page_namespace = '14') ORDER BY cl_sortkey LIMIT 2\G

  • 1. row ******* id: 1 select_type: SIMPLE table: categorylinks type: range

possible_keys: cl_sortkey

    key: cl_sortkey
key_len: 258
    ref: NULL
   rows: 559
  Extra: Using where; Using filesort
  • 2. row ******* id: 1 select_type: SIMPLE table: page type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 4
    ref: wikidb.categorylinks.cl_from
   rows: 1
  Extra:
  • 3. row ******* id: 1 select_type: SIMPLE table: category type: eq_ref

possible_keys: cat_title

    key: cat_title
key_len: 257
    ref: wikidb.page.page_title
   rows: 1
  Extra:

3 rows in set (0.00 sec)

Hmm, damn databases.

Success! So the issue is that the cl_sortkey index on categorylinks puts the cl_to column before the cl_sortkey column, so when you add the "cl_to IN ...", it can no longer use the index to sort by cl_sortkey (from the ORDER BY clause).

After adding the following index:

ALTER TABLE categorylinks
ADD UNIQUE cl_newsort ( cl_type, cl_sortkey, cl_to, cl_from )

And then running the following query:

EXPLAIN EXTENDED SELECT cl_from
FROM categorylinks
INNER JOIN page ON
page_id = cl_from
LEFT JOIN category ON
cat_title = page_title AND
page_namespace = 14
WHERE
cl_type = 'page' AND
cl_to IN ( 'Foo', 'Test' )
ORDER BY cl_sortkey

I finally got no more filesort. (I was even able to get rid of the FORCE INDEX usage.) If somebody could please check this and make sure I'm still sane, and that MySQL isn't just inventing things to trick my mind, that'd be great.

I havent tested this, but I would guess that unless it is doing something very fancy with merging indecies, this would cause very large scans of the categorylinks table. (Since it wouldn't be able to skip to only results in the relavent category). filesort isnt the only way that a db query can be inefficient.

(In reply to comment #71)

I havent tested this, but I would guess that unless it is doing something
very
fancy with merging indecies, this would cause very large scans of the
categorylinks table. (Since it wouldn't be able to skip to only results in
the
relavent category). filesort isnt the only way that a db query can be
inefficient.

Hmm, you're right. Now that I realize it, this would require scanning the entire cl_sortkey index (I think).

Qgil added a comment.May 3 2013, 9:56 PM

Just a note to say that Andre Saboia has submitted a GSoC proposal related to this report: https://www.mediawiki.org/wiki/User:Anboia/Automatic_category_redirects

Related URL: https://gerrit.wikimedia.org/r/65176 (Gerrit Change I29a629a514f9568d0ee4d967c516dfd599dc11ba)

Tyler: The patch received a -1, do you plan to rework it?

If I ever have free time again (read: probably not for a while), I offer to help Tyler address some of the issues with the patch.

(In reply to comment #76)

If I ever have free time again (read: probably not for a while), I offer to
help Tyler address some of the issues with the patch.

That would be great. I don't have much free time myself, although once I do I'll definitely work on it.

(In reply to comment #77)

(In reply to comment #76)
> If I ever have free time again (read: probably not for a while), I offer to
> help Tyler address some of the issues with the patch.

That would be great. I don't have much free time myself, although once I do
I'll definitely work on it.

yeah, somebody should make a graph of number of commits to mediawiki by volunteers vs when school semester starts.

Qgil added a comment.Oct 24 2013, 8:35 PM

I'm delisting this project from https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Automatic_category_redirects since it looks like you are almost there.

Remove milestone 1.22 - Given that this has somewhat stalled due to lack of time on the part of interested parties, seems unlikely it could possibly make it to 1.22.

T77903 isn't a duplicate of this task, it's more of a relatively easy fix until we get those long standing difficult core problems solved, which may not be before a very long time.

This task was mentioned in https://www.mediawiki.org/w/index.php?title=Outreach_programs/Possible_projects&oldid=1404823#Very_raw_projects as a possible candidate for Google Summer of Code or similar programs. Do you think it is a good candidate?

Qgil added a comment.Feb 11 2015, 1:44 PM

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Qgil added a comment.Feb 17 2015, 9:39 AM

@Parent5446, this task is assigned to you. Do you want to work on it or propose it to the next GSoC/Outreachy round?

@Bawolff @Parent5446 Has this already been done? If not, is there interest in pushing this for upcoming GSoC round?

Nemo_bis added a comment.EditedMar 6 2015, 9:32 AM

Has this already been done?

No.

There is a partial patch from way back. See comments on gerrit.

(In reply to comment #63)

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Yes me :)

Does this still hold true @Bawolff?

In T5311#1095413, @NiharikaKohli wrote:

(In reply to comment #63)

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Yes me :)

Does this still hold true @Bawolff?

Not really. Like if there was a student who really super wanted to do this idea and no other, maybe, but generally I'd rather not mentor this idea, this round.

Qgil added a comment.Sep 23 2015, 9:10 AM

This is a message posted to all tasks under "Re-check in September 2015" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

Qgil added a comment.Sep 23 2015, 9:36 AM

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

This is the last call for Possible-Tech-Projects missing mentors. The application deadline for Outreachy-Round-11 is 2015-11-02. If this proposal doesn't have two mentors assigned by the end of Thursday, October 22, it will be moved as a candidate for the next round.

Interested in mentoring? Check the documentation for possible mentors.

As previously mentioned, this task is moved to 'Recheck in February 2016' as it doesn't have two mentors assigned to it as of today, October 23 - 2015. The project will be included in the discussion of next iteration of GSoC/Outreachy, and is excluded from #Outreachy-11. Potential candidates are discouraged from submitting proposals to this task for #Outreachy-11 as it lacks mentors in this round.

Billghost removed a subscriber: Billghost.
Sumit added a subscriber: Sumit.Feb 18 2016, 1:33 PM
NOTE: This task is a proposed project for Google-Summer-of-Code (2016) and Outreachy-Round-12 : GSoC 2016 and Outreachy round 12 is around the corner, and this task is listed as a Possible-Tech-Projects for the same. Projects listed for the internship programs should have a well-defined scope within the timeline of the event, minimum of two mentors, and should take about 2 weeks for a senior developer to complete. Interested in mentoring? Please add your details to the task description, if not done yet. Prospective interns should go through Life of a successful project doc to find out how to come up with a strong proposal for the same.
Niharika removed a subscriber: Niharika.Feb 19 2016, 3:21 AM
Meno25 removed a subscriber: Meno25.Feb 22 2016, 7:15 PM
Zppix moved this task from Unsorted to Working on on the Editing-Department board.Apr 26 2016, 2:31 PM
Sumit added a comment.Sep 10 2016, 9:49 PM

@Parent5446 this task has been assigned to you. Do you plan on working on this or mentoring this for the upcoming Outreachy-13 round?

Verdy_p added a subscriber: Verdy_p.EditedSep 19 2016, 9:47 AM

instead of changing the indexes for every member pages of a redirected category, couldn't there exist an index within target categories for listing the other category names that are redirected to it?
In that case, listing the content of the category would list all pages of the target category and of the source redirected categories, merged into a single list to sort.
This could however have an impact on performance if there are many redirected categories to list.

Note that the source category would still list their specific members (but to view them you have to visit these redirected categoties with redirect=no) which remain nidexed there and still need maintenance (using a job queue, or probably still preferably by using some auhtorized bots that may be paced down and will not run before the new redirect is accepted and not reverted) on all member pages to empty the contents of the redirected categories.

Note that most often, the redirected categories just come from an edit to rename it, while preserving its description and edit history into the new target category; however the member pages are still not recategorized. For this reason, renaming a category should still not create a new category with the old name but just the "#REDIRECT". it should better create by default a new category with a template showing a soft redirect by default, so that its members are still vieable in a single click, even if they are also listed in the merge list shown when viewing the target category.

Adding a hard redirect before the soft redirect template in the category description page should better delayed.

Or if possible, the #REDIRECT in a category should not be immediately followed when visiting it via any link on any page as long as it's not empty. (this does not require any edit on many articles and no change in the indexes on the database, just a small patch in MediaWiki to provide better navigation)

@Parent5446 this task has been assigned to you. Do you plan on working on this or mentoring this for the upcoming Outreachy-13 round?

I don't have time to work on it (my patch in Gerrit has been untouched for ages), but I can help mentor it since I remember most of the problems involved.

Qgil removed a subscriber: Qgil.Jan 7 2017, 10:50 AM

Looks like from the community wish survey results, that this feature is quite in high demand, see the updated task description :)

@Parent5446 How much work is remaining to be done? Would you be interested in mentoring this for GSOC 2017/ Outreachy Round 14?

This task was proposed in the Community-Wishlist-Survey-2016 and in its current state needs owner. Wikimedia is participating in Google Summer of Code 2017 and Outreachy Round 14. To the subscribers -- would this task or a portion of it be a good fit for either of these programs? If so, would you be willing to help mentor this project? Remember, each outreach project requires a minimum of one primary mentor, and co-mentor.

@Parent5446 Hi! We are in the process of recruiting tasks suitable for volunteers from the community wishlist for the Wikimedia-Hackathon-2017. This task seems like a good fit, but I'm wondering even if it gets picked up would you be around to answer any questions someone might have? Also, if someone claims it, would it be a start over? The hackathon is in May b/w 19-21.