Page MenuHomePhabricator

Allow for subcategories to be included when parsing categories
Open, MediumPublic5 Estimated Story Points

Description

Talking to Sati today, this came up as a real use case that people use extensively. So we want to allow users to be able to select whether they want subcategories for a given category.

Tasks for this ticket:

Here's a rough mock of what it would look like:

image.png (660×1 px, 157 KB)

We should have some limits to prevent issues. Here's the proposed limits which we can tweak later as we test the feature:

Event Timeline

Niharika triaged this task as Medium priority.Jul 27 2018, 12:42 AM

I think having a limit on number of categories is redundant if we are having a limit on the number of pages we process. If we need one, I think 50 is a good place to start. The limits I put in the description are flexible and I'd prefer to go higher if we can.

I think having a limit on number of categories is redundant if we are having a limit on the number of pages we process. If we need one, I think 50 is a good place to start. The limits I put in the description are flexible and I'd prefer to go higher if we can.

The issue (I believe) is the [[ https://www.mediawiki.org/wiki/Manual:Categorylinks_table#cl_to | cl_to ]] field in the categorylinks table is a string, the actual category name. So if there are 10,000 subcategories, at say, 10 characters each, that's a big query! I don't really know this to be the core issue, just a theory based on some testing I did with Massviews. There I capped the categories at 5,000, and I couldn't find a parent category that made it break, so we'll try going with the same.

We'll just have to do more testing to see what limitations we need to impose.

@Mooeypoo @aezell @MusikAnimal I want us to estimate this ticket in our Tuesday meeting. If there is something missing in the task description, let me know before then.

This looks good, as a summary of the requirements. I think we still need to do more tests before we can settle on an implementation approach. It will probably end up being a mess. Estimate accordingly! :)

@Samwilson @Mooeypoo @MusikAnimal @MaxSem @aezell This ticket will be estimated in our Tuesday meeting. Let's discuss any potential concerns/missing information on the ticket ahead of that meeting.

I know some questions came up about filtering out some of the included subcategories. I think we should consider that for a future task and not part of this work.

I also think we can simplify this by saying that we won't show the users the subcategories that would be included. Maybe we could show them a count though?

I know some questions came up about filtering out some of the included subcategories. I think we should consider that for a future task and not part of this work.

I also think we can simplify this by saying that we won't show the users the subcategories that would be included. Maybe we could show them a count though?

That's a good idea but let's move that to be part of a future task too. That would mean adding more queries to do that count and whether we want to store that in the database and what the UI would look like etc.

We also want another task for giving the user a message if we stop after a certain number of categories.

From T194707#4492874 it seems setting a limit on the number of pages may not be that much of an issue. At least, the more efficient way to do it is search in categories via subquery, and MariaDB won't let you set a LIMIT in a subquery. We still need more testing but given it went through Category:Living people (870,000 pages) in a matter of seconds I'm mostly confident it can handle numerous smaller categories, which is what we would expect.

I also think we can simplify this by saying that we won't show the users the subcategories that would be included. Maybe we could show them a count though?

I recommend linking to the category wiki page, e.g. Category:Living people. This shows you the number of pages in the category and lets you browse through subcategories.

Niharika set the point value for this task to 5.Aug 14 2018, 11:31 PM
Niharika moved this task from To Be Estimated/Discussed to Estimated on the Community-Tech board.