Page MenuHomePhabricator

[Epic] Track metrics by categories
Closed, ResolvedPublic

Description

I made this task into an Epic so we can have subtasks for what we need to do here. Here's how I'm thinking of breaking it up:

  1. Frontend for supporting categories as inputs from users. (T194703: Add support for categories on the frontend)
    • Have an accordion for "Categories" under the current "Participants" accordion.
    • Tentative mock:
      image.png (656×1 px, 209 KB)
  2. Frontend for categories being fetched from database (T202762: Behavior on category save/pre-population)
  3. Backend for supporting categories as input for users.

Related Objects

Event Timeline

Niharika renamed this task from Track metrics by categories to [Epic] Track metrics by categories.May 13 2018, 3:15 AM
Niharika updated the task description. (Show Details)

This all seems doable. We'd need a new model for categories, that belongs to an event, and contains the category name and a foreign key to the event wiki. I think that's it for infrastructure. Then of course there's logic to check edits within the given categories. I did something similar for https://xtools.wmflabs.org/categoryedits and can steal the code.

@MusikAnimal Does splitting this into two tickets - one for backend and one for frontend make sense? I'm wondering if the backend should be split further.

Yes frontend won't be that bad either but should be a different ticket. For backend I guess it makes sense for there to be a ticket for the model changes (mostly busy work), and another for the statistics generation. I'm going to really want test coverage for this... which will probably add a few points but will pay off given the piece of mind that our stats are correct. I don't think T186917 is necessary for this but it would help.

Okay, so here's how I'm gonna split this --

  1. Frontend
  2. Backend - model changes
  3. Backend - stats generation
  4. Adding tests

Sound good?

This sounds like something that will also be useful for the Editathon Tool (which will likely piggy-back on Grant Metrics down the road). Just so I understand: this feature will let users pull stats about all activity in a certain category or group of categories, right?

Will users also be able to restrict the search by a particular time frame?

This sounds like something that will also be useful for the Editathon Tool (which will likely piggy-back on Grant Metrics down the road). Just so I understand: this feature will let users pull stats about all activity in a certain category or group of categories, right?

Yes, they can pull all activity if they don't provide a set of participants. Else, they can specify users and the tool will pull all activity for those users on pages in those categories.

Will users also be able to restrict the search by a particular time frame?

Yes. Specifying a time frame is a restriction for creating any event. Right now there is no overall restriction about the length of the event - it can be as long or as short as you like.

I think this ticket could benefit from Design attention. In particular, if we're automatically including subcategories, there is a job to be done UX-wise in informing users what subcategories just got added without them asking. If we don't provide some transparency into that, then I it seems quite likely (given the whack nature of Categories) that users will get a bunch of results they don't want. E.g., a Scott interested in the category Scottish People might not want the subcategory Fictional Scottish People, but if we don't tell him it's there he won't know why his metrics include stats for the article Connor MacLeod.

Furthermore, if we start automatically providing subcategories in the search, do we need the ability to let users deselect some of those automatic subcategories? Otherwise, people would have to figure out that they're getting unwanted results, then figure out where the unwanted results came from (how?), and then deselect "include subcategories" from the main category, then manually include whatever subcategories they originally DID want, leaving out he offending subcategory. Which sounds basically impossible.

Overall, I'd split off the whole idea of automatic subcategories into a separate task, which would begin with some design explorations. @Prtksxna what do you think?

Good points, @jmatazzoni. I do worry about listing all subcategories in plain text -- there could be thousands. I would suggest simply having a link to the category itself on-wiki, that way they can browse around and see what would be included.

I should mention "Include subcategories" (either all or a depth) is consistent with existing tooling used by event organizers and outreach, such as Massviews, Petscan and TreeViews.

if we start automatically providing subcategories in the search, do we need the ability to let users deselect some of those automatic subcategories?

I don't think this has come up but it would likely be useful. Petscan for instance does allow you enter "negative categories".

Overall, I'd split off the whole idea of automatic subcategories into a separate task

T200481

About listing the subcategories - I agree it'll be a useful feature to have. I share Leon's concern there that it can be way too many. Plus, we aren't storing them in the database so they'd have to be shown on demand which is different from everything else on the Event page so we'd have to come up with a good way to do that. Maybe a popup.

Furthermore, if we start automatically providing subcategories in the search, do we need the ability to let users deselect some of those automatic subcategories? Otherwise, people would have to figure out that they're getting unwanted results, then figure out where the unwanted results came from (how?), and then deselect "include subcategories" from the main category, then manually include whatever subcategories they originally DID want, leaving out he offending subcategory. Which sounds basically impossible.

That's a good idea Joe but I would ship the all subcategories feature first and then talk to our users to see if this is something they want before we put work into it.

If we don't provide some transparency into that, then I it seems quite likely (given the whack nature of Categories) that users will get a bunch of results they don't want. E.g., a Scott interested in the category Scottish People might not want the subcategory Fictional Scottish People, but if we don't tell him it's there he won't know why his metrics include stats for the article Connor MacLeod.
Furthermore, if we start automatically providing subcategories in the search, do we need the ability to let users deselect some of those automatic subcategories?

Yeah the users expectations might be different from the way the categories are actually organized on the wikis. While it might be difficult (impossible?) for us to smartly suggest which categories to exclude, we should surely have a way to exclude them if the user has figured out that they want to.

In the short term, @MusikAnimal's suggestion seems appropriate and inline with the users' experience from other tools they might be familiar with:

Petscan for instance does allow you enter "negative categories".

We'd need to look at how and where we want to surface this to the users.


That's a good idea Joe but I would ship the all subcategories feature first and then talk to our users to see if this is something they want before we put work into it.

Makes sense. I'd be happy to do a design exploration here. Or maybe, as you suggested — ship this first and then talk to users, or conduct a test to figure out if and how people need this.

! In T190463#4487089, @MusikAnimal wrote:

I do worry about listing all subcategories...there could be thousands.

Thousands sounds like we're taking the wrong approach. How many levels of subcategories are we planning to include?

I would advise not including very many, for a couple of reasons. My investigations have not been thorough. But what i've observed is that as you travel down the category tree, it's like a game of telephone: the more levels away from the original category you get, the more the original concept becomes distorted. E.g, take the category Romanticism (the theme of a recent editathon I saw a post for): five levels down, you get the category "Members of the Frankfurt parliament" (not a group known for their romantic tendencies, I presume).

Limiting the levels will also help with performance, of course. Also, the use case we're aiming at here is not that someone comes in with a broad category like Science. We are imagining more targeted categories, like Art&Feminism USA.

Three levels seems like a reasonable place to start to me. Enough to get past initial categorizing and down to the article level in most cases. And if someone wants to, they can always include a subcategory or two, then get three more levels after that.

Niharika claimed this task.

Pretty sure yes.