Page MenuHomePhabricator

Investigation: Fix Mr.Z-bot's popular pages report
Closed, ResolvedPublic3 Story Points

Description

Background info at T141154.
Existing code at https://github.com/alexz-enwp/popularpages
Existing interface at http://tools.wmflabs.org/popularpages/
Sample report: https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spiders/Popular_pages

Questions to answer:

  • Would it better to work off of the existing code in GitHub or start from scratch?
  • Does Mr.Z-man want to be involved in this project?
  • Should we use the new Pageviews API to get the pageview data?
  • Should we use PageAssessments for the assessment data? If so, do any changes need to be made to PageAssessments to have feature parity with the existing reports?
  • What improvements can be made to the tool (besides fixing it)?

Event Timeline

kaldari created this task.Dec 20 2016, 9:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 20 2016, 9:02 PM
kaldari triaged this task as Normal priority.Dec 20 2016, 9:02 PM
kaldari moved this task from Untriaged to To be estimated/discussed on the Community-Tech board.
kaldari set the point value for this task to 3.Dec 20 2016, 10:24 PM
kaldari moved this task from To be estimated/discussed to Estimated on the Community-Tech board.

Heard back from Alex. He says that he thinks rewriting the tool make sense. He also says that he might be willing to be involved, just not as the primary maintainer.

Should we use PageAssessments for the assessment data? If so, do any changes need to be made to PageAssessments to have feature parity with the existing reports?

Mr.Z-bot's popular pages report current supports Task Forces (basically WikiProject subprojects), but PageAssessments does not. This would probably require adding a new column to the page_assessments_projects table that is either a boolean to indicate whether or not a "project" is actually a task force (as these are handled differently by the WikiProject templates and we probably don't want to just lump them all together) or to record the "parent" of the task force (as a project ID). The template code change would look something like this: https://en.wikipedia.org/w/index.php?title=Template%3AWPBannerMeta%2Ftaskforce%2Fsandbox&type=revision&diff=750206083&oldid=748630796.

Niharika claimed this task.Jan 3 2017, 1:09 PM
Niharika edited projects, added Community-Tech-Sprint; removed Community-Tech.
Niharika moved this task from Ready to In Development on the Community-Tech-Sprint board.
  • Would it better to work off of the existing code in GitHub or start from scratch?
    • I spent a good while trying to understand the existing code but it's beyond comprehension, pretty much. Without any documentation, it's almost impossible to understand what's going on. As far as I know, the bot uses dumps to create and populate tables which are then queried for the reports.
  • Does Mr.Z-man want to be involved in this project?
    • Per Ryan's comment above, yes, to some extent.
  • Should we use the new Pageviews API to get the pageview data?
    • This seems to be the logical way to do it. The pageviews API does not yet have the data for Wikiproject pages. According to T141010: Adding top counts for wiki projects (ex: WikiProject:Medicine) to pageview API this is probably going to happen this quarter. A comment in the ticket says that this will probably give back total views for top 1000 pages or so, which might not be perfect for our needs.
    • Using the API will be more accurate than the dumps, most likely.
    • A comment also mentions that the way this will work is going by how enwiki names its Wikiprojects, for example, "Wikiproject:Medicine". This might cause an issue with how other projects title their wikiprojects. I am wondering if there's a way to standardize the term "Wikiproject" across projects much like how "User" or "Template" are. Maybe a way to do this is using wikidata (Wikimedia Portal)
    • Using the API will also provide us with more granular data - desktop and mobile views.
  • Should we use PageAssessments for the assessment data? If so, do any changes need to be made to PageAssessments to have feature parity with the existing reports?
    • We probably should. The PageAssessments API and database could come in useful. Besides the point made in T153790#2897180 I couldn't find anything else that'd need to change.
  • What improvements can be made to the tool (besides fixing it)?
    • Availability for all projects (not just enwiki)
    • Granular data (mobile view stats)
    • Data over a longer period of time, a year perhaps depending on whether pageviews API can provide that data (Requested on the author's talk page)

Next steps:

  1. We should push for T141010: Adding top counts for wiki projects (ex: WikiProject:Medicine) to pageview API to start with. It's seen a couple of quarters without any work, so it's likely low-priority for Analytics for now. We need to discuss if the best it can do is provide top 1000 pages (with continuation or not?) and if so, do we have a way of directly accessing the database and will that be any better?
  2. Second, we need to think about whether this makes more sense as an extension (Special page) which would work much like the tool. Currently the bot populates a subpage for each project that the bot is configured to run on. This functionality would then no longer be available. Is that a good or a bad thing?
    • The extension would have a config flag to turn it on/off depending on which wikis support wikiprojects.
  3. After that we can proceed to changing PageAssessments DB accordingly.

A comment in the ticket says that this will probably give back total views for top 1000 pages or so, which might not be perfect for our needs.

I don't think that will be an issue. The existing reports are limited to 1000 pages maximum with a default of 500.

Stevietheman added a comment.EditedJan 5 2017, 11:45 PM

I think it should stay on WikiProject subpages, generated monthly as it is now (I think that's what survey voters expected to keep, not have this product changed into something else, noting that changes in the program that generated it probably don't matter to the casual reader). A key to this report's value is its production _for_ WikiProjects. And I want to be able to point people to a page in projects I work for, not a separate tool page.

For WikiProject Louisville, I was using the report for outreach to the Louisville community, and it was generating real interest in Louisville-related articles. Also, I think folks were gradually incorporating its results into project processes. I sure know I was. Having this report appear in the format of our project, inside our project, I think is important for these things.

I also concur with DanielPenfield's comments in the 2016 Community Wishlist Survey proposal.

Stevietheman added a comment.EditedJan 6 2017, 12:06 AM

As for improvements, Niharika's ideas look useful to consider, and I also had one from March 15, 2015 that I submitted and didn't get acted upon. It's basically to show changes from month to month for each entry like how Billboard charts do it. Obviously, the first monthly generation wouldn't show these changes.

kaldari closed this task as Resolved.EditedJan 6 2017, 10:52 PM

@Stevietheman: Thanks for sharing your thoughts. I like the monthly changes idea and keeping it as a monthly static report seems reasonable.

Here's a Billboard chart for reference (you may want to turn down your speaker volume first): http://www.billboard.com/charts/hot-100.

DannyH moved this task from Estimated to Archive on the Community-Tech board.Jan 17 2017, 11:07 PM