Page MenuHomePhabricator

Fix page count by namespace report
Closed, ResolvedPublic

Description

Report for page count per namespace
Ref: https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Page_count_by_namespace

  • Code merged
  • Cronjob setup
  • Tested

Status: Fixed.

Event Timeline

Niharika claimed this task.
Niharika raised the priority of this task from to Medium.
Niharika updated the task description. (Show Details)
Niharika moved this task from New & TBD Tickets to Ready on the Community-Tech board.
Niharika moved this task from Ready to Needs Review/Feedback on the Community-Tech board.
Niharika added subscribers: kaldari, Legoktm, Krenair and 3 others.

Looks good. Here are my suggestions:

{{done}} https://test.wikipedia.org/wiki/Page_count_by_namespace

kaldari renamed this task from Page count by namespace report to Fix page count by namespace report.Aug 28 2015, 11:26 PM
kaldari closed this task as Resolved.
kaldari moved this task from Needs Review/Feedback to Done on the Community-Tech board.

Actually, I guess we have to complete the Bot permission request and start running it before this is technically finished.

Can we keep all the database reports code in the same repository (https://github.com/mzmcbride/database-reports)? I or @MZMcBride can give you access to the git repo and tool labs tool so you can set up and fix reports.

We already have a standardized class system for reports, similar to the QueryPage abstraction in MediaWiki, see the reports in https://github.com/mzmcbride/database-reports/tree/master/reports, it would be nice if future reports also followed that format.

Can we keep all the database reports code in the same repository (https://github.com/mzmcbride/database-reports)? I or @MZMcBride can give you access to the git repo and tool labs tool so you can set up and fix reports.

We already have a standardized class system for reports, similar to the QueryPage abstraction in MediaWiki, see the reports in https://github.com/mzmcbride/database-reports/tree/master/reports, it would be nice if future reports also followed that format.

That makes sense, and that was my original intention as well, but I didn't find any helpful documentation to ease my job. Can you add some documentation to the repo? Especially how to run and test specific scripts?

@MZMcBride, @Legoktm, I have some questions about the labs setup:

  1. There's a file bin/dbreps which does all the parsing and formatting work for a report, apparently. Why is this file not a part of the main database-reports directory? There's a recurring error about it not being able to concatenate strings and ints/longs which I'd like to fix in that file for once and for all.
  2. I want to test on wikis other than enwiki. How do I do that? What is data/project/enwiki folder for? It was a prerequisite when I tried running my report on testwiki.
  3. What is the purpose of ~/.dbreps.ini and what do the fields dumpdate and userdb mean?
In T110575#1606937, @NiharikaKohli wrote:

@MZMcBride, @Legoktm, I have some questions about the labs setup:

  1. There's a file bin/dbreps which does all the parsing and formatting work for a report, apparently. Why is this file not a part of the main database-reports directory? There's a recurring error about it not being able to concatenate strings and ints/longs which I'd like to fix in that file for once and for all.

https://github.com/mzmcbride/database-reports/blob/master/dbreps and you'll need to run some version of setup.py install to update the copy at bin/dbreps I think.

  1. I want to test on wikis other than enwiki. How do I do that? What is data/project/enwiki folder for? It was a prerequisite when I tried running my report on testwiki.

general type reports take the database name as an argument IIRC. I'm not sure what data/project/enwiki is for...

  1. What is the purpose of ~/.dbreps.ini and what do the fields dumpdate and userdb mean?

It has the bot's username/password and some other configuration. Those are for scripts that read dump files instead of using a database. grepping for "dumpdate" only shows one script that uses it though.

Looks good. Here are my suggestions:

  • Remove the 'No.' column as I don't see any useful purpose for it (although I know it is in the original version of the report)

For what it's worth, many/most database reports have this No. column. It's just basic enumeration, but that itself can be helpful to get an understanding of the size of a data set or to point to a particular row. This simple enumeration is common in spreadsheet programs and elsewhere, of course.

Actually, I guess we have to complete the Bot permission request and start running it before this is technically finished.

Yeah, https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Page_count_by_namespace should be updated to consider this task resolved, in my opinion.

@MZMcBride I could see the No. column be useful for some reports, but in reports like this one it is just confusing. For example, I first thought that it was the number of the namespace. It's also isn't very useful in reports that are already limited to a specific number of entries.

Bot request approved. Re-activated the cron job and updated the edit summary.