Page MenuHomePhabricator

Add column for MediaWiki namespace edits to adminstats
Open, LowPublic

Description

Originally reported at: https://github.com/x-tools/xtools-legacy/issues/59

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
MusikAnimal removed a project: Internet-Archive.
MusikAnimal moved this task from Backlog to Other on the XTools board.
MusikAnimal added a subscriber: MusikAnimal.

I will admit this is not a straightforward task, especially with the new architectural design of Admin Stats. Basically everything revolves around logged actions (in the logging table). We'd have to add some special code to do an edit count on the MediaWiki namespace. I suspect this could be really slow, too.

@MusikAnimal I don't think it would be that slow - see this Quarry query - it took 0.53 seconds to see that you have made 388 edits in the MediaWiki namespace (with "AND rev_user_text = MusikAnimal") and 4.84 seconds to get the edit count of all users to have edited in the MediaWiki namespace (1309 editors, including a lot of IPs)

@MusikAnimal I don't think it would be that slow - see this Quarry query - it took 0.53 seconds to see that you have made 388 edits in the MediaWiki namespace (with "AND rev_user_text = MusikAnimal") and 4.84 seconds to get the edit count of all users to have edited in the MediaWiki namespace (1309 editors, including a lot of IPs)

It took 8 seconds on my try (maybe you ran the same query a few times and it was cached). But regardless, that's only one user. We need to check the number of interface edits from all users, and group it by user, something like:

SELECT rev_user_text, COUNT(rev_id) AS count
FROM revision_userindex
JOIN page ON rev_page = page_id
WHERE page_namespace = 8
AND rev_timestamp BETWEEN 20180101000000 AND 20190101000000
GROUP BY rev_user_text
ORDER BY count DESC

This is for the max 1-year time span, and it actually is fast enough. So from there it's just a matter of intertwining it into the data structure used for the log counts, which shouldn't be terrible.

I think this may be feasible! I'll try to take this on in the coming days, unless someone beats me to it :)

I don't think we need to limit the timespan. Also, what is the "revision_userindex" table?

For reference, my code was:

SELECT rev_user_text, COUNT(*) FROM revision
WHERE rev_page IN (SELECT page_id FROM page WHERE page_namespace = 8)
GROUP BY rev_user_text

I don't think we need to limit the timespan.

We do because the Admin Stats is given a date range, so the counting needs to also be within that date range.

Also, what is the "revision_userindex" table?

_userindex views should be used if you're filtering by user: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs