https://test.wikipedia.org/wiki/Special:GadgetUsage Looks good. :)
Please add another column for "active users".
https://test.wikipedia.org/wiki/Special:GadgetUsage Looks good. :)
|Resolved||kaldari||T21288 Gadget usage statistics|
|Resolved||Niharika||T116894 Add recently active users count to Special:GadgetUsage|
Ideally it would use user_touched somehow (?), because that counts anyone who logs in. (Manual:User_table#user_touched)
That way, we would include readers, and semi-active editors, who log in just to have access to specific preferences/gadgets/tools/watchlist/customizations/notifications/etc.
If that isn't possible, then $wgActiveUserDays is good.
That's not too bad. It would add a few false positives, but would be including the people (non-editing readers and site-preference-users) whom we really do want to include, which $wgActiveUserEditCount wouldn't include.
Ah, right. That would be terrible for stats purposes then. Oh well. :(
@NiharikaKohli: It looks like Special:ActiveUsers stores the currently active users in the querycachetwo table (where qcc_type = 'activeusers'). The qcc_title field is the username. So you'll need to join with the querycachetwo table and do a count of all the usernames that are both gadget users and activeusers. This will be a fairly complicated query though, so you may have to find someone with serious SQL kung-fu to help (like @aaron or @jcrespo).
SELECT up_property, SUM(up_value) FROM user_properties WHERE up_property LIKE 'gadget-%' GROUP BY up_property
Now we want to add an additional column to the output for the number of 'active users' per gadget, where an active is user is someone who's made an edit in the last X days. This list of users is computed by another special page and stored in querycachetwo table, as pointed out by @kaldari above (it stores the user name only and not the user id) and I'm unable to figure out the right query for doing that. Help?
Joining on the revision table is another option, but it's likely to be costlier?
He're what I've got. It's not terribly fast, but it works:
SELECT up_property, SUM(up_value) FROM user_properties INNER JOIN ( SELECT rev_user FROM revision_userindex WHERE rev_timestamp BETWEEN "201510" AND "201511" GROUP BY rev_user HAVING COUNT(rev_id) >= 5 ) AS active_editor ON rev_user = up_user WHERE up_property LIKE 'gadget-%' GROUP BY up_property;
SELECT up_property, SUM(up_value) FROM user_properties WHERE up_user IN ( SELECT rev_user FROM revision WHERE rev_timestamp BETWEEN "201510" AND "201511" GROUP BY rev_user HAVING COUNT(rev_id) >= 5 ) AND up_property LIKE 'gadget-%' GROUP BY up_property;
I'm not sure which version is faster. I suppose we can substitute in the querycachetwo table, but the computation of active editors in the last month is actually the fastest part of the query (5 seconds in my tests) because it uses the user_timestamp index on revision and doesn't even need to read the table.
This query has the following assumptions:
- up_value = 0 means disabled, 1 means enabled
- querycachetwo reliably records the active users
- There is an existing user owning every existing user_property (do not ask me why, but this is false)
- I will not allow this to run on real time on page load, it is too expensive for medium and large size wikis. Here it is the results on enwikivoyage:
MariaDB DBSTORE localhost enwikivoyage > SELECT up_property, SUM(up_value), count(qcc_title) FROM user_properties LEFT JOIN user ON up_user = user_id LEFT JOIN querycachetwo ON user_name = qcc_title AND qcc_type = 'activeusers' AND up_value = 1 WHERE up_property LIKE 'gadget-%' GROUP BY up_property; +------------------------------+---------------+------------------+ | up_property | SUM(up_value) | count(qcc_title) | +------------------------------+---------------+------------------+ | gadget-Carousel | 0 | 0 | | gadget-DotsSyntaxHighlighter | 7 | 3 | | gadget-JS_toolset | 74 | 7 | | gadget-ListingEditor | 0 | 0 | | gadget-ListingEditor2 | 7 | 2 | | gadget-MapFrame | 0 | 0 | | gadget-RTRC | 109 | 15 | | gadget-RenameWizard | 0 | 0 | | gadget-UTCLiveClock | 164 | 10 | | gadget-edittop | 70 | 8 | | gadget-popups | 182 | 10 | | gadget-vector-headanchor | 33 | 1 | | gadget-wikEd | 102 | 4 | +------------------------------+---------------+------------------+ 13 rows in set (0.07 sec)
Ideally/eventually, we hope to be able to track changes over time (monthly stat dumps to csv?), and have an indication of which gadgets are on by default, and a few other parameters, too (per my and He7d3r's musings in T21288#240675 and below).
Re: "expensive", I saw the description at T115152 says
It would probably be an expensive query (at least on larger wikis), so it should have isExpensive() return true. See /includes/specials/SpecialMediaStatistics.php for example.
does that help at all?
Also, the data is cached for at least 24 hours, per the top-note at e.g. https://www.mediawiki.org/wiki/Special:GadgetUsage
(Mostly just in case you hadn't seen either of those. I know almost nothing of databases! :)
@jcrespo: Thanks, that's exactly what we need! A few replies to your comments...
up_value = 0 means disabled, 1 means enabled
This is correct.
querycachetwo reliably records the active users
This seems pretty much correct to me. The info is cached, but the cache gets updated every time a user makes an edit. We could check the querycache_info table first to make sure there is an entry with qci_type'= 'activeusers' (and possibly check the cache staleness as well). @aaron: Do you know of any reason we wouldn't be able to rely on querycachetwo having up to date active user data? Is there any check we should perform first?
There is an existing user owning every existing user_property (do not ask me why, but this is false)
Noted, although I'm hoping this is a rare case.
I will not allow this to run on real time on page load, it is too expensive for medium and large size wikis.
Is it OK as an expensive QueryPage (what the page is already marked as)? In theory we could skip joining against the user table if RecentChangesUpdateJob::updateActiveUsers() were changed to store the userID in the qcc_namespacetwo field. Right now it just stores 0 in all cases. @aaron: Would that be a terrible hack or a great idea? How often are the querycache fields abused for things other than their stated field names?
@NiharikaKohli: No, can't rename the table column since that table is already being used for tons of other stuff, and in most cases it is actually storing a second namespace (see https://en.wikipedia.org/wiki/Special:DisambiguationPageLinks for example). I have no idea why Special:ActiveUsers is caching results in querycachetwo instead of querycache though, as it's not making use of the 2 extra columns at all (which is the only difference in the two tables). As long as @jcrespo is OK with this running as a properly designated expensive query, I think we should go with the query that he suggested.
Excuse my lack of knowledge of the mediawiki terminology regarding Special pages. The query I sent is hackish, but I would be ok with it running with cron on terbium, but not on every page reload (Is that what you mean with "expensive query"?). It can run in real time (for each http request) for wikis ~ smaller in number of users than enwikivoyage.
@jcrespo: The QueryPage class (which is a special subclass of SpecialPage specifically tailored for executing database queries) has a method called IsExpensive(). If IsExpensive() returns true (as it is hardcoded to do in this case), and the wiki is set to run in "MiserMode" (https://www.mediawiki.org/wiki/Manual:$wgMiserMode), as all WMF wikis are, the actual database query for the special page is only run periodically and the results are normally pulled from a database cache (typically the querycache table). I don't remember what controls the expiry of expensive QueryPage caches, but I imagine @aaron does.
As a quick benchmark for how this affects performance of the query, I ran ...
mwscript maintenance/updateSpecialPages.php --wiki=frwiktionary --only=GadgetUsage
around 01:45 UTC today (before the code change) and will try running it again tomorrow for comparison.
Today's result was:
GadgetUsage [QueryPage] got 74 rows in 0.77s
Running the new query on French Wiktionary took about 3 times longer:
GadgetUsage [QueryPage] got 74 rows in 2.67s
Running the new query on Commons took quite a while, however:
GadgetUsage [QueryPage] SlowTimer [3486901ms] got 113 rows in 58m 6.96s