Implement "actual watchers" count into MediaWiki's info action
Closed, ResolvedPublic

Description

Add an "active watchers" count to MediaWiki's info action.

Not quite sure how we'd do this, but it seems potentially useful. On older wikis, the number of page watchers stat can quickly become meaningless without further context (i.e., a number in a vacuum doesn't mean much). If we limited the count to "active" users (defined by having made an edit or action in the past 30 days, I suppose), it might be more helpful.

From a suggestion here: https://en.wikipedia.org/w/index.php?title=User_talk:MZMcBride&oldid=559641049#Number_of_watchers.

Description of functionality now available in 1.26: https://www.mediawiki.org/wiki/MediaWiki_1.26#Information_of_actual_watchers_of_a_page


Version: 1.22.0
Severity: enhancement

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz49506.
MZMcBride created this task.Via LegacyJun 12 2013, 11:29 PM
Schnark added a comment.Via ConduitJun 13 2013, 9:49 AM

Just for reference: Dispenser's toolserver tool does this, too: https://toolserver.org/~dispenser/cgi-bin/watcher.py?page=de:Wikipedia:Hauptseite

Whatamidoing-WMF added a comment.Via ConduitJul 12 2013, 3:00 AM

IMO the older the public wiki projects get, the more important this will become.

Nemo_bis added a comment.Via ConduitJan 21 2014, 8:22 AM

I was going to file this today, :) I think it's a high-impact feature which should be high priority.
I hope it will be rather easy to implement: with [[mw:Manual:$wgShowUpdatedMarker]] enabled, we know exactly what's the last revision each watching user visited.

Requirements:

  • count the "watching watchers", i.e. users with the page in watchlist who visited it in the last 30 days;
  • add the count as "Number of actual watchers" under "Number of page watchers" in action=info;
  • hide it if it's lower than 30, unless the user has "unwatchedpages" permission.

We can adjust the numbers with later bugs, possibly reusing some config if an appropriate one is found ($wgRCMaxAge is not ok, can be years if one doesn't have database constraints).

John_of_Reading added a comment.Via ConduitOct 3 2014, 8:48 PM

@Nemo - If user A is watching page B, I don't think it should matter whether user A has *visited* page B recently. What's important is whether user A is still active at the site, and therefore is likely to notice edits to page B when they show up on via the watchlist.

I have many pages in my watchlist that I don't visit, but if they show up in my watchlist I'll check their recent history using popups.

Does the software track the last time that each user displayed his/her watchlist?

Nemo_bis added a comment.Via ConduitOct 3 2014, 8:55 PM

(In reply to john_of_reading from comment #4)

@Nemo - If user A is watching page B, I don't think it should matter whether
user A has *visited* page B recently. What's important is whether user A is
still active at the site, and therefore is likely to notice edits to page B
when they show up on via the watchlist.

Sure. And I think this likelihood correlates to actual visits more.

I have many pages in my watchlist that I don't visit, but if they show up in
my watchlist I'll check their recent history using popups.

And what does this say about the extent to which you notice edits there? There is another bug about making action=history visits count as visits btw.

Does the software track the last time that each user displayed his/her
watchlist?

No.

Whatamidoing-WMF added a comment.Via ConduitOct 3 2014, 11:06 PM

The problem with counting only people who visited the page in the last 30 days is that many pages aren't edited every 30 days, and thus even though I, as an active user keenly interested in that page, will definitely see each and every change ever made to that page, possibly within minutes, there may have been no reason at all for me to visit that low-traffic, low-edit page in the last year (much less than in the last 30 days).

I've got many pages on my watchlist that average one or two edits per year. The fact that they rarely appear in my watchlist does not mean that I would not notice them being edited.

Nemo_bis added a comment.Via ConduitOct 4 2014, 6:12 AM

That's easy to fix, step 1 in comment 3 becomes "check recent unvisited edits and if there are some how old/how many they are". We could discuss what's the most sensible filter for 60 more comments but the reality is that, if done, this will be done at first with the simplest filter possible for performance reasons and then improved in later steps.

Nemo_bis awarded a token.Via WebDec 12 2014, 8:21 AM
gerritbot added a subscriber: gerritbot.Via ConduitMar 2 2015, 4:17 PM

Change 193838 had a related patch set uploaded (by Nemo bis):
Attempt to count actual watchers in the info action

https://gerrit.wikimedia.org/r/193838

gerritbot added a project: Patch-For-Review.Via ConduitMar 2 2015, 4:17 PM
Ricordisamoa added a subscriber: Ricordisamoa.Via WebMar 2 2015, 4:30 PM
Nemo_bis added a comment.EditedVia WebMar 2 2015, 4:37 PM

No other implementation was proposed yet, so I went ahead and implemented what I had mentioned above.

Given concerns above about the possibility that one doesn't visit the page despite seeing the edit summary, for now I used 6 months of "absence" from the page as threshold: someone who doesn't visit an updated page for that long is very unlikely to be following it. It would be simpler to just consider $wgRcMaxAge though, because 1) the "recent editors" count and others do the same, 2) when one waits more than $wgRcMaxAge to check an edit, that edit disappears from recent changes and watchlist and is unlikely to be seen ever.

That sort of query is very simple and on a page with about a hundred watchers it takes 600 ms on translatewiki.net (thanks Nikerabbit):

MariaDB [translatewiki_net]> explain SELECT count(*) FROM bw_watchlist WHERE wl_namespace = 0 AND wl_title = 'Support' and
                   wl_notificationtimestamp <= '20150101000000';
+------+-------------+--------------+------+-----------------+-----------------+---------+-------------+------+------------------------------------+
| id   | select_type | table        | type | possible_keys   | key             | key_len | ref         | rows | Extra
                   |
+------+-------------+--------------+------+-----------------+-----------------+---------+-------------+------+------------------------------------+
|    1 | SIMPLE      | bw_watchlist | ref  | namespace_title | namespace_title | 261     | const,const |   66 | Using index
                   condition; Using where |
+------+-------------+--------------+------+-----------------+-----------------+---------+-------------+------+------------------------------------+
1 row in set (0.00 sec)
Liuxinyu970226 added a subscriber: Liuxinyu970226.Via WebMar 31 2015, 1:43 AM
Nemo_bis added a comment.Via WebMay 12 2015, 6:05 PM

valhallasw pointed out https://www.mediawiki.org/wiki/Manual:Watchlist_table#wl_notificationtimestamp didn't agree with my understanding of wl_notificationtimestamp; I (hopefully) corrected the manual, noting also very recent updates in 1.26 (T91284).

The main question is still what threshold to use. Maybe keep 6 months, but add a separate configuration setting to control it?

Reedy added a subscriber: Reedy.EditedVia WebMay 12 2015, 6:22 PM

For @Nemo_bis per IRC

mysql:wikiadmin@db1052 [enwiki]> explain SELECT count(*) FROM watchlist WHERE wl_namespace = 4 AND wl_title = 'Administrators\'_noticeboard/Incidents' and wl_notificationtimestamp <= '20141112000000';
+------+-------------+-----------+------+-----------------+-----------------+---------+-------------+-------+------------------------------------+
| id   | select_type | table     | type | possible_keys   | key             | key_len | ref         | rows  | Extra                              |
+------+-------------+-----------+------+-----------------+-----------------+---------+-------------+-------+------------------------------------+
|    1 | SIMPLE      | watchlist | ref  | namespace_title | namespace_title | 261     | const,const | 14048 | Using index condition; Using where |
+------+-------------+-----------+------+-----------------+-----------------+---------+-------------+-------+------------------------------------+
1 row in set (0.00 sec)

mysql:wikiadmin@db1052 [enwiki]>
Nemo_bis added a comment.Via WebMay 12 2015, 6:49 PM

Thanks! Scanning 14048 rows for such an extreme case sounds very reasonable, looks like there are no performance issues.

Glaisher added a subscriber: Glaisher.Via WebMay 15 2015, 12:58 PM
Nemo_bis added a subscriber: Darenwelsh.Via WebMay 24 2015, 1:25 PM

Semi-relatedly, NASA at T100061 just showed us Special:WatchAnalytics, which would be useful for wiki-wide information.

Dodger67 added a subscriber: Dodger67.EditedVia WebJun 19 2015, 8:09 AM

See https://en.wikipedia.org/wiki/Wikipedia_talk:Database_reports/WikiProject_watchers#Active_watchers_again
That page used to be updated by a bot on Toolserver. The criterion it used for "active watcher" was simply users who have logged in at least once in the previous 30 days. There is no need to see if the user visited or edited any particular page, it simply determined how many of the total page watchers are still active Wikipedians.

Nemo_bis claimed this task.Via WebJun 19 2015, 10:56 AM
Nemo_bis removed a project: Community-Tech.
Nemo_bis set Security to None.
Sitic added a subscriber: Sitic.Via WebJun 19 2015, 11:39 AM
Harej awarded a token.Via WebJun 25 2015, 4:03 PM
Harej added a subscriber: Harej.Via WebJun 25 2015, 4:07 PM

I think the criterion of logging in at least once in the past 30 days is a reasonable criterion. I am wary of referring to such users as "active users" however since "active users" is a WMF analytics term for users making 5+ edits in the past 30 days.

I think the criterion of logging in at least once in the past 30 days is a reasonable criterion.

How is the last time the user logged in tracked in WMF wikis? Note that WMF wikis have central auth, which means a user may be active in one project, but having watched pages on other projects where he no longer visits. This may count some watchers as "active" when they aren't, if the login action in one wiki propagates the last login timestamp on all wikis.

This also may interfere with T68699: Increase "remember me" login cookie expiry from 30 days to 1 year on Wikimedia wikis

Bawolff added a subscriber: Bawolff.Via WebWed, Jul 1, 10:25 AM

I think the criterion of logging in at least once in the past 30 days is a reasonable criterion. I am wary of referring to such users as "active users" however since "active users" is a WMF analytics term for users making 5+ edits in the past 30 days.

Not to mention Special:ActiveUsers having a different definition from either of them


Logging in does not necessarily imply looking at your watchlist. I log in to wikipedia all the time. I edit (content pages) almost never, and look at my watchlist even less.

gerritbot added a comment.Via ConduitWed, Jul 1, 10:59 AM

Change 193838 merged by jenkins-bot:
Attempt to count actual watchers in the info action

https://gerrit.wikimedia.org/r/193838

Glaisher added a project: user-notice.Via WebWed, Jul 1, 12:10 PM
Elitre added a subscriber: Elitre.Via WebWed, Jul 1, 12:30 PM
Harej added a comment.Via EmailWed, Jul 1, 12:42 PM

Doesn't CentralAuth work by automatically furnishing login credentials for individual wikis, meaning the concept of a local login still exists at some level? I know for example visiting a wiki you've never been to before causes a record to be created in the new account log.

MGChecker added a subscriber: MGChecker.Via WebThu, Jul 2, 10:24 AM
Restricted Application added a subscriber: Luke081515. · View Herald TranscriptVia HeraldThu, Jul 2, 10:24 AM
MGChecker added a comment.Via WebThu, Jul 2, 10:37 AM

It looks like you can't see the count of active watchers without unwatchedpages at all, even if there are more than 30. Definitely there isn't anything about active watchers if there are less than thirty watchers, as you can see here.

gpaumier moved this task to Announce in next Tech/News on the user-notice workboard.Via WebThu, Jul 2, 8:04 PM
gpaumier moved this task to In current Tech News draft on the user-notice workboard.Via WebThu, Jul 2, 8:54 PM
gpaumier moved this task to Recently announced in Tech/News on the user-notice workboard.Via WebFri, Jul 3, 11:45 PM
Nemo_bis added a comment.Via WebWed, Jul 8, 8:44 AM

It looks like you can't see the count of active watchers without unwatchedpages at all, even if there are more than 30. Definitely there isn't anything about active watchers if there are less than thirty watchers, as you can see here.

Beta is not suitable to test real-life situations. https://www.mediawiki.org/w/index.php?title=Project:Support_desk&action=info works.

Bawolff added a comment.Via WebWed, Jul 8, 8:46 AM

Definitely there isn't anything about active watchers if there are less than thirty watchers, as you can see here.

that's intentional.

Nemo_bis edited the task description. (Show Details)Via WebWed, Jul 8, 9:07 AM
gpaumier moved this task to Archive on the user-notice workboard.Via WebWed, Jul 8, 10:15 PM
Kipod added a subscriber: Kipod.Via WebWed, Jul 8, 11:33 PM

some time ago i received "sysop" perms on test.wikipedia.org.

i decided to make some use of these permissions - lo and behold, i can see "active watchers" now. kudos.

however, i could not figure out how to extract the "active watchers" datum via the API ( {action: 'query', prop: 'info' } when using the apisandbox, and i did not see any documentation regarding this column.

is it available via the api ? anyone cares to instruct me how to access it? IOW, what inprop will cause it to appear, and what is the item name?
(if it's simply _not_ accessible via API, maybe "someone" should open a separate ticket to fix it).

thanks a bunch,
peace.

Bawolff added a comment.Via WebWed, Jul 8, 11:43 PM

It is not yet available via the API. That is a future todo

It is not yet available via the API. That is a future todo

see T105392 (and comment on it as needed).

peace.

Nemo_bis added a comment.Via WebFri, Jul 10, 5:28 PM

https://en.wikipedia.org/?curid=3252662&action=info works (644 / 2835), https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard/Incidents&action=info not quite; probably only a handful pages in total across all the wikis fail.

Should we catch the DB timeout and carry on? Reduce the data involved/rows scanned (would reducing the timespan even help?)?

Nemo_bis closed this task as "Resolved".Via WebWed, Jul 15, 8:39 AM

That was split to T105852, so I think all issues found are tracked elsewhere. Marking this fixed.

Liuxinyu970226 removed a subscriber: Liuxinyu970226.Via WebWed, Jul 15, 9:22 AM

Add Comment