Page MenuHomePhabricator

Implement "actual watchers" count into MediaWiki's info action
Closed, ResolvedPublic

Description

Add an "active watchers" count to MediaWiki's info action.

Not quite sure how we'd do this, but it seems potentially useful. On older wikis, the number of page watchers stat can quickly become meaningless without further context (i.e., a number in a vacuum doesn't mean much). If we limited the count to "active" users (defined by having made an edit or action in the past 30 days, I suppose), it might be more helpful.

From a suggestion here: https://en.wikipedia.org/w/index.php?title=User_talk:MZMcBride&oldid=559641049#Number_of_watchers.

Description of functionality now available in 1.26: https://www.mediawiki.org/wiki/MediaWiki_1.26#Information_of_actual_watchers_of_a_page


Version: 1.22.0
Severity: enhancement

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:44 AM
bzimport set Reference to bz49506.
bzimport added a subscriber: Unknown Object (MLST).

IMO the older the public wiki projects get, the more important this will become.

I was going to file this today, :) I think it's a high-impact feature which should be high priority.
I hope it will be rather easy to implement: with [[mw:Manual:$wgShowUpdatedMarker]] enabled, we know exactly what's the last revision each watching user visited.

Requirements:

  • count the "watching watchers", i.e. users with the page in watchlist who visited it in the last 30 days;
  • add the count as "Number of actual watchers" under "Number of page watchers" in action=info;
  • hide it if it's lower than 30, unless the user has "unwatchedpages" permission.

We can adjust the numbers with later bugs, possibly reusing some config if an appropriate one is found ($wgRCMaxAge is not ok, can be years if one doesn't have database constraints).

@Nemo - If user A is watching page B, I don't think it should matter whether user A has *visited* page B recently. What's important is whether user A is still active at the site, and therefore is likely to notice edits to page B when they show up on via the watchlist.

I have many pages in my watchlist that I don't visit, but if they show up in my watchlist I'll check their recent history using popups.

Does the software track the last time that each user displayed his/her watchlist?

(In reply to john_of_reading from comment #4)

@Nemo - If user A is watching page B, I don't think it should matter whether
user A has *visited* page B recently. What's important is whether user A is
still active at the site, and therefore is likely to notice edits to page B
when they show up on via the watchlist.

Sure. And I think this likelihood correlates to actual visits more.

I have many pages in my watchlist that I don't visit, but if they show up in
my watchlist I'll check their recent history using popups.

And what does this say about the extent to which you notice edits there? There is another bug about making action=history visits count as visits btw.

Does the software track the last time that each user displayed his/her
watchlist?

No.

The problem with counting only people who visited the page in the last 30 days is that many pages aren't edited every 30 days, and thus even though I, as an active user keenly interested in that page, will definitely see each and every change ever made to that page, possibly within minutes, there may have been no reason at all for me to visit that low-traffic, low-edit page in the last year (much less than in the last 30 days).

I've got many pages on my watchlist that average one or two edits per year. The fact that they rarely appear in my watchlist does not mean that I would not notice them being edited.

That's easy to fix, step 1 in comment 3 becomes "check recent unvisited edits and if there are some how old/how many they are". We could discuss what's the most sensible filter for 60 more comments but the reality is that, if done, this will be done at first with the simplest filter possible for performance reasons and then improved in later steps.

Change 193838 had a related patch set uploaded (by Nemo bis):
Attempt to count actual watchers in the info action

https://gerrit.wikimedia.org/r/193838

No other implementation was proposed yet, so I went ahead and implemented what I had mentioned above.

Given concerns above about the possibility that one doesn't visit the page despite seeing the edit summary, for now I used 6 months of "absence" from the page as threshold: someone who doesn't visit an updated page for that long is very unlikely to be following it. It would be simpler to just consider $wgRcMaxAge though, because 1) the "recent editors" count and others do the same, 2) when one waits more than $wgRcMaxAge to check an edit, that edit disappears from recent changes and watchlist and is unlikely to be seen ever.

That sort of query is very simple and on a page with about a hundred watchers it takes 600 ms on translatewiki.net (thanks Nikerabbit):

MariaDB [translatewiki_net]> explain SELECT count(*) FROM bw_watchlist WHERE wl_namespace = 0 AND wl_title = 'Support' and
                   wl_notificationtimestamp <= '20150101000000';
+------+-------------+--------------+------+-----------------+-----------------+---------+-------------+------+------------------------------------+
| id   | select_type | table        | type | possible_keys   | key             | key_len | ref         | rows | Extra
                   |
+------+-------------+--------------+------+-----------------+-----------------+---------+-------------+------+------------------------------------+
|    1 | SIMPLE      | bw_watchlist | ref  | namespace_title | namespace_title | 261     | const,const |   66 | Using index
                   condition; Using where |
+------+-------------+--------------+------+-----------------+-----------------+---------+-------------+------+------------------------------------+
1 row in set (0.00 sec)

valhallasw pointed out https://www.mediawiki.org/wiki/Manual:Watchlist_table#wl_notificationtimestamp didn't agree with my understanding of wl_notificationtimestamp; I (hopefully) corrected the manual, noting also very recent updates in 1.26 (T91284).

The main question is still what threshold to use. Maybe keep 6 months, but add a separate configuration setting to control it?

For @Nemo_bis per IRC

mysql:wikiadmin@db1052 [enwiki]> explain SELECT count(*) FROM watchlist WHERE wl_namespace = 4 AND wl_title = 'Administrators\'_noticeboard/Incidents' and wl_notificationtimestamp <= '20141112000000';
+------+-------------+-----------+------+-----------------+-----------------+---------+-------------+-------+------------------------------------+
| id   | select_type | table     | type | possible_keys   | key             | key_len | ref         | rows  | Extra                              |
+------+-------------+-----------+------+-----------------+-----------------+---------+-------------+-------+------------------------------------+
|    1 | SIMPLE      | watchlist | ref  | namespace_title | namespace_title | 261     | const,const | 14048 | Using index condition; Using where |
+------+-------------+-----------+------+-----------------+-----------------+---------+-------------+-------+------------------------------------+
1 row in set (0.00 sec)

mysql:wikiadmin@db1052 [enwiki]>

Thanks! Scanning 14048 rows for such an extreme case sounds very reasonable, looks like there are no performance issues.

Semi-relatedly, NASA at T100061 just showed us Special:WatchAnalytics, which would be useful for wiki-wide information.

See https://en.wikipedia.org/wiki/Wikipedia_talk:Database_reports/WikiProject_watchers#Active_watchers_again
That page used to be updated by a bot on Toolserver. The criterion it used for "active watcher" was simply users who have logged in at least once in the previous 30 days. There is no need to see if the user visited or edited any particular page, it simply determined how many of the total page watchers are still active Wikipedians.

Nemo_bis removed a project: Community-Tech.
Nemo_bis set Security to None.

I think the criterion of logging in at least once in the past 30 days is a reasonable criterion. I am wary of referring to such users as "active users" however since "active users" is a WMF analytics term for users making 5+ edits in the past 30 days.

I think the criterion of logging in at least once in the past 30 days is a reasonable criterion.

How is the last time the user logged in tracked in WMF wikis? Note that WMF wikis have central auth, which means a user may be active in one project, but having watched pages on other projects where he no longer visits. This may count some watchers as "active" when they aren't, if the login action in one wiki propagates the last login timestamp on all wikis.

This also may interfere with T68699: Increase "remember me" login cookie expiry from 30 days to 1 year on Wikimedia wikis

I think the criterion of logging in at least once in the past 30 days is a reasonable criterion. I am wary of referring to such users as "active users" however since "active users" is a WMF analytics term for users making 5+ edits in the past 30 days.

Not to mention Special:ActiveUsers having a different definition from either of them


Logging in does not necessarily imply looking at your watchlist. I log in to wikipedia all the time. I edit (content pages) almost never, and look at my watchlist even less.

Change 193838 merged by jenkins-bot:
Attempt to count actual watchers in the info action

https://gerrit.wikimedia.org/r/193838

Doesn't CentralAuth work by automatically furnishing login credentials for individual wikis, meaning the concept of a local login still exists at some level? I know for example visiting a wiki you've never been to before causes a record to be created in the new account log.

It looks like you can't see the count of active watchers without unwatchedpages at all, even if there are more than 30. Definitely there isn't anything about active watchers if there are less than thirty watchers, as you can see here.

It looks like you can't see the count of active watchers without unwatchedpages at all, even if there are more than 30. Definitely there isn't anything about active watchers if there are less than thirty watchers, as you can see here.

Beta is not suitable to test real-life situations. https://www.mediawiki.org/w/index.php?title=Project:Support_desk&action=info works.

Definitely there isn't anything about active watchers if there are less than thirty watchers, as you can see here.

that's intentional.

some time ago i received "sysop" perms on test.wikipedia.org.

i decided to make some use of these permissions - lo and behold, i can see "active watchers" now. kudos.

however, i could not figure out how to extract the "active watchers" datum via the API ( {action: 'query', prop: 'info' } when using the apisandbox, and i did not see any documentation regarding this column.

is it available via the api ? anyone cares to instruct me how to access it? IOW, what inprop will cause it to appear, and what is the item name?
(if it's simply _not_ accessible via API, maybe "someone" should open a separate ticket to fix it).

thanks a bunch,
peace.

It is not yet available via the API. That is a future todo

It is not yet available via the API. That is a future todo

see T105392 (and comment on it as needed).

peace.

https://en.wikipedia.org/?curid=3252662&action=info works (644 / 2835), https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard/Incidents&action=info not quite; probably only a handful pages in total across all the wikis fail.

Should we catch the DB timeout and carry on? Reduce the data involved/rows scanned (would reducing the timespan even help?)?

That was split to T105852, so I think all issues found are tracked elsewhere. Marking this fixed.