Page MenuHomePhabricator

Add a new filter for Special:RecentChanges to display changes made to unwatched pages
Open, Stalled, Needs TriagePublicFeature

Description

Summary: Please add a new filter for Special:RecentChanges to filter changes made to unwatched pages.

Issue: Special:UnwatchedPages is barely helpful for administrators on large wikis (e.g. eswiki) where it has thousands and thousands of entries (the cache is limited to 5,000 results as well). It was suggested on-wiki that it'd be more useful if we could patrol changes to these pages via Special:RecentChanges or, in the alternative, a dedicated special page.

Request: For users with unwatchedpages permissions, please allow a new filter option on Special:RecentChanges to list/filter/highlight changes made to unwatched pages.

Longer description

Approximately 5-7% of non-redirect Wikipedia articles are on zero users' watchlists. This presents a potential vandalism risk - edits to pages on user's watchlists are more likely to be viewed by those users, and therefore vandalism is more likely to be caught. Edits to pages with no page watchers are less likely to be noticed. Therefore, it would be helpful to be able to monitor edits made to pages with few or no watchers.

Given that this Recent Changes filter would be visible to all users, this presents a related risk - vandals could use this data/filter to find pages to target. This is why, currently, the exact number of watchers a given page has is visible only to users with the unwatchedpages right (generally just administrators) if the value is less than 30, along with Special:UnwatchedPages. Past discussions have indicated that this may not be as risky for a Recent Changes filter - any pages showing up in this view as not being watched have just been edited by someone, so there is an increased likelihood that any subsequent vandalism would be noticed.

However, to help address this problem, we could:

  • Limit API access to this data to users who have the unwatchedpages user right.
  • Prevent this data from being copied to the database replicas. This would prevent tools like Quarry from accessing the data.

Details

Related Changes in Gerrit:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I was always told that the unwatched pages list was restricted to administrators for security reasons (so that those pages were less tempting for vandals to attack), although I am the first to admit it's essentially useless given the number of pages involved. on large wikis. Would this also mean that the "Page Information" would now include the correct number of page watchers, which I understand is now not included for non-admins if it is lower than 5?

Please don't take this as a critique of the idea; having those pages flagged for RC patrol makes sense to me. Perhaps I might suggest pages that have a low number of watchers (5 or less), though. And I do think it is important to openly talk about the change in philosophy that takes us from "only admins should be aware of these pages" to "anyone who looks at the RC patrol page should be aware of these pages", since it is a big change.

I was always told that the unwatched pages list was restricted to administrators for security reasons (so that those pages were less tempting for vandals to attack), although I am the first to admit it's essentially useless given the number of pages involved. on large wikis. Would this also mean that the "Page Information" would now include the correct number of page watchers, which I understand is now not included for non-admins if it is lower than 5?

Please don't take this as a critique of the idea; having those pages flagged for RC patrol makes sense to me. Perhaps I might suggest pages that have a low number of watchers (5 or less), though. And I do think it is important to openly talk about the change in philosophy that takes us from "only admins should be aware of these pages" to "anyone who looks at the RC patrol page should be aware of these pages", since it is a big change.

Good questions! The 'Page information' data won't change, it will still not show a specific number below its current value, which I think is 30.

The way we're implementing the data storage for this filter (T20790) is to store the actual page watcher count, not just whether the page is being watched at all. As I understand it that opens up the possibility of the filter being 'no watchers', 'under 5 watchers', or any other value, so I'm open to thoughts on what that should be.

We did consider the security angle, but a couple of things led me to conclude that this feature would be low-risk from an 'encouraging vandalism' point of view. The first is discussions like this one where this feature has been proposed and discussed. I appreciate that was 15 years ago now, and perhaps opinions have changed, but what I saw there was relatively low concern about this angle, and a consensus that the benefits outweighed the negatives. The second aspect is that the Unwatched pages Special page is a blanket list of all pages with no watchers, regardless of whether they've been edited recently. It would be possible to pick a page that no editor has looked at for some time. With RecentChanges, especially on big wikis, vandals will only see pages where someone has just edited the page, and is therefore more likely to be engaged in tracking subsequent edits. It's also a page that will show up twice in quick succession for patrollers, once for the unwatched edit, and once for the followup vandalism. Do you think I'm drawing the right conclusion here, or do you still have concerns?

@Samwalton9-WMF I'm not Risker, but -- thinking out loud here -- perhaps it's worth adding this behind a config variable, and/or adding a config variable to determine who has access to the new RC filter (so that a wiki could, e.g., choose to restrict its use to autoconfirmed users/sysops/another user group)? Your mention of "especially on big wikis" made me think about small wikis with less counter-vandalism/RecentChanges patrollers - obviously I wouldn't be able to be certain about this without doing some research into it, but it seems possible that - when there aren't as many RC patrollers - the benefits of 'more RC-patroller eyes on unwatched pages' may not outweigh the drawbacks of 'vandals/malicious actors can discover which pages are unwatched'. I can also imagine it being possible that third-party wikis may not want this functionality available to everyone for a similar reason (though I'm not confident one way or the other on that particular front).

Thanks for your response, @Samwalton9-WMF. My suggestions would be as follows:

  • Make the flag "less than 10 watchers". That will cover the contingencies.
  • Apply this only on wikis that have more than 100 very active editors (more than 100 edits/month) AND more than 500 edits/day average. On probably 90% of our projects, almost every page will have fewer than 10 watchers, and most will have only one or two. In other words, if 90% or more of edits in the RC patrol page will wind up getting this flag, it's just clutter on the page for those projects.

To be honest, I've never really bought the "security issue" related to unwatched pages; edits to those pages still show up in RC patrols. On the other hand, I'm not sure of the history of when RC patrol pages were created and wonder if it was some time after the decision to make unwatched pages admin-only. I know by the time I started editing in 2005, RC patrol was one of the primary "entry points" for new accounts. After 25 years, there are a lot of remnants that were based on situations that are no longer germane.

Thanks for your response, @Samwalton9-WMF. My suggestions would be as follows:

  • Make the flag "less than 10 watchers". That will cover the contingencies.
  • Apply this only on wikis that have more than 100 very active editors (more than 100 edits/month) AND more than 500 edits/day average. On probably 90% of our projects, almost every page will have fewer than 10 watchers, and most will have only one or two. In other words, if 90% or more of edits in the RC patrol page will wind up getting this flag, it's just clutter on the page for those projects.

After we merge T20790 it will be very easy to write some data queries to assess what % of RecentChanges entries, on various projects, have fewer than X watchers. I think that will help us decide where this threshold should be, and how that threshold might vary by wiki, if needed.

I agree that this filter probably isn't so useful, especially at a high threshold, on smaller wikis, but if that's the case I don't think it's worth the effort to specifically exclude them from having the filter. That list already has 22 entries (30 with ORES), so adding one extra isn't a huge deal in my opinion.

After we merge T20790 it will be very easy to write some data queries to assess what % of RecentChanges entries, on various projects, have fewer than X watchers. I think that will help us decide where this threshold should be, and how that threshold might vary by wiki, if needed.

Does that also mean that this data would be available via Quarry etc.? Because in that case, it is a definite no-go. If this data is generally not available in APIs unless you have permissions, it should not be available in data replicas.

As for the general thing, it does feel like allowing to circumvent ‘Lower than 30 watchers’ restriction set by MediaWiki. The positive is that it would also allow experienced users to track those kinds of changes, but it does feel like a sort of filter that could be abused by people to collect that data even though it is not available for a reason. Allowing it behind a user right would be fair, but that user right can’t be autoconfirmed, even, since that is easily gamed.

After we merge T20790 it will be very easy to write some data queries to assess what % of RecentChanges entries, on various projects, have fewer than X watchers. I think that will help us decide where this threshold should be, and how that threshold might vary by wiki, if needed.

Does that also mean that this data would be available via Quarry etc.? Because in that case, it is a definite no-go. If this data is generally not available in APIs unless you have permissions, it should not be available in data replicas.

As for the general thing, it does feel like allowing to circumvent ‘Lower than 30 watchers’ restriction set by MediaWiki. The positive is that it would also allow experienced users to track those kinds of changes, but it does feel like a sort of filter that could be abused by people to collect that data even though it is not available for a reason. Allowing it behind a user right would be fair, but that user right can’t be autoconfirmed, even, since that is easily gamed.

I believe it would be, but I can confirm. Do you see vandals going to this level of effort to figure out where to vandalise? I figured that on most wikis you can just click Random Page and chances are you'll land on a page that very few people, if anyone, is actively watching, which seems like it would lead to higher risk than a page that was recently edited.

"I figured that on most wikis you can just click Random Page and chances are you'll land on a page that very few people, if anyone, is actively watching, which seems like it would lead to higher risk than a page that was recently edited."

Let's take a step back here. If almost all pages have fewer than 30 watchers, even on large wikis....then one can assume that the vast majority of recent changes are to pages that don't have 30 or more watchers. What is the benefit of flagging almost every edit in the Recent Changes list? (There should be some stats available about the percentage of recent changes involving low-watched articles; do we have that?)

Even not considering that, if this restriction on who can see this data is useless, then it should be dropped entirely, not just in RC filters. But for that, I think, asking the community first would be needed. So I am not a fan of doing the same by proxy, given that this restriction clearly exists for some reason and it should not be chipped away in one component while being present in the rest of MW (most notably, on action=info page).

Some initial data on page watching, based on my analysis which should always come with an "is not a data analyst" warning:

  • On English Wikipedia there are 18,281,692 main namespace pages, 6,956,877 of which are not redirects.
  • 1,252,313 non-redirect content pages (18%) have 10 or more page watchers.
  • 5,704,564 non-redirect content pages (82%) have fewer than 10 page watchers.
  • 516,518 non-redirect content pages (7%) have 0 page watchers.

Some additional data points:

  • 6,568,993 non-redirect content pages (94%) have fewer than 30 page watchers (this is the number of pages where I believe you can't currently get the precise number through Page Information).

I checked the numbers above also on some other wikis, including a couple of smaller ones. The fraction of pages with 0 watchers was broadly the same (dewiki: 5%; trwiki: 7%; azwiki: 7%, though skwiki was a randomly-selected outlier at 34%). As expected, the fraction of pages with more than an absolute number of page watchers decreases as wiki size decreases. Just 0.6% of azwiki's articles have more than 10 page watchers, as compared to enwiki's 18%.

There should be some stats available about the percentage of recent changes involving low-watched articles; do we have that?

I started looking at this but it's somewhat beyond my personal querying capabilities so I might need to find an analyst to support. It will be hard to get accurately, because if a user started watching a page after an edit was made, that would inflate the page watcher data. If we went ahead and merged the patch to start tracking page watcher count in the RC table, perhaps on a trial basis, we could get this data very easily.

Thanks for this response, Sam. Really appreciate the work you have put in on the statistical review, because it tells us a lot. (I am surprised that there are so many redirects, at roughly 2 redirects per actual article.)

Let's suppose that the threshold for the flag is set to fewer than 10 watchers - information that already is not available to non-admins. It does not seem to me that a RC flag that shows up on 80% or more of all edits in the queue is going to be that useful for RC patrollers - and that percentage would be even higher on smaller projects. All of those edits show up in the RC queues today without a "few watchers" flag. Do we have reason to believe that the edits to unwatched or low-watched pages are any less likely to be reviewed by a RC patroller? Do we have reason to believe that more unreverted vandalism or otherwise problem edits occurs on those pages than on more highly watched pages?

(As a point of interest only, and not related to this project, it would be interesting to know the average page views for those unwatched pages on enwiki. Don't feel in any way pressured to provide this.)

Thanks for this response, Sam. Really appreciate the work you have put in on the statistical review, because it tells us a lot. (I am surprised that there are so many redirects, at roughly 2 redirects per actual article.)

Let's suppose that the threshold for the flag is set to fewer than 10 watchers - information that already is not available to non-admins. It does not seem to me that a RC flag that shows up on 80% or more of all edits in the queue is going to be that useful for RC patrollers - and that percentage would be even higher on smaller projects. All of those edits show up in the RC queues today without a "few watchers" flag.

That's true, though <10 watchers doesn't need to be the threshold. Originally we were aiming for 0 watchers, which doesn't sound unreasonable at 7% of articles, seemingly consistently across project sizes. As you pointed out earlier, though, it's not currently clear if this is also the % of unwatched articles that are actually edited.

One other aspect to consider here is that the data I pulled is all page watchers, not active page watchers. Some (probably substantial) fraction of those numbers will be people who don't edit anymore.

Do we have reason to believe that the edits to unwatched or low-watched pages are any less likely to be reviewed by a RC patroller? Do we have reason to believe that more unreverted vandalism or otherwise problem edits occurs on those pages than on more highly watched pages?

I don't think we have a quantitative indication of this - it would be hard to extract 'page was watched' as a clear variable, I suspect, there are so many other correlated variables that would impact edits being patrolled on pages with high/low numbers of page watchers. Qualitatively, though, we do know that users use their watchlist, and revert vandalism when they see it there, so it stands to reason that edits on low-watched pages are less likely to be reviewed.

I checked the numbers above also on some other wikis, including a couple of smaller ones. The fraction of pages with 0 watchers was broadly the same (dewiki: 5%; trwiki: 7%; azwiki: 7%, though skwiki was a randomly-selected outlier at 34%)

Ah, figured it out - https://sk.wikipedia.org/wiki/Redaktor:Wizzo-Bot, a bot automatically creating pages.

Per T20790#10587213, we have determined that we could both prevent this data from appearing in the wiki replicas, and limit access to it via the API. I think this should address both of the major avenues through which a bad actor could generate lists of pages to target.

I suggest we go ahead with deploying this filter (with these two safeguards in place), with the filter set at 0 page watchers, and then monitor data on how many pages are appearing via this filter on a selection of wikis. If it's a very low number, we could consider increasing the filter to, say, 3 or 5 page watchers. If we see an influx of vandalism on these pages then we can always revert our changes, but this seems unlikely to me.

Per T20790#10587213, we have determined that we could both prevent this data from appearing in the wiki replicas, and limit access to it via the API. I think this should address both of the major avenues through which a bad actor could generate lists of pages to target.

I suggest we go ahead with deploying this filter (with these two safeguards in place), with the filter set at 0 page watchers, and then monitor data on how many pages are appearing via this filter on a selection of wikis. If it's a very low number, we could consider increasing the filter to, say, 3 or 5 page watchers. If we see an influx of vandalism on these pages then we can always revert our changes, but this seems unlikely to me.

Since we are adding a column that, by default, will be NULL, I suggest we first add the column without adding the filter and then wait a month. We can add the filter when we ensure data is in the new column. Let me know if this is a good plan and I will split my existing patch in two (with the filter patch attached to this ticket).

Per T20790#10587213, we have determined that we could both prevent this data from appearing in the wiki replicas, and limit access to it via the API. I think this should address both of the major avenues through which a bad actor could generate lists of pages to target.

I suggest we go ahead with deploying this filter (with these two safeguards in place), with the filter set at 0 page watchers, and then monitor data on how many pages are appearing via this filter on a selection of wikis. If it's a very low number, we could consider increasing the filter to, say, 3 or 5 page watchers. If we see an influx of vandalism on these pages then we can always revert our changes, but this seems unlikely to me.

Since we are adding a column that, by default, will be NULL, I suggest we first add the column without adding the filter and then wait a month. We can add the filter when we ensure data is in the new column. Let me know if this is a good plan and I will split my existing patch in two (with the filter patch attached to this ticket).

Oh yes, it totally makes sense to me to do this in two steps. It also gives us a little chance to collect data on the watch count of edited pages before moving forward with the actual filter.

It is possible to have a filter being conditional to a role. The Growth team does it with mentorship: the mentor role provides access to specific filters (they highlight their mentees' edits). We can imagine a configuration setting to provide access to a specific role (editor, autopatrolled, rollbacker...).

Regarding the APIs, shouldn't we have the same conditions that on the RC page, for consistency?

Scardenasmolinar changed the task status from Open to In Progress.Mar 13 2025, 3:31 AM
Scardenasmolinar claimed this task.

Change #1127215 had a related patch set uploaded (by Scardenasmolinar; author: Scardenasmolinar):

[mediawiki/core@master] Add watch count filter to Recent Changes

https://gerrit.wikimedia.org/r/1127215

It is possible to have a filter being conditional to a role. The Growth team does it with mentorship: the mentor role provides access to specific filters (they highlight their mentees' edits). We can imagine a configuration setting to provide access to a specific role (editor, autopatrolled, rollbacker...).

Regarding the APIs, shouldn't we have the same conditions that on the RC page, for consistency?

I think this is something we overlooked earlier, but I agree we could limit this in the RecentChanges UI, too. Perhaps access could be restricted to users with the patrol or rollback user right as a starting point?

It is possible to have a filter being conditional to a role. The Growth team does it with mentorship: the mentor role provides access to specific filters (they highlight their mentees' edits). We can imagine a configuration setting to provide access to a specific role (editor, autopatrolled, rollbacker...).

Regarding the APIs, shouldn't we have the same conditions that on the RC page, for consistency?

I think this is something we overlooked earlier, but I agree we could limit this in the RecentChanges UI, too. Perhaps access could be restricted to users with the patrol or rollback user right as a starting point?

If access to this RC filter could potentially be tied to user groups defined on local wikis, might it potentially be worth consulting with local communities to see if they have an opinion on which groups could have access to this filter? /gen

It is possible to have a filter being conditional to a role. The Growth team does it with mentorship: the mentor role provides access to specific filters (they highlight their mentees' edits). We can imagine a configuration setting to provide access to a specific role (editor, autopatrolled, rollbacker...).

Regarding the APIs, shouldn't we have the same conditions that on the RC page, for consistency?

I think this is something we overlooked earlier, but I agree we could limit this in the RecentChanges UI, too. Perhaps access could be restricted to users with the patrol or rollback user right as a starting point?

I'll add both and we can decide which is the best fit during reviews.

Scardenasmolinar changed the task status from In Progress to Stalled.Mar 18 2025, 5:07 PM
Scardenasmolinar removed Scardenasmolinar as the assignee of this task.

Whether to make it role-specific is pretty dependent on what wikis will be used for testing, and whether the specs for that role on that wiki are consistent with someone who might be reviewing recent changes. Rollback, maybe, as it is a permission that is fairly consistent across all projects. However, many projects do not have a patroller permission, or it is linked to something other than a recent-changes type activity.

I'm not really seeing a rationale to limiting this to particular roles, except that perhaps it might be easier for the trial period on certain projects. But monitoring and responding to recent changes is generally an entry-level task for new users on a large swath of our projects, and those new users aren't going to hold any permission other than confirmed user,

(Maybe I am missing something here?)

Whether to make it role-specific is pretty dependent on what wikis will be used for testing, and whether the specs for that role on that wiki are consistent with someone who might be reviewing recent changes. Rollback, maybe, as it is a permission that is fairly consistent across all projects. However, many projects do not have a patroller permission, or it is linked to something other than a recent-changes type activity.

I'm not really seeing a rationale to limiting this to particular roles, except that perhaps it might be easier for the trial period on certain projects. But monitoring and responding to recent changes is generally an entry-level task for new users on a large swath of our projects, and those new users aren't going to hold any permission other than confirmed user,

(Maybe I am missing something here?)

I think the primary argument is consistency between the RecentChanges API and user interface. If we're concerned that bad actors might use the API to generate lists of pages which aren't being watched, they could do this too in Special:RecentChanges, albeit with more manual effort.

I don't personally feel that strongly about it - it would be harder to do this in the Special page itself, so it doesn't strike me as a big deal, but it's possible that there's a technical reason the API and the front-end need to match.