Page MenuHomePhabricator

Create a tool to unwatchlist large numbers of pages
Open, Needs TriagePublic

Description

User story: As an active Wikipedia editor, I want to bulk-remove pages from my Watchlist, so that I can shrink it to a more manageable size and improve the performance of using my Watchlist.
User story: As an active Wikipedia editor, I want to bulk-remove pages from my Watchlist, so that I can focus on the kinds of edits I am most interested in.

Many experienced editors have very large watchlists. These watchlists can be bloated, slow, and cause database performance issues. The main ways to address this currently are to manually un-watchlist individual pages one at a time, or entirely blanking your watchlist (if you are even able to without timeouts). T363622; T41510

Screenshot 2025-08-06 at 09.47.58.png (1,618×992 px, 153 KB)

It would be helpful if users could un-watchlist pages based on some criteria. This could be a simple Toolforge tool.

Unwatching criteria could include:

  • User and talk pages of users who haven't contributed in X years
  • User and talk pages of blocked users/IPs (T405140)
  • Pages that have not been edited in X days (this ticket)
  • Pages in a given namespace (T405141)
  • Pages that are redirects (T405143)
  • Non-existing (i.e. deleted) pages (T405142)

MVP spec
The tool should allow users to:

  1. Log in via OAuth (using a consumer which allows viewing and editing watched pages)
  2. Enter a number of days.
  3. Submit the form.

All pages on the user's watchlist that have not been edited within this number of days should be removed from their watchlist.

MVP stretch goals

  • Retain a log of the user's watchlist so they can undo the action
  • Inform the user how many pages were removed / what the new size of their watchlist is now

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

This is a suggested Wikimania-Hackathon-2025 project from me. I won't be writing any code personally, so feel free to jump in, but I'm going to be popping by the Hackathon room throughout the week so happy to provide input and feedback!

If I'm not in the Hackathon room please feel free to ping me here.

Hi, since I’m in the small yellow bar of people with more than 10k pages in my watchlist (and want to decrease that), I’d like to add some additional insights or ideas:

  • batch remove non existing pages (because they were deleted previously)
  • enable to convert a page watchlisted "indefinitely" to a shorter period of time: that could help people to a "soft-unwatchlisting", for ex. "I convert all my watchlist to 1-month long watch, then after one month I’ll only be editing pages I’ve been actively interested in". I think for now the only way to do that is manually?
  • using Wikidata items linked to the pages (requirement: pages need to be properly linked to a wikidata item), filter in/out pages based on that (ex. I have a lot of football/soccer players due to previous patrolling, I don’t want to follow them anymore -> it would be easier to select them by "wikidata occupation")
  • provide a "watchlist analysis" by type of page (biographies, countries, places…), occupation, country… to help people better understand what pages they are watching and manage their focus

I’m not attending the hackathon this year, but feel free to ping me (here, I guess) if you need more context or explanations (and I’ll be glad to beta-test any tool around that purpose :D)

Discussed quite a few things with @Samwalton9-WMF, and I'm willing to form an MVP for this. Of course, performance is gonna be meh (because the API can only return so many watchlisted pages per request), but it's a start. :)

  • batch remove non existing pages (because they were deleted previously)

This is a good idea! Could include it in the first version.

For an MVP Pages that have not been edited in X days seems like it would be a good bet for something that is useful and achievable in a shorter time.

Oh, yeah, I forgot I already filed that. I'll merge that here since this ticket is more fleshed out.

I think these is a good idea generally for watchlist management. But for those of us who already have large watchlists, it's not a solution to the inability to edit the list (ironically which is what you need to do to significantly prune it), something reported years ago and still not addressed. It seems an internal timeout is exceeded, why not just have a longer timeout?

An update on this: I left the project untouched since Wikimania but I just finished the bulk of the code yesterday when I remembered this task existed. I aim to release the (probably-super-buggy-but-not-so-much-because-I-did-my-best) beta today.

I think these is a good idea generally for watchlist management. But for those of us who already have large watchlists, it's not a solution to the inability to edit the list (ironically which is what you need to do to significantly prune it), something reported years ago and still not addressed. It seems an internal timeout is exceeded, why not just have a longer timeout?

That's being addressed at T41510.

why not just have a longer timeout?

A longer timeout is not scalable. If we double the timeout, Special:EditWatchlist will still time out for really large watchlists. The solution to this is probably to paginate Special:EditWatchlist.

image.png (1,034×71 px, 16 KB)

https://unwatchlist.toolforge.org has been created and deployed but it's currently pending OAuth application approval for it to be usable by anyone besides myself.

https://unwatchlist.toolforge.org has been created and deployed but it's currently pending OAuth application approval for it to be usable by anyone besides myself.

Amazing! Let me know when it's approved and I can QA.

https://unwatchlist.toolforge.org has been created and deployed but it's currently pending OAuth application approval for it to be usable by anyone besides myself.

Amazing! Let me know when it's approved and I can QA.

App has been approved! :)

I'm getting a Toolforge error when attempting to login. I click login, get taken to Meta to approve the OAuth login, and then got sent to https://unwatchlist.toolforge.org/login?code=[a long code available on request]

Screenshot 2025-09-20 at 10.30.08.png (1,726×1,156 px, 110 KB)

Careful posting links like that in public tickets. I'm not 100% sure, but that URL is giving me a session ID vibe.

Careful posting links like that in public tickets. I'm not 100% sure, but that URL is giving me a session ID vibe.

Good point, the thought crossed my mind, redacted!

I'm getting a Toolforge error when attempting to login. I click login, get taken to Meta to approve the OAuth login, and then got sent to https://unwatchlist.toolforge.org/login?code=[a long code available on request]

Screenshot 2025-09-20 at 10.30.08.png (1,726×1,156 px, 110 KB)

Hrmm... I tested this both locally and on the tool itself. For some reason, logins are only failing on the tool. And that's odd because I tested the login yesterday after approval and it was working just fine.

To my knowledge, errors emitted from tools-proxy should only show up on proxy-specific errors and not tool errors (at least according to this). If it did come from the tool, I would have expected to see an error message coming from the tool directly, but that doesn't seem to be the case here. Nothing showing up in the logs either.

I'm getting a Toolforge error when attempting to login. I click login, get taken to Meta to approve the OAuth login, and then got sent to https://unwatchlist.toolforge.org/login?code=[a long code available on request]

Screenshot 2025-09-20 at 10.30.08.png (1,726×1,156 px, 110 KB)

Hrmm... I tested this both locally and on the tool itself. For some reason, logins are only failing on the tool. And that's odd because I tested the login yesterday after approval and it was working just fine.

To my knowledge, errors emitted from tools-proxy should only show up on proxy-specific errors and not tool errors (at least according to this). If it did come from the tool, I would have expected to see an error message coming from the tool directly, but that doesn't seem to be the case here. Nothing showing up in the logs either.

I'm getting 403, Server Error, Invalid state parameter now, seems like an actual tool error message rather than Toolforge.

Yeah, I'm in the middle of trying different things out to see if the proxy is blocking something specific. According to the logs, the entire "login" process is fully completed, so the part that fails is somewhere in between the "setting the session cookie → redirecting the user to the main page" part. The error there was because I set the cookie security to Strict, which turns out isn't a good idea when handling OAuth responses from Meta-wiki. I've set it back to Lax and yep, still not working.

Should work now. It seems like Nginx just doesn't like super long cookies.

I tested with my main account (with confidence thanks to the numerous options to download lists of pages!) and it all seems to work as expected! It's also more performant than I was expecting. With a ~2000 page watchlist, and searching for no edits in the last 2500 days, it responded and acted surprisingly quickly.

Results from some more thorough testing using a test account:

  • The randomly selected message under the Unwatchlist title blinks a message and then changes to another one for me on each page load.
  • This is minor, but if your watchlist has 0 pages the tool probably shouldn't enact or allow submission of Step 3.
  • In Step 2 I wonder if it's worth adding a note that the number presented here for size of watchlist is double what Special:Watchlist shows, because it's counting both articles and their talk pages. Alternatively, the tool could just use the same number (i.e. half what it currently shows)
  • The days field in Step 3 could probably use some validation. Entering -1, abc, 0.5, and ⌚️ were all allowed, but probably shouldn't be.
  • I noticed that the tool seems to be assessing talk pages and articles separately - a discussion page showed up that hadn't been edited in X days, but the content page had been edited. As I understand it a page can only have both its page and discussion page watchlisted simultaneously - you can't remove one without removing the other. This is a hard one to figure out how to handle, I suppose we have two options:
    • 'Group' the pages and assess whether either the page or its discussion page have been edited in X days.
    • Ignore discussion pages and only assess the content page. I think this is my preference.
  • I noticed that the tool seems to be assessing talk pages and articles separately - a discussion page showed up that hadn't been edited in X days, but the content page had been edited. As I understand it a page can only have both its page and discussion page watchlisted simultaneously - you can't remove one without removing the other. This is a hard one to figure out how to handle, I suppose we have two options:
    • 'Group' the pages and assess whether either the page or its discussion page have been edited in X days.
    • Ignore discussion pages and only assess the content page. I think this is my preference.

Chlod and I talked about this some more on Discord and clarified what we think the best approach is. In short, we think we should give users the option of having page-related criteria apply to both subject and discussion page, or just the subject page. I made a mock of what a setting could look like that applies to all page-related criteria (e.g. page edited in X days, page is deleted):

Screenshot 2025-09-20 at 20.29.46.png (1,982×992 px, 282 KB)

Would love to hear what subscribers on this task think of this approach.