Page MenuHomePhabricator

Ignore auto-watchlist preferences for bots
Closed, ResolvedPublic

Description

See also T258098 and T252812

Problem

The DBAs have expressed concerns about the ever-growing size of the watchlist table. In Community-Tech's research for Expiring-Watchlist-Items, we discovered the accounts with the most watched items are almost always bots, and appear to be only because the "add pages and files I create to my watchlist" preference is on by default. It's not surprising that bots create so many pages, since naturally they operate at a higher rate than humans.

Proposed solution

Ignore auto-watch preferences when an account is added to a user group with the bot permission. This seems rather uncontroversial, as bots conceivably have little use for the watchlist. If they do need it for whatever reason, the need to hear the use-case. We should solicit input from the community before making any decisions.

Some quick data

For perspective:

  • commonswiki: 14% (~22 million rows) of the watchlist table are owed to bots, which appear to be bots that upload files and simply have the default preference set to watch them
  • wikidatawiki: 1.1% (~91 million rows) -- probably bots that automatically create items after articles on Wikipedia are created
  • enwiki: 4.7% (~10.3 million rows) -- e.g. counter-vandalism bots that create User talk pages when issuing warnings
  • mgwiktionary: 99.8% (~13.2 million rows) -- a single bot mass-created nearly every entry on the wiki

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

It might be nice to prune some of the existing bot entry rows too. There's precedent for this sort of action, see T184485

It might be nice to prune some of the existing bot entry rows too. There's precedent for this sort of action, see T184485

That is T258098: Purge unused watchlist rows

It would make more sense to me to ignore the user preference when the user has the bot right, rather than to change the preference in the DB whenever the group changes.

It would make more sense to me to ignore the user preference when the user has the bot right, rather than to change the preference in the DB whenever the group changes.

What if there is a use-case for bots to utilize the watchlist? I can't think of any examples, but I didn't want to rule out that possibility.

What if there is a use-case for bots to utilize the watchlist? I can't think of any examples, but I didn't want to rule out that possibility.

Then we would break that use case. The point of the bug is to rectify a performance problem. Someone wanting it is not enough, the value has to justify the costs.

At T270481: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag it is thought that this issue might have contributed to cause the issues saw there.
Could this be prioritized so it gets some love soon?

Thanks

MusikAnimal renamed this task from Turn off auto-watchlist preferences for bots to Ignore auto-watchlist preferences for bots.Dec 19 2020, 9:28 AM
MusikAnimal updated the task description. (Show Details)

Change 650784 had a related patch set uploaded (by Umherirrender; owner: Umherirrender):
[mediawiki/core@master] [API] Ignore watchlist preferences for bot users

https://gerrit.wikimedia.org/r/650784

Change 650784 merged by jenkins-bot:
[mediawiki/core@master] [API] Ignore watchlist preferences for bot users

https://gerrit.wikimedia.org/r/650784

JJMC89 assigned this task to Umherirrender.

Should this change be added to RELEASE_NOTES to inform third-party users?

Thanks again to Umherirrender for the patch! The solution to ignore the bot preference, rather than not allowing watching at all, I think just warrants a note in Tech News. Am I correct that the next deployment train goes out January 5? It would seem the first Tech News of the year isn't until January 11. The chances that bots that are actually relying on the watch preference is likely very slim, so I suppose it's OK to be a little late on getting the word out.

Should this change be added to RELEASE_NOTES to inform third-party users?

I would think so, yes. We also probably need to update some documentation on mediawiki.org. I'll try to help with that.

Change 653188 had a related patch set uploaded (by MusikAnimal; owner: MusikAnimal):
[mediawiki/core@master] RELEASE-NOTES: Note that watchlist prefs for bots are ignored in the API

https://gerrit.wikimedia.org/r/653188

Change 653188 merged by jenkins-bot:
[mediawiki/core@master] RELEASE-NOTES: Note that watchlist prefs for bots are ignored in the API

https://gerrit.wikimedia.org/r/653188

Wouldn't the best fix be to make the processes scalable? I'm amazed that 100 million rows here or there cause a problem