Page MenuHomePhabricator

Opening Special:EditWatchlist with a large watchlist hits server timeout (Create watchlist pager)
Open, MediumPublic

Description

From T220245#5105063 (April 2019) where @kostajh wrote:

Plan as discussed with @Catrope and @SBisson:

  1. Implement a pager that respects the groupings displayed on EditWatchlist. Max number of items rendered per page would be 500.
  2. Add a namespace filter at the top of the page which defaults to "(all)"
Original 2012 report by @Nemo_bis

A user on translatewiki.net: "And I cannot now edit my watchlist due to its huge volume. The server does not allow me to do that (HTTP error 500)". From the log:
I don't know if he tried Special:EditWatchlist or Special:EditWatchlist/raw, he also mentioned having 15.000 items in watchlist but that may be an hyperbole (works for me with only 3000 watchlisted pages).
Examples of (possibly?) caused errors in the logs:
-rakkaus:#mediawiki-i18n- [20-Aug-2012 14:52:34] PHP Fatal error: Allowed memory size of 204472320 bytes exhausted (tried to allocate 32 bytes) in /www/w/includes/db/Database.php on line 1746
-rakkaus:#mediawiki-i18n- [20-Aug-2012 14:53:06] PHP Fatal error: Allowed memory size of 204472320 bytes exhausted (tried to allocate 32 bytes) in /www/w/includes/db/Database.php on line 1733
-rakkaus:#mediawiki-i18n- [20-Aug-2012 14:56:14] PHP Fatal error: Allowed memory size of 204472320 bytes exhausted (tried to allocate 83 bytes) in /www/w/includes/db/DatabaseMysql.php on line 210


See Also:

Details

Reference
bz39510
Related Gerrit Patches:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ab.zachaeus wrote:

I've been whittling my Commons watchlist down (originally +39K, list emptying tool failed, see #66212). I broke 36K items and raw editing works fine now. Normal edit still fails and gives 504.

For large watchlists; we should have another quering interface, allowing us to download it in an external file; we should also be able to query the list for example by range of dates of last edit or date when we started watching it or by namespace or by filters on names or by category of page:

We should then be able to upload this edited sublist using the same filter, meaning that pages matching the filter but not in the uploaded list would be discarded from the watchlist.
The watchlist should also accept some badic level of compression (such as matching all subpages of a parent page, by using the parent page followed by a final slash, meaning that this watched item matches all subpages, or by using a training wildcard (a basic level of regexp). That list should also be exported automatically sorted (at least lexicographically if not by CLDR order) and auto,atically deduplicated.

If the watched page is a category, there should exist an option saying if we watch only the category or all pages listed in them, or all pages we have edited ourself and that are in that category (options can be set for each page name using special <tags> between angle brackets, the default is one line per watched page which currently cannot include those brackets; or even by adding these options after a "?" question mark as the question mark is also not a possible pagename).

Possibly the lists of pages within <bracketed options> or "?query options" could be edited separately.

Finally we could also have several watchlists per user; giving to them user-defined names and working more or less like personal categories. When adding a page to wath we could have a combobox to select which one to use (more or less like "labels" in Gmail and other IMAP-based webmails).

This would help splitting the lists and filters and manage them more easily. (It would also avoid users creating their own public categories for their own uploaded files on Commons, something that is just pollution of the common category tree...)

Why not trying with a new user namespace such as "user_label:<username>/<labelname>"? Each of these user labels cold then have their own watch options (notably email notifications or not), and when viewing these personal pages, we could have options such as listing page edit/revert histories, reviews, new talks; change in lists of contributors, change in list of watchers (only if watchers are acepting to publish the fact they watch it by making this fact public), making the "user label" publicly viewable or private (if private, the label name used would be invisible, as if it was not existing).

Finally we could have "shared user labels" (a user creates a label and authorizes other users to subscribe to it either to watch it or add or remove items from the list or propose additions/suppressions sucj that the owner can accept these changes by a simple click) But unlike categories, all thiese user labels are owned by their creator and anyone can create them. It would be very useful for managing lists of pages in a community project (these project may be temporary, such as collaborative maintenance work: no more need to pollute the shared public namespaces with lists of links): just start by creating your own label, add pages to them, share this list by making it publicly visible, allow users to propose additions/suppressions, choose users that can manage it by adding/deleting items (pagenames, categories, filters).

[Separate enhancement requests should go to separate tickets.
This ticket is not about brainstorming ideas.]

It is fuly related to the subject of manaiging large watchlists (and the faft that they are completely inefficent and don't scale correctly).
It is something to work on to REPLACE watchlists (or onlt support existing baic watchlists via a basic gateway).
I propose the transition to something more manageable (ans more useful) because it could also solve other problems (notably for the management of contnent maintenance projects or quality projects).
My watchlist fills up too rapidly and I have more and more difficulties to sort its generated notifications and organize things to do about them, and I'm certainly not alone. They have becme largely unusable (and the server iself cannot support them correctly without f).

(In reply to Philippe Verdy from comment #29)

The watchlist should also accept some badic level of compression (such as
matching all subpages of a parent page, by using the parent page followed by
a final slash, meaning that this watched item matches all subpages, or by

This is the same as bug 15072 (given the way pages are organized on Wikibooks projects).

If the watched page is a category, there should exist an option saying if we
watch only the category or all pages listed in them, or all pages we have
edited ourself and that are in that category

See bug 1710 and bug 7148.

(In reply to Philippe Verdy from comment #30)

Finally we could also have several watchlists per user; giving to them
user-defined names and working more or less like personal categories. When
adding a page to wath we could have a combobox to select which one to use
(more or less like "labels" in Gmail and other IMAP-based webmails).

Bug 5875 or bug 20444.

(In reply to Philippe Verdy from comment #32)

It is something to work on to REPLACE watchlists (or onlt support existing
baic watchlists via a basic gateway).
I propose the transition to something more manageable (ans more useful)
because it could also solve other problems (notably for the management of
contnent maintenance projects or quality projects).

Bug 33888.

(In reply to Philippe Verdy from comment #32)

It is fuly related to the subject

Yes, only *related*.
Hence please see the tickets that Helder was kind enough to identify...

AFAICT, all of those enhancements are low priority, having no more than 30 votes. This is at least a normal-priority bug.

Pitke added a subscriber: Pitke.

ab.zachaeus wrote:
I've been whittling my Commons watchlist down (originally +39K, list emptying tool failed, see #66212). I broke 36K items and raw editing works fine now. Normal edit still fails and gives 504.

Normal edit has been working starting somewhere between 33,5K and 35K, but is very very slow and selecting multiple items on the list increases lag greatly.

Anyone willing to clear my watchlist? 103,948 pages and counting, cannot clear it manually..

Krenair added a subscriber: Krenair.Apr 5 2016, 9:31 PM

Anyone willing to clear my watchlist? 103,948 pages and counting, cannot clear it manually..

Yes, but I need to know which wiki you're requesting this on.

Commons.wikimedia.org please

(per chat on IRC, cleared watchlist for Riley Huntley @ commonswiki)

Krenair removed Riley_Huntley as the assignee of this task.Apr 5 2016, 9:42 PM
Restricted Application added a project: Growth-Team. · View Herald TranscriptMay 28 2019, 3:37 PM
JTannerWMF closed this task as Resolved.May 28 2019, 6:02 PM
JTannerWMF claimed this task.
JTannerWMF added a subscriber: JTannerWMF.

If this is still happening please reopen.

matmarex reopened this task as Open.May 28 2019, 6:43 PM
matmarex removed JTannerWMF as the assignee of this task.
matmarex added a subscriber: kostajh.

This is most likely still a problem, I don't think we've made any changes that would fix it. (Although software and hardware updates that happened since 2012, when this task was filed, probably increased the watchlist size that can be handled without hitting limits.)

There's a recently filed task T220245, which is a similar issue (although possibly triggered by a recent code change, the root cause is the same – trying to handle thousands of entries at once exceeds time/memory limits). It actually has a patch pending by @kostajh that should fix this task too! Let me connect it.

Change 505784 had a related patch set uploaded (by Bartosz Dziewoński; owner: Kosta Harlan):
[mediawiki/core@master] Introduce alphabetic pager for Special:EditWatchlist

https://gerrit.wikimedia.org/r/505784

Verdy_p added a comment.EditedSep 5 2019, 8:20 AM

For now this old bug is still present in various wikis (and more serious on wikis running in smaller servers with more limited CPU/memory resources): the HTTP error 500 occurs even when we ask to purge the list completely (and nothing is purged at all) even those with the most recent versions of Mediawiki.

You closed the topic about translatewiki.net, but that wiki is still affected today!

All that can be done, still, is to use the list editor in raw mode (from the user's preferences panel), and even in this case when we submit the lsit (which has been sorted and filtered using an external text editor) we still have an HTTP error 500 when submitting the list when it contains over ~5000 items: if we reload the raw list editor, sometimes nothing is recorded (the old list is still present), but sometimes it has been saved correctly, and in rare cases, the recorded list has been arbitrarily truncated, so the raw list editor is still very unreliable.

Mediawiki should include an option to automatically sort the list by full page name and should allow editing it in groups (by selected namespaces) or by page (200 items by page like in categories), and should include a search filter (at least by prefix or suffix). Most filters applicable to normal page searches should exist when viewing/editing the user's tracking list. Additional options should include a way to automatically select items that have a last edit by the user himself before a given period (but this may require the tracking list to keep a record of these last edit time)

As these lists are not easily editable, or impossible to manage, many users have abandoned since long the idea of purging them selectively, and these lists stored by user must take now considerable space in servers and are probably now a severe drain/waste of storage resources for the most active users (and if these lists have their own histories, for some users this could take easily gigabytes of storage for the thousands versions created by incrementally adding each item or trying to manage them): these users may receive frequent notifications, but have learnt to manage them (and probably opted out from receiving such notifications by email due to their volume and frequency in their mailbox, where their ISP may consider these to be spam; some users may have even chosen to no longer contribute to wikis if they are notified too frequently or if their mailbox is constantly full or unusable for receiving other mails, or may not even detect that other non-notification mails written by humans from the wiki have been received: they may not detect that, and will not be able to reply to them in a reasonnable time, or may drop these mails too easily within a huge flow of notifications).

So this is a really serious problem that undermines the wiki projects on multiple aspects.
Allowing users to manage their tracking list efficiently and more selectively is a real need.

kostajh moved this task from Inbox to Q2 2019-20 on the Growth-Team board.Sep 5 2019, 2:37 PM

T220245 should fix this problem. Shall we merge them? (Moving this one to Q2, in case T220245 isn't resolved, so we don't forget about this.)

Teles added a subscriber: Teles.Sep 8 2019, 4:26 PM
This comment was removed by Teles.
Teles added a comment.Sep 8 2019, 4:32 PM

I was trying to edit my watchlist on pt.wikipedia. It has more than 27k pages. I received this error message:
Request from 2804:14d:72b3:8094:4df3:6ecc:9539:aeed via cp1075.eqiad.wmnet, ATS/8.0.5
12:48:27 Error: 504, Connection Timed Out at 2019-09-08 15:40:37 GMT

Daimona added a subscriber: Daimona.Sep 8 2019, 5:03 PM

I was trying to edit my watchlist on pt.wikipedia. It has more than 27k pages. I received this error message:
Request from 2804:14d:72b3:8094:4df3:6ecc:9539:aeed via cp1075.eqiad.wmnet, ATS/8.0.5
12:48:27 Error: 504, Connection Timed Out at 2019-09-08 15:40:37 GMT

See logs snapshot. That's telling us the resultset is too big...

Krinkle updated the task description. (Show Details)Sep 8 2019, 6:37 PM
Krinkle renamed this task from Editing large (10k+) watchlist [Special:EditWatchlist but not /raw] fails with HTTP error 500 (fatal error, OOM) to Editing large watchlists via Special:EditWatchlist fails due to a server timeout.Sep 8 2019, 6:42 PM
Krinkle updated the task description. (Show Details)
Krinkle edited subscribers, added: SBisson, Catrope; removed: wikibugs-l-list.
Krinkle added a subscriber: Krinkle.Sep 8 2019, 6:46 PM

Plan as discussed with @Catrope and @SBisson:

  1. Implement a pager that respects the groupings displayed on EditWatchlist. Max number of items rendered per page would be 500.
  2. Add a namespace filter at the top of the page which defaults to "(all)"

The solution proposed in the patch for this task is to implement an alphabetic pager for Special:EditWatchlist. The pager will respect the namespace groupings currently on EditWatchlist, but the maximum number of items rendered per page will be 500 (as opposed to the unlimited number currently). Also, the TOC at the top of EditWatchlist is replaced with a namespace filter, which defaults to (all).

Qgil removed a subscriber: Qgil.Sep 9 2019, 10:18 AM
kostajh claimed this task.Sep 18 2019, 10:04 AM

@eprodromou can you please weigh in on this comment before I rework the patch?

kostajh changed the task status from Open to Stalled.Oct 15 2019, 12:35 PM
kostajh removed kostajh as the assignee of this task.
kostajh edited projects, added Growth-Team; removed Growth-Team (Current Sprint).

Marking as stalled pending T41510#5506021, and moving out of Growth-Team's current sprint as we are working on other things at the moment. I've also unassigned myself, if someone else wants to pick this up please do, otherwise I will come back to it later this year.

Krinkle edited projects, added Core Platform Team; removed Patch-For-Review.

Re-triaging in CPT inbox per the below question:

@eprodromou can you please weigh in on this comment before I rework the patch?

@eprodromou can you please weigh in on this comment before I rework the patch?

Sorry I missed this. I'll review tomorrow and get a response.

OK, I had a talk with Brad about this. I think from the CPT side, it's best to order the pages by namespace then title, because our database indexes are set up to do that query efficiently, and the page won't work very well without the indexes.

@kostajh if that works for you, let me know. Otherwise, I'll see if we can be helpful in other ways.

kostajh changed the task status from Stalled to Open.Nov 19 2019, 8:03 PM

sounds good, thanks @eprodromou

CCicalese_WMF added a subscriber: CCicalese_WMF.

Let us know if you need any future review from CPT.

Krinkle renamed this task from Editing large watchlists via Special:EditWatchlist fails due to a server timeout to Editing large watchlists via Special:EditWatchlist fails due to server timeout (Create watchlist pager).Sat, Jan 11, 6:58 PM
Krinkle renamed this task from Editing large watchlists via Special:EditWatchlist fails due to server timeout (Create watchlist pager) to Opening Special:EditWatchlist with a large watchlist hits server timeout (Create watchlist pager).Wed, Jan 15, 3:11 AM