Editing large (10k+) watchlist [Special:EditWatchlist but not /raw] fails with HTTP error 500 (fatal error, OOM)
OpenPublic

Description

A user on translatewiki.net: "And I cannot now edit may watchlist due to its huge volume. The server does not allow me to do that (HTTP error 500)". From the log:
I don't know if he tried Special:EditWatchlist or Special:EditWatchlist/raw, he also mentioned having 15.000 items in watchlist but that may be an hyperbole (works for me with only 3000 watchlisted pages).

Examples of (possibly?) caused errors in the logs:

-rakkaus:#mediawiki-i18n- [20-Aug-2012 14:52:34] PHP Fatal error: Allowed memory size of 204472320 bytes exhausted (tried to allocate 32 bytes) in /www/w/includes/db/Database.php on line 1746

-rakkaus:#mediawiki-i18n- [20-Aug-2012 14:53:06] PHP Fatal error: Allowed memory size of 204472320 bytes exhausted (tried to allocate 32 bytes) in /www/w/includes/db/Database.php on line 1733

-rakkaus:#mediawiki-i18n- [20-Aug-2012 14:56:14] PHP Fatal error: Allowed memory size of 204472320 bytes exhausted (tried to allocate 83 bytes) in /www/w/includes/db/DatabaseMysql.php on line 210


Version: 1.22.0
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=20483
https://bugzilla.wikimedia.org/show_bug.cgi?id=66212

bzimport added a project: MediaWiki-Watchlist.Via ConduitNov 22 2014, 1:00 AM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz39510.
Nemo_bis created this task.Via LegacyAug 20 2012, 5:28 PM
bzimport added a comment.Via ConduitAug 20 2012, 7:41 PM

verdy_p wrote:

I can reply: I used the normal Watchlist page, which causes the error 500.

Editing the raw list worked for me (I have sorted it in an external editor, deleting duplicate entries, plus some netries that I have NEVER subscribed myself notably when it sudenly started to flood like the "Support" page (it started when I had not even visited the site since long), and did not want to subscribe it.

Apparently someone added the Support page in my watchlist, or it was a side effect of a recent upgrade with neceaary maintenance (and unreasonnable default) of LQT on the TranslateWiki site. And then there were tons of threads added to it implicitely (by a cascading effect of the LQT extension). I have removed this "Support" page used implicitly as a base for most questions asked by users there (in my opinion, moving all support questions in the same base page is cuasnig this.

May be the subscription to this page was caused by someone moving/redirecting one of my past pages (posted elsewhere) into the "Support" page (but instead of moving the watched item only in its new Thread:* subpage, what is added is the full "Support" page ; the tool that allows the admin to move old Talk pages, when they are already watched, into a Support's new LQT thread, is probably causing the issue).

Still, even if my Watchlist is now cleaned up a bit and is editable in raw mode, trying to just view the Special:Watchlist page in normal mode, it just generates an HTTP error 500.

Note: I have contributed a lot of translations (more than 5000 only for MediaWiki, but many more also in other translation projects), so most French translation subpages ("<project>:<RessourceName>/fr") are in the watchlist. It has never flooded me like now.

bzimport added a comment.Via ConduitAug 20 2012, 7:48 PM

verdy_p wrote:

And no this is not an hyperbole: after cleanup, I have still about 9600 items watched. Most of them are for pages "<projectname>:<resource>/fr".

I was complaining because I started to receive noticacations of LQT messages about ALL translations in ALL languages of ALL projects sent by ALL users, even those I had not even participated in. And including for each time that the TranslateWiki admin was moving questions from sections of some talk pages into new LQT threads.

Nemo_bis added a comment.Via ConduitAug 20 2012, 7:57 PM

Thank you, I've summarised the info in the summary.

bzimport added a comment.Via ConduitAug 20 2012, 9:07 PM

verdy_p wrote:

One addition thing: if our watchlist if too large for allowing a cute presentation in HTML mode with simple clicks, you should detect that the max size of the genereted HTML or wiki text has been reachedn making it impossible to render that way.

Instead of producing an error 500, you should catch the error and display a page in raw mode (where the content of the watchlist just appears unparsed in a single text within a basic text input box, as a single HTML element, exactly like with the old format).

This bug is not specific to TranslateWiki. In fact this happens now as well in Wikipedia (where I also have lots of pages watched, but still a very small amount of notifications).

About the origin of the bug: the new version of LQT installed on TranslateWiki does not interpret our watchlist correctly. When parsing it:

  • it discards the "Talk" part of the namespace
  • it discards the specification of the subpage (containing a language restriction)
  • it follows back the redirections that may have been created by moving sections of text of a Talk page into the central "Support" page. Those redirects are apparently confusing it because it has the equivalent of subscribing us to EVERYTHING that is linked to the Support page (almost all discussions made using LQT threads in TranslateWiki, by everyone, in any language and in all project namespaces and ignoring the difference between the project's article space and the project's talk space).

As a consequence we get flooded even if we've NOT watched the "Support" page directly on TranslateWiki.

There, we were taught to check if the "Support" page (in man namespace) was watched. It was not initially but was added at one time during some maintenance of the site, when redirecting old talk pages or converting them to LQT threads instead of sections.

That's when I tried to edit the watchlist (that now fails miserably) that I had to use a manual edit in "raw" mode. Hovever the raw mode is not linked anywhere on the site and not documented. The raw mode should still be the default if the new coooked mode cannot be generated.

matmarex added a comment.Via ConduitMay 11 2013, 10:47 AM

Reported on pl.wp as well: https://pl.wikipedia.org/wiki/Wikipedia:Kawiarenka/Kwestie_techniczne#Edycja_listy_obserwowanych . Raising severity.

See also bug 20483 about unpaginated watchlist, and the patch for it - I11451c33.

bzimport added a comment.Via ConduitMay 13 2013, 6:59 AM

winne2i wrote:

@up: P.S. The user states that he has 32.029 items on his watchlist and he cannot edit the list even in raw mode.

Nemo_bis added a comment.Via ConduitMay 13 2013, 8:28 AM

(In reply to comment #7)

@up: P.S. The user states that he has 32.029 items on his watchlist and he
cannot edit the list even in raw mode.

Ah. That's unfortunate. I assume the watch/unwatch buttons work, though (otherwise it would be a deeper bug than this)?

I see the user does a lot of ns0 copyediting, so there's no easy bulk of pages to watch, still as a workaround someone could make him a script to list the whole watchlist and clear it with [[mw:API:Watchlistraw]] / [[mw:API:Watch]], then the usual special page could be used to restore the desired part of the watchlist...

duplicatebug added a comment.Via ConduitJun 12 2013, 6:12 PM
  • Bug 49483 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitJun 12 2013, 11:11 PM

dcduring wrote:

Simply wiping out the list is not good enough. Nor are most conceivable ways of bulk-reducing (eg, by namespace, by limiting to ASCII character). If the normal edit processes don't work for large watchlists, then some kind of offline editing and reloading of the watchlist would be required.

The effective limit on watchlist size at English Wiktionary is about 20K entries. We have one contributor with 150K, another with 50K items and others with counts exceeding 20K. With 3.5M entries we really need to have large watchlists to avoid overloading our patrolers or letting many low-quality changes through. It is not realistic to have users watch or unwatch each page they contribute to individually. Users either watch every page they edit by default or virtually no page they edit. As the list is presented for editing is alphabetical within each namespace, reviewing it manually as it approaches 20K items means reviewing not just recent additions (since the last review), but the whole list. Going through one's recent contributions to unwatch items one at a time is also quite slow.

bzimport added a comment.Via ConduitJun 12 2013, 11:34 PM

verdy_p wrote:

Watchlists stored and managed like large text files are just a bad solution.
A better data model where the list will just be indexed pointers linking a user account and an article, in such a way that it can be searched and edited within subselections with the integration of a search feature, will be more useful.
We should be able for example to query the list of articles we watch that are linked to or from some page or category and we should be able, when searching in articles in Wikipedia, to see immediately in the found results those pages that we have contributed in the past, without having to watch or unwatch them.

For this we already have the history list for every page. This history already links the user, the page, the date of edit and its version number. It could as well contain a simple flag for watched items. So instead of watching pags, we would watch a list of the last version we have edited in the past. And we should be immediately offered a way to look compare versions of a page since out last edit on it, using the existing diff tool.

So integrate these watch lists within the history list of pages, and forget the large watch list themselves.

We should also be allowed to query watched pages by the time of their edit or last time we clicked on the "watch" button.

We should have an interface showing a calendar of our own activities.

bzimport added a comment.Via ConduitJun 12 2013, 11:49 PM

dcduring wrote:

I could support that if it allowed the user to filter out titles which included any character outside of a given range of Unicodes as well. At Wiktionary, we have pagenames in many different charactersets.

Pending the arrival of the UI of our dreams, perhaps by early 2014, I would like users with very large watchlists to not be prevented from seeing more than the last few hour's worth of changes to pages in their with watchlist.

matmarex added a comment.Via ConduitJun 13 2013, 12:02 AM

(In reply to comment #12)

Pending the arrival of the UI of our dreams, perhaps by early 2014,

Don't be so pessimistic, this bug already has a patch (I11451c33), which should at least allow you to view and edit the watchlist contents.

Nemo_bis added a comment.Via ConduitJun 13 2013, 5:14 AM

(In reply to comment #10)

The effective limit on watchlist size at English Wiktionary is about 20K
entries. We have one contributor with 150K, another with 50K items and others
with counts exceeding 20K.

Just to clarify: Special:Watchlist *does* load for them, doesn't it? Otherwise it would be an additional issue.

bzimport added a comment.Via ConduitJun 13 2013, 12:59 PM

dcduring wrote:

For the guy with the 150K list, apparently it didn't load until he reduced the number of items displayed, by limiting the time period (or, possibly, the maximum number of items). That is the only time I have heard of such a problem at en.wikt as long as I've been there (late 2007) and paying attention to GP matters (2008).

The inability to edit one's watchlist has been mentioned a two or three times, with two or three of the heaviest users acknowledging that they'd experienced the issue each time. I solved my personal problem with it by no longer using the default option of watching all items I edit. That means I rarely add principal namespace pages to my watchlist. I find that quite unsatisfactory from the perspective of the project as a whole or of patrollers.

bzimport added a comment.Via ConduitJun 13 2013, 1:16 PM

dcduring wrote:

(In reply to comment #13)

(In reply to comment #12)
> Pending the arrival of the UI of our dreams, perhaps by early 2014,

Don't be so pessimistic, this bug already has a patch (I11451c33), which
should
at least allow you to view and edit the watchlist contents.

When might that patch be deployed? What do we have to do to get it deployed?

Nemo_bis added a comment.Via ConduitJun 13 2013, 1:26 PM

(In reply to comment #16)

When might that patch be deployed? What do we have to do to get it deployed?

You have to hunt for a developer working on it. :) It has some issues, but the original author has been inactive for a while.

bzimport added a comment.Via ConduitJun 13 2013, 4:38 PM

dcduring wrote:

Does the existence of this not-yet-ready-for-primetime patch mean that no quick-and-dirty patch can be expected? Does this mean that the UI of my dreams (See comments 11 and 12 above) will be pushed back from early 2014?

Nemo_bis added a comment.Via ConduitJun 13 2013, 4:40 PM

(In reply to comment #18)

Does the existence of this not-yet-ready-for-primetime patch mean that no
quick-and-dirty patch can be expected? Does this mean that the UI of my
dreams
(See comments 11 and 12 above) will be pushed back from early 2014?

Yes, I suggest not to have hopes.

bzimport added a comment.Via ConduitJul 25 2013, 2:06 PM

verdy_p wrote:

Could we edit the watchlist per namespace?

This would facilitate the clenup a lot (notably it's true that the main space has little interest for the watchlist, when we expect to follow more sensible things like templates, modules, personal pages, and some specialized subproject pages with few contributors (mostly technical edits there, not much editorial info there when most useful info should go to the main articles).

In the main namespaces, some pages may still be followed: pages constaining lists, or pages detailing some formulas, or containing data (but most data should go now to Wikidata, and is more easily watched using its relational features, acting like complex categories with multiple search axis).

We have other tools for monitoring the main namespace, notably the QA evaluation tools.

bzimport added a comment.Via ConduitJul 25 2013, 4:33 PM

dcduring wrote:

(In reply to comment #20)

Could we edit the watchlist per namespace?

This would facilitate the clenup a lot (notably it's true that the main space
has little interest for the watchlist, when we expect to follow more sensible
things like templates, modules, personal pages, and some specialized
subproject
pages with few contributors (mostly technical edits there, not much editorial
info there when most useful info should go to the main articles).

In the main namespaces, some pages may still be followed: pages constaining
lists, or pages detailing some formulas, or containing data (but most data
should go now to Wikidata, and is more easily watched using its relational
features, acting like complex categories with multiple search axis).

We have other tools for monitoring the main namespace, notably the QA
evaluation tools.

This is seems to be an exclusively WP-oriented comment. At English Wiktionary we have vandalism problems and difficulties with PoV-pushing editors which we address in part by having large watchlists, eg, as many as 150K pages, most of which are principal namespace. If we had tools such as language-section-specific "recent changes" and could look back a year or more rather than 30 days on such language-section-specific recent changes we could also be dismissive of the expressed concerns.

matmarex added a comment.Via ConduitJul 25 2013, 5:17 PM

(In reply to comment #20)

Could we edit the watchlist per namespace?

That would be cumbersome and probably wouldn't help to relieve the issue caused by this bug anyway. File it as a separate bug if you feel strongly about it.

(In reply to comment #21)

If we had tools such as
language-section-specific "recent changes" and could look back a year or more
rather than 30 days on such language-section-specific recent changes we could
also be dismissive of the expressed concerns.

The language-section-specific "recent changes" you describe seem quite hard to implement "correctly", but should be possible with a gadget filtering recent changes on edit summaries.

The length of time that recent changes entries are kept is configurable per-wiki (for example translatewiki.net has it set to 5 years IIRC), and technically it's even possible to rebuild entries which expired and have been removed if the time limit is raised (but I'm not sure if that would actually work on large wikis). File a bug under "Wikimedia -> Site requests" if you want this changed for your wiki.

bzimport added a comment.Via ConduitJul 25 2013, 7:26 PM

dcduring wrote:

(In reply to comment #22)

(In reply to comment #20)
> Could we edit the watchlist per namespace?
>
> …

That would be cumbersome and probably wouldn't help to relieve the issue
caused
by this bug anyway. File it as a separate bug if you feel strongly about it.

(In reply to comment #21)
> If we had tools such as
> language-section-specific "recent changes" and could look back a year or more
> rather than 30 days on such language-section-specific recent changes we could
> also be dismissive of the expressed concerns.

The language-section-specific "recent changes" you describe seem quite hard
to
implement "correctly", but should be possible with a gadget filtering recent
changes on edit summaries.

The length of time that recent changes entries are kept is configurable
per-wiki (for example translatewiki.net has it set to 5 years IIRC), and
technically it's even possible to rebuild entries which expired and have been
removed if the time limit is raised (but I'm not sure if that would actually
work on large wikis). File a bug under "Wikimedia -> Site requests" if you
want
this changed for your wiki.

I thought that the time-limit configurability might be available, but it is of value only for some of our content, specifically, content in language sections for languages with few contributors.

A roughly right implementation of language-section watchlists would be so much better than the nothing that we have now that I am interested in any gadgetry that might accomplish it. en.wikt has some JS and Lua capability, but has lost some veteran technical contributors. All of our technical contributors are aware of the problem, so there are probably difficulties that I couldn't understand well enough to ever communicate them here.

Perhaps there is some cleverness that can overcome the problem.

I

hoo added a comment.Via ConduitDec 10 2013, 4:18 PM
  • Bug 58257 has been marked as a duplicate of this bug. ***
matmarex added a comment.Via ConduitMar 4 2014, 3:48 PM
  • Bug 62208 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitJun 6 2014, 10:50 PM

ab.zachaeus wrote:

Getting 504 Gateway Time-out nginx/1.1.19 on zh.wikipedia normal watchlist edit. 27K foo entries on watchlist.

Raw watchlist edit works normally.

Trying to save a 30K-ish list in raw edit mode results reliably at this point of day in

http://zh.wikipedia.org/wiki/Special:%E7%BC%96%E8%BE%91%E7%9B%91%E8%A7%86%E5%88%97%E8%A1%A8/raw, from 10.64.0.105 via cp1068 cp1068 ([10.64.0.105]:3128), Varnish XID 3113761383
Forwarded for: 84.250.106.149, 91.198.174.102, 208.80.154.133, 10.64.0.105
Error: 503, Service Unavailable at Fri, 06 Jun 2014 22:32:50 GMT

Emptying (/clear) works fine this far.

Bawolff added a comment.Via ConduitJun 26 2014, 5:37 PM
  • Bug 67123 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitSep 30 2014, 7:13 AM

ab.zachaeus wrote:

I've been whittling my Commons watchlist down (originally +39K, list emptying tool failed, see #66212). I broke 36K items and raw editing works fine now. Normal edit still fails and gives 504.

bzimport added a comment.Via ConduitSep 30 2014, 8:10 AM

verdy_p wrote:

For large watchlists; we should have another quering interface, allowing us to download it in an external file; we should also be able to query the list for example by range of dates of last edit or date when we started watching it or by namespace or by filters on names or by category of page:

We should then be able to upload this edited sublist using the same filter, meaning that pages matching the filter but not in the uploaded list would be discarded from the watchlist.
The watchlist should also accept some badic level of compression (such as matching all subpages of a parent page, by using the parent page followed by a final slash, meaning that this watched item matches all subpages, or by using a training wildcard (a basic level of regexp). That list should also be exported automatically sorted (at least lexicographically if not by CLDR order) and auto,atically deduplicated.

If the watched page is a category, there should exist an option saying if we watch only the category or all pages listed in them, or all pages we have edited ourself and that are in that category (options can be set for each page name using special <tags> between angle brackets, the default is one line per watched page which currently cannot include those brackets; or even by adding these options after a "?" question mark as the question mark is also not a possible pagename).

bzimport added a comment.Via ConduitSep 30 2014, 8:11 AM

verdy_p wrote:

Possibly the lists of pages within <bracketed options> or "?query options" could be edited separately.

Finally we could also have several watchlists per user; giving to them user-defined names and working more or less like personal categories. When adding a page to wath we could have a combobox to select which one to use (more or less like "labels" in Gmail and other IMAP-based webmails).

This would help splitting the lists and filters and manage them more easily. (It would also avoid users creating their own public categories for their own uploaded files on Commons, something that is just pollution of the common category tree...)

Why not trying with a new user namespace such as "user_label:<username>/<labelname>"? Each of these user labels cold then have their own watch options (notably email notifications or not), and when viewing these personal pages, we could have options such as listing page edit/revert histories, reviews, new talks; change in lists of contributors, change in list of watchers (only if watchers are acepting to publish the fact they watch it by making this fact public), making the "user label" publicly viewable or private (if private, the label name used would be invisible, as if it was not existing).

Finally we could have "shared user labels" (a user creates a label and authorizes other users to subscribe to it either to watch it or add or remove items from the list or propose additions/suppressions sucj that the owner can accept these changes by a simple click) But unlike categories, all thiese user labels are owned by their creator and anyone can create them. It would be very useful for managing lists of pages in a community project (these project may be temporary, such as collaborative maintenance work: no more need to pollute the shared public namespaces with lists of links): just start by creating your own label, add pages to them, share this list by making it publicly visible, allow users to propose additions/suppressions, choose users that can manage it by adding/deleting items (pagenames, categories, filters).

Aklapper added a comment.Via ConduitSep 30 2014, 9:03 AM

[Separate enhancement requests should go to separate tickets.
This ticket is not about brainstorming ideas.]

bzimport added a comment.Via ConduitSep 30 2014, 10:28 AM

verdy_p wrote:

It is fuly related to the subject of manaiging large watchlists (and the faft that they are completely inefficent and don't scale correctly).
It is something to work on to REPLACE watchlists (or onlt support existing baic watchlists via a basic gateway).
I propose the transition to something more manageable (ans more useful) because it could also solve other problems (notably for the management of contnent maintenance projects or quality projects).
My watchlist fills up too rapidly and I have more and more difficulties to sort its generated notifications and organize things to do about them, and I'm certainly not alone. They have becme largely unusable (and the server iself cannot support them correctly without f).

He7d3r added a comment.Via ConduitSep 30 2014, 12:35 PM

(In reply to Philippe Verdy from comment #29)

The watchlist should also accept some badic level of compression (such as
matching all subpages of a parent page, by using the parent page followed by
a final slash, meaning that this watched item matches all subpages, or by

This is the same as bug 15072 (given the way pages are organized on Wikibooks projects).

If the watched page is a category, there should exist an option saying if we
watch only the category or all pages listed in them, or all pages we have
edited ourself and that are in that category

See bug 1710 and bug 7148.

(In reply to Philippe Verdy from comment #30)

Finally we could also have several watchlists per user; giving to them
user-defined names and working more or less like personal categories. When
adding a page to wath we could have a combobox to select which one to use
(more or less like "labels" in Gmail and other IMAP-based webmails).

Bug 5875 or bug 20444.

(In reply to Philippe Verdy from comment #32)

It is something to work on to REPLACE watchlists (or onlt support existing
baic watchlists via a basic gateway).
I propose the transition to something more manageable (ans more useful)
because it could also solve other problems (notably for the management of
contnent maintenance projects or quality projects).

Bug 33888.

Aklapper added a comment.Via ConduitSep 30 2014, 2:48 PM

(In reply to Philippe Verdy from comment #32)

It is fuly related to the subject

Yes, only *related*.
Hence please see the tickets that Helder was kind enough to identify...

bzimport added a comment.Via ConduitSep 30 2014, 3:11 PM

dcduring wrote:

AFAICT, all of those enhancements are low priority, having no more than 30 votes. This is at least a normal-priority bug.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.