Page MenuHomePhabricator

refreshLinks.php eats memory => fatal error
Closed, ResolvedPublic

Description

Author: al2.baeckeroot

Description:
$ php refreshLinks.php wikidb
50
..../....
7150

Fatal error: Allowed memory size of 67108864 bytes exhausted
(tried to allocate 224 bytes)
in /Big/Wikipedia/mediawiki-1.3.8/includes/Parser.php on line
787

hmm, that's 64M for Php request, i think it should be enough ;)

Then i tried to restart from here , having modfied the .inc to
have reporting interval of 1 instead of 50, to find the
offending article, and to start at 7150 and it goes .... until
a crash later:
17510
17511

Fatal error: Allowed memory size of 67108864 bytes exhausted
(tried to allocate 17833 bytes)
in /Big/Wikipedia/mediawiki-1.3.8/includes/MagicWord.php on
line 174

It looks like a memory "leak" in the script ?


Version: 1.3.x
Severity: normal
OS: Linux
Platform: PC

Details

Reference
bz1101

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:06 PM
bzimport set Reference to bz1101.
bzimport added a subscriber: Unknown Object (MLST).

domas.mituzas wrote:

could be one of my memory-leak patches not backported from HEAD.
I'll take a look over there.

al2.baeckeroot wrote:

Debian/Sarge athlon 2Ghz 750 MB
mysql 4.0 php 4.3

php client : memory_limit = 64 M !!! That should be enough ;)

I have written a small bash script, which split the task in several
blocks, which can eventually be run simultaneously:

For 5000 articles (begin of french cur.DB) it always take 30 min (on a
mono CPU machine) :

  • 1 x 5000
  • 1 tube with 10 x 500 article
  • 40 tube in parralel with 5 x 25 articles
  • 10 x( 5 *250 )

etc ...

It works fine and ALWAYS take the same time, but the memory use vary.

al2.baeckeroot wrote:

I think this have to do with the LinkCache or other kind of caches. One
seems to be filled, incrinsingly , but needs a lot of place (this may be
a feature, not a bug, if so it should be documented)

This cache is probably useless in maintenance, because :

  • we never do the same thing,
  • the time needed is always the same: for 1 x 5000 articles, or for 50

consecutive run of 100 articles. That mean the cache is local to the
article, but useless for a new one, so in maintenance (at least) this
cache could be flushed after each article has been processed.

hmm, well ! TODO : find the offending cache and kill it ;) It might take
some time, for the moment i understand very little the cache stuff...

This was probably mostly the ever-growing $wgLinkHolders array
(bug 1132, now fixed). A smaller (and not data-destructive) leak
would have been the cache for Title::newFromText(), which is now
capped so it doesn't grow indefinitely in batch operations.