Page MenuHomePhabricator

forceSearchIndex.php hangs at the end of the process when running on large wikis
Closed, ResolvedPublic

Description

It happened twice on large wikis when running forceSearchIndex.php after a reindex.
The process prints the last message
Indexed a total of 127258 pages at 2/second
and then hangs (for ever?).

strace seems to report tons of short lived threads being created: P6404
gdb backtrace: P6401

ps and htop show that the process is actively using the CPU:

UID        PID  PPID  C STIME TTY          TIME CMD
www-data 30146 30144 69 Nov29 pts/4    21:05:32 php5 /srv/mediawiki-staging/multiversion/MWScript.php extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki enwiki --from 2017-11-27T18:11:01Z
ps  -T -o pid,tid,user,tty,stat,pcpu,rss,vsz,time,cmd -g $(ps -o sid= -p 30146)
  PID   TID USER     TT       STAT %CPU   RSS    VSZ     TIME CMD
17134 17134 dcausse  pts/4    S+    0.0  3028  13264 00:00:00 bash reindex.sh wiki_en_group2.lst
30139 30139 dcausse  pts/4    S+    0.0  3068  13244 00:00:00 /bin/bash /usr/local/bin/mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki enwiki --from 2017-11-27T18:
30140 30140 dcausse  pts/4    S+    0.0  1784   5812 00:00:00 tee -a ./cirrus_log/enwiki.reindex.log
30144 30144 root     pts/4    S+    0.0  3504  40544 00:00:00 sudo -u www-data php5 /srv/mediawiki-staging/multiversion/MWScript.php extensions/CirrusSearch/maintenance/forceSearchIndex.ph
30146 30146 www-data pts/4    Rl+  69.5 4641292 5701404 21:08:55 php5 /srv/mediawiki-staging/multiversion/MWScript.php extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki enwik
30146 30350 www-data pts/4    Sl+   0.1 4641292 5701404 00:03:32 php5 /srv/mediawiki-staging/multiversion/MWScript.php extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki enwik

I'll keep the script running in case someone wants to debug it further.

Event Timeline

dcausse created this task.Nov 30 2017, 1:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
debt triaged this task as Normal priority.Nov 30 2017, 6:02 PM
debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.

P6410 has PHP backtrace. Looks like statsd is trying to send out huge amount of data. I have a vague recollection we already dealt with this kind of problem, but do not remember what exactly was the fix. I'll try to look it up.

Smalyshev added a comment.EditedNov 30 2017, 7:21 PM

Looks like we have Maintenance::disablePoolCountersAndLogging() in Cirrus Maintenance.php to work around this exact problem (T165203: Archive reindex gets stuck), but for some reason this did not work.
Most probably the reason is because ObjectCache initializes the statsd factory very early on setup and we fail to override it later. We need to find better solution then than the hack in disablePoolCountersAndLogging().

Change 394380 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/CirrusSearch@master] Disable statsd collection instead of replacing statsd

https://gerrit.wikimedia.org/r/394380

Change 394380 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Disable statsd collection instead of replacing statsd

https://gerrit.wikimedia.org/r/394380

debt closed this task as Resolved.Dec 8 2017, 8:48 PM
debt claimed this task.