Page MenuHomePhabricator

kmlexport perl script memory usage
Closed, DeclinedPublic

Description

/data/project/kmlexport/public_html//kmlexport.pl

was using excessive memory on tools-webgrid-lighttpd-1402, causing puppet to fail due to lack of memory. There were six processes, each using ~8% memory (=650M x 6 is almost 4GB). In addition, the lighttpd process itself was also using ~5% memory. The webgrid nodes have 8GB memory, so this was problematic.

Killed the processes and restarted the webservice, but this needs a more long-term solution.

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
DeclinedNone

Event Timeline

valhallasw raised the priority of this task from to Needs Triage.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Toolforge.
valhallasw subscribed.
valhallasw moved this task from Backlog to Ready to be worked on on the Toolforge board.

Happening again:

  PID USER                PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
19202 tools.kmlexport     20   0  474656 423336   1648 S   0.0  5.2   3:11.94 /usr/sbin/lighttpd -f /var/run/lighttpd/kmlexport.conf -D
22130 tools.kmlexport     20   0  487712 318620   3092 S   2.7  3.9  33:15.10 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22306 tools.kmlexport     20   0  487708 316136   3092 S   1.4  3.9  33:28.56 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22264 tools.kmlexport     20   0  487716 315612   3092 S   1.4  3.9  32:24.78 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22332 tools.kmlexport     20   0  487716 315348   3092 S   2.7  3.9  33:01.57 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22126 tools.kmlexport     20   0  487716 313124   3104 S   1.4  3.8  32:14.82 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22160 tools.kmlexport     20   0  487716 312436   3092 S   1.8  3.8  32:44.59 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22298 tools.kmlexport     20   0  487712 311908   3092 S   0.9  3.8  32:04.44 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22361 tools.kmlexport     20   0  487708 311376   3092 S   1.4  3.8  31:40.89 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
22334 tools.kmlexport     20   0  487708 310928   3092 S   1.8  3.8  31:51.22 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl

These are all processes that actually started over 24 hours ago:

root@tools-webgrid-lighttpd-1403:/home/valhallasw# ls -ld /proc/{22130,22306,22264,22332,22126,22160,22298,22361,22334}
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:04 /proc/22126
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:04 /proc/22130
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:05 /proc/22160
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:08 /proc/22264
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:09 /proc/22298
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:09 /proc/22306
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:10 /proc/22332
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:10 /proc/22334
dr-xr-xr-x 9 tools.kmlexport tools.kmlexport 0 May 15 19:11 /proc/22361

so the main issue seems to be that there's a way to cause long-running processes to start, and they aren't being killed after a reasonable timeout (say, 1 minute, for http requests). All request seem to come from the same user: they are all requesting

root@tools-webgrid-lighttpd-1403:/home/valhallasw# sed /proc/{22130,22306,22264,22332,22126,22160,22298,22361,22334}/environ -e 's/\x00/\n/g' | grep REQUEST_URI
REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Fran%C3%A7a

and user agent:
HTTP_USER_AGENT=Wikipedia-OSM tool by Plenz

which is used more widely, unfortunately.

Killing them again for now.

And again...

20892 tools.kmlexport     20   0  322464 183984   3088 S   2.0  2.3  15:58.87 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
 9888 tools.kmlexport     20   0  272124 142968   3088 S   3.6  1.7  11:48.41 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
 9950 tools.kmlexport     20   0  272120 142232   3088 S   1.0  1.7  12:12.27 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
 9866 tools.kmlexport     20   0  272120 140920   3088 S   2.7  1.7  11:48.92 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
 9909 tools.kmlexport     20   0  272120 140920   3088 S   3.6  1.7  11:34.11 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
 9925 tools.kmlexport     20   0  272128 140328   3088 S   2.3  1.7  12:08.40 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
 9934 tools.kmlexport     20   0  272120 140060   3088 S   3.0  1.7  11:38.29 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
 5247 tools.kmlexport     20   0  212944  92628   3092 R   7.3  1.1   2:29.61 /usr/bin/perl /data/project/kmlexport/public_html//kmlexport.pl
# root@tools-webgrid-lighttpd-1403:/home/valhallasw# ps aux | grep kmlexport | sed -e 's/^[a-z\.\+]*\s*\([0-9]*\).*/\1/' | while read pid; do echo $pid `sed /proc/$pid/environ -e 's/\x00/\n/g' | grep REQUEST_URI`; done
5247 REQUEST_URI=/kmlexport/?project=no&l=0&article=Kategori%3AS%25C3%25B8r-Amerika&redir=bing
9866 REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Geografia_de_la_pen%C3%ADnsula_Ib%C3%A8rica
9888 REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Geografia_de_la_pen%C3%ADnsula_Ib%C3%A8rica
9909 REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Geografia_de_la_pen%C3%ADnsula_Ib%C3%A8rica
9925 REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Geografia_de_la_pen%C3%ADnsula_Ib%C3%A8rica
9934 REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Geografia_de_la_pen%C3%ADnsula_Ib%C3%A8rica
9950 REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Geografia_de_la_pen%C3%ADnsula_Ib%C3%A8rica
sed: can't read /proc/15974/environ: No such file or directory
15974
19202
20892 REQUEST_URI=/kmlexport/?project=ca&l=0&article=Categoria:Estats_d'Europa

Killed them again.

A 'solution' now is to remove the raised memlimits for kmlexport - it's back at 4G now. So hypothesis is that it'll hit the limit, GridEngine will kill the processes, and service monitor will restart. Maybe.

I'm actually wondering whether 4GB is enough. I just killed a few on tools-webgrid-lighttpd-1401 because puppet couldn't run because of kmlexport. The total memory usage of the processes combined was ~25% =2 GB. Without kmlexport, the server was already using 4-5GB of the 8GB available.

pasted_file (308×586 px, 23 KB)

What I don't get, though, is why Puppet can't run with 2GB memory available.