Page MenuHomePhabricator

Bingbot scraping tools?
Closed, ResolvedPublic

Description

Since yesterday, two of my tools (catscan2, glamtools) die every few hours (minutes?). I haven't changed anything there, so I thought it might be the latest Labs database trouble. However, access.log contains a lot of bingbot:

10.68.21.49 www.tools.wmflabs.org - [16/Feb/2016:13:53:45 +0000] "GET /glamtools/baglama.php?group=Images+from+the+National+Archives+and+Records+Administration&date=201211 HTTP/1.1" 200 86633 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Did the lighttp default config change? But why do only these tools die?

Event Timeline

Magnus raised the priority of this task from to Needs Triage.
Magnus updated the task description. (Show Details)
Magnus added a project: Cloud-Services.
Magnus added a subscriber: Magnus.

Now also seeing Yahoo Slurp:

10.68.21.49 tools.wmflabs.org - [16/Feb/2016:14:12:55 +0000] "HEAD /glamtools/glamorous.php HTTP/1.1" 200 0 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

I don't know what's going on, but many of my tools keep crashing, including catscan2, autolist, glamtools, all of which are of not insignificant value to the community. Looking at my overview tool
https://tools.wmflabs.org/magnustools/multistatus.html
it appears many connections don't close properly.

I don't think it's anything I did. I tried to force another DB server, but to no avail. Some help, please?

If there's a lot of open connections, that's probably T104799: lighttpd does not correctly close connections (CLOSE_WAIT). Could you check if there are indeed a large number of child php processes?

Lots of active connections:
https://tools.wmflabs.org/catscan2/server-statistics

Not sure which server to check for php processes.

I do have some code in PHP to force-close the connection:
header("Connection: close");
Doesn't seem to work though

Sorry, that's indeed not entirely obvious.

$ qstat -u 'tools.catscan2' -xml
...
      <queue_name>webgrid-lighttpd@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs</queue_name>
...

$ssh tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs

$ ps aux | grep catscan2
tools.c+ 24907  0.0  0.2 128520 23420 ?        Ss   14:41   0:01 /usr/sbin/lighttpd -f /var/run/lighttpd/catscan2.conf -D
valhall+ 32756  0.0  0.0  35700   940 pts/1    S+   15:28   0:00 grep --color=auto catscan2

$ ps aux | grep "tools.c"
valhall+   318  0.0  0.0  35696   956 pts/1    S+   15:29   0:00 grep --color=auto tools.c
tools.m+ 10901  0.1  0.7 108272 58912 ?        Ss   08:51   0:36 /usr/sbin/lighttpd -f /var/run/lighttpd/magnustools.conf -D
tools.c+ 12174  0.0  0.0  15016  1332 ?        S    09:14   0:14 /usr/lib/gamin/gam_server
tools.c+ 21948  0.0  0.1 334592 12752 ?        S    Feb16   0:50 /usr/bin/php-cgi
tools.c+ 21981  0.0  0.1 334580 11700 ?        S    Feb16   0:14 /usr/bin/php-cgi
tools.c+ 24931  0.0  0.1 336192 15208 ?        S    14:41   0:01 /usr/bin/php-cgi
tools.c+ 24932  0.3  0.1 336300 14432 ?        S    14:41   0:09 /usr/bin/php-cgi
tools.c+ 24933  0.4  0.1 336192 14156 ?        S    14:41   0:13 /usr/bin/php-cgi
tools.c+ 24935  0.0  0.1 335104 13900 ?        S    14:41   0:00 /usr/bin/php-cgi
tools.c+ 24936  0.3  0.1 335092 13148 ?        S    14:41   0:09 /usr/bin/php-cgi
tools.c+ 24937  0.0  0.1 334856 12040 ?        S    14:41   0:00 /usr/bin/php-cgi
tools.c+ 25266  0.0  0.0  15016   836 ?        S    Feb09   6:14 /usr/lib/gamin/gam_server
tools.c+ 25273  0.0  0.0  52456  2300 ?        Ss   Feb09   3:02 /usr/sbin/lighttpd -f /var/run/lighttpd/commonshelper.conf -D
tools.c+ 25295  0.0  0.1 329340 12296 ?        Ss   Feb09   0:00 /usr/bin/php-cgi
tools.c+ 25296  0.0  0.1 329340 12400 ?        Ss   Feb09   0:00 /usr/bin/php-cgi
tools.c+ 25297  0.0  0.1 331576 12652 ?        S    Feb09   0:04 /usr/bin/php-cgi
tools.c+ 25298  0.0  0.1 331576 12512 ?        S    Feb09   0:37 /usr/bin/php-cgi
tools.c+ 32767  0.0  0.0  23784  1656 ?        Rs   15:29   0:00 /usr/sbin/lighttpd -f /var/run/lighttpd/catscan2.conf -D

so not a crazy number of cgi processes either. Hrm.

Oh, but now active-requests is also back to 2...

max_exection_time is set to 30 in php.ini, so if it's just php, this should only happen for short periods of time.

I keep restarting it, because it becomes unresponsive. Restarted two times since my last comment here...

And four minutes later, back up to 24 active requests.

It does not seem to be reflected in the number of php processes, but:

fastcgi.active-requests: 226
(...)
fastcgi.requests: 233

does suggest it's related to requests hanging/waiting in a fcgi request.

Magnus triaged this task as High priority.Feb 20 2016, 2:31 PM

So I have no idea if it's the bots, or the database (enwiki_p, for example, is slow as hell), or something else, but catscan2 now becomes unusable a few minutes after a "webservice restart".

AFAICT it's nothing I did. This is a much-used tool. Someone, help. Please.

UPDATE: catscan2 seems to work now. Thanks to whoever fixed Labs.