Page MenuHomePhabricator

Security test load caused search and page loads extremely slow on beta cluster
Closed, ResolvedPublic

Description

I am observing degrade in the performance of search in Betalabs which includes searching for matching link target names,images, category names, templates etc.
In some cases , it is taking even 7-8 secs to bring up the matched results.


Version: unspecified
Severity: normal

Details

Reference
bz70103

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:31 AM
bzimport set Reference to bz70103.
bzimport added a subscriber: Unknown Object (MLST).

I see it taking a long time too. The load on the search servers is quite low. Beta's ganglia seems to be down so I can't see what it is historically. Is this a new thing? I always remember beta being slow slow slow.

18:29 < bd808> For some currently unknown reason varnish is not caching anything

That may have been my user-agent doing something strange.

Antoine also found that we were undergoing a high rate vulnerability scan from a volunteer.

We have some security audit being run on the beta cluster. Unfortunately the script is not throttled and cause a fair amount of queries on the backend server.

We apparently only had one (hhvm) application server until today which does not help either.

I have blacklisted the volunteer IP on the beta cluster varnish caches using:

ip route add blackhole x.y.z.a/32

Actual IP can be found by using 'route -n'.

To remove the blacklist one can:

ip route add blackhole "THE IP ADDRESS/32"

gjg@deployment-bastion:/data/project/logs$ grep -c REDACTED xff.log
4960
gjg@deployment-bastion:/data/project/logs/archive$ zgrep -c REDACTED xff.log-20140*
...bunch of 0s...
xff.log-20140816.gz:0
xff.log-20140817.gz:0
xff.log-20140818.gz:0
xff.log-20140819.gz:0
xff.log-20140820:0
xff.log-20140821:2034
xff.log-20140824:199048
xff.log-20140827:184299

And total lines:

 9130 xff.log-20140820
 20285 xff.log-20140821
208037 xff.log-20140824
197124 xff.log-20140827

Basically, 99% of the traffic to the Beta Cluster was from this tool.

This is the root cause of the slowness. The awesome volunteer will throttle his bot for us.

The udp2log-mw service on deployment-bastion.eqiad.wmflabs logs the average number of packets it receives per second over 5 minutes.

The file is /var/log/udp2log/udp2log.log

It shows up we went from 0.0xx k/s to 2.500 k/s which indicates a huge amount of requests being done.

Closing this.

Thanks Rummana for the heads up and for all who helped debug this multilayered issue.

Its slow while searching for pages, link target names, media files etc but page loading is fine at my end.

I just sat next to Rummana to see the symptoms. There a bit sporatic but noticable.

I'll open a new bug and consider this one closed and just for the security testing that was going on.