Page MenuHomePhabricator

upload.beta.wmflabs.org is throwing 503s
Closed, ResolvedPublic

Description

I was trying to repro T75787: [Regression pre-wmf10] upload.beta.wmflabs.org is throwing 503s so Math function parsing is completely broken inside VE and hit 503s from upload, just loading the main page (the main logo didn't load):

Request: GET http://upload.beta.wmflabs.org/wikipedia/en/b/bc/Wiki.png, from 98.234.250.248 via deployment-cache-upload02 frontend ([10.68.17.51]:80), Varnish XID 1630474886
Forwarded for: 98.234.250.248
Error: 503, Service Unavailable at Tue, 25 Nov 2014 20:23:34 GMT

Event Timeline

greg raised the priority of this task from to High.
greg updated the task description. (Show Details)
greg changed Security from none to None.
greg moved this task from To Triage to Next: Maintenance on the Beta-Cluster-Infrastructure board.
greg added subscribers: greg, hashar, mmodell, Reedy.

The two VE-related bugs are closed, but I'm still getting a 503 when trying to load the Beta Cluster logo: http://upload.beta.wmflabs.org/wikipedia/en/b/bc/Wiki.png

greg raised the priority of this task from High to Unbreak Now!.Nov 25 2014, 11:31 PM

@hashar / @Reedy: Can you take a look at this ASAP?

hashar added a subscriber: BBlack.

Logging on deployment-cache-upload02.eqiad.wmflabs which is the varnish in charge of serving upload

$ varnishncsa -n frontend
blabla miss/503 blba
blabla miss/503 boba
blabla miss/503 boba
blabla miss/503 beer
blabla miss/503 bob
$

So the Varnish frontend emits 503 because it can not contact the varnish backend which is not surprising since the backend is not there:

# ps -u varnish  f|cat
  PID TTY      STAT   TIME COMMAND
 6607 ?        Sl   214:03 /usr/sbin/varnishd -P /var/run/varnishd-frontend.pid ...

Attempting to start it yields:

# /etc/init.d/varnish start
 * Starting HTTP accelerator                                                                                                                                                       [fail] 
sizeof(struct smp_ident) = 112 = 0x70
sizeof(struct smp_sign) = 40 = 0x28
sizeof(struct smp_segptr) = 32 = 0x20
sizeof(struct smp_object) = 56 = 0x38
WARNING: (-spersistent) file size reduced to 19770756300 (80% of available disk space)
min_nseg = 10, max_segl = 1976655455
max_nseg = 104850, min_segl = 188522
aim_nseg = 1023, aim_segl = 19322145
free_reserve = 193221450
sizeof(struct smp_ident) = 112 = 0x70
sizeof(struct smp_sign) = 40 = 0x28
sizeof(struct smp_segptr) = 32 = 0x20
sizeof(struct smp_object) = 56 = 0x38
WARNING: (-spersistent) file size reduced to 19770756300 (80% of available disk space)
Could not mmap SILO (/srv/vdb/varnish.main2) at target 0x7efcfa33c000, was mapped at 0x7f61d2649000 instead

I have mentioned that to @BBlack a few months ago and I am pretty sure he came with a fix in Varnish for that mmap error and I did upgrade the varnish whenever the new package has been made available. Should probably be filled as a new task for investigation.

The workaround is to delete cache files for the backend: rm /srv/vdb/varnish.* and restart:

# /etc/init.d/varnish start
 * Starting HTTP accelerator                [ OK ]
#
# ps -u varnish f
  PID TTY      STAT   TIME COMMAND
13746 ?        Sl     0:00 /usr/sbin/varnishd -P /var/run/varnishd.pid ...
 6607 ?        Sl   214:04 /usr/sbin/varnishd -P /var/run/varnishd-frontend.pid ...

Confirmed it works by hitting http://upload.beta.wmflabs.org/wikipedia/en/b/bc/Wiki.png?really :

# varnishncsa -n frontend
deployment-cache-upload02.eqiad.wmflabs 1 2014-11-25T23:42:36 0.004356384 82.x.x.x \
 miss/200 22120 GET http://upload.beta.wmflabs.org/wikipedia/en/b/bc/Wiki.png??? - - - - Mozilla/5.0...
 ^^^^^^^^

Magic.

For later reference:

deployment-cache-upload02:~$ apt-cache policy varnish
varnish:
  Installed: 3.0.5plus~x-wm7
  Candidate: 3.0.5plus~x-wm7
  Version table:
 *** 3.0.5plus~x-wm7 0
       1001 http://apt.wikimedia.org/wikimedia/ precise-wikimedia/main amd64 Packages
        100 /var/lib/dpkg/status
     3.0.2-1ubuntu0.1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ precise-updates/universe amd64 Packages
     3.0.2-1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ precise/universe amd64 Packages

Thanks for unbreaking this one.All related bugs are resolved now.

greg lowered the priority of this task from Unbreak Now! to Lowest.Nov 26 2014, 12:03 AM
greg moved this task from Next: Maintenance to Done on the Beta-Cluster-Infrastructure board.
greg raised the priority of this task from Lowest to Unbreak Now!.Nov 26 2014, 12:08 AM
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald Transcript