Page MenuHomePhabricator

Wikimedia servers won't serve images
Closed, ResolvedPublic

Description

Details of the error are attached to the report, which will be publicly viewable. If you are not comfortable with that, you can edit the report below and remove all the data you don't want to share.

Error details:

error: could not load image from https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Falcon_9_Flight_20_OG2_first_stage_post-landing_%2823273082823%29_cropped.jpg/800px-Falcon_9_Flight_20_OG2_first_stage_post-landing_%2823273082823%29_cropped.jpg
URL: https://en.wikipedia.org/wiki/Main_Page#/media/File:Falcon_9_Flight_20_OG2_first_stage_post-landing_(23273082823)_cropped.jpg
user agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36
screen size: 1680x1050
canvas size: 1108x573
image size: 1104x1472
thumbnail size: CSS: 431x573, screen width: 646.5, real width: 800

Event Timeline

Ssoulakiotis raised the priority of this task from to Needs Triage.
Ssoulakiotis updated the task description. (Show Details)
Ssoulakiotis added a project: MediaViewer.
Ssoulakiotis subscribed.
Glaisher subscribed.

Works for me too but someone else also did report this error on #wikimedia. He said he was located at the UK.

Request from 10.20.0.183 via cp3048 frontend ([10.20.0.183]:80), Varnish XID 1428481750
Forwarded for: 84.80.97.16, 10.20.0.183
Error: 429, Request Rate Exceeded at Sat, 26 Dec 2015 17:32:14 GMT

By this error, I meant that images do only load intermittently.

Also quite some 5xx 45mins ago.. and later a big group of 4xx. Maybe something triggered an internal limit ?

https://grafana.wikimedia.org/dashboard/db/varnish-http-errors

Glaisher triaged this task as Unbreak Now! priority.Dec 26 2015, 5:48 PM

Others are also reporting this.
<yannf> https://commons.wikimedia.org/wiki/Special:NewFiles no image :/

Probably related to <icinga-wm> PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]

I haven't encountered this so far and this location is not served by esams.

faidon set Security to None.
faidon added a subscriber: BBlack.

This is all very preliminary but this appears to have happened:

  • cp3048 ran out of memory due to what it looks like a memory leak (T122455)
  • The OOM killer killed the varnish-frontend
  • puppet started varnish-frontend again
  • cp3048 started throwing 429 (TBF rate limit) errors for some reason (corrupted bdb? surge of traffic? something else?)

I stopped varnish-frontend, cleaned up /run/vmod_tbf and kept a backup copy on my home directory, started varnish-frontend again. I initially got reports of errors still happening, so I stopped the varnish-frontend entirely to investigate further; after a while puppet started varnish-frontend again, but this time everything works.

I'll investigate further and report back with a few actionables.

Thanks faidon, it looks like the images are back on the wikis. Good luck on the investigation part.

Two days later: Does this problem still happen?

faidon claimed this task.

@Aklapper this should be fixed now, yeah. There are a couple of other bugs here, one is tracked with T122455, the other one is for a feature that got removed today ("TBF") because of this bug. Let's resolve this for now, yes.