Page MenuHomePhabricator

Certain images failing to load in ulsfo
Closed, DuplicatePublic

Description

Details of the error are attached to the report, which will be publicly viewable. If you are not comfortable with that, you can edit the report below and remove all the data you don't want to share.

The error occurs in Chrome as well as Safari. The images appear to be consistent, but fairly random.

EDIT: This appears to be an issue with images in general since the raw image thumbnails aren't loading.

Error details:

error: could not load image from https://upload.wikimedia.org/wikipedia/commons/thumb/0/02/Joe_Sutherland_photo.jpg/2560px-Joe_Sutherland_photo.jpg
URL: https://wikimediafoundation.org/wiki/Staff_and_contractors#/media/File:Joe_Sutherland_photo.jpg
user agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
screen size: 1440x900
canvas size: 1397x690
image size: 5184x3456
thumbnail size: CSS: 1039x690, screen width: 2078, real width: 2560

Event Timeline

jrbs renamed this task from MediaViewer failing to load certain images to Certain images failing to load.Aug 30 2016, 6:18 AM
jrbs updated the task description. (Show Details)

Worked for me. This error message is used for HTTP errors; not impossible that it's a MediaViewer bug (it seemed to happen a lot at one point, never figured out why - cf T115563) but usually just a poor connection.

Worked for me. This error message is used for HTTP errors; not impossible that it's a MediaViewer bug (it seemed to happen a lot at one point, never figured out why - cf T115563) but usually just a poor connection.

This is consistent on a 200Mbps wifi connection as well as an LTE connection. Though admittedly both in San Francisco. I imagine this is probably an error with the images, but you are right, likely not with MediaViewer.

AlexMonk-WMF renamed this task from Certain images failing to load to Certain images failing to load in ulsfo.Aug 30 2016, 8:18 AM
AlexMonk-WMF subscribed.
alex@alex-laptop:~$ curl -Ik -H "Host: upload.wikimedia.org" https://upload-lb.{esams,codfw,eqiad,ulsfo}.wikimedia.org/wikipedia/commons/thumb/0/02/Joe_Sutherland_photo.jpg/2560px-Joe_Sutherland_photo.jpg 2>/dev/null | grep Content-Length:
Content-Length: 412261
Content-Length: 412261
Content-Length: 412261
Content-Length: 0

the fact that ulsfo fails but not the others might be related to varnish 4, Traffic recently switched ulsfo cache_misc

Yes, we finished upgrading cache_upload in ulsfo to Varnish 4 yesterday: T131502.

I've banned the specific image from the frontends in ulsfo and I now get the right Content-Length. Please let Traffic know if the issue pops up again.

ema triaged this task as High priority.Aug 30 2016, 8:56 AM

Unfortunately quite a few requests on all ulsfo upload frontends are affected, as confirmed with:

varnishlog -n frontend -q 'RespHeader ~ "Content-Length: 0" and RespStatus == 200 and ReqURL ~ "^/wikipedia/"'

The same query on a v3 frontend in eqiad does not yield any output:

varnishlog -n frontend -m TxHeader:"Content-Length: 0" -m TxStatus:200

Also, ulsfo upload backends don't seem to be affected. A rolling restart of the frontends in ulsfo is probably the easiest way to fix this.

Also, ulsfo upload backends don't seem to be affected. A rolling restart of the frontends in ulsfo is probably the easiest way to fix this.

Banning empty objects with status code 200 seems like a better idea. :) I've just done that.

Mentioned in SAL [2016-08-30T14:39:58Z] <ema> banning objects with status code 200 and content-length 0 from upload backends in ulsfo T144257

Mentioned in SAL [2016-08-30T14:46:54Z] <ema> banning objects with status code 200 and content-length 0 from upload frontends in ulsfo T144257

Change 307742 had a related patch set uploaded (by Ema):
upload VCL: do not cache objects with CL:0 and status 200

https://gerrit.wikimedia.org/r/307742

Change 307742 merged by Ema:
upload VCL: do not cache objects with CL:0 and status 200

https://gerrit.wikimedia.org/r/307742

Change 307964 had a related patch set uploaded (by Ema):
Revert "Upgrade upload ulsfo to Varnish 4"

https://gerrit.wikimedia.org/r/307964

Change 307964 merged by Ema:
Revert "Upgrade upload ulsfo to Varnish 4"

https://gerrit.wikimedia.org/r/307964

@JasperStPierre reported another occurrence of this issue on IRC. The reporting time was 2016-09-14 22:18 UTC. I'm not sure what the exact repro time was.

https://upload.wikimedia.org/wikipedia/commons/thumb/1/18/Bartagame_fcm.jpg/800px-Bartagame_fcm.jpg
Content-Length: 0
X-Cache: cp1048 hit/2, cp2020 hit/8, cp4007 miss, cp4005 miss

cp4007's backend was affected by T145661 between 21:06 and 22:29 and the same URL was involved in a 503 LRU_Fail at 22:23:40:

  • ObjHeader X-Cache-Int: cp1048 hit/2, cp2020 hit/14

Change 310767 had a related patch set uploaded (by Ema):
cache_upload fe: do not set do_stream=true on Varnish 4

https://gerrit.wikimedia.org/r/310767

Change 310767 merged by Ema:
cache_upload fe: do not set do_stream=true on Varnish 4

https://gerrit.wikimedia.org/r/310767

The varnish error triggering this seems to be:

-- FetchError straight insufficient bytes

Full varnishlog here: https://phabricator.wikimedia.org/P4053.

Is anybody actively investigating this? / Does this need more investigation? Or did the merged patches eliminate the issue?

The patch has worked for my draft article https://en.wikipedia.org/wiki/Draft:Flags_of_the_Imperial_Austrian_Army_of_the_Napoleonic_Wars absolutely fine. Thanks, Dave Hollins
----Original message----
From : no-reply@phabricator.wikimedia.org
Date : 01/10/2016 - 07:43 (GMTST)
To : david.a.hollins@btinternet.com
Subject : [Maniphest] [Commented On] T144257: Certain images failing to load in ulsfo
Aklapper added a comment.
Is anybody actively investigating this? / Does this need more investigation? Or did the merged patches eliminate the issue?
TASK DETAIL
https://phabricator.wikimedia.org/T144257
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Aklapper
Cc: MZMcBride, JasperStPierre, Allyn, DaveHMBA, gerritbot, Jalexander, Stashbot, ema, fgiunchedi, AlexMonk-WMF, Tgr, Aklapper, jrbs, Lewizho99, Maathavan, faidon, Jay8g, Krenair

Is anybody actively investigating this? / Does this need more investigation? Or did the merged patches eliminate the issue?

I don't think the do_stream patch above had any effect, but this issue is most likely a part of the same bug as T145661 where we've been working more actively, and is probably effectively resolved now, although we're still doing trailing work in that ticket. Will merge it up.