+++ This bug was initially created as a clone of Bug #41130 comment 87 +++
Splitting off from bug 41130 comment 87 and 88.
It appears that the htcp purges are not clearing the upload squids in esams. It appeared to briefly be working, but is broken again.
Furthermore, it appears that non-image related htcp purges are working fine (So articles are getting purged properly). I did the equivalent procedure described below but with an article, and everything worked fine.
Examples of broken files include http://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/Wappen_Landkreis_Aurich.svg/140px-Wappen_Landkreis_Aurich.svg.png
Steps to reproduce:
- Find a file that is in the caches (note, its ok if the correct version is in the cache). In this example lets take http://upload.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg (I'm using the svg source, to ensure that this bug is not possibly mixed up with bug 44428)
- Get the file, accessing from both esams and eqiad. Note the http headers, specifically the age header.
To test from esams I used the command:
wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'
(Occasionally a varnish responded to that, which didn't have the described problem, If that happens just make the request again).
This had an age header like:
Age: 753820
Which is ~8 days
To test from eqiad I used the command (Actually, I originally just used upload.wikimedia.org since I'm in north america, but I'm trying to make the steps to reproduce generic):
wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'
It had an age header of
Age: 11199
- Purge the image description page: http://commons.wikimedia.org/wiki/File:Wappen_Landkreis_Aurich.svg?action=purge
Expected behaviour:
Doing
wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'
and
wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'
Should both result in responses that either have cache miss headers, or an age header that is very small.
Actual behavior:
North america ( wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' ) gets the expected response:
[..] Age: 0 X-Cache: cp1030 miss (0), cp1029 frontend miss (0)
Europe ( wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' ) does not have the cache cleared
[..] Age: 754915 X-Cache: HIT from sq84.wikimedia.org X-Cache: MISS from amssq62.esams.wikimedia.org X-Cache: MISS from amssq61.esams.wikimedia.org
In previous tests I did, things worked yesterday, and there have been some users reporting success here and there, which suggests this issue is either intermittent or recently broken. However, I did try this procedure several times, so its not just that my packet happened to be lost (Unless a significant portion of the htcp packets are being lost).
Several users have reported that broken thumbs are magically fixed one out of every so often page loads. I think that has to do with sometimes a varnish server responds from esams, and it doesn't appear to have the cache clearing issue (the headers say in that case that the cache hit is from cp1030, which also seems to respond from eqiad, so I'm guessing that server is actually in eqiad(?). This is going quite a bit beyond my knowladge of wmf's network layout).
Version: wmf-deployment
Severity: critical