A user (details known to @Bawolff) reported in #wikimedia-tech that they are seeing broken images on the Wikimedia Commons home page.
Specifically, viewing https://commons.wikimedia.org/wiki/Main_Page in their browser (Firefox 62) led to a broken image (404 Not Found) for today's picture of the day, as requested from https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Sphinx_at_Universitetskaya_Embankment_(img1).jpg/500px-Sphinx_at_Universitetskaya_Embankment_(img1).jpg.
For most users (including @Joe, @faidon and myself) opening the above url results in the expected JPEG thumbnail of https://commons.wikimedia.org/wiki/File:Sphinx_at_Universitetskaya_Embankment_(img1).jpg.
But our logs confirm that the request was made to our servers, and did get a 404 Not Found response:
{ "hostname": "cp3041.esams.wmnet", .. "dt":"2018-10-17T20:..", .. "cache_status":"hit-local", "http_status":"404", .. "http_method":"GET", "uri_host":"upload.wikimedia.org", "uri_path":"/wikipedia/commons/thumb/e/ec/Sphinx_at_Universitetskaya_Embankment_%28img1%29.jpg/500px-Sphinx_at_Universitetskaya_Embankment_%28img1%29.jpg", "uri_query":"", "content_type":"text/html; charset=utf-8", "referer":"https://commons.wikimedia.org/", "user_agent":"Mozilla/5.0 (Windows .. rv:62.0) .. Firefox/62.0", .. "x_cache":"cp1081 pass, cp3033 hit/1, cp3041 miss" }
Upon closer inspection we realised this was a request with a upload.wikimedia.org directed at a cache_text Varnish server. This is odd because the upload.wikimedia.org hostname is meant to resolve to a load balancer that directs to the cache_upload cluster of Varnish servers.
In other words, the request was made by the user's browser to the wrong IP address / connection.
The 404 Not Found response is correct and expected for the given request to the given server. The question is: Why was this request directed to text-lb?
What we know so far is that it probably does affect multiple users (not an isolated incident) and the issue may've started around October 11, according to a Hive query for wmf_raw.webrequest, and a query on Turnilo for webrequest_sampled_128.
Results of the latter pictured below (src):