The issues described in T154780 have been caused by varnishd crashes triggered by multiple concurrent node depools/repools in codfw. The crashes need to be investigated further. See https://phabricator.wikimedia.org/P4724 for a crash log sample.
Related incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/2017-01-06_Cache-upload