In T427450 we could see that the mwext-codehealth-master-non-voting job is the slowest one syncing data using Castor. It looks like the cache is dirty all the time and we sync it.
Could we just disable saving cache for this job? Delete everything and let it drift over time, always downloading new stuff.
Or investigate if something is saved to the cache that is changed every time ans shouldn't be saved?
If we can delete this bottleneck, it will help all other jobs to get a shorter queue time to save their cache.
There was a build this morning that waited 6m35s for the job to start:
00:13:59.135 Waiting for the completion of castor-save-workspace-cache 00:20:35.295 castor-save-workspace-cache #6662819 started.
Looking for saves that took longer than 100 seconds all point to mwext-codehealth-master-non-voting:
$ grep -l -P '<duration>\d\d\d\d\d\d+' /srv/jenkins/builds/castor-save-workspace-cache/*/build.xml |xargs grep -A1 TRIGGERED_JOB_NAME|grep value /srv/jenkins/builds/castor-save-workspace-cache/6670854/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6671387/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6671868/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6672133/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6672160/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6672824/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6674308/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6675449/build.xml- <value>mwext-codehealth-master-non-voting</value> /srv/jenkins/builds/castor-save-workspace-cache/6675904/build.xml- <value>mwext-codehealth-master-non-voting</value>
On integration-castor06.integration.eqiad1.wikimedia.cloud, that cache holds 166k files and is 17GBytes:
$ sudo find /srv/castor/castor-mw-ext-and-skins/master/mwext-codehealth-master-non-voting -type f|wc -l 166205
$ sudo du -hs /srv/castor/castor-mw-ext-and-skins/master/mwext-codehealth-master-non-voting 17G /srv/castor/castor-mw-ext-and-skins/master/mwext-codehealth-master-non-voting