Twice today while deploying the wikimedia/search/analytics repository a deploy failed with our promote script reporting that a wheel package is not in the expected format. Logging into stat1005.eqiad.wmnet and checking /srv/deployment/wikimedia/discovery/analytics/artifacts/ confirms that one or more files were not initialized. Looking through the relevant logs generated by scap on deploy1001 (first scap-sync-2018-08-16-0007-1-g76dddd2.log, then scap-sync-2018-08-16-0014-1-ge9e1e21.log ). Nothing looks out of the ordinary in these logs, the code is synced and git-fat is run. This is currently intermittent as well, the same repo may deploy fine when done again with --force.
|operations/debs/git-fat||debian||+8 -0||Release 0.1.3 with both upstream and wikimedia improvements|
It is that same problem!
The current version of git-fat doesn't have my commit in it: https://github.com/wikimedia/operations-debs-git-fat/commit/0e3abb0c5e8b1e4d81470397ec17138c6d24d9e8
If you look on deploy1001:
[thcipriani@deploy1001 analytics (master % u=)]$ grep -A1 'trick' $(which git-fat) # also does the trick. os.utime(fname, None)
Tagging SRE to repackage git-fat.
Don't totally remember the context here, but I just built a new version of git-fat with your fix, and installed on deploy1001
Unpacking git-fat (0.1.3-1~stretch1) over (0.1.2-1~stretch1) ...
Yeah, we'll need to update the targets since that's where we run into the problem the most. The problem can be summarized as running git fat init and git fat pull within the same second as you created your git workdir which only happens in scap, really. Maybe there's some magic cumin command that could take care of the update?
It looks like T201341: rack/setup/install cloudservices1004.wikimedia.org is done. Can anyone with access to that box confirm that git fat is on 0.1.3-2? And then we can call this task resovled :)