Twice today while deploying the wikimedia/search/analytics repository a deploy failed with our promote script reporting that a wheel package is not in the expected format. Logging into stat1005.eqiad.wmnet and checking /srv/deployment/wikimedia/discovery/analytics/artifacts/ confirms that one or more files were not initialized. Looking through the relevant logs generated by scap on deploy1001 (first scap-sync-2018-08-16-0007-1-g76dddd2.log, then scap-sync-2018-08-16-0014-1-ge9e1e21.log ). Nothing looks out of the ordinary in these logs, the code is synced and git-fat is run. This is currently intermittent as well, the same repo may deploy fine when done again with --force.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/debs/git-fat | debian | +8 -0 | Release 0.1.3 with both upstream and wikimedia improvements |
Related Objects
Event Timeline
In the past this could have been caused by a git index cache race condition; however, I had hoped we'd solved that previously (T147856#2885665 is the summary). I'll dig into logs and see if anything else jumps out as the cause.
It is that same problem!
The current version of git-fat doesn't have my commit in it: https://github.com/wikimedia/operations-debs-git-fat/commit/0e3abb0c5e8b1e4d81470397ec17138c6d24d9e8
If you look on deploy1001:
[thcipriani@deploy1001 analytics (master % u=)]$ grep -A1 'trick' $(which git-fat) # also does the trick. os.utime(fname, None)
Tagging SRE to repackage git-fat.
Adding @Ottomata since they were the person who initially helped me get git-fat packaged after my tweak.
Change 454017 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/git-fat@debian] Release 0.1.3 with both upstream and wikimedia improvements
Don't totally remember the context here, but I just built a new version of git-fat with your fix, and installed on deploy1001
Unpacking git-fat (0.1.3-1~stretch1) over (0.1.2-1~stretch1) ...
Change 454017 merged by Ottomata:
[operations/debs/git-fat@debian] Release 0.1.3 with both upstream and wikimedia improvements
Yeah, we'll need to update the targets since that's where we run into the problem the most. The problem can be summarized as running git fat init and git fat pull within the same second as you created your git workdir which only happens in scap, really. Maybe there's some magic cumin command that could take care of the update?
we just updated it for wqds* hosts. If that worked fine for @Gehel and Erik, we'll update the rest of the flee (all nodes!) with @MoritzMuehlenhoff when he's back around.
Mentioned in SAL (#wikimedia-operations) [2018-08-29T18:31:28Z] <ottomata> debdeploy git-fat update for all nodes - T202100
Done, worked everywhere except:
The following hosts were unreachable: cloudservices1004.wikimedia.org
FYI, I've upstreamed my commit to git-fat: https://github.com/jedbrown/git-fat/commit/2fad58fd0631ed8dcb77358bb9b80e4cd091d3fe
It looks like T201341: rack/setup/install cloudservices1004.wikimedia.org is done. Can anyone with access to that box confirm that git fat is on 0.1.3-2? And then we can call this task resovled :)
[cloudservices1004:~] $ dpkg -l git-fat .. ii git-fat 0.1.3-2~jessie1 all Manage large files with git, without checking