Page MenuHomePhabricator

Intermittent git-fat failure during deploy
Closed, ResolvedPublic

Description

Twice today while deploying the wikimedia/search/analytics repository a deploy failed with our promote script reporting that a wheel package is not in the expected format. Logging into stat1005.eqiad.wmnet and checking /srv/deployment/wikimedia/discovery/analytics/artifacts/ confirms that one or more files were not initialized. Looking through the relevant logs generated by scap on deploy1001 (first scap-sync-2018-08-16-0007-1-g76dddd2.log, then scap-sync-2018-08-16-0014-1-ge9e1e21.log ). Nothing looks out of the ordinary in these logs, the code is synced and git-fat is run. This is currently intermittent as well, the same repo may deploy fine when done again with --force.

Event Timeline

thcipriani moved this task from Needs triage to Debt on the Scap board.
thcipriani subscribed.

In the past this could have been caused by a git index cache race condition; however, I had hoped we'd solved that previously (T147856#2885665 is the summary). I'll dig into logs and see if anything else jumps out as the cause.

thcipriani moved this task from Debt to External/Watching on the Scap board.

It is that same problem!

The current version of git-fat doesn't have my commit in it: https://github.com/wikimedia/operations-debs-git-fat/commit/0e3abb0c5e8b1e4d81470397ec17138c6d24d9e8

If you look on deploy1001:

[thcipriani@deploy1001 analytics (master % u=)]$ grep -A1 'trick' $(which git-fat)                                                                    
                # also does the trick.                                                                                                                
                os.utime(fname, None)

Tagging SRE to repackage git-fat.

Adding @Ottomata since they were the person who initially helped me get git-fat packaged after my tweak.

Change 454017 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/git-fat@debian] Release 0.1.3 with both upstream and wikimedia improvements

https://gerrit.wikimedia.org/r/454017

Don't totally remember the context here, but I just built a new version of git-fat with your fix, and installed on deploy1001

Unpacking git-fat (0.1.3-1~stretch1) over (0.1.2-1~stretch1) ...

It might need to also be updated on targets hm.

Change 454017 merged by Ottomata:
[operations/debs/git-fat@debian] Release 0.1.3 with both upstream and wikimedia improvements

https://gerrit.wikimedia.org/r/454017

It might need to also be updated on targets hm.

Yeah, we'll need to update the targets since that's where we run into the problem the most. The problem can be summarized as running git fat init and git fat pull within the same second as you created your git workdir which only happens in scap, really. Maybe there's some magic cumin command that could take care of the update?

we just updated it for wqds* hosts. If that worked fine for @Gehel and Erik, we'll update the rest of the flee (all nodes!) with @MoritzMuehlenhoff when he's back around.

we just updated it for wqds* hosts. If that worked fine for @Gehel and Erik, we'll update the rest of the flee (all nodes!) with @MoritzMuehlenhoff when he's back around.

Did this happen? Anything left to do here?

Mentioned in SAL (#wikimedia-operations) [2018-08-29T18:31:28Z] <ottomata> debdeploy git-fat update for all nodes - T202100

Done, worked everywhere except:

The following hosts were unreachable:
cloudservices1004.wikimedia.org

The following hosts were unreachable:
cloudservices1004.wikimedia.org

should be because of T201341#4543268

Done, worked everywhere except:

The following hosts were unreachable:
cloudservices1004.wikimedia.org

It looks like T201341: rack/setup/install cloudservices1004.wikimedia.org is done. Can anyone with access to that box confirm that git fat is on 0.1.3-2? And then we can call this task resovled :)

[cloudservices1004:~] $ dpkg -l git-fat
..
ii  git-fat              0.1.3-2~jessie1 all             Manage large files with git, without checking
Dzahn assigned this task to Ottomata.