1 | zfilipin@deploy1001:/srv/mediawiki-staging$ find . -mindepth 2 -maxdepth 2 -type f -path './php-*/README' -ctime +30 -exec dirname {} \; |
---|---|
2 | ./php-1.33.0-wmf.17 |
3 | |
4 | zfilipin@deploy1001:/srv/mediawiki-staging$ scap clean --delete 1.33.0-wmf.17 |
5 | ___ ____ |
6 | ⎛ ⎛ ,---- |
7 | \ //==--' |
8 | _//|,.·//==--' ____________________________ |
9 | _OO≣=- ︶ ᴹw ⎞_§ ______ ___\ ___\ ,\__ \/ __ \ |
10 | (∞)_, ) ( | ______/__ \/ /__ / /_/ / /_/ / |
11 | ¨--¨|| |- ( / ______\____/ \___/ \__^_/ .__/ |
12 | ««_/ «_/ jgs/bd808 /_/ |
13 | |
14 | 13:36:52 Checking for new runtime errors locally |
15 | 13:36:53 Started clean-l10nupdate-cache |
16 | 13:36:53 Finished clean-l10nupdate-cache (duration: 00m 00s) |
17 | 13:36:53 Started clean-l10nupdate-owned-files |
18 | 13:36:53 Finished clean-l10nupdate-owned-files (duration: 00m 00s) |
19 | 13:36:53 Started clean-ExtensionMessages |
20 | 13:36:53 Unable to delete /srv/mediawiki-staging/wmf-config/ExtensionMessages-1.33.0-wmf.17.php, already missing |
21 | 13:36:53 Finished clean-ExtensionMessages (duration: 00m 00s) |
22 | 13:36:53 Started prune-git-branches |
23 | Received disconnect from 2620:0:861:3:208:80:154:85 port 29418:2: Too many authentication failures: 7 |
24 | Authentication failed. |
25 | fatal: Could not read from remote repository. |
26 | |
27 | Please make sure you have the correct access rights |
28 | and the repository exists. |
29 | Received disconnect from 2620:0:861:3:208:80:154:85 port 29418:2: Too many authentication failures: 7 |
30 | Authentication failed. |
31 | fatal: Could not read from remote repository. |
32 | |
33 | ... |
Description
Description
Details
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • fsero | T219989 mwdebug2001 and mwdebug2002 "/" almost full | |||
Invalid | None | T218783 `scap clean` failure |
Event Timeline
Comment Actions
13:44:00 Started prune-git-branches
Received disconnect from 2620:0:861:3:208:80:154:85 port 29418:2: Too many authentication failures: 7
Authentication failed.
prune-git-branches comes from our scap plugin:
/operations/mediawiki-config(masteru=)$ git grep -A7 prune.git
scap/plugins/clean.py
with log.Timer('prune-git-branches', self.get_stats()): # Prune all the submodules' remote branches with utils.cd(self.branch_stage_dir): submodule_cmd = 'git submodule foreach "{} ||:"'.format( ' '.join(git_prune_cmd)) subprocess.check_output(submodule_cmd, shell=True) if subprocess.call(git_prune_cmd) != 0: logger.info('Failed to prune core branch')
Or to summarizes, our scap plugin does:
git submodule foreach git push origin --delete wmf/1.33.0-wmf.17
Which deletes the legacy branch.
Comment Actions
When scap clean is issued, on Gerrit sshd_log we eventually see:
AUTH FAILURE FROM 2620:0:861:103:10:64:32:16 no-matching-key
But ssh -p 29418 zfilipin@gerrit.wikimedia.org works.
So the insteadOf in .gitconfig works. Seems the proper ssh key is not passed when using scap? :-(
One thing that can be attempted is to manually replay the command that scap/plugins/clean.py is doing. Namely:
cd /srv/mediawiki-staging/php-1.33.0-wmf.17 git submodule foreach "git push origin --quiet --delete wmf/1.33.0-wmf.17 || :" ; echo $?
And for core:
cd /srv/mediawiki-staging/php-1.33.0-wmf.17 git push origin --quiet --delete wmf/1.33.0-wmf.17 ; echo $?
Comment Actions
Change 497781 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/mediawiki-config@master] scap: add logging to clean > prune-git-branches
Comment Actions
Mentioned in SAL (#wikimedia-operations) [2019-03-26T19:18:52Z] <marxarelli> scap clean failure due to T218783. train is rolling without cleanup
Comment Actions
I think what's happening is that scap overwrites the environment $SSH_AUTH_SOCK with whatever is in /etc/scap.cfg:ssh_auth_sock (which is /run/keyholder/proxy.sock)
Comment Actions
Change 502316 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/mediawiki-config@master] Train: scap clean, feature flag prune branches
Comment Actions
Change 502316 merged by jenkins-bot:
[operations/mediawiki-config@master] Train: scap clean, feature flag prune branches
Comment Actions
mwdebug2001 and mwdebug2002 are now full:
root@mwdebug2002:~# df -hT / Filesystem Type Size Used Avail Use% Mounted on /dev/vda1 ext4 39G 37G 0 100% / root@mwdebug2001:~# df -hT / Filesystem Type Size Used Avail Use% Mounted on /dev/vda1 ext4 39G 37G 0 100% /
Comment Actions
Since these are vms, can their disk be expanded easily? I ask because they're the oddballs in the mw fleet :) I know Tyler wants to fix this issue this week, though.
Comment Actions
To be precise, the last version of the train did not deploy correctly to any of the debug servers.
Deployments should be considered blocked (train and SWAT) until we have at least pruned some directories manually.
Comment Actions
Also please note that with the next train, the eqiad servers would fill their disk up as well.
Comment Actions
It's some work (basically, reimaging 2 servers) and more importantly takes time to complete. We will be doing it but it's not a painless quick fix.
Comment Actions
Ugh, I did "fix" scap clean (for some value of "fix" -- it should no longer fail, but it also doesn't do everything it used to -- namely deleting old branches on gerrit) , but we still need to run scap clean for versions 17-20.
Comment Actions
I ran it for version 17 yesterday and that seems to have worked. I'll clean up 18-20 today.
Comment Actions
Ok I ran scap clean for wmf.18-wmf.20 and it seems like it got things all cleaned up.
Comment Actions
The error now more shows up since the cleaning has been disabled behind a feature flag by https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/502316/
So one now has to run scap clean --delete-gerrit-branch which we do not do. The repository still has the old branches:
$ git ls-remote https://gerrit.wikimedia.org/r/mediawiki/core 'refs/heads/wmf/*' 5e425f78328119c494f85bde2b62bf882af957b3 refs/heads/wmf/1.34.0-wmf.13 7df499c9ac0da2ec9c5dfae79091a7dcfab0721b refs/heads/wmf/1.34.0-wmf.14 2cf6cfe13f23dca5430e47d337936f503bfe6115 refs/heads/wmf/1.34.0-wmf.15 4c69337a8a74c57294bf3d5b3ef8a73aec333b13 refs/heads/wmf/1.34.0-wmf.16 f84a4abb418de8e2c53c87f5a3dc1379acfd2f63 refs/heads/wmf/1.34.0-wmf.17 5d54761bb099434397d9e2accb3a2d396c07989c refs/heads/wmf/1.34.0-wmf.19 ddfeb42049c2ce35157ab9e941d24b294cbb2924 refs/heads/wmf/1.34.0-wmf.20 2286620289584e937ed9f595942988e28ba6b4f7 refs/heads/wmf/1.34.0-wmf.21 5a907677b69dd008498f170c3683e5d5e9e821b3 refs/heads/wmf/1.34.0-wmf.22 56f788d5bb941b109119c5ed374e1e49004776bf refs/heads/wmf/1.34.0-wmf.23 68ccab1e007112d8f45954088ac3c2fa4b0692d0 refs/heads/wmf/1.34.0-wmf.24 8df64470fdead29ddd42c88ee1ecba6e77d88311 refs/heads/wmf/1.34.0-wmf.24-bak 54dbe5f05df6636dc2c3f7f20616a7a2acd07f47 refs/heads/wmf/1.34.0-wmf.25 f01da543f11746acc1aa42cd46e02688b2b7b9de refs/heads/wmf/1.34.0-wmf.25-bak 0ba3e32a84a5960d266ed9cfba5e36ac4f1a9256 refs/heads/wmf/1.35.0-wmf.1 21e7707b532027baf3bd8f1563b920f237327ca8 refs/heads/wmf/1.35.0-wmf.2 50fc302ed798530164baf5839b375e099591d608 refs/heads/wmf/1.35.0-wmf.3 7af252a5be5b00646446532f61f59bd4111033d0 refs/heads/wmf/1.35.0-wmf.4 6ecbe69f4d1e5e9d2b15113036d3552bb674e4ec refs/heads/wmf/1.35.0-wmf.5
I had a patch to add at least some level of logging in the scap clean command: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497781/
Then @thcipriani wrote:
Gerrit's use of HTTP auth tokens has been temporarily removed. Until it
is re-enabled we rely on Gerrit's SSH auth. In the case of scap clean,
since scap overrides the SSH_AUTH_SOCK env var with a pointer to
keyholder, we are not authorized to prune any branches on Gerrit since
there are no appropriate keys in keyholderUntil either Gerrit's HTTP auth tokens are re-enabled, or we have a
shared key to prune branches in keyholder, we need to feature flag
branch prune so that we ensure that we're doing the rest of scap clean
(removing old branches on the application servers themselves).
I am in favor of creating a maintenance / branch cutter user in Gerrit with a ssh key hold in the deployment hosts keyholder :] Then we would need the branch deletion part to use that username (is that mwdeploy?) and drop the feature flag.
Comment Actions
The issue is with prune-git-branches. Apparently we are no more pruning wmf branches from extensions. AbuseFilter has all the branches since 1.35.0-wmf.1
Anyway that does not seem related to Gerrit , I am removing that tag.
Comment Actions
We've refactored this a few times, now no one runs scap clean manually, it's a part of scap train and it runs every week. I don't think this specific task is still actionable.
Comment Actions
The branch cleaning got moved behind a feature flag scap clean --delete-gerrit-branch which mean the wmf branches are never deleted and keep accumulating. I eventually wondered why they were no more deleted and later on filed that as T303828.
Tyler had an explanation which I had quoted at T218783#5639668