Page MenuHomePhabricator

Scap deployment on Hadoop test cluster broken
Closed, ResolvedPublic

Description

Looks like analytics/refinery scap is still trying to deploy on an-test-client1001 decommissioned in profit of 1002.

A proposed patch: https://gerrit.wikimedia.org/r/c/analytics/refinery/scap/+/961390

Manually updating scap config to 1002 was not working. So may have to run some commands first.

scap deploy -e hadoop-test "Regular analytics weekly train TEST [analytics/refinery@$(git rev-parse --short HEAD)]"
13:35:09 Started deploy [analytics/refinery@223be0f] (hadoop-test)
13:35:09 Deploying Rev: HEAD = 223be0fb1aacc19fc923e23083e9a06534e1b355
13:35:09 Started deploy [analytics/refinery@223be0f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@223be0fb]
13:35:09
== DEFAULT ==
:* an-test-client1002.eqiad.wmnet
:* an-test-coord1001.eqiad.wmnet
13:35:14 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'analytics/refinery', '-g', 'default', 'fetch', '--refresh-config'] (ran as analytics-deploy@an-test-client1002.eqiad.wmnet) returned [70]: Registering scripts in directory '/srv/deployment/analytics/refinery-cache/revs/223be0fb1aacc19fc923e23083e9a06534e1b355/scap/scripts'
Fetch from: http://deploy2002.codfw.wmnet/analytics/refinery/.git
Running ['git', 'remote', 'set-url', 'origin', 'http://deploy2002.codfw.wmnet/analytics/refinery/.git'] with {'cwd': '/srv/deployment/analytics/refinery-cache/cache', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
Command exited with code 0
Running ['git', 'fetch', '--tags', '--jobs', '1', '--no-recurse-submodules'] with {'cwd': '/srv/deployment/analytics/refinery-cache/cache', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
Command exited with code 0
Running ['git', 'clone', '--jobs', '1', '--reference', '/srv/deployment/analytics/refinery-cache/cache', '/srv/deployment/analytics/refinery-cache/cache', '/srv/deployment/analytics/refinery-cache/revs/223be0fb1aacc19fc923e23083e9a06534e1b355'] with {'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
Command exited with code 0
Checkout rev: 223be0fb1aacc19fc923e23083e9a06534e1b355
Checking out rev: 223be0fb1aacc19fc923e23083e9a06534e1b355 at location: /srv/deployment/analytics/refinery-cache/revs/223be0fb1aacc19fc923e23083e9a06534e1b355
Running ['git', 'checkout', '--force', '--quiet', '223be0fb1aacc19fc923e23083e9a06534e1b355'] with {'cwd': '/srv/deployment/analytics/refinery-cache/revs/223be0fb1aacc19fc923e23083e9a06534e1b355', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
Command exited with code 0
Pulling large objects [using git-fat]
Running ['git', 'fat', 'init'] with {'cwd': '/srv/deployment/analytics/refinery-cache/revs/223be0fb1aacc19fc923e23083e9a06534e1b355', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
Command exited with code 1
Unhandled error:
deploy-local failed: <FailedCommand> {'exitcode': 1, 'stdout': '', 'stderr': "git: 'fat' is not a git command. See 'git --help'.\n\nThe most similar commands are\n\tfetch\n\tmktag\n\tstage\n\tstash\n\ttag\n\tvar\n"}

Event Timeline

I have manually installed git-fat onto an-test-client1002 so that you shold be able to complete the deployment.

btullis@an-test-client1002:~$ sudo apt install git-fat
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
  python3-debconf
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  git-fat
0 upgraded, 1 newly installed, 0 to remove and 14 not upgraded.
Need to get 10.1 kB of archives.
After this operation, 38.9 kB of additional disk space will be used.
Get:1 http://apt.wikimedia.org/wikimedia bullseye-wikimedia/main amd64 git-fat all 0.1.3-2+deb10u1 [10.1 kB]
Fetched 10.1 kB in 0s (369 kB/s)
su: warning: cannot change directory to /nonexistent: No such file or directory

You do not have a valid Kerberos ticket in the credential cache, remember to kinit.
INFO:debmonitor:Got 1 updates from dpkg hook version 3
INFO:debmonitor:Successfully sent the dpkg_hook update to the DebMonitor server
Selecting previously unselected package git-fat.
(Reading database ... 321976 files and directories currently installed.)
Preparing to unpack .../git-fat_0.1.3-2+deb10u1_all.deb ...
Unpacking git-fat (0.1.3-2+deb10u1) ...
Setting up git-fat (0.1.3-2+deb10u1) ...
btullis@an-test-client1002:~$

This is only possible because we have already retained python2 on this host, otherwise it would pull it in, which we don't want.
Really, we need to find a replacement for git-fat pretty soon.

@Antoine_Quhen - Are you happy for us to resolve the ticket? Have you tested another deployment to the Hadoop test cluster, since I installed git-fat on an-test-client1002?

Thanks @BTullis ! I've tested it. It's now resolved!