Page MenuHomePhabricator

beta-code-update-eqiad: fatal: No url found for submodule path {repo} in .gitmodules
Open, Needs TriagePublic

Description

Repeated failures, see https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/449119/console

13:15:21 12:15:21 https://gerrit.wikimedia.org/r/mediawiki/skins checked out at commit 225e32a657f6995da4b8ff884875675abc308a5f
13:15:25 12:15:25 Finished scap prep auto (duration: 02m 24s)
13:15:25 12:15:25 Unhandled error:
13:15:25 Traceback (most recent call last):
13:15:25   File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/cli.py", line 530, in run
13:15:25     exit_status = app.main(app.extra_arguments)
13:15:25   File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/plugins/prep.py", line 164, in main
13:15:25     self._prep_mw_branch(version, logger, apply_patches=self.arguments.apply_patches)
13:15:25   File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/plugins/prep.py", line 211, in _prep_mw_branch
13:15:25     update_update_strategy(dest_dir)
13:15:25   File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/plugins/prep.py", line 39, in update_update_strategy
13:15:25     git.gitcmd("submodule", "foreach", "--recursive", base_cmd, cwd=path)
13:15:25   File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/runcmd.py", line 91, in gitcmd
13:15:25     return _runcmd(["git", subcommand] + list(args), **kwargs)
13:15:25   File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/runcmd.py", line 78, in _runcmd
13:15:25     raise FailedCommand(argv, p.returncode, stdout, stderr)
13:15:25 scap.runcmd.FailedCommand: Command 'git submodule foreach --recursive /usr/bin/git -C /srv/mediawiki-staging/php-master config submodule.$name.update rebase' failed with exit code 128;
13:15:25 stdout:
13:15:25 
13:15:25 stderr:
13:15:25 fatal: No url found for submodule path 'ApiFeatureUsage' in .gitmodules
13:15:25 
13:15:25 12:15:25 prep failed: <FailedCommand> Command 'git submodule foreach --recursive /usr/bin/git -C /srv/mediawiki-staging/php-master config submodule.$name.update rebase' failed with exit code 128;
13:15:25 stdout:
13:15:25 
13:15:25 stderr:
13:15:25 fatal: No url found for submodule path 'ApiFeatureUsage' in .gitmodules
13:15:25 
13:15:25 Build step 'Execute shell' marked build as failure
13:15:26 IRC notifier plugin: Sending notification to: #wikimedia-releng
13:15:34 No emails were triggered.
13:15:34 Finished: FAILURE

Some previous runs (e.g. https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/449111/console) failed with:

11:54:09 stderr:
11:54:09 /usr/lib/git-core/git-submodule: 567: cd: can't cd to LdapAuthentication
11:54:09 Unable to find current revision in submodule path 'LdapAuthentication'

Event Timeline

TheresNoTime triaged this task as Unbreak Now! priority.Jun 21 2023, 12:43 PM

Guess this is blocking testing things on beta (ci-test-error (WMF-deployed Build Failure)?), so setting UBN!

Mentioned in SAL (#wikimedia-releng) [2023-06-21T12:47:47Z] <TheresNoTime> deployment-prep: [samtar@deployment-deploy03 ~]$ sudo -u jenkins-deploy scap prep auto --no-log-message --verbose T340030

Verbose run didn't really give anything useful :/

12:48:47 https://gerrit.wikimedia.org/r/mediawiki/skins checked out at commit 225e32a657f6995da4b8ff884875675abc308a5f
12:48:47 Updating submodules
12:48:47 git submodule sync
12:48:47 Running ['git', 'submodule', 'sync', '--recursive'] with {'cwd': '/srv/mediawiki-staging/php-master/skins', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
12:48:47 Command exited with code 0
12:48:47 Fetch submodules
12:48:47 Using upstream submodules
12:48:47 Running ['git', 'submodule', 'update', '--init', '--recursive', '--jobs', '2', '--checkout', '--force'] with {'cwd': '/srv/mediawiki-staging/php-master/skins', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
12:48:51 Command exited with code 0
12:48:51 Running ['git', 'remote', 'set-url', '--push', 'origin', 'ssh://gerrit.wikimedia.org:29418/mediawiki/skins'] with {'cwd': '/srv/mediawiki-staging/php-master/skins', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
12:48:51 Command exited with code 0
12:48:51 Running ['git', 'add', '--all'] with {'cwd': '/srv/mediawiki-staging/php-master', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
12:48:51 Command exited with code 0
12:48:51 Running ['git', 'commit', '--quiet', '-m', 'scap prep auto setup'] with {'cwd': '/srv/mediawiki-staging/php-master', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
12:48:51 Command exited with code 0
12:48:51 Running ['git', 'submodule', 'foreach', '--recursive', '/usr/bin/git -C /srv/mediawiki-staging/php-master config submodule.$name.update rebase'] with {'cwd': '/srv/mediawiki-staging/php-master', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
12:48:51 Command exited with code 128
12:48:51 Finished scap prep auto (duration: 02m 00s)
12:48:51 Unhandled error:
Traceback (most recent call last):
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/cli.py", line 530, in run
    exit_status = app.main(app.extra_arguments)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/plugins/prep.py", line 164, in main
    self._prep_mw_branch(version, logger, apply_patches=self.arguments.apply_patches)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/plugins/prep.py", line 211, in _prep_mw_branch
    update_update_strategy(dest_dir)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/plugins/prep.py", line 39, in update_update_strategy
    git.gitcmd("submodule", "foreach", "--recursive", base_cmd, cwd=path)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/runcmd.py", line 91, in gitcmd
    return _runcmd(["git", subcommand] + list(args), **kwargs)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/runcmd.py", line 78, in _runcmd
    raise FailedCommand(argv, p.returncode, stdout, stderr)
scap.runcmd.FailedCommand: Command 'git submodule foreach --recursive /usr/bin/git -C /srv/mediawiki-staging/php-master config submodule.$name.update rebase' failed with exit code 128;
stdout:

stderr:
fatal: No url found for submodule path 'ApiFeatureUsage' in .gitmodules

12:48:51 prep failed: <FailedCommand> Command 'git submodule foreach --recursive /usr/bin/git -C /srv/mediawiki-staging/php-master config submodule.$name.update rebase' failed with exit code 128;
stdout:

stderr:
fatal: No url found for submodule path 'ApiFeatureUsage' in .gitmodules
hashar subscribed.
$ scap prep auto
...
scap.runcmd.FailedCommand: Command 'git submodule foreach --recursive /usr/bin/git -C /srv/mediawiki-staging/php-master config submodule.$name.update rebase' failed with exit code 128;

I think that is the root cause. On Beta-Cluster-Infrastructure we clone the repositories as such:

PathRepoBranchSubmodules?
/srv/mediawiki-staging/php-mastermediawiki/coremasterNO
/srv/mediawiki-staging/php-master/extensionsmediawiki/extensionsmasterYes
/srv/mediawiki-staging/php-master/skinsmediawiki/skinsmasterYes

Where as in production, for a given wmf version we clone A SINGLE REPO:

PathRepoBranchSubmodules?
/srv/mediawiki-staging/php-1.41.0-wmf.Xmediawiki/corewmf/1.41.0-wmf.XYES

And in production the wmf branch has the extensions and skins in submodules. Thus scap backport can do a submodule update from the mediawiki/core root. On beta that is not possible since mediawiki/core has no submodules.

Mentioned in SAL (#wikimedia-releng) [2023-06-21T13:08:08Z] <TheresNoTime> deployment-prep: [samtar@deployment-deploy03 mediawiki-staging (master u=)]$ sudo puppet agent -tv T340030, nb. taking a while to do corrective actions..

^

[samtar@deployment-deploy03 mediawiki-staging (master u=)]$ sudo puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(22958ce0f91) root - fix-staging-perms: set group name from Puppet'
Notice: /Stage[main]/Profile::Beta::Mediawiki_packages/Package[lilypond/buster-backports]/ensure: created (corre                                                                                                                        ctive)
Notice: /Stage[main]/Profile::Beta::Mediawiki_packages/Package[lilypond-data/buster-backports]/ensure: created (                                                                                                                        corrective)
Notice: /Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-core]/Exec[git_set_origin_beta-mediawiki-core]/                                                                                                                        returns: executed successfully (corrective)
Info: Git::Clone[beta-mediawiki-core]: Scheduling refresh of Exec[/bin/rm -r /srv/mediawiki-staging/php-master/e                                                                                                                        xtensions]
Notice: /Stage[main]/Beta::Autoupdater/Exec[/bin/rm -r /srv/mediawiki-staging/php-master/extensions]: Triggered                                                                                                                         'refresh' from 1 event
Notice: /Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-extensions]/File[/srv/mediawiki-staging/php-mas                                                                                                                        ter/extensions]/ensure: created (corrective)
Notice: /Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-extensions]/Exec[git_clone_beta-mediawiki-exten                                                                                                                        sions]/returns: executed successfully (corrective)
Notice: Applied catalog in 331.49 seconds

it tried to correct something there, but still fails with fatal: No url found for submodule path 'ApiFeatureUsage' in .gitmodules

$ scap prep auto
...
scap.runcmd.FailedCommand: Command 'git submodule foreach --recursive /usr/bin/git -C /srv/mediawiki-staging/php-master config submodule.$name.update rebase' failed with exit code 128;

I think that is the root cause. On Beta-Cluster-Infrastructure we clone the repositories as such:

PathRepoBranchSubmodules?
/srv/mediawiki-staging/php-mastermediawiki/coremasterNO
/srv/mediawiki-staging/php-master/extensionsmediawiki/extensionsmasterYes
/srv/mediawiki-staging/php-master/skinsmediawiki/skinsmasterYes

Where as in production, for a given wmf version we clone A SINGLE REPO:

PathRepoBranchSubmodules?
/srv/mediawiki-staging/php-1.41.0-wmf.Xmediawiki/corewmf/1.41.0-wmf.XYES

And in production the wmf branch has the extensions and skins in submodules. Thus scap backport can do a submodule update from the mediawiki/core root. On beta that is not possible since mediawiki/core has no submodules.

Oh wait, it's looking for .gitmodules in /srv/mediawiki-staging/php-master, but it's in /srv/mediawiki-staging/php-master/extensions ?

I think there is a race condition with Puppet git::clone. The beta-code-update-eqiad triggered at 9:23:00 UTC and failed at 09:24:09 UTC:

09:24:09 /usr/lib/git-core/git-submodule: 567: cd: can't cd to LdapAuthentication
09:24:09 Unable to find current revision in submodule path 'LdapAuthentication'

Looking at the Puppet log on deployment-deploy03:

Jun 21 09:23:43 Loading facts
Jun 21 09:23:55 Caching catalog for deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud
Jun 21 09:23:55 Applying configuration version '(09f0cdfaba3) root - fix-staging-perms: set group name from Puppet'
Jun 21 09:24:04 (/Stage[main]/Profile::Beta::Mediawiki_packages/Package[lilypond/buster-backports]/ensure) created (corrective)
Jun 21 09:24:05 (/Stage[main]/Profile::Beta::Mediawiki_packages/Package[lilypond-data/buster-backports]/ensure) created (corrective)
Jun 21 09:24:07 (/Stage[main]/Scap::Master/Git::Clone[operations/mediawiki-config]/Exec[git_set_origin_operations/mediawiki-config]/returns) executed successfully
Jun 21 09:24:07 (/Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-core]/Exec[git_set_origin_beta-mediawiki-core]/returns) executed successfully
Jun 21 09:24:07 (Git::Clone[beta-mediawiki-core]) Scheduling refresh of Exec[/bin/rm -r /srv/mediawiki-staging/php-master/extensions]
Jun 21 09:24:20 (/Stage[main]/Beta::Autoupdater/Exec[/bin/rm -r /srv/mediawiki-staging/php-master/extensions]) Triggered 'refresh' from 1 event
Jun 21 09:24:20 (/Stage[main]/Beta::Autoupdater/Git::Clone[beta-portal]/Exec[git_set_origin_beta-portal]/returns) executed successfully
Jun 21 09:24:20 (/Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-extensions]/File[/srv/mediawiki-staging/php-master/extensions]/ensure) created (corrective)
Jun 21 09:29:31 (/Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-extensions]/Exec[git_clone_beta-mediawiki-extensions]/returns) executed successfully (corrective)
Jun 21 09:29:31 (/Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-extensions]/Exec[git_set_origin_beta-mediawiki-extensions]/returns) executed successfully
Jun 21 09:29:31 (/Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-skins]/Exec[git_set_origin_beta-mediawiki-skins]/returns) executed successfully
Jun 21 09:29:31 (/Stage[main]/Beta::Autoupdater/Git::Clone[mediawiki/vendor]/Exec[git_set_origin_mediawiki/vendor]/returns) executed successfully
Jun 21 09:29:31 (/Stage[main]/Scap::Master/Git::Clone[repos/releng/scap]/Exec[git_set_origin_repos/releng/scap]/returns) executed successfully
Jun 21 09:29:38 Applied catalog in 343.10 seconds

Thus Puppet went to:

Jun 21 09:24:20 (/Stage[main]/Beta::Autoupdater/Exec[/bin/rm -r /srv/mediawiki-staging/php-master/extensions]) Triggered 'refresh' from 1 event

Which surely causes the git submodule update to fail.

Oh wait, it's looking for .gitmodules in /srv/mediawiki-staging/php-master, but it's in /srv/mediawiki-staging/php-master/extensions ?

Yes the extensions are in the super project mediawiki/extensions.git cause on beta cluster php-master holds mediawiki/core.git @ master which does NOT have any submodules :]

Notice: /Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-core]/Exec[git_set_origin_beta-mediawiki-core]/ returns: executed successfully (corrective)

This is caused because we have scap::master and beta::autoupdater managing the same repo in /srv/mediawiki-staging each with different origin urls so each one of theses is updating the origin url on every puppet run. The rm -rf is happening because there is an exec to rm -rf the extenstions directory everytime there is a change to Git::Clone['beta-mediawiki-core']

Mentioned in SAL (#wikimedia-releng) [2023-06-21T13:28:55Z] <TheresNoTime> deployment-prep: [samtar@deployment-deploy03 php-master (master *% u=)]$ sudo rm -rfv /srv/mediawiki-staging/php-master/* T340030

Mentioned in SAL (#wikimedia-releng) [2023-06-21T13:28:55Z] <TheresNoTime> deployment-prep: [samtar@deployment-deploy03 php-master (master *% u=)]$ sudo rm -rfv /srv/mediawiki-staging/php-master/* T340030

following with sudo puppet agent -tv, /srv/mediawiki-staging/php-master/ directories appear to be being created successfully so far..

Change 931936 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: disable beta cluster jobs

https://gerrit.wikimedia.org/r/931936

Change 931936 merged by jenkins-bot:

[integration/config@master] jjb: disable beta cluster jobs

https://gerrit.wikimedia.org/r/931936

Mentioned in SAL (#wikimedia-releng) [2023-06-21T13:44:57Z] <TheresNoTime> deployment-prep: [samtar@deployment-deploy03 php-master (master *% u=)]$ sudo -u jenkins-deploy scap prep auto --no-log-message --verbose T340030

^ above got further than previously, but then error'd with:

13:46:50 Update https://gerrit.wikimedia.org/r/mediawiki/skins (master branch) in /srv/mediawiki-staging/php-master/skins
13:46:50 Fetching from origin
13:46:50 Running ['git', 'remote', 'set-url', 'origin', 'https://gerrit.wikimedia.org/r/mediawiki/skins'] with {'cwd': '/srv/mediawiki-staging/php-master/skins', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
13:46:50 Command exited with code 0
13:46:50 Running ['git', 'fetch', '--tags', '--jobs', '2', '--no-recurse-submodules', 'origin', 'master'] with {'cwd': '/srv/mediawiki-staging/php-master/skins', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
13:46:51 Command exited with code 0
13:46:51 Running ['git', 'config', 'core.sharedRepository', 'group'] with {'cwd': '/srv/mediawiki-staging/php-master/skins', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
13:46:51 Command exited with code 0
13:46:51 Running ['git', 'log', 'HEAD..@{upstream}'] with {'cwd': '/srv/mediawiki-staging/php-master/skins', 'stdout': -1, 'stderr': -1, 'text': True, 'stdin': -3}
13:46:51 Command exited with code 128
13:46:51 Finished scap prep auto (duration: 02m 17s)
13:46:51 prep failed: <FailedCommand> Command 'git log HEAD..@{upstream}' failed with exit code 128;
stdout:

stderr:
fatal: no such branch: 'HEAD..'

Mentioned in SAL (#wikimedia-releng) [2023-06-21T13:55:58Z] <TheresNoTime> deployment-prep: Pulled skins/, then sudo -u jenkins-deploy scap prep auto --verbose T340030

This time failed with:

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
fatal: Unable to create '/srv/mediawiki-staging/php-master/extensions/.git/modules/ContentStabilization/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
fatal: Unable to create '/srv/mediawiki-staging/php-master/extensions/.git/modules/DiscussionTools/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
fatal: Unable to create '/srv/mediawiki-staging/php-master/extensions/.git/modules/ElectronPdfService/index.lock': File exists.

unsure if it conflicted with a puppet run maybe?

Change 931941 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] Revert "jjb: disable beta cluster jobs"

https://gerrit.wikimedia.org/r/931941

*aaaaaah!* — the sync-world failed right at the end;

14:25:20 php-fpm-restart: 100% (in-flight: 0; ok: 2; fail: 0; left: 0)
14:25:20 Finished php-fpm-restarts (duration: 00m 00s)
14:25:20 Running /usr/local/bin/mwscript purgeMessageBlobStore.php
14:25:20 sudo_check_call running sudo -u www-data -n PHP="php7.4" -- /usr/local/bin/mwscript purgeMessageBlobStore.php
14:25:20 Error: The PagedTiffHandler extension cannot be loaded. Check that all of its files are installed properly.
14:25:20
14:25:20 #0 /srv/mediawiki-staging/php-master/includes/GlobalFunctions.php(52): ExtensionRegistry->queue('/srv/mediawiki-...')
14:25:20 #1 /srv/mediawiki-staging/wmf-config/CommonSettings.php(719): wfLoadExtension('PagedTiffHandle...')
14:25:20 #2 /srv/mediawiki-staging/php-master/LocalSettings.php(4): require('/srv/mediawiki-...')
14:25:20 #3 /srv/mediawiki-staging/php-master/includes/Setup.php(210): require_once('/srv/mediawiki-...')
14:25:20 #4 /srv/mediawiki-staging/php-master/maintenance/run.php(49): require_once('/srv/mediawiki-...')
14:25:20 #5 /srv/mediawiki-staging/multiversion/MWScript.php(144): require_once('/srv/mediawiki-...')
14:25:20 #6 {main}
14:25:20 Fatal error: Error Loading extension. Unable to open file /srv/mediawiki-staging/php-master/extensions/PagedTiffHandler/extension.json: filemtime(): stat failed for /srv/mediawiki-staging/php-master/extensions/PagedTiffHandler/extension.json in /srv/mediawiki-staging/php-master/includes/registration/MissingExtensionException.php on line 100
14:25:20 Last output:
Error: The PagedTiffHandler extension cannot be loaded. Check that all of its files are installed properly.

#0 /srv/mediawiki-staging/php-master/includes/GlobalFunctions.php(52): ExtensionRegistry->queue('/srv/mediawiki-...')
#1 /srv/mediawiki-staging/wmf-config/CommonSettings.php(719): wfLoadExtension('PagedTiffHandle...')
#2 /srv/mediawiki-staging/php-master/LocalSettings.php(4): require('/srv/mediawiki-...')
#3 /srv/mediawiki-staging/php-master/includes/Setup.php(210): require_once('/srv/mediawiki-...')
#4 /srv/mediawiki-staging/php-master/maintenance/run.php(49): require_once('/srv/mediawiki-...')
#5 /srv/mediawiki-staging/multiversion/MWScript.php(144): require_once('/srv/mediawiki-...')
#6 {main}
Fatal error: Error Loading extension. Unable to open file /srv/mediawiki-staging/php-master/extensions/PagedTiffHandler/extension.json: filemtime(): stat failed for /srv/mediawiki-staging/php-master/extensions/PagedTiffHandler/extension.json in /srv/mediawiki-staging/php-master/includes/registration/MissingExtensionException.php on line 100
14:25:20 Unhandled error:
Traceback (most recent call last):
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/cli.py", line 530, in run
    exit_status = app.main(app.extra_arguments)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/main.py", line 813, in main
    return super().main(*extra_args)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/main.py", line 188, in main
    self._after_cluster_sync()
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/main.py", line 871, in _after_cluster_sync
    tasks.clear_message_blobs(self.config)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/utils.py", line 401, in context_wrapper
    return func(*args, **kwargs)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/tasks.py", line 902, in clear_message_blobs
    utils.sudo_check_call("www-data", cmd)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/utils.py", line 401, in context_wrapper
    return func(*args, **kwargs)
  File "/var/lib/scap/scap/lib/python3.7/site-packages/scap/utils.py", line 530, in sudo_check_call
    raise subprocess.CalledProcessError(proc.returncode, cmd)
subprocess.CalledProcessError: Command '/usr/local/bin/mwscript purgeMessageBlobStore.php' returned non-zero exit status 255.
14:25:20 scap failed: CalledProcessError Command '/usr/local/bin/mwscript purgeMessageBlobStore.php' returned non-zero exit status 255. (duration: 18m 22s)

Mentioned in SAL (#wikimedia-releng) [2023-06-21T14:43:20Z] <TheresNoTime> deployment-prep: [samtar@deployment-deploy03]$ sudo disable-puppet "T340030" T340030, seeing if it *is* puppet to blame here

Ah that is never ending :] From IRC:

14:24:45 deployment-deploy03 puppet-agent[31824]: (/Stage[main]/Beta::Autoupdater/Exec[/bin/rm -r /srv/mediawiki-staging/php-master/extensions]) Triggered 'refresh' from 1 event

That is Puppet deleting the extensions repo and thus MediaWiki can't find /srv/mediawiki-staging/php-master/extensions/PagedTiffHandler/extension.json anymore since that got deleted.

The reason is:

14:24:24 deployment-deploy03 puppet-agent[31824]: (/Stage[main]/Beta::Autoupdater/Git::Clone[beta-mediawiki-core]/Exec[git_set_origin_beta-mediawiki-core]/returns) executed successfully (corrective)

That is Puppet emitting a notification cause Git::Clone[beta-mediawiki-core] and we subscribe to that event to delete php-master/extensions. The use case is that on a fresh host, after cloning mediawiki/core we need to remove its extensions directory to clone mediawiki/extensions.git there (else git clone complains about the directory already existing).

The issue comes from Puppet change to git::clone made this morning: c707c24768f494af0e65e74484fbe9d3e536b754 . It set the git remote origin which I guess is always executed.

Fixes would be:

A) have git clone to only set the origin when it is changed (I caught that on codereview and there is guard: unless => "[ \"\$(${git} remote get-url ${remote_name})\" = \"${remote}\" ]", but that is apparently not working?

B) do not rm php-master/extensions when it is already an extension (that is defined in `modules/beta/manifests/autoupdater.php).

Change 931949 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] beta: avoid erasing extensions when already present

https://gerrit.wikimedia.org/r/931949

A) have git clone to only set the origin when it is changed (I caught that on codereview and there is guard: unless => "[ \"\$(${git} remote get-url ${remote_name})\" = \"${remote}\" ]", but that is apparently not working?

We have a puppet run with debug log on deployment-deploy03:

Jun 21 13:13:19 deployment-deploy03 puppet-agent[9220]: Executing with uid=jenkins-deploy gid=wikidev: '/bin/sh -c [ "$(/usr/bin/git remote get-url origin)" = "https://gerrit.wikimedia.org/r/mediawiki/core.git" ]'
Jun 21 13:13:19 deployment-deploy03 puppet-agent[9220]: (Exec[git_set_origin_beta-mediawiki-core](provider=shell)) Executing '["/bin/sh", "-c", "/usr/bin/git remote set-url origin https://gerrit.wikimedia.org/r/mediawiki/core.git"]'

I tested on one of the CI agents (they do use git::clone) and there is no refresh happening (and I have confirmed there are indeed some Exec[git_set_origin*] resources defined. So I think the guard to prevent git set-remote is correct. No clue WHY it does not work on beta though :\

I have cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/931949 to the puppet master, had to fix [ which was missing the path and replaced it with /usr/bin/test in PS3.

Running Puppet on deployment twice yields the same output without any git clone refreshing so I think that solves the issue.

No clue WHY it does not work on beta though :

I think it dose work on beta. ignore what i was saying before about it happening. I think what we saw was something along the lines of

  • i merged the git::clone patch
  • deployment-deploy03 runs puppet
    • Exec[git_set_origin_beta-mediawiki-extensions] and Exec[git_set_origin_beta-mediawiki-core] executes
    • Exec[git_set_origin_beta-mediawiki-core] notifies the resource to rm -rf ${stage_dir}/php-master/extensions": (the folder used by Exec[git_set_origin_beta-mediawiki-extensions])
  • In the next run the resource for Exec[git_set_origin_beta-mediawiki-extensions] checks if it needs to run again
    • at this point the subfolder no longer ${stage_dir}/php-master/extensions is no longer a git folder
    • as such the unless and the subsequent command act on the first git folder that is a git folder which is /srv/mediawiki-staging
  • On the next puppet run the Exec[git_set_origin_beta-mediawiki-core] checks if /srv/mediawiki has the correct url, it doesn't so it updates it to the correct one
  • On the next puppet run the Exec[git_set_origin_beta-mediawiki-extensions] checks if /srv/mediawiki/php-master/extensions has the correct url but actually checks /srv/mediawiki and therefore it dosn;t match so it fixes it

From here on we start to loop and on every other puppet run.

Change 931949 merged by Jbond:

[operations/puppet@production] beta: avoid erasing extensions when already present

https://gerrit.wikimedia.org/r/931949

Mentioned in SAL (#wikimedia-releng) [2023-06-21T15:36:02Z] <hashar> Reenabling deployment-prep Jenkins jobs # T340030

Change 931941 merged by jenkins-bot:

[integration/config@master] Revert "jjb: disable beta cluster jobs"

https://gerrit.wikimedia.org/r/931941

TheresNoTime lowered the priority of this task from Unbreak Now! to Needs Triage.Jun 21 2023, 3:41 PM

Looks [almost] resolved, no longer blocking beta deployments, lowering priority

So while https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/ is running:

deployment-deploy03$ git -C /srv/mediawiki-staging/php-master/extensions remote get-url origin
https://gerrit.wikimedia.org/r/mediawiki/extensions

But before it ran Puppet had it set to:

https://gerrit.wikimedia.org/r/mediawiki/extensions.git

So what happens is:

  • Puppet sets the remote with .git suffix
  • The Jenkins job runs scap prep auto which sets the remote url to its own view of the canonical url (WITHOUT .git)
  • Puppet runs, set the remote to its own version with .git, trigger a refresh
  • EDIT WAR

I think the guard deployed previously prevents Puppet from deleting the extensions but I am not sure afterall.