Page MenuHomePhabricator

Automatic pickup of Gerrit clone master doesn't happen due to missing git-lfs – new deployment env
Closed, ResolvedPublic

Description

After the latest fiddling with Gerrit clone in regards to lfs changes as part of T235013, the Gerrit clone master state of design/style-guide is not updated on the website any more.

Deployer note

Event Timeline

The changes made in T235013 added a requirement to have git-lfs installed and use a different command to pull data.

This broke puppet runs on the production misc webservers where the design site lives because git-lfs is not installed and the git::clone puppet class does not support changing the command.

T235013#5580957

Notice: /Stage[main]/Profile::Microsites::Design/Git::Clone[design/style-guide]/Exec[git_pull_design/style-guide]/returns: fatal: refusing to merge unrelated histories
Error: '/usr/bin/git  pull --quiet' returned 128 instead of one of [0]
Error: /Stage[main]/Profile::Microsites::Design/Git::Clone[design/style-guide]/Exec[git_pull_design/style-guide]/returns: change from 'notrun' to ['0'] failed: '/usr/bin/git  pull --quiet' returned 128 instead of one of [0]
Notice: Applied catalog in 14.59 seconds
Aklapper renamed this task from Automatic pickup of Gerrit clone master doesn't happen to Automatic pickup of Gerrit clone master doesn't happen (due to git-lfs not installed on production misc).Oct 17 2019, 3:48 PM

So from the looks of the puppet output, it looks like this repo is periodically pulled down by puppet to deploy

Error: /Stage[main]/Profile::Microsites::Design/Git::Clone[design/style-guide]/Exec[git_pull_design/style-guide]/returns: change from 'notrun' to ['0'] failed: '/usr/bin/git  pull --quiet' returned 128 instead of one of [0]

Also, it looks like this repo maybe got force-pushed at some point?

Notice: /Stage[main]/Profile::Microsites::Design/Git::Clone[design/style-guide]/Exec[git_pull_design/style-guide]/returns: fatal: refusing to merge unrelated histories

Not sure how much work it would be to add an lfs option to git::clone. Another option would be deploying via scap.

It was indeed force-pushed by help of @20after4 as the Git history needed to be rewritten as a side-effect of the LFS change.

It was indeed force-pushed by help of @20after4 as the Git history needed to be rewritten as a side-effect of the LFS change.

That explains the, refusing to merge unrelated histories bit.

Not sure how much work it would be to add an lfs option to git::clone. Another option would be deploying via scap.

Looking at git::clone it seems like its already got quite a lot of options, probably not best to add this in.

it looks like the design microsite is on bromine which is a jessie machine which means that the git::lfs module isn't going to install git lfs...not sure why.

The simplest option here might be (if it's possible) to install git lfs and run: git lfs install && git -C /srv/org/wikimedia/design-style-guide lfs pull after the git::clone operation.

We've reverted Git LFS for now in https://github.com/wikimedia/WikimediaUI-Style-Guide/pull/259. But given the binary files get updated rather frequently, LFS would be still the right solution AFAICT.

So I think this should really be deployed with scap. That would allow you to make changes and then push them instead of waiting for puppet which is unreliable and asynchronous, not to mention difficult to debug without getting a SRE involved.

Change 546253 had a related patch set uploaded (by Dzahn; owner: 20after4):
[operations/puppet@production] scapify design/style-guide microsite (1 of 2)

https://gerrit.wikimedia.org/r/546253

Change 546253 merged by Dzahn:
[operations/puppet@production] scapify design/style-guide microsite (1 of 2)

https://gerrit.wikimedia.org/r/546253

So I think this should really be deployed with scap. That would allow you to make changes and then push them instead of waiting for puppet which is unreliable and asynchronous, not to mention difficult to debug without getting a SRE involved.

And how would this be different from designers (my process) to the current one, where we defined a remote gerrit and pushed to it manually when we felt time's ready for a site update?
Puppet then regularly updated via git::clone puppet.

Merged and deployed part 1 of the "scapify" changes.

Error: Execution of '/usr/bin/scap deploy-local --repo design/style-guide -D log_json:False' returned 70: 19:00:35 WARNING  - Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 335, in run
    app._load_config()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 114, in _load_config
    overrides = self._get_config_overrides()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 530, in _get_config_overrides
    config = self._get_remote_overrides()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 545, in _get_remote_overrides
    raise IOError(errno.ENOENT, 'Config file not found', cfg_url)
IOError: [Errno 2] Config file not found: 'deploy1001.eqiad.wmnet/design/style-guide/.git/DEPLOY_HEAD'
19:00:35 ERROR    - deploy-local failed: <IOError> [Errno 2] Config file not found: 'deploy1001.eqiad.wmnet/design/style-guide/.git/DEPLOY_HEAD'

Error: /Stage[main]/Profile::Microsites::Design/Scap::Target[design/style-guide]/Package[design/style-guide]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/bin/scap deploy-local --repo design/style-guide -D log_json:False' returned 70: 19:00:35 WARNING  - Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 335, in run
    app._load_config()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 114, in _load_config
    overrides = self._get_config_overrides()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 530, in _get_config_overrides
    config = self._get_remote_overrides()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 545, in _get_remote_overrides
    raise IOError(errno.ENOENT, 'Config file not found', cfg_url)
IOError: [Errno 2] Config file not found: 'deploy1001.eqiad.wmnet/design/style-guide/.git/DEPLOY_HEAD'
19:00:35 ERROR    - deploy-local failed: <IOError> [Errno 2] Config file not found: 'deploy1001.eqiad.wmnet/design/style-guide/.git/DEPLOY_HEAD'

This might be normal before the first deploy?

Please deploy and verify. If all looks good to you we can merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/546254/ to actually switch the webserver config to point to the deployment path.

ran scap deploy and it succeeded first try.

So the process to deploy is pretty simple:

ssh deployment.wikimedia.org
cd /srv/deployment/design/style-guide
scap deploy 'reason for this deployment (this will be logged)'

Here's a terminal session recording:

https://asciinema.org/a/wXUqAcyAFRzLqoW4fwMKDm14m

@mmodell How important are the deploy messages? Can they equal git log messages?

@mmodell How important are the deploy messages? Can they equal git log messages?

The message can be anything you want it to be. It gets logged to the SAL so that others will know what got deployed and why.

I created a new keypair for deployment (keyholder) and committed in the private repo on the puppetmaster.

Then, https://gerrit.wikimedia.org/r/c/operations/puppet/+/547014 has been merged and deployed.

This removed the deploy-service user on bromine.eqiad.wmnet and vega.codfw.wmnet, the backend servers, and added the deploy-design user, group and new key.

On the deployment server, deploy1001.eqiad.wmnet the 3 new users have been created and are members of the new deploy-design group.

Now would be time for a deployment test.

heh amazing timing! I just tried it and it failed (because I'm not a member of the new group!) but I think it should work for @Volker_E now.

Once deployment has been confirmed keep in mind there is still https://gerrit.wikimedia.org/r/c/operations/puppet/+/546254 which links the docroot to the new deployment dir. Ping me or somebody in SRE to merge that when it's time. Of course this is just a one time change. After that you won't have to ask SRE anymore to deploy.

heh amazing timing! I just tried it and it failed (because I'm not a member of the new group!) but I think it should work for @Volker_E now.

If you want i can add you to the new group (for now).

Change 547296 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] deployment: add deploy-design trusted group stanza

https://gerrit.wikimedia.org/r/547296

Change 547297 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/puppet@production] add deploy-design to keyholder agents

https://gerrit.wikimedia.org/r/547297

Change 547296 merged by Dzahn:
[operations/puppet@production] deployment: add deploy-design trusted group stanza

https://gerrit.wikimedia.org/r/547296

Change 547297 abandoned by 20after4:
add deploy-design to keyholder agents

Reason:
https://gerrit.wikimedia.org/r/c/operations/puppet/ /547296

https://gerrit.wikimedia.org/r/547297

Change 547300 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/puppet@production] Add design/style-guide to scap::sources on deployment server

https://gerrit.wikimedia.org/r/547300

Change 547300 merged by Dzahn:
[operations/puppet@production] Add design/style-guide to scap::sources on deployment server

https://gerrit.wikimedia.org/r/547300

Mentioned in SAL (#wikimedia-operations) [2019-10-31T21:46:07Z] <mutante> deploy1001 - move apach2modsec deployment key out of keyholder dir, keyholder arm to reload all other deployment keys including the new one for design (T235677)

Mentioned in SAL (#wikimedia-operations) [2019-10-31T21:59:49Z] <mutante> deploy1001 - recreating deploy_design deployment key as ED25519 and with the correct comment (the comment matters and must match path to the file for keyholder) (T235677)

  • deployment works now :)

now all this waits for is a time for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/546254

document root has switched to deployment dir.

This should be resolved now.

Volker_E claimed this task.

Thanks @Dzahn and @mmodell!

Volker_E renamed this task from Automatic pickup of Gerrit clone master doesn't happen (due to git-lfs not installed on production misc) to Automatic pickup of Gerrit clone master doesn't happen due to missing git-lfs – new deployment env.Nov 1 2019, 2:02 AM
Volker_E removed a project: Patch-For-Review.
Volker_E updated the task description. (Show Details)