Page MenuHomePhabricator

Use scap to deploy itself to scap targets
Closed, ResolvedPublic

Description

Overview of how it would work

Scap won't rely anymore on a debian package to install its Python dependencies. Instead, it will use a Python virtual environment populated using pip. This venv will act as a self-contained scap that can then be rsync'd to targets.

For external dependencies/system configuration, scap will rely on Puppet.

Deploy workflow

Once a new release is created (i.e. a new tag has been created), on deploy server:

  • On both masters:
    • cd /srv/deployment/scap (which is a checkout of the scap git repo)
    • git checkout <new tag>
    • python3 -m venv /home/scap/scap
    • /home/scap/scap/bin/pip install --upgrade /srv/deployment/scap
  • scap install-world (note that /usr/bin/scap -> /var/lib/scap/scap/bin/scap), which will:
    • Reads a file to collect a list of hosts that should have scap installed
    • rsync's ~scap/scap to scap@target:scap/ for each target listed in /etc/dsh/group/scap_targets (masters excluded).

Prereqs

Puppet will configure the deploy servers to provide the following:

  • Checkout of the scap git repo at /srv/deployment/scap
  • A symlink /usr/bin/scap -> /var/lib/scap/scap/bin/scap
  • A way to query the list of scap targets. See https://phabricator.wikimedia.org/T302919#7748986 for a recent list. I assume that cumin is involved. Whatever we use needs to be accessible from deployment.eqiad.wmnet (should be addressed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/771441)
  • A user (maybe named scap, and assumed hereafter in this text) that can be ssh'd into on each of those hosts (similar to how deployers can ssh as mwdeploy to mediawiki targets during scap sync operations.. how is that set up? keyholder is used)
  • The following dependencies:
    • git
    • rsync
    • bash-completion
    • python3
    • python3-venv

The scap user, the symlink to the venv and some of the deps need to be provisioned on the scap targets too, not just the deploy servers. The targets also need to be able to access /var/lib/scap/scap on the deploy servers via rsync.

Transition plan

  1. Run scap install-world to prime the targets. Any uses of scap on these targets will continue to use the code from the current scap deb.
  2. Apply all the Puppet changes except the /usr/bin/scap -> /var/lib/scap/scap/bin/scap symlink
  3. Thoroughly test the new scap deploy process
  4. Apply the Puppet change that creates the symlink
  5. Uninstall scap deb package

Outstanding notes/problems

  • What about beta ? -> Puppet config and scap beta scripts will be updated to also use the new self-installation mechanism
  • Installing scap somewhere other than /usr/lib/python3/dist-packages will break scripts that expect to be able to import scap (such as stage-train in the releases repo). Hopefully allowing scap to deploy itself will reduce the need/desire to extend scap outside of scap.
  • What about freshly-provisioned hosts? Running scap on them will fail until someone runs scap install-world on the deploy server. This will be resolved by establishing Puppet configuration to make scap targets self-install scap via rsync if it is not installed already (https://gerrit.wikimedia.org/r/c/operations/puppet/+/806397)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hi @MoritzMuehlenhoff and @Volans. Can you comment on how this part of the proposal could be achieved?

Find a way to query the list of scap targets. See https://phabricator.wikimedia.org/T302919#7748986 for a recent list. I assume that cumin is involved. Whatever we use needs to be accessible from deployment.eqiad.wmnet.

I'm interested in any alternate ideas you might have that would move things in the right direction.

Hi @MoritzMuehlenhoff and @Volans. Can you comment on how this part of the proposal could be achieved?

Find a way to query the list of scap targets. See https://phabricator.wikimedia.org/T302919#7748986 for a recent list. I assume that cumin is involved. Whatever we use needs to be accessible from deployment.eqiad.wmnet.

I'm interested in any alternate ideas you might have that would move things in the right direction.

Hi @MoritzMuehlenhoff and @Volans. Can you comment on how this part of the proposal could be achieved?

Find a way to query the list of scap targets. See https://phabricator.wikimedia.org/T302919#7748986 for a recent list. I assume that cumin is involved. Whatever we use needs to be accessible from deployment.eqiad.wmnet.

I'm interested in any alternate ideas you might have that would move things in the right direction.

What do you need specifically? A list of all the servers where scap is installed? That can be obtained using a puppetdb query. I don't think we do have python libraries to poll puppetdb, but indeed at least cumin implements puppetdb queries, so it might be useful to have a puppetdb module in https://doc.wikimedia.org/wmflib/master/

What do you need specifically? A list of all the servers where scap is installed? That can be obtained using a puppetdb query. I don't think we do have python libraries to poll puppetdb, but indeed at least cumin implements puppetdb queries, so it might be useful to have a puppetdb module in https://doc.wikimedia.org/wmflib/master/

No, that's not an option, we allow to connect directly to PuppetDB APIs only from a very limited number of root-only hosts.
What's could be considered is to allow access to the puppetdb microservice ( https://wikitech.wikimedia.org/wiki/Puppet#Micro_Service ) from the deployment hosts. Access to this microservice is limited from an allow list of trusted hosts, and we can consider to add the deployment servers to the list. I'll defer to @jbond and @MoritzMuehlenhoff on that.

To me the simpler option would be to have Puppet drop a configuration file on the deployment hosts with the list of target hosts doing a puppetdb lookup.

To me the simpler option would be to have Puppet drop a configuration file on the deployment hosts with the list of target hosts doing a puppetdb lookup.

I instinctively went for that approach too. The list of scap targets tends to be static enough to warrant populating that file every 30m or so. It is also going to be way easier to consume a file in the application that learn to talk the puppetdb query API (even with libraries it will have a learning curve)

To me the simpler option would be to have Puppet drop a configuration file on the deployment hosts with the list of target hosts doing a puppetdb lookup.

I instinctively went for that approach too. The list of scap targets tends to be static enough to warrant populating that file every 30m or so. It is also going to be way easier to consume a file in the application that learn to talk the puppetdb query API (even with libraries it will have a learning curve)

This would be fine. Especially if the list was dropped into /etc/dsh/group/<some-file>

What do you need specifically? A list of all the servers where scap is installed?

Confirming for the record that what is needed is the list of all servers where scap (and/or the scap-stub) is installed.

Change 771437 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] wmflib: add class_hosts

https://gerrit.wikimedia.org/r/771437

Change 771441 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] P:scap::dsh: Add scpa targets as a dsh group

https://gerrit.wikimedia.org/r/771441

Sorry for the slow response on this, there is already a function, wmflib::role_hosts which dose almost what is requested here. i have created a PS to generalise it a bit so it can work for this use case and have also created a PoC CR for how this might be used for the use case here but suspect i may need some pointers to get that right.

What do you need specifically? A list of all the servers where scap is installed? That can be obtained using a puppetdb query. I don't think we do have python libraries to poll puppetdb, but indeed at least cumin implements puppetdb queries, so it might be useful to have a puppetdb module in https://doc.wikimedia.org/wmflib/master/

No, that's not an option, we allow to connect directly to PuppetDB APIs only from a very limited number of root-only hosts.

although if you are using a script in that list of hosts you can use python3-pypuppetdb, there are a couple of scripts using it in the repo so you can git grep pypuppetdb for some examples.

What's could be considered is to allow access to the puppetdb microservice ( https://wikitech.wikimedia.org/wiki/Puppet#Micro_Service )

i suspect pypuppetdb will also work for this, for allowd queries, but its not tested

Change 771437 merged by Jbond:

[operations/puppet@production] wmflib: add class_hosts

https://gerrit.wikimedia.org/r/771437

Change 771441 merged by Jbond:

[operations/puppet@production] P:scap::dsh: Add scap targets as a dsh group

https://gerrit.wikimedia.org/r/771441

Change 803556 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[mediawiki/tools/scap@master] startup: warn user if not running from a Python virtual environment

https://gerrit.wikimedia.org/r/803556

Change 804306 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] scap: switch over from Debian package to self-installed scap

https://gerrit.wikimedia.org/r/804306

Change 804311 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] scap: remove scap Debian package from targets

https://gerrit.wikimedia.org/r/804311

Change 804306 merged by Muehlenhoff:

[operations/puppet@production] scap: switch over from Debian package to self-installed scap

https://gerrit.wikimedia.org/r/804306

Change 804311 merged by Muehlenhoff:

[operations/puppet@production] scap: remove scap Debian package from targets

https://gerrit.wikimedia.org/r/804311

Change 805374 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] cap: add version back

https://gerrit.wikimedia.org/r/805374

Change 805374 merged by Jbond:

[operations/puppet@production] cap: add version back

https://gerrit.wikimedia.org/r/805374

Change 805406 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[mediawiki/tools/scap@master] install-world: remove concurrency lock

https://gerrit.wikimedia.org/r/805406

Change 805406 merged by jenkins-bot:

[mediawiki/tools/scap@master] install-world: remove concurrency lock

https://gerrit.wikimedia.org/r/805406

Ref the removal of the concurrency lock, I have recently seen a couple of instances where beta deployments did trigger a concurrency lock (if I'm correct in assuming the lock in question causes that "lock file found, waiting up to x minutes" message?), though this tends to happen after a job gets stuck.. just thought it was worth mentioning just in case!

hi @TheresNoTime, thank you for your comment!

The scap operation running in beta on a cron is different from the one in the patch you referenced. So beta shouldn't be affected, it will keep using its own lock :)

Change 805727 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[mediawiki/tools/scap@master] install-world: use bootstrap script from scap source dir

https://gerrit.wikimedia.org/r/805727

Change 805727 merged by jenkins-bot:

[mediawiki/tools/scap@master] install-world: use bootstrap script from scap source dir

https://gerrit.wikimedia.org/r/805727

Change 805736 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] scap: remove config for scap Debian package

https://gerrit.wikimedia.org/r/805736

Change 805736 merged by Muehlenhoff:

[operations/puppet@production] scap: remove config for scap Debian package

https://gerrit.wikimedia.org/r/805736

Change 805797 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[mediawiki/tools/scap@master] release readme: update to reflect installation via scap-over-scap

https://gerrit.wikimedia.org/r/805797

Change 805800 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[mediawiki/tools/scap@master] install-world: update minimum allowed scap version

https://gerrit.wikimedia.org/r/805800

Change 803556 merged by jenkins-bot:

[mediawiki/tools/scap@master] startup: warn user if not running from a Python virtual environment

https://gerrit.wikimedia.org/r/803556

Change 805797 merged by jenkins-bot:

[mediawiki/tools/scap@master] release readme: update to reflect installation via scap-over-scap

https://gerrit.wikimedia.org/r/805797

Change 805800 merged by jenkins-bot:

[mediawiki/tools/scap@master] install-world: update minimum allowed scap version

https://gerrit.wikimedia.org/r/805800

Change 807186 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/tools/scap@master] scap install-world rsync fix

https://gerrit.wikimedia.org/r/807186

dancy updated the task description. (Show Details)

Change 807186 merged by jenkins-bot:

[mediawiki/tools/scap@master] scap install-world rsync fix

https://gerrit.wikimedia.org/r/807186

Change 811987 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[mediawiki/tools/scap@master] startup: warn user if not running from a Python virtual environment

https://gerrit.wikimedia.org/r/811987

Change 811987 merged by jenkins-bot:

[mediawiki/tools/scap@master] startup: warn user if not running from a Python virtual environment

https://gerrit.wikimedia.org/r/811987

Change 816140 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] scap: allow `scap` user to login into deployment-prep scap targets

https://gerrit.wikimedia.org/r/816140

Change 817762 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] scap: enable target bootstrap in beta cluster

https://gerrit.wikimedia.org/r/817762

Change 816140 merged by Giuseppe Lavagetto:

[operations/puppet@production] scap: allow `scap` user to login into deployment-prep scap targets

https://gerrit.wikimedia.org/r/816140

Change 817762 merged by Giuseppe Lavagetto:

[operations/puppet@production] scap: enable target bootstrap in beta cluster

https://gerrit.wikimedia.org/r/817762

Change 820139 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] scap: add option to selectivlely disable bootstrapping

https://gerrit.wikimedia.org/r/820139

Change 820139 merged by Cwhite:

[operations/puppet@production] scap: add option to selectivlely disable bootstrapping

https://gerrit.wikimedia.org/r/820139

Mentioned in SAL (#wikimedia-operations) [2024-10-16T20:16:30Z] <mutante> phab2002 - manually bootstrapping scap since puppet did not do it due to dependency cycles: sudo -u scap /usr/local/bin/bootstrap-scap-target.sh deploy2002.codfw.wmnet /var/lib/scap T303559 T310740 T377374

Mentioned in SAL (#wikimedia-operations) [2024-10-16T20:17:26Z] <mutante> phab2002 - after manually running bootstrap-scap-target.sh and "Scap from local bullseye wheels successfully installed at /var/lib/scap/scap" still "cannot open `/usr/bin/scap' (No such file or directory)" though. T303559 T310740 T377374

Today I ran into this issue again, after reimaging phab2002.

The initial puppet run failed with some dependency problems because there had been no scap deploy yet.

But that scap deploy was not possible because scap simply did not exist on the machine yet. (/usr/bin/scap: No such file or directory).

Kind of remembering this from the past I then manually ran the scap bootstrap command that is expected to run automatically:

sudo -u scap /usr/local/bin/bootstrap-scap-target.sh deploy2002.codfw.wmnet /var/lib/scap

This told me that Scap from local bullseye wheels successfully installed at /var/lib/scap/scap but there was still cannot open `/usr/bin/scap' (No such file or directory)".

So I also manually created the missing symlink ln -s /var/lib/scap/scap/bin/scap /usr/bin/scap.