Page MenuHomePhabricator

scap / puppet dependency issue on reimaged phabricator hosts
Open, Needs TriagePublic

Description

user story:

Recently passive failover phab host phab2002 had to be reimaged (T377374) to move it to a new VLAN.

So everything stayed the same as before and the same production role(phabricator) was applied before and after.

The reimaging cookbook we use expects that initial puppet runs on reimaged systems just work and if they don't the entire
cookbook ends with a failure message.

This is fine for a lot of services but it's often an issue for any services that are deployed by scap.

The procedure needed here is usually more like:

  • run puppet once to trigger a command that bootstraps scap itself
  • if this fails (T257317, T257317#10212656) ) run the scap bootstrap command manually, (scap deploy --init ,T363415#9762416 , T257317#9851645)
  • run puppet again (sometimes run into a dependency issue which this ticket is about in specific)
  • deploy using scap
  • run puppet again

..or. trying to avoid these:

The latter won't work for reimages of existing hosts though unless we first remove the production role, add the migration role, reimage and then revert the process.

One part of this is a puppet / scap dependency issue, which manifests like this:

Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Config/Scap::Target[phabricator/deployment]/Package[phabricator/deployment]) Provider scap3 is not functional on this host
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Config/File[/srv/phab]) Dependency Package[phabricator/deployment] has failures: true
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Config/File[/srv/phab]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Config/File[/srv/phab/phabricator/scripts/]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Config/File[/srv/phab/phabricator/scripts/mail/]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (Class[Phabricator::Config]) Unscheduling all events on Class[Phabricator::Config]
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator/File[/usr/local/bin/arc]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Vcs/File[/usr/local/bin/git-http-backend]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Vcs/File[/usr/libexec]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Vcs/File[/usr/libexec/phabricator-ssh-hook.sh]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Tools/Package[python3-mysqldb]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Tools/Package[python3-pymysql]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Tools/File[/etc/phabtools.conf]) Skipping because of failed dependencies
Oct 16 19:14:56 phab2002 puppet-agent[3358]: (/Stage[main]/Phabricator::Tools/File[/srv/dumps]) Skipping because of failed dependencies

.. followed by many many other things all failing because of failed dependencies

As discussed with @jnuche on IRC "that is triggered by phabricator::config when it tries to init the phabricator/deployment` repo.
and at that point puppet should have installed scap to be able to work correctly, but it hasn't yet
".

We have tracked down that puppet tries to install scap from the scap::target class and "we should be able to solve the order problem and make sure scap gets installed before the stuff in phabricator::config runs".

Event Timeline

Change #1092841 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] phabricator: ensure scap is installed on host before it is required

https://gerrit.wikimedia.org/r/1092841

Thank you, @jnuche !:)

Per IRC discussion and gerrit. The patch above looks good to me! I agree scap::target should REQUIRE scap and scap should REQUIRE scap::user, rather than merely including them. Including says nothing about the order of things, require does.

This also reverts https://gerrit.wikimedia.org/r/c/operations/puppet/+/825262 which seems correct to me. The assumption made there that "nothing in the scap::target resource requres a resource managed in the scap class" does not seem correct (anymore).

Since scap::target is a defined type rather than a class we have to target hosts in puppet compiler and cumin with R:scap::target which hits 181 hosts in production.

Just the patch is still in WIP and we should be careful deploying it and test on phab2002 to remove scap and then run puppet.

Change #1092841 merged by Dzahn:

[operations/puppet@production] scap target: ensure scap is installed on host before it is required

https://gerrit.wikimedia.org/r/1092841

Deployed Jaime's fix.

No issues on any of the 180 scap::target hosts with a notable exception.

It broke puppet on the 2 phabricator machines with:

Error: Found 1 dependency cycle:
(Exec[Refresh sysusers] => User[scap] => Exec[bootstrap-scap-target] => Class[Scap] => Scap::Target[phabricator/deployment] => Package[phabricator/deployment] => Class[Phabricator::Vcs] => Systemd::Sysuser[vcs] => File[/etc/sysusers.d/vcs.conf] => Exec[Refresh sysusers])

Unfortunately the compiler doesn't catch this type of thing so it's hard to test without just merging.

Change #1098933 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] scap target: ensure scap is installed on host before it is required

https://gerrit.wikimedia.org/r/1098933

That dependency cycle sure looks curious to me. Judging by the config in the Puppet files I would have expected Scap::Target[phabricator/deployment] to be a the root and everything else to be dependencies of it. Something like this:

  1. Scap::Target[phabricator/deployment] => Package[phabricator/deployment] => Class[Phabricator::Vcs] => Systemd::Sysuser[vcs] => ...
  2. Scap::Target[phabricator/deployment] => Class[Scap] => Exec[bootstrap-scap-target] => User[scap] => ...

But somehow the stuff to the left of Scap::Target[phabricator/deployment] got reversed and causes a cycle.

Based on my scant knowledge of Puppet, I can only guess that there is some hidden magic at work here and Exec[Refresh sysusers] is somehow connected to the root cause. Whatever the case, there could be some non-obvious relation to this change that I piggybacked in the fix: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1092841/5/modules/scap/manifests/init.pp#20. Strictly speaking it wasn't necessary, so we can remove it and see if that has any effect. Created a change for that: https://gerrit.wikimedia.org/r/1098933

Change #1098933 merged by Dzahn:

[operations/puppet@production] scap target: ensure scap is installed on host before it is required

https://gerrit.wikimedia.org/r/1098933

Change #1160217 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: add /etc/phabricator/script-vars for scap

https://gerrit.wikimedia.org/r/1160217

Change #1160217 merged by Dzahn:

[operations/puppet@production] phabricator::migration: add /etc/phabricator/script-vars for scap

https://gerrit.wikimedia.org/r/1160217

Change #1160231 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: add missing sudo_defaults file for scap

https://gerrit.wikimedia.org/r/1160231

Change #1160231 merged by Dzahn:

[operations/puppet@production] phabricator::migration: add missing sudo_defaults file for scap

https://gerrit.wikimedia.org/r/1160231

Change #1160270 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: include phabricator::httpd, add /srv/phab dir

https://gerrit.wikimedia.org/r/1160270

Change #1160270 merged by Dzahn:

[operations/puppet@production] phabricator::migration: include phabricator::httpd, add /srv/phab dir

https://gerrit.wikimedia.org/r/1160270

Change #1160280 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: install PHP

https://gerrit.wikimedia.org/r/1160280

Change #1160280 merged by Dzahn:

[operations/puppet@production] phabricator::migration: install PHP

https://gerrit.wikimedia.org/r/1160280

Change #1160310 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] phabricator::migration: ensure /srv/phab is the correct symlink

https://gerrit.wikimedia.org/r/1160310

Change #1160310 merged by Dzahn:

[operations/puppet@production] phabricator::migration: ensure /srv/phab is the correct symlink

https://gerrit.wikimedia.org/r/1160310