Page MenuHomePhabricator

`become`, `crontab` et al missing from Trusty hosts
Closed, ResolvedPublic

Description

valhallasw@tools-bastion-03:~$ become
-bash: become: command not found
valhallasw@tools-bastion-03:~$ crontab -l
no crontab for valhallasw

vs

valhallasw@tools-sgebastion-07:~$ become --help
usage: become <toolname> [command [args...]]
valhallasw@tools-sgebastion-07:~$ crontab --version
/usr/local/bin/crontab: only tools are allowed crontabs

The issue seems to be that the misctools package for Trusty is largely empty:

valhallasw@tools-bastion-03:~$ dpkg -L misctools
/etc/bash_completion.d/misctools

vs

valhallasw@tools-sgebastion-07:~$ dpkg -L misctools
/.
/usr
/usr/bin
/usr/bin/become
/usr/bin/list-user-databases
/usr/bin/oge-crontab
/usr/bin/setup-tomcat
/usr/bin/sql
/usr/bin/take
/usr/share
/usr/share/bash-completion
/usr/share/bash-completion/completions
/usr/share/bash-completion/completions/misctools
/usr/share/doc
/usr/share/doc/misctools
/usr/share/doc/misctools/changelog.gz
/usr/share/doc/misctools/copyright
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/misctools
/usr/share/man
/usr/share/man/man1
/usr/share/man/man1/become.1.gz
/usr/share/man/man1/list-user-databases.1.gz
/usr/share/man/man1/sql.1.gz
/usr/share/man/man1/take.1.gz
/etc/bash_completion.d/misctools

Event Timeline

I've found the same issue while trying to migrate jobs from the trusty grid to the new one (or better deleting the tool at all) :/

Workarounds:

  • become mytoolname: sudo -niu tools.mytoolname
  • crontab -l: ssh cat /etc/toollabs-cronhost` crontab -l`

Tools like qstat-full also missed.

This seems a different issue (a tool missing on Stretch while it was available on Trusty), so I created a separate task for this: T218504

I forced the missing misctools package back with a manual puppet run.

$ sudo -i puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-bastion-03.tools.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1552861777'
Notice: /Stage[main]/Toollabs::Exec_environ/Package[mariadb-client]/ensure: created
Notice: /Stage[main]/Toollabs::Exec_environ/Package[misctools]/ensure: created
Notice: Applied catalog in 61.30 seconds

It looks like this next automated puppet run removed it again. Another manual run restored it:

$ sudo -i puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-bastion-03.tools.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1552862910'
Notice: /Stage[main]/Toollabs::Exec_environ/Package[mariadb-client]/ensure: created
Notice: /Stage[main]/Toollabs::Exec_environ/Package[misctools]/ensure: created
Notice: Applied catalog in 70.90 seconds

My current guess is that we have a non-deterministic state defined in Puppet such that some package we are installing is forcing removal of the mariadb-client package which then in turn removes the misctools package. Seems like this might be related to T218009: Puppet failure emails sent to non-admin members of tools project causing user confusion but that is not confirmed yet.

bd808 triaged this task as Unbreak Now! priority.Mar 17 2019, 10:53 PM

I caught Puppet causing the removal, and yes this is from T218009:

$ sudo -i puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-bastion-03.tools.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1552863224'
Notice: /Stage[main]/Openstack::Clientpackages::Mitaka::Trusty/Package[mysql-client-5.5]/ensure: created
Notice: Applied catalog in 66.48 seconds

Change 497210 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] cloud-vps: Remove mysql packages from openstack::clientpackages::mitaka::*

https://gerrit.wikimedia.org/r/497210

bd808 lowered the priority of this task from Unbreak Now! to High.Mar 17 2019, 11:59 PM

The cherry-pick documented in T218494#5031116 should stop the misctools package from being uninstalled for now. This is not a permanent fix, but should keep things working as expected until we can determine the proper fix in the coming days. Lowering priority from UBN! to High.

I caught Puppet causing the removal, and yes this is from T218009:

$ sudo -i puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-bastion-03.tools.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1552863224'
Notice: /Stage[main]/Openstack::Clientpackages::Mitaka::Trusty/Package[mysql-client-5.5]/ensure: created
Notice: Applied catalog in 66.48 seconds

Confirmed that in Trusty this happens because:

aborrero@tools-bastion-03:~$ dpkg -s mariadb-client | grep Depends
Depends: mariadb-client-5.5 (>= 5.5.63-1ubuntu0.14.04.1)

aborrero@tools-bastion-03:~$ dpkg -s mariadb-client-5.5 | grep Breaks
Breaks: mariadb-server-5.5 (<< 5.5.63-1ubuntu0.14.04.1), mysql-client, mysql-client-5.5, mysql-client-5.6, virtual-mysql-client

i.e, mariadb-client and mysql-client can't be installed at the same time. These 2 tend to be problematic. We had a similar issue in T215578: Weird dependency issue on the stretch grid around libmariadb-dev-compat and friends
We should probably standardize to one or the other across all the puppet code, if even possible (I'm aware this may have consequences for toolforge exec nodes).

I'm fine with this patch, +1'd

Change 497210 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud-vps: Remove mysql packages from openstack::clientpackages::mitaka::*

https://gerrit.wikimedia.org/r/497210

Change 497445 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Follow-up I71678b27: Remove stray MariaDB reference in openstack::clientpackages::newton::stretch

https://gerrit.wikimedia.org/r/497445

bd808 claimed this task.

Verified that the tools puppetmaster has the real patch and not just the cherry-pick.

Change 497445 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] openstack: Follow-up I71678b27: Remove stray MariaDB reference

https://gerrit.wikimedia.org/r/497445

This happened over the weekend on Stretch hosts as well, and then the next puppet run resolved it. It was weird.
I'm so glad it makes more sense now. :)

Change 497445 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: Follow-up I71678b27: Remove stray MariaDB reference

https://gerrit.wikimedia.org/r/497445