Page MenuHomePhabricator

Cleanup Netbox stuff from netmon hosts
Open, MediumPublic

Description

Given that the migration of Netbox to the nextbox dedicated hosts has been concluded and it's now running since a while and we're surely not coming back, we should cleanup the Netbox stuff from the netmon hosts unless they are scheduled to be reimaged soon.

A not complete list of things to cleanup:

  • any remaining puppet code
  • anything not cleaned up by Puppet (either because ensure:absent was not used or not supported):
    • postgres
    • checkouts
  • ACME certs, see T238900#5683710

Details

Related Gerrit Patches:

Event Timeline

Volans created this task.Fri, Nov 22, 12:29 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFri, Nov 22, 12:29 PM
Volans triaged this task as Medium priority.Fri, Nov 22, 12:29 PM

Also the configurations in /etc/netbox, anything related to deploys in /srv

Change 552680 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] acme_chief: Revoke access from netmon boxes to netbox certificate

https://gerrit.wikimedia.org/r/552680

Change 552680 merged by Vgutierrez:
[operations/puppet@production] acme_chief: Revoke access from netmon boxes to netbox certificate

https://gerrit.wikimedia.org/r/552680

Doing a quick check on netmon2001:

  • Stuff is left in /srv: removed /srv/netbox-dumps, /srv/deployment/netbox*
  • Nothing is left in /etc, this was cleaned by Puppet I believe.
  • Postgres is left running. It does not appear to be used by anything else on the box. @ayounsi is this the case? If so I can remove that as well.
  • Postgres is left running. It does not appear to be used by anything else on the box. @ayounsi is this the case? If so I can remove that as well.

Indeed.

Thanks, I'll go ahead and remove it then.

Okay I believe I have removed all traces on netmo1002 and netmon2001:

  • Purged postgres packages
  • Purged uwsgi packages
  • Made sure all of the configuration files were cleaned up (most were by puppet)
  • removed things from /var, /srv related to netbox, uwsgi and postgres

And rechecking the top list, I have removed acme certs, and other miscellany from /etc.

@Volans what do you mean by "any remaining puppet code" ?

@Volans what do you mean by "any remaining puppet code" ?

Any puppet code included in the netmon catalogs that is still referring netbox, if any.

Mentioned in SAL (#wikimedia-operations) [2019-11-28T13:21:31Z] <volans> cumin 'netmon*' 'rm -v /var/spool/cron/crontabs/postgres' T238919

Postgres user and related crontab are still present on the hosts and triggered a failure in the backup because there is no more DB to backup.
I've just removed the crontab for now.

I've also removed the crontab entries for wmf_auto_restart_uwsgi-netbox and prometheus-postgres-exporter.

Mentioned in SAL (#wikimedia-operations) [2019-11-28T13:28:22Z] <volans> cleanup root's crontab entries on netmon hosts from netbox/postres stuff - T238919

It looks like not all the puppet code was made ensure=>'absent'. We might have many more small things still laying around as a result.