Page MenuHomePhabricator

netbox: drop profile::netbox::active_server parameter
Closed, DeclinedPublic

Description

With the most recent netbox upgrade we switched netbox to use conftool and DNS discovery for fail over as such we should remove profile::netbox::active_server from the puppet config. currently this parameter is used to enable a bunch of monitoring and systemd::timers. we should investigate if:

  • its safe to monitor both servers, probably yes?
  • if its ok to run the reports on both servers, likely no?

in both cases if we really do need to only set things up on the active host then we should use conftool to discover which one is the active node

Event Timeline

Yes I agree.

  • its safe to monitor both servers, probably yes?

Actually not really, the monitoring includes the reports monitoring that currently alerts in the dcops IRC channel. Those should probably run only on the active host.

  • if its ok to run the reports on both servers, likely no?

Not only those but also the ganeti sync systemd timers and such.

So I think we'll need to have that information somewhere.
The simplest thing to use would be the DNS, if the PTR of the IP returned by the discovery record is the FQDN of the host then it's the active one.
The other one is to have a confd template write a file in /etc/$NAME.

One approach could be to have Puppet deploy everything, but export resources only for the active one based on the DNS check above.
On the host the scripts should check the chosen method (either directly the dns or the file generated by confd) and act accordingly.

Volans triaged this task as Medium priority.May 23 2022, 3:52 PM

Middle/longer term the reports status should go through Prometheus so we could revisit at this point. Until then I agree with Riccardo.

As it's the same database, the Ganeti sync could run from both servers but for example as half their current frequency (similar to the Homer diff emails).

Change 805125 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] hieradata: netbox1001 to specify netbox1002 as the active server.

https://gerrit.wikimedia.org/r/805125

Change 805125 merged by Jbond:

[operations/puppet@production] hieradata: netbox1001 to specify netbox1002 as the active server.

https://gerrit.wikimedia.org/r/805125

ayounsi changed the task status from Open to Stalled.Aug 23 2024, 9:59 AM
ayounsi lowered the priority of this task from Medium to Low.

The active server parameter now control rq-netbox as well, so it's unlikely we get rid of it (see T341843: Netbox rq.timeouts.JobTimeoutException)
As we're not going to the active/active direction anytime soon (see T234997: Make Netbox Active/Active) I'm going to close this task in favor of T330883: Improve Netbox active/passive failover process