Forking from T296452#8653039
during the recent DC switch over Netbox got moved to codfw and it was super slow. this means that in the current set up:
- active/active may not be the best idea
- we need to update the dc-switch cookbook to also failover the postgresdb
The issue comes from the extra latency of having the frontend in codfw and the DB in eqiad.
There are 2 main ways of solving the issue.
- We always move the primary DB to where the primary frontend is
- But this prevents doing active/active
- We split reads and writes, reads are always done on the local node, and writes are done where the primary DB is
- This means slower writes, but as Netbox is read heavy this is most likely fine
- This permits active/active (and ensure all nodes are healthy)
Option 2 seems better to me, but doesn't have any builtin support in Netbox, I've been pointed to this django module: https://github.com/jbalogh/django-multidb-router but it haven't been updated since a while.
A cookbook to ease master switchover would be valuable in all cases.