Currently arclamp is using redis in an active/standby architecture (active in eqiad, standby in codfw)
This has a few downsides, for instance failover is manual involving commits in a few different places, the architecture lacks durability at the site level, etc.
Since we're in the process of dedicating hardware to arclamp/redis (T327277), let's evaluate possible approaches to deploy arclamp with a basic active/active architecture.
For the purposes of this task I think it'll be best to focus on minimally invasive config only changes as much as possible.
current status: draft, gathering options, please edit/add notes/feedback/etc
Option: independent redis instances on arclamp hosts
- Deploy individual redis instances to arclamp1001 and arclamp2001
- Direct writes to the site local redis instance
- Add/update arclamp-log jobs to a per-site layout, e.g. both arclamp hosts run:
- arclamp-log.py /etc/arclamp-log-excimer-eqiad.yaml -- redis_host: arclamp1001
- arclamp-log.py /etc/arclamp-log-excimer-codfw.yaml -- redis_host: arclamp2001
In theory this would remove the current manual failover steps, with the possible wrinkle that two arclamp-log.py processes outputting to the same files will have side effects (I've not tested this), low activity thresholds may need adjusting, etc.