Page MenuHomePhabricator

take steps outlined at techops offiste to (try to) address salt reliability
Closed, ResolvedPublic

Description

These are short terms steps that may or may not address the issue completely.

  • Move salt master off the puppet master on a host running jessie
  • Set up a syndic in codfw, eqiad, on separate hardware, and in the other dcs on he bastion hosts (won't need more than that) repoting to the master, all on jessie
  • Check dependencies for salt 2014.7 supplied by jessie and do backporting as needed for libzmq

Note that if the master-syndic model works out, we'll want to go multimaster (second master in codfw probably) with a fallback syndic in each dc for redundancy.

Event Timeline

ArielGlenn claimed this task.
ArielGlenn raised the priority of this task from to Needs Triage.
ArielGlenn updated the task description. (Show Details)
ArielGlenn added projects: Salt, acl*sre-team.
ArielGlenn added a subscriber: ArielGlenn.
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptOct 12 2015, 10:44 PM
ArielGlenn closed this task as Resolved.Feb 3 2016, 3:14 PM

Huh, well it looks like salt is reliable after just moving to its own server with some patches to the master and minion. So while I'm going to deploy eventually multimaster for redundancy's sake (and syndics for the same reason), that's not part of this ticket. And I can close it, w00t!

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptFeb 3 2016, 3:14 PM
ArielGlenn moved this task from Backlog to Done on the Salt board.Feb 3 2016, 3:15 PM