Page MenuHomePhabricator

setup/install gerrit2001/WMF6408
Closed, ResolvedPublic

Description

This task will track the setup and deployment of gerrit2001/WMF6408 in codfw. This system was allocated via T148187 and procurement task T150885.

Partition Scheme: Raid1 the SSDs and have a small / and a large /srv. disable the sata disks entirely in the bios, since they won't be used.

Naming scheme: Since these are dedicated gerrit boxes that will now have a warm standby, @Dzahn suggested (and it makes sense to @RobH) to name the new system gerrit2001. If/when the eqiad gerrit box is reimaged/replaced, it should likely rename to gerrit1001.

  • - update physical label / racktables via sub-task
  • - install SSDs via sub-task
  • - network port updated (desc/enable/vlan public)
  • - dns update (mgmt and production public vlan) - https://gerrit.wikimedia.org/r/#/c/325860/
  • - install server update (dhcp and partitioning)
  • - install os (jessie)
  • - accept/sign salt/puppet
  • - handoff for service implementation

Event Timeline

RobH created this task.Dec 6 2016, 7:48 PM
RobH added a parent task: Unknown Object (Task).
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)Dec 8 2016, 12:13 AM
RobH updated the task description. (Show Details)
RobH reassigned this task from RobH to Papaul.Feb 1 2017, 6:28 PM
RobH updated the task description. (Show Details)

Please update this task with the network port this system is plugged into. I neglected to ask you do to that via the sub task. Then assign back to me, thanks!

RobH claimed this task.Feb 1 2017, 6:34 PM
RobH added a subscriber: Papaul.
RobH updated the task description. (Show Details)Feb 1 2017, 6:44 PM
RobH updated the task description. (Show Details)Feb 1 2017, 6:47 PM

Change 335483 had a related patch set uploaded (by RobH):
gerrit2001 dns update

https://gerrit.wikimedia.org/r/335483

Change 335483 merged by RobH:
gerrit2001 dns update

https://gerrit.wikimedia.org/r/335483

Change 335700 had a related patch set uploaded (by RobH):
fixing my dns typo

https://gerrit.wikimedia.org/r/335700

Change 335700 merged by RobH:
fixing my dns typo

https://gerrit.wikimedia.org/r/335700

RobH updated the task description. (Show Details)Feb 2 2017, 10:18 PM
RobH updated the task description. (Show Details)Feb 2 2017, 10:26 PM
RobH reassigned this task from RobH to demon.Feb 2 2017, 10:32 PM

Assigning this task to Chad. Once he is aware that this system is all theirs, he can resolve.

Dzahn added a comment.Feb 2 2017, 11:07 PM

@demon So if we'd just put the role gerrit::server on this one as well, let's figure out which things need to be stopped or skipped when not on the "active" server / what is different between prod and "warm standby" from a puppet point of view. Can all services just run or do we nede to stop some etc

demon added a comment.Feb 3 2017, 3:58 PM

Assigning this task to Chad. Once he is aware that this system is all theirs, he can resolve.

Confirmed.

@demon So if we'd just put the role gerrit::server on this one as well, let's figure out which things need to be stopped or skipped when not on the "active" server / what is different between prod and "warm standby" from a puppet point of view. Can all services just run or do we nede to stop some etc

I did some work towards this end, but I don't think it's 100% ready for just blind application of the role. I'll have to review it again :)

Change 336658 had a related patch set uploaded (by Dzahn):
Gerrit: Add gerrit-roots to new gerrit2001 in Dallas

https://gerrit.wikimedia.org/r/336658

Change 336658 merged by Dzahn:
Gerrit: Add gerrit-roots to new gerrit2001 in Dallas

https://gerrit.wikimedia.org/r/336658

@demon @20after4 and @Catrope can now SSH to gerrit2001 and have root like on the current prod server

Change 344072 had a related patch set uploaded (by Dzahn):
[operations/puppet] site.pp: add gerrit2001 with just standard and IPv6

https://gerrit.wikimedia.org/r/344072

Change 344072 merged by Dzahn:
[operations/puppet] site.pp: add gerrit2001 with just standard and IPv6

https://gerrit.wikimedia.org/r/344072

Change 344074 had a related patch set uploaded (by Dzahn):
[operations/dns] add IPv6 for gerrit2001.wikimedia.org

https://gerrit.wikimedia.org/r/344074

Change 344074 merged by Dzahn:
[operations/dns] add IPv6 for gerrit2001.wikimedia.org

https://gerrit.wikimedia.org/r/344074

Change 344187 had a related patch set uploaded (by Jcrespo):
[operations/dns@master] Add m2 aliases for db2011- in the future that should be a proxy

https://gerrit.wikimedia.org/r/344187

Change 344187 merged by Jcrespo:
[operations/dns@master] Add m2 aliases for db2011- in the future that should be a proxy

https://gerrit.wikimedia.org/r/344187

Dzahn added a comment.May 2 2017, 9:48 PM

Gerrit: Finish replication prep - https://gerrit.wikimedia.org/r/#/c/351520/ has been deployed.

Change 351525 had a related patch set uploaded (by Dzahn; owner: Chad):
[operations/puppet@production] Gerrit: Go ahead and apply gerrit role to new slave in codfw

https://gerrit.wikimedia.org/r/351525

Change 351525 merged by Dzahn:
[operations/puppet@production] Gerrit: Go ahead and apply gerrit role to new slave in codfw

https://gerrit.wikimedia.org/r/351525

Dzahn added a comment.May 2 2017, 10:23 PM

We need to allow SSH between both servers for clustering, just like for Phabricator in T137928#2565556. [https://gerrit.wikimedia.org/r/#/c/305277/]. First ferm and then we have to check if we also need ACLs like on T143363.

Change 351533 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: ferm rules to allow ssh between servers for clustering

https://gerrit.wikimedia.org/r/351533

Change 351533 merged by Dzahn:
[operations/puppet@production] gerrit: ferm rules to allow ssh between servers for clustering

https://gerrit.wikimedia.org/r/351533

Mentioned in SAL (#wikimedia-operations) [2017-05-02T22:38:26Z] <mutante> gerrit (cobalt/gerrit2001) - deployed firewall change to allow ssh between gerrit servers for clustering, new iptables rules exist now (T152525)

Change 351547 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: also allow ssh via IPv6 between servers

https://gerrit.wikimedia.org/r/351547

Change 351547 merged by Dzahn:
[operations/puppet@production] gerrit: also allow ssh via IPv6 between servers

https://gerrit.wikimedia.org/r/351547

Dzahn added a comment.May 2 2017, 11:54 PM

after some debug: We need to move the ssh public key from gerrit2's home dir to /etc/ssh/userkeys/ to make ssh work for replication, we should use ssh::userkey for it. This is because sshd config nowadays has:

46 #AuthorizedKeysFile %h/.ssh/authorized_keys
47 
48 AuthorizedKeysFile  /etc/ssh/userkeys/%u /etc/ssh/userkeys/%u.d/cumin

Change 351565 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: use ssh::userkey to install ssh key in proper location

https://gerrit.wikimedia.org/r/351565

Change 351566 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: use new ecdsa key for replication, add pub key

https://gerrit.wikimedia.org/r/351566

Change 351565 merged by Dzahn:
[operations/puppet@production] gerrit: use ssh::userkey to install ssh key in proper location

https://gerrit.wikimedia.org/r/351565

Change 351566 abandoned by Dzahn:
gerrit: use new ecdsa key for replication, add pub key

Reason:
fair enough

https://gerrit.wikimedia.org/r/351566

Change 351734 had a related patch set uploaded (by Dzahn; owner: Chad):
[operations/puppet@production] Gerrit: Start replicating to slaves

https://gerrit.wikimedia.org/r/351734

Change 351734 merged by Dzahn:
[operations/puppet@production] Gerrit: Start replicating to slaves

https://gerrit.wikimedia.org/r/351734

demon closed this task as Resolved.May 4 2017, 7:10 PM

Gerrit running on gerrit2001.wikimedia.org in codfw. Git data is being replicated just fine.

Dzahn updated the task description. (Show Details)May 4 2017, 7:26 PM
Dzahn awarded a token.