Page MenuHomePhabricator

setup/install gerrit2001/WMF6408
Closed, ResolvedPublic

Description

This task will track the setup and deployment of gerrit2001/WMF6408 in codfw. This system was allocated via T148187 and procurement task T150885.

Partition Scheme: Raid1 the SSDs and have a small / and a large /srv. disable the sata disks entirely in the bios, since they won't be used.

Naming scheme: Since these are dedicated gerrit boxes that will now have a warm standby, @Dzahn suggested (and it makes sense to @RobH) to name the new system gerrit2001. If/when the eqiad gerrit box is reimaged/replaced, it should likely rename to gerrit1001.

  • - update physical label / racktables via sub-task
  • - install SSDs via sub-task
  • - network port updated (desc/enable/vlan public)
  • - dns update (mgmt and production public vlan) - https://gerrit.wikimedia.org/r/#/c/325860/
  • - install server update (dhcp and partitioning)
  • - install os (jessie)
  • - accept/sign salt/puppet
  • - handoff for service implementation

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)

Please update this task with the network port this system is plugged into. I neglected to ask you do to that via the sub task. Then assign back to me, thanks!

RobH added a subscriber: Papaul.

Change 335483 had a related patch set uploaded (by RobH):
gerrit2001 dns update

https://gerrit.wikimedia.org/r/335483

Change 335700 had a related patch set uploaded (by RobH):
fixing my dns typo

https://gerrit.wikimedia.org/r/335700

Assigning this task to Chad. Once he is aware that this system is all theirs, he can resolve.

@demon So if we'd just put the role gerrit::server on this one as well, let's figure out which things need to be stopped or skipped when not on the "active" server / what is different between prod and "warm standby" from a puppet point of view. Can all services just run or do we nede to stop some etc

Assigning this task to Chad. Once he is aware that this system is all theirs, he can resolve.

Confirmed.

@demon So if we'd just put the role gerrit::server on this one as well, let's figure out which things need to be stopped or skipped when not on the "active" server / what is different between prod and "warm standby" from a puppet point of view. Can all services just run or do we nede to stop some etc

I did some work towards this end, but I don't think it's 100% ready for just blind application of the role. I'll have to review it again :)

Change 336658 had a related patch set uploaded (by Dzahn):
Gerrit: Add gerrit-roots to new gerrit2001 in Dallas

https://gerrit.wikimedia.org/r/336658

Change 336658 merged by Dzahn:
Gerrit: Add gerrit-roots to new gerrit2001 in Dallas

https://gerrit.wikimedia.org/r/336658

@demon @20after4 and @Catrope can now SSH to gerrit2001 and have root like on the current prod server

Change 344072 had a related patch set uploaded (by Dzahn):
[operations/puppet] site.pp: add gerrit2001 with just standard and IPv6

https://gerrit.wikimedia.org/r/344072

Change 344072 merged by Dzahn:
[operations/puppet] site.pp: add gerrit2001 with just standard and IPv6

https://gerrit.wikimedia.org/r/344072

Change 344074 had a related patch set uploaded (by Dzahn):
[operations/dns] add IPv6 for gerrit2001.wikimedia.org

https://gerrit.wikimedia.org/r/344074

Change 344074 merged by Dzahn:
[operations/dns] add IPv6 for gerrit2001.wikimedia.org

https://gerrit.wikimedia.org/r/344074

Change 344187 had a related patch set uploaded (by Jcrespo):
[operations/dns@master] Add m2 aliases for db2011- in the future that should be a proxy

https://gerrit.wikimedia.org/r/344187

Change 344187 merged by Jcrespo:
[operations/dns@master] Add m2 aliases for db2011- in the future that should be a proxy

https://gerrit.wikimedia.org/r/344187

Change 351525 had a related patch set uploaded (by Dzahn; owner: Chad):
[operations/puppet@production] Gerrit: Go ahead and apply gerrit role to new slave in codfw

https://gerrit.wikimedia.org/r/351525

Change 351525 merged by Dzahn:
[operations/puppet@production] Gerrit: Go ahead and apply gerrit role to new slave in codfw

https://gerrit.wikimedia.org/r/351525

We need to allow SSH between both servers for clustering, just like for Phabricator in T137928#2565556. [https://gerrit.wikimedia.org/r/#/c/305277/]. First ferm and then we have to check if we also need ACLs like on T143363.

Change 351533 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: ferm rules to allow ssh between servers for clustering

https://gerrit.wikimedia.org/r/351533

Change 351533 merged by Dzahn:
[operations/puppet@production] gerrit: ferm rules to allow ssh between servers for clustering

https://gerrit.wikimedia.org/r/351533

Mentioned in SAL (#wikimedia-operations) [2017-05-02T22:38:26Z] <mutante> gerrit (cobalt/gerrit2001) - deployed firewall change to allow ssh between gerrit servers for clustering, new iptables rules exist now (T152525)

Change 351547 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: also allow ssh via IPv6 between servers

https://gerrit.wikimedia.org/r/351547

Change 351547 merged by Dzahn:
[operations/puppet@production] gerrit: also allow ssh via IPv6 between servers

https://gerrit.wikimedia.org/r/351547

after some debug: We need to move the ssh public key from gerrit2's home dir to /etc/ssh/userkeys/ to make ssh work for replication, we should use ssh::userkey for it. This is because sshd config nowadays has:

46 #AuthorizedKeysFile %h/.ssh/authorized_keys
47 
48 AuthorizedKeysFile  /etc/ssh/userkeys/%u /etc/ssh/userkeys/%u.d/cumin

Change 351565 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: use ssh::userkey to install ssh key in proper location

https://gerrit.wikimedia.org/r/351565

Change 351566 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: use new ecdsa key for replication, add pub key

https://gerrit.wikimedia.org/r/351566

Change 351565 merged by Dzahn:
[operations/puppet@production] gerrit: use ssh::userkey to install ssh key in proper location

https://gerrit.wikimedia.org/r/351565

Change 351566 abandoned by Dzahn:
gerrit: use new ecdsa key for replication, add pub key

Reason:
fair enough

https://gerrit.wikimedia.org/r/351566

Change 351734 had a related patch set uploaded (by Dzahn; owner: Chad):
[operations/puppet@production] Gerrit: Start replicating to slaves

https://gerrit.wikimedia.org/r/351734

Change 351734 merged by Dzahn:
[operations/puppet@production] Gerrit: Start replicating to slaves

https://gerrit.wikimedia.org/r/351734

Gerrit running on gerrit2001.wikimedia.org in codfw. Git data is being replicated just fine.

Dzahn awarded a token.