Split carbon's install/mirror roles, provision install1001
Closed, ResolvedPublic

Description

OK, so we have a few things in-flight right now (e.g. jessie upgrades, T132450/HTTPS for carbon etc.), so it's about time to clean this all up.

This would be my suggestion:

  • Upgrade install2001 to jessie, make sure our manifests work there (trivial)
  • Deploy a new install1001 VM
  • move install_server (DHCP, TFTP, preseed web) to install1001
  • move reprepro/apt (remember /root/.gnupg/!) to install1001
  • Remove all install_server-related roles out of carbon and designate it as a mirror server (T84817); reinstall carbon with jessie (T123733)
  • (optionally) split the reprepro role out of install_server in puppet, they're really different roles
  • Upgrade install2001 to match install1001 (i.e. add DHCP, point codfw's DHCP relays to install2001) (T84380)
  • Set up apt on install2001 too (sync done with rsync in cron)
  • use a reprepro hooks to sync apt across and possibly make apt.wikimedia.org HA (perhaps with gdnsd). -> (T158022)

Any takers? @Dzahn maybe?

related patches: https://gerrit.wikimedia.org/r/#/q/topic:installserver+%28status:open+OR+status:merged%29

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 325739 had a related patch set uploaded (by Dzahn):
install: add http & proxy roles on install1001

https://gerrit.wikimedia.org/r/325739

Dzahn added a comment.Dec 7 2016, 6:14 AM
node 'carbon.wikimedia.org' {
    role(installserver::tftp,
        installserver::dhcp,
        installserver::http,
        installserver::proxy,
        installserver::preseed,
        aptrepo::wikimedia)

Change 325743 had a related patch set uploaded (by Dzahn):
install: add http & proxy roles on install2001

https://gerrit.wikimedia.org/r/325743

Change 325737 merged by Dzahn:
install: add 'preseed'-role to install1001

https://gerrit.wikimedia.org/r/325737

Change 325864 had a related patch set uploaded (by Dzahn):
install: copy/move apt.wm.org setup to aptrepo module

https://gerrit.wikimedia.org/r/325864

Change 325864 abandoned by Dzahn:
install: copy/move apt.wm.org setup to aptrepo module

Reason:
wrong approach after rethinking

https://gerrit.wikimedia.org/r/325864

Change 325864 restored by Dzahn:
install: copy/move apt.wm.org setup to aptrepo module

https://gerrit.wikimedia.org/r/325864

Change 325739 merged by Dzahn:
install: add http & proxy roles on install1001

https://gerrit.wikimedia.org/r/325739

Change 328429 had a related patch set uploaded (by Dzahn):
install: add hiera override to skip Letsencrypt cert creation

https://gerrit.wikimedia.org/r/328429

Change 328429 merged by Dzahn:
install: add hiera override to skip Letsencrypt cert creation

https://gerrit.wikimedia.org/r/328429

Change 325743 merged by Dzahn:
install: add http & proxy roles on install2001

https://gerrit.wikimedia.org/r/325743

Change 328439 had a related patch set uploaded (by Dzahn):
install/dhcp: switch all "next-server" from carbon to install1001

https://gerrit.wikimedia.org/r/328439

Change 328448 had a related patch set uploaded (by Dzahn):
dhcp: switch private-c-eqiad to install1001 as tftp

https://gerrit.wikimedia.org/r/328448

Change 328448 merged by Dzahn:
dhcp: switch private-c-eqiad to install1001 as tftp

https://gerrit.wikimedia.org/r/328448

Change 328450 had a related patch set uploaded (by Dzahn):
move ganglia aggregator eqiad from carbon to install1001

https://gerrit.wikimedia.org/r/328450

Change 328439 merged by Dzahn:
install/dhcp: switch all "next-server" from carbon to install1001

https://gerrit.wikimedia.org/r/328439

so current status is now:

if in public1-b-eqiad or public1-c-eqiad then install1001 is used for everything (DHCP -> TFTP -> serves installer), if in other subnets we still use carbon as DHCP but then TFTP is also install1001 already

Dzahn added a comment.Dec 21 2016, 8:48 PM

other things we need:

  • move ganglia aggregator
  • change ACLs / ferm rules for webproxy
    • (it's still webproxy 1H IN CNAME carbon.wikimedia.org. in DNS)
  • rsync apt repo again
  • confirm if we still want to reinstall carbon as a "role mirror" as the ticket says or if that is already done because sodium is now a mirror and we just decom carbon
  • ?

Change 328597 had a related patch set uploaded (by Dzahn):
openstack: switch tftp server from carbon to install1001

https://gerrit.wikimedia.org/r/328597

Dzahn edited the task description. (Show Details)Dec 22 2016, 12:35 AM
  • @akosiaris configured switches so that public1-b-eqiad and public1-c-eqiad are using install1001 as DHCP

Correction: private1-c-eqiad, private1-b-codfw

so current status is now:

if in public1-b-eqiad or public1-c-eqiad then install1001 is used for everything (DHCP -> TFTP -> serves installer), if in other subnets we still use carbon as DHCP but then TFTP is also install1001 already

Same correction here as well.

Dzahn added a comment.Dec 22 2016, 4:17 PM

oops, yea, of course, private(!)-eqiad, consider that a typo

Change 328450 abandoned by Dzahn:
move ganglia aggregator eqiad from carbon to install1001

Reason:
duplicate of https://gerrit.wikimedia.org/r/#/c/328599/

https://gerrit.wikimedia.org/r/328450

Change 328599 had a related patch set uploaded (by Dzahn):
ganglia: switch eqiad aggregator from carbon to install1001

https://gerrit.wikimedia.org/r/328599

Change 328597 merged by Dzahn:
openstack: switch tftp server from carbon to install1001

https://gerrit.wikimedia.org/r/328597

Change 328599 merged by Dzahn:
ganglia: switch eqiad aggregator from carbon to install1001

https://gerrit.wikimedia.org/r/328599

Change 333676 had a related patch set uploaded (by Dzahn):
aptrepo: setup rsync between 2 APT servers

https://gerrit.wikimedia.org/r/333676

Dzahn edited the task description. (Show Details)Jan 23 2017, 8:47 PM

Change 333676 merged by Dzahn:
aptrepo: setup rsync between 2 APT servers

https://gerrit.wikimedia.org/r/333676

Dzahn added a comment.Jan 24 2017, 7:45 AM

I moved the eqiad Ganglia aggregator from carbon to install1001 today. This part is unblocked.

Change 334221 had a related patch set uploaded (by Dzahn):
switch install_server to carbon for initial rsync of APT repo data

https://gerrit.wikimedia.org/r/334221

Change 334221 merged by Dzahn:
switch install_server to carbon for initial rsync of APT repo data

https://gerrit.wikimedia.org/r/334221

Change 334237 had a related patch set uploaded (by Dzahn):
aptrepo: fix rsyncd 'hosts allow' syntax

https://gerrit.wikimedia.org/r/334237

Change 334237 merged by Dzahn:
aptrepo: fix rsyncd 'hosts allow' syntax

https://gerrit.wikimedia.org/r/334237

Change 334465 had a related patch set uploaded (by Dzahn):
aptrepo: add second rsync module for entire /srv/

https://gerrit.wikimedia.org/r/334465

Change 334465 merged by Dzahn:
aptrepo: add second rsync module for entire /srv/

https://gerrit.wikimedia.org/r/334465

Change 335372 had a related patch set uploaded (by Dzahn):
add install1002/2002 to replace 1001/2001

https://gerrit.wikimedia.org/r/335372

Change 335372 merged by Dzahn:
add install1002/2002 to replace 1001/2001

https://gerrit.wikimedia.org/r/335372

Change 335376 had a related patch set uploaded (by Dzahn):
add install1002/2002 to replace install1001/2001

https://gerrit.wikimedia.org/r/335376

Change 335376 merged by Papaul:
add install1002/2002 to replace install1001/2001

https://gerrit.wikimedia.org/r/335376

Change 335379 had a related patch set uploaded (by Dzahn):
move install1002 to lower free IP address

https://gerrit.wikimedia.org/r/335379

Change 335379 merged by Dzahn:
move install1002 to lower free IP address

https://gerrit.wikimedia.org/r/335379

Mentioned in SAL (#wikimedia-operations) [2017-02-01T01:15:26Z] <mutante> ganeti: install1001 - remove virtual disk 1 from instance | create instance install1002 instead (T132757)

Change 335386 had a related patch set uploaded (by Dzahn):
DHCP: add install1001,install2001

https://gerrit.wikimedia.org/r/335386

Change 335386 merged by Dzahn:
DHCP: add install1001,install2001

https://gerrit.wikimedia.org/r/335386

Mentioned in SAL (#wikimedia-operations) [2017-02-01T02:48:05Z] <mutante> install1002, install2002 - install jessie, sign puppet certs, initial puppet run (T132757, T156440)

Change 335388 had a related patch set uploaded (by Dzahn):
installserver: add install1002/2002 to hiera

https://gerrit.wikimedia.org/r/335388

Change 335388 merged by Dzahn:
installserver: add install1002/2002 to hiera

https://gerrit.wikimedia.org/r/335388

Mentioned in SAL (#wikimedia-operations) [2017-02-01T22:54:51Z] <mutante> carbon - rsyncing /srv/ data to install1002 (T132757)

Change 334241 had a related patch set uploaded (by Dzahn):
aptrepo: add cron to rsync APT data automatically

https://gerrit.wikimedia.org/r/334241

Change 334241 merged by Dzahn:
aptrepo: add cron to rsync APT data automatically

https://gerrit.wikimedia.org/r/334241

Change 335585 had a related patch set uploaded (by Dzahn):
installserver: add firewall hole for rsync also for IPv6

https://gerrit.wikimedia.org/r/335585

Change 335585 merged by Dzahn:
installserver: add firewall hole for rsync also for IPv6

https://gerrit.wikimedia.org/r/335585

Mentioned in SAL (#wikimedia-operations) [2017-02-02T02:02:09Z] <mutante> carbon - remove unmapped IPv6 address making ferm rules fail, use only the _mapped_ IP (ip addr del 2620:0:861:1:7a2b:cbff:fe09:ea0/64 dev eth0) (T84380 T132757)

Change 335594 had a related patch set uploaded (by Dzahn):
aptrepo: disable autoconfigured EUI64 addresses

https://gerrit.wikimedia.org/r/335594

Change 335594 abandoned by Dzahn:
aptrepo: disable autoconfigured EUI64 addresses

Reason:
it just affects precise so we can just ignore the issue on carbon until it's down. (rsync -4 to work around it for example)

https://gerrit.wikimedia.org/r/335594

Change 335734 had a related patch set uploaded (by Dzahn):
switch apt.wm.org from carbon to install1002

https://gerrit.wikimedia.org/r/335734

Dzahn edited the task description. (Show Details)Feb 3 2017, 12:25 AM

Change 336959 had a related patch set uploaded (by Dzahn):
install/TFTP: use install1002 and install2002 as next-servers

https://gerrit.wikimedia.org/r/336959

Change 337084 had a related patch set uploaded (by Dzahn):
remove install1001/install2001 from site.pp

https://gerrit.wikimedia.org/r/337084

Change 336959 merged by Dzahn:
install/DHCP/TFTP: use install1002 and install2002 as next-servers

https://gerrit.wikimedia.org/r/336959

Change 337084 merged by Dzahn:
remove install1001/install2001 from site.pp

https://gerrit.wikimedia.org/r/337084

Change 337093 had a related patch set uploaded (by Dzahn):
remove install1001 and install2001, keep 2001 mgmt

https://gerrit.wikimedia.org/r/337093

Mentioned in SAL (#wikimedia-operations) [2017-02-10T21:18:27Z] <mutante> install1001, install2001 - revoke puppet certs, puppet node deactivate, delete salt keys (T84380, T132757)

Mentioned in SAL (#wikimedia-operations) [2017-02-10T21:27:47Z] <mutante> install1001, install2001 - removed from Icinga, shutting down (T84380, T132757)

Mentioned in SAL (#wikimedia-operations) [2017-02-10T22:17:16Z] <mutante> install1001 - shutdown ganeti instance and deleting it and its disk (T132757)

Mentioned in SAL (#wikimedia-operations) [2017-02-10T22:29:17Z] <mutante> carbon - stopping puppet and most services, adding deprecation warning to motd, rsyncing data one last time (T132757)

Change 335734 merged by Dzahn:
switch apt.wm.org from carbon to install1002

https://gerrit.wikimedia.org/r/335734

16:02 < mutante> !log switching apt.wikimedia.org from carbon to install1002 - there might be a short time until the LE SSL cert is also adjusted

Change 337197 had a related patch set uploaded (by Dzahn):
install: remove carbon from puppet and netboot

https://gerrit.wikimedia.org/r/337197

Change 325864 abandoned by Dzahn:
install: copy/move apt.wm.org setup to aptrepo module

Reason:
thinking about it again i guess these things should stay in the "webserver" class even though "aptrepo" might sound like it sets up apt.wikimedia.org

https://gerrit.wikimedia.org/r/325864

Mentioned in SAL (#wikimedia-operations) [2017-02-13T19:38:04Z] <mutante> carbon - synced /srv/ data to install1002/2002 for the last time, switching apt.wikimedia.org CNAME to install1002 - carbon deprecated (T132757)

Change 337443 had a related patch set uploaded (by Dzahn):
install: enable Letsencrypt on install1002

https://gerrit.wikimedia.org/r/337443

Change 337443 merged by Dzahn:
install: enable Letsencrypt on install1002

https://gerrit.wikimedia.org/r/337443

Change 337198 had a related patch set uploaded (by Dzahn):
let install1002 be the new source for APT data rsync

https://gerrit.wikimedia.org/r/337198

Change 337198 merged by Dzahn:
let install1002 be the new source for APT data rsync

https://gerrit.wikimedia.org/r/337198

Mentioned in SAL (#wikimedia-operations) [2017-02-13T20:51:57Z] <mutante> carbon/install - adjusted Letsencrypt cert creation, deactivated reprepro to protect from accidental use, switching rsync direction from install1002->install2002, disabled cron on carbon (T132757)

Dzahn edited the task description. (Show Details)Feb 13 2017, 9:52 PM
Dzahn edited the task description. (Show Details)
Dzahn closed this task as "Resolved".Feb 14 2017, 1:07 AM

all the things originally listed in this ticket have been done - except the "make APT HA" one which has been split out into T158022

decom of carbon is in T158020

based on that i am resolving this