Page MenuHomePhabricator

Replace bast3001
Closed, ResolvedPublic

Description

bast3001 is having disk issues (T154603) that are unlikely to be fixed soon by a site visit (the hardware is super old). esams does have other unused servers though, and we could use those to replace the bastion with.

A good start would be hooft. This was our previous bastion, but we switched to slauerhoff when we were unable to reformat hooft to jessie, if memory serves. I think what happened back then with hooft was the PXE firmware was getting confused with all of the Ganglia UDP packets hooft was receiving, an issue that we figured out later in a different part of the infrastructure.

Event Timeline

@Dzahn, any chance you could take this?

Any news about this? I see @Dzahn you claimed that already :)

< mutante> paravoid: i'm back and will try the bast3001 reinstall today. first thing was "should it be install3002 or actually re-use the existing name". i guess keep it 3001 because the inconsistency is ugly.. < mutante> but if it wasn't that i would use new names for new servers

Change 339681 merged by Dzahn:
(re-)add hooft as bast3002

https://gerrit.wikimedia.org/r/339681

Change 339684 had a related patch set uploaded (by Dzahn):
add bast3002 to network constants

https://gerrit.wikimedia.org/r/339684

Change 339687 had a related patch set uploaded (by Dzahn):
dhcp/site: add bast3002

https://gerrit.wikimedia.org/r/339687

Change 339687 merged by Dzahn:
dhcp/site: add bast3002

https://gerrit.wikimedia.org/r/339687

Change 339698 had a related patch set uploaded (by Dzahn):
install: don't use http install method for bast3002

https://gerrit.wikimedia.org/r/339698

Change 339698 merged by Dzahn:
install: don't use http install method for bast3002

https://gerrit.wikimedia.org/r/339698

I was able to install jessie on the-server-formerly-known-as-hooft as "bast3002". It did not work over http. Over tftp it was still very slow and needed patience but did eventually finish.

Debian GNU/Linux 8 bast3002 ttyS1

bast3002 login:

Mentioned in SAL (#wikimedia-operations) [2017-02-25T01:43:26Z] <mutante> bast3002 - sign puppet cert, initial run with basic "bastion" role, to replace broken bast3001, but WIP, ganglia/prometheus roles not moved yet (T156506)

[bast3002:~] $ gen_fingerprints
+---------+---------+-------------------------------------------------+
| Cipher  | Algo    | Fingerprint                                     |
+---------+---------+-------------------------------------------------+
| RSA     | MD5     | 5e:2b:0b:da:fa:16:c2:9d:0e:f2:a0:ab:42:4d:b7:17 |
| RSA     | SHA-256 | 3IMu0Zs5cTA6V4k81wUpsEihM3ZP5WMj7gn8V7Nwy/0=    |
+---------+---------+-------------------------------------------------+
| DSA     | MD5     | b7:65:77:e7:a6:2e:af:e1:ed:3f:74:a8:14:57:83:9d |
| DSA     | SHA-256 | Xh9tdp6FrtaFwblK5s2fixW1a0AKqXxcbu7uksuqPhM=    |
+---------+---------+-------------------------------------------------+
| ECDSA   | MD5     | fc:8a:2b:af:ea:e1:27:72:1d:d5:25:0f:e3:0f:ab:d9 |
| ECDSA   | SHA-256 | 4jFetkjXoXVKbwm5mhwzdDVWTd+ejLIBdeujmz7cvLo=    |
+---------+---------+-------------------------------------------------+
| ED25519 | MD5     | 07:f4:7b:af:18:f9:74:8c:bc:b1:2a:94:db:6d:b2:c8 |
| ED25519 | SHA-256 | Y0mvj3+P7/yP2C9n681H5goh4wwvkkGKXyl7KHOx0AA=    |
+---------+---------+-------------------------------------------------+

Next is https://gerrit.wikimedia.org/r/#/c/339684/ and moving the roles:

installserver::tftp

prometheus::ops

ganglia::monitor::aggregator from 3001 to 3002, then shutting down 3001.

Any preference if the final step should be renaming 3002 back to 3001 or just leave it as 3002 or CNAME 3001 to 3002?

Change 340163 had a related patch set uploaded (by Dzahn; owner: Dzahn):
ganglia: move esams aggregator from bast3001 to bast3002

https://gerrit.wikimedia.org/r/340163

Change 340165 had a related patch set uploaded (by Dzahn; owner: Dzahn):
install/bast: move tftp server from bast3001 to bast3002

https://gerrit.wikimedia.org/r/340165

Change 340166 had a related patch set uploaded (by Dzahn; owner: Dzahn):
install/prometheus: move prometheus::ops from bast3001 to 3002

https://gerrit.wikimedia.org/r/340166

Change 340169 had a related patch set uploaded (by Dzahn; owner: Dzahn):
install: remove bast3001 from puppet and smokeping

https://gerrit.wikimedia.org/r/340169

Change 339684 merged by Dzahn:
add bast3002 to network constants

https://gerrit.wikimedia.org/r/339684

Change 340173 had a related patch set uploaded (by Dzahn; owner: Dzahn):
prometheus: add bast3002 as second esams host

https://gerrit.wikimedia.org/r/340173

Change 340173 merged by Dzahn:
prometheus: add bast3002 as second esams host

https://gerrit.wikimedia.org/r/340173

Change 340163 merged by Dzahn:
ganglia: move esams aggregator from bast3001 to bast3002

https://gerrit.wikimedia.org/r/340163

Change 340165 merged by Dzahn:
install/bast: move tftp server from bast3001 to bast3002

https://gerrit.wikimedia.org/r/340165

Change 340166 merged by Dzahn:
install/prometheus: add prometheus::ops to bast3002

https://gerrit.wikimedia.org/r/340166

Mentioned in SAL (#wikimedia-operations) [2017-02-28T02:18:40Z] <mutante> rsyncing prometheus metrics data from bast3001 to bast3002 (T156506)

Change 340272 had a related patch set uploaded (by Dzahn; owner: Dzahn):
switch prometheus.eqiad to bast3002

https://gerrit.wikimedia.org/r/340272

Change 340272 merged by Dzahn:
[operations/dns] switch prometheus.esams to bast3002

https://gerrit.wikimedia.org/r/340272

Change 340169 merged by Dzahn:
[operations/puppet] smokeping: replace bast3001 with bast3002

https://gerrit.wikimedia.org/r/340169

Change 340811 had a related patch set uploaded (by Dzahn):
[operations/puppet] prometheus: remove bast3001 as esams server, keep bast3002

https://gerrit.wikimedia.org/r/340811

Change 340811 merged by Dzahn:
[operations/puppet] prometheus: remove bast3001 as esams node, keep bast3002

https://gerrit.wikimedia.org/r/340811

Change 340812 had a related patch set uploaded (by Dzahn):
[operations/puppet] bast3001: remove puppet roles, add role::spare for decom

https://gerrit.wikimedia.org/r/340812

Change 340813 had a related patch set uploaded (by Dzahn):
[operations/puppet] bast3001: remove from network/constants.pp

https://gerrit.wikimedia.org/r/340813

Change 340812 merged by Dzahn:
[operations/puppet] bast3001: remove puppet roles, add role::spare for decom

https://gerrit.wikimedia.org/r/340812

Change 340833 had a related patch set uploaded (by Dzahn):
[operations/puppet] bastion: rsync home dir data bast3001->bast3002

https://gerrit.wikimedia.org/r/340833

Change 340833 merged by Dzahn:
[operations/puppet] bastion: rsync home dir data bast3001->bast3002

https://gerrit.wikimedia.org/r/340833

Change 340842 had a related patch set uploaded (by Dzahn):
[operations/puppet] bast3002: remove bastionhost::migration role

https://gerrit.wikimedia.org/r/340842

replaced by bast3002 for all practical purposes (prometheus and ganglia roles moved too)

copied home dir data, mailed ops list about it, edited wikitech pages, pasted new fingerprints as above,

created follow-up decom ticket at T159480

last step here will be removing it from firewall rules https://gerrit.wikimedia.org/r/#/c/340813/

then i will hand over the decom steps to dc-ops

Change 340842 merged by Dzahn:
[operations/puppet] bast3002: remove bastionhost::migration role

https://gerrit.wikimedia.org/r/340842

Mentioned in SAL (#wikimedia-operations) [2017-03-02T22:09:37Z] <mutante> bast3002 - stop rsyncd, remove rsyncd config snippets (T156506)

Change 340813 merged by Dzahn:
[operations/puppet] bast3001: remove from network/constants.pp

https://gerrit.wikimedia.org/r/340813

Change 341451 had a related patch set uploaded (by dzahn):
[operations/puppet] delete unused bastionhost::migration class

https://gerrit.wikimedia.org/r/341451

Change 341451 merged by Dzahn:
[operations/puppet] delete unused bastionhost::migration class

https://gerrit.wikimedia.org/r/341451