Page MenuHomePhabricator

Migrate hydrogen/chromium to jessie
Closed, ResolvedPublic


Still on precise, migrate to jessie (maerlant and nescio are already on jessie, others are using trusty)

Event Timeline

MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff raised the priority of this task from to Needs Triage.
MoritzMuehlenhoff added a project: Operations.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 15 2016, 12:21 PM

since these are dnsrecursors (i addition to urldownloader), what steps have to be taken before one of them can be taken down for reinstall? any?

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 8:39 PM

For the dnsrec service the server should be depooled via confctl. For NTP all our servers are configured to use multiple NTP servers, so as long as only one system is being reimaged at a time, it should be fine.

fgiunchedi triaged this task as Normal priority.Apr 27 2016, 2:58 PM

Change 285753 had a related patch set uploaded (by Dzahn):
installserver: let hydrogen use jessie installer

For the dnsrec service the server should be depooled via confctl.


[palladium:~] $ sudo confctl --tags dc=eqiad,cluster=dns,service=pdns_recursor --action get
{"": {"pooled": "yes", "weight": 10}, "tags": "dc=eqiad,cluster=dns,service=pdns_recursor"}

[palladium:~] $ sudo confctl --tags dc=eqiad,cluster=dns,service=pdns_recursor --action get
{"": {"pooled": "yes", "weight": 10}, "tags": "dc=eqiad,cluster=dns,service=pdns_recursor"}

set (not executed) would then be:

sudo confctl --tags dc=eqiad,cluster=dns,service=pdns_recursor --action set/pooled-no

and no edit needed in the puppet repo in conftool-data.. does that all seem right?

Change 285753 merged by Dzahn:
installserver: let hydrogen use jessie installer

hydrogen and chromium also appear on T136562 for not having RAID.

that should be done as part of this ticket too

we picked hydrogen to start with.

removes it from /etc/resolv.conf on LVS servers

after that we are going to depool it

hydrogen was in netboot.cfg twice with different partman recipe

had to racreset to see console output after reboot, booted into PXE, reinstalled with jessie now

re-added to puppet, re-added to salt

1[hydrogen:~] $ gen_fingerprints
3| Cipher | Algo | Fingerprint |
5| RSA | MD5 | 3d:b7:20:30:7f:3b:d4:78:6d:b0:f9:96:2b:f9:32:00 |
6| RSA | SHA-256 | 7tJdX+OpxpRab4RniQJdC0gh4xwEO5anOMjRPHAhZ9o= |
8| DSA | MD5 | d8:b1:0f:dd:7e:3a:06:09:97:82:6c:0b:32:e3:f5:d1 |
9| DSA | SHA-256 | fJQI1z+Fc8NXy7oatkUGZqA1wZpZYiIxql08X8qw/L0= |
11| ECDSA | MD5 | 23:99:0d:af:94:0d:1f:33:4b:e8:bb:c6:a2:ec:50:23 |
12| ECDSA | SHA-256 | Pg6ebLcSEsMAqo7PAC4SAdoPOwFg7Z+JnYzwo4bcMQM= |
14| ED25519 | MD5 | 64:f1:05:5e:19:fa:c3:d5:6f:14:7f:c9:7d:d2:50:09 |
15| ED25519 | SHA-256 | AEo1xsX4w0bBOKtmDPSld8/87rh8ibTYpA+g9zaaBgg= |

Dzahn added a comment.EditedAug 23 2016, 10:01 PM

21:05 mutante: hydrogen - reinstall finished, re-added to salt, restarted ntpd
20:42 mutante: hydrogen - signing new puppet cert
20:22 mutante: hydrogen - reinstalling one more time, wrong partitioning
19:55 mutante: re-signing new puppet certs for hydrogen, initial run, new salt key

installed a second time. now with RAID (/dev/md0)

restarted NTP server, checked that it was in sync with chromium.. icinga recovered..

checked that /etc/powerdns was populated, service is running after second puppet run

tested with dig that it answers to requests over from palladium

21:27 logmsgbot: dzahn@palladium conftool action : set/pooled=yes; selector: dc=eqiad,cluster=dns,

saw traffic coming back in ganglia

waited a little while and then reverted the LVS config change:

ran puppet on lvs100x...

NTP in sync with chromium:

root@hydrogen:~# ntpdc -c peers | grep chrom
+chromium.wikime 2620:0:861:1:20 3 128 377 0.00009 -0.012633 0.08954

and Icinga: NTP OK: Offset -0.006912 secs

also, counter increasing here on lvs1002

root@lvs1002:~# ipvsadm -Ln -u
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
UDP wrr
  ->             Route   10     0          1215214   
  ->            Route   10     0          2709388
Dzahn set Security to None.
Dzahn claimed this task.
Dzahn added a comment.EditedAug 24 2016, 11:50 PM

21:27 mutante: depooling chromium for reinstall. scheduled downtime for host and service IPs
21:50 mutante: running puppet on lvs servers, removing chromium from resolv.conf for reinstall
22:05 mutante: stopping puppet and pdns-recursor on chromium
22:20 mutante: rebooting chromium into PXE

22:49 mutante: chromium - revoking and re-signing puppet certs, salt keys, initial puppet run..

reinstalled with jessie

< mutante> !log chromium - install ntpdate, stop ntp, sync time with hydrogen, start ntp, remove ntpdate

< icinga-wm> RECOVERY - NTP peers on chromium is OK: NTP OK: Offset -0.00151 secs

16:49 < logmsgbot> !log dzahn@palladium conftool action : set/pooled=yes; selector:

Dzahn closed this task as Resolved.Aug 25 2016, 12:09 AM

17:14 < mutante> !log chromium back in service - both eqiad DNS recursors now on jessie