Page MenuHomePhabricator

Reboot irc.wikimedia.org for kernel upgrades
Closed, ResolvedPublic

Description

We need to reboot irc.wikimedia.org (along many other hosts) to apply some long standing kernel updates. . Downtime is expected to be a few minutes as the reboot will happen along the reboot of the entire codfw ganeti infrastructure.

This normally would be easy and not require a ticket if not for the nature of the service. Hence filing this task as an advance notice. for interested parties. https://wikitech.wikimedia.org/wiki/Irc.wikimedia.org has some information as to the stuff that may need some extra help recovering for that, we should probably update whatever is needed before (and possibly after the reboot).

This will happen on June 21st.

Details

Related Gerrit Patches:
operations/puppet : productionSwitch codfw puppetdb hosts to eqiad

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 12 2017, 9:19 AM
akosiaris triaged this task as Normal priority.Jun 12 2017, 9:19 AM
Johan added a subscriber: Johan.Jun 14 2017, 2:02 AM

(As this has been marked with user-notice.)

How tentative is that date? Is it worth spreading the word mentioning at specific date yet?

It's dependent on a kernel upgrade due to be released on the 19th. That has already been rescheduled once. I wish I could provide a degree of certainty but it is not dependent on something we control. But it does look like it's not going to be rescheduled. I would suggest we spread the word and just re-spread it if we have to reschedule.

The date is no longer tentative. It's now fixed.

Johan added a comment.Jun 20 2017, 1:22 PM

It was included in the issue of Tech News that went out yesterday.

Gestrid updated the task description. (Show Details)Jun 20 2017, 2:38 PM

Per @akosiaris ' last comment, I've taken the liberty to update the task description to reflect the now non-tentative date.

This is happening now.

Mentioned in SAL (#wikimedia-operations) [2017-06-21T10:35:28Z] <akosiaris> rebooting the entire codfw ganeti cluster for kernel upgrades. Silenced hosts in icinga already. T167643

Change 360629 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Switch codfw puppetdb hosts to eqiad

https://gerrit.wikimedia.org/r/360629

Change 360629 merged by Alexandros Kosiaris:
[operations/puppet@production] Switch codfw puppetdb hosts to eqiad

https://gerrit.wikimedia.org/r/360629

This has been completed successfully. I see 92 users (well bots actually/probably) already in #en.wikipedia and another 64 in #de.wikipedia. ClueBot_NG is among them. That's a bit lower than the 120 and 75 respectively we had before the reboot, but that's probably clients not behaving very well to disconnections. Overall this has gone very well, I am gonna leave the task open for a few days for monitoring.

akosiaris closed this task as Resolved.Jun 30 2017, 8:44 AM

No issue reported in a week, resolving