Page MenuHomePhabricator

Upgrade all mw* servers to debian jessie
Closed, ResolvedPublic

Description

We should upgrade all the mediawiki layer to use debian jessie. The clusters are:

  • Deployment servers (tin/mira)
  • Script servers (terbium/wasat)
  • Appservers
  • Api appservers
  • Jobrunners
  • Imagescalers
  • Videoscalers

In most of these cases, it's just matter of reimaging a huge number of servers.

Related Objects

Event Timeline

Joe created this task.Aug 22 2016, 8:03 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 22 2016, 8:03 AM
Joe added a comment.Aug 22 2016, 8:18 AM

Number of systems to reimage:

  • Jobrunners: 25
  • Appservers: 123
  • Api: 97
  • Videoscalers: 3
  • Script/deployment servers: 4

So we need to reimage 252 servers. Videoscalers and the script/deployment servers need some preparation work before they can be reimaged, but for everything else, (the huge bulk of the machines to reimage) can be done rather mechanically using wmf-reimage right now.

Six of the image scalers in codfw also need to be reimaged; all except mw208[67]

Papaul added a subscriber: Papaul.Aug 22 2016, 4:41 PM

@Joe I can help with the re-image

greg added a subscriber: greg.Aug 22 2016, 8:49 PM

Out of curiosity (and so we know what to do with MW-Vagrant) What's the timeline for this?

elukey added a subscriber: elukey.Aug 26 2016, 8:53 AM

Change 307482 had a related patch set uploaded (by Volans):
Reimaging: add option to reboot after the reimage

https://gerrit.wikimedia.org/r/307482

Change 307482 abandoned by Volans:
Reimaging: add option to reboot after the reimage

Reason:
Changed approach and instead creating a Python script to be run on salt master

https://gerrit.wikimedia.org/r/307482

Change 308520 had a related patch set uploaded (by Volans):
Automation: automatically reimage host

https://gerrit.wikimedia.org/r/308520

Change 308554 had a related patch set uploaded (by Volans):
Salt: reducing permissions on the master's Job cache

https://gerrit.wikimedia.org/r/308554

Volans moved this task from Backlog to In Code Review on the SRE-tools board.

Change 308554 merged by Volans:
Salt: reducing permissions on the master's Job cache

https://gerrit.wikimedia.org/r/308554

MoritzMuehlenhoff triaged this task as High priority.Sep 6 2016, 12:58 PM

Mentioned in SAL [2016-09-12T10:07:53Z] <volans> reimage mw2198, mw2199 to Jessie (again) T143536

All mw* servers in codfw with the exception of mw2152 (the video scaler) are now running jessie.

Change 308520 merged by Volans:
Automation: automatically reimage host

https://gerrit.wikimedia.org/r/308520

Change 310278 had a related patch set uploaded (by Volans):
Salt: fix source path typo

https://gerrit.wikimedia.org/r/310278

Change 310278 merged by Volans:
Salt: fix source path typo

https://gerrit.wikimedia.org/r/310278

Change 310283 had a related patch set uploaded (by Volans):
Salt: include password module

https://gerrit.wikimedia.org/r/310283

Change 310283 merged by Volans:
Salt: include password module

https://gerrit.wikimedia.org/r/310283

Script wmf_auto_reimage was launched by volans on neodymium.eqiad.wmnet for hosts:

['mw2198.codfw.wmnet', 'mw2199.codfw.wmnet']

The log can be found in /var/log/wmf_auto_reimage.log.

Script wmf_auto_reimage was launched by jmm on neodymium.eqiad.wmnet for hosts:

['mw2100.codfw.wmnet']

The log can be found in /var/log/wmf_auto_reimage.log.

Completed auto-reimage of hosts:

['mw2198.codfw.wmnet', 'mw2199.codfw.wmnet']

Those hosts were successful:

['mw2199.codfw.wmnet']

To set back the conftool status to their previous values run:

confctl --quiet select 'name=mw2199.codfw.wmnet' set/pooled=yes
confctl --quiet select 'name=mw2198.codfw.wmnet' set/pooled=yes

Completed auto-reimage of hosts:

['mw2100.codfw.wmnet']

Those hosts were successful:

['mw2100.codfw.wmnet']

To set back the conftool status to their previous values run:

confctl --quiet select 'name=mw2100.codfw.wmnet' set/pooled=yes

Change 310309 had a related patch set uploaded (by Volans):
Automation: improve wmf-auto-reimage

https://gerrit.wikimedia.org/r/310309

Change 310309 merged by Volans:
Automation: improve wmf-auto-reimage

https://gerrit.wikimedia.org/r/310309

Change 310587 had a related patch set uploaded (by Volans):
Salt: use puppetmaster CNAME

https://gerrit.wikimedia.org/r/310587

Change 310587 merged by Volans:
Salt: use puppetmaster CNAME

https://gerrit.wikimedia.org/r/310587

Change 310680 had a related patch set uploaded (by Volans):
auto-reimage: improved output

https://gerrit.wikimedia.org/r/310680

Change 310680 merged by Volans:
auto-reimage: improved output

https://gerrit.wikimedia.org/r/310680

Change 310768 had a related patch set uploaded (by Volans):
Auto-reimage: increase timeout for Icinga command

https://gerrit.wikimedia.org/r/310768

Change 310768 merged by Volans:
Auto-reimage: increase timeout for Icinga command

https://gerrit.wikimedia.org/r/310768

Volans moved this task from In Code Review to Done on the SRE-tools board.Sep 17 2016, 5:57 PM

Change 311605 had a related patch set uploaded (by Volans):
Reimage: improve output in case of errors

https://gerrit.wikimedia.org/r/311605

Change 311605 merged by Volans:
Reimage: improve output in case of errors

https://gerrit.wikimedia.org/r/311605

Change 311701 had a related patch set uploaded (by Volans):
Reimage: minor improvements

https://gerrit.wikimedia.org/r/311701

Change 311701 merged by Volans:
Reimage: minor improvements

https://gerrit.wikimedia.org/r/311701

Change 311940 had a related patch set uploaded (by Volans):
Reimage: fix import

https://gerrit.wikimedia.org/r/311940

Change 311940 merged by Volans:
Reimage: fix import

https://gerrit.wikimedia.org/r/311940

elukey updated the task description. (Show Details)Oct 19 2016, 12:46 PM
Volans added a subscriber: Volans.

What is the status of terbium? From the summary it appears to have been upgraded but the host is still a trusty.

MoritzMuehlenhoff closed this task as Resolved.Aug 29 2017, 11:44 AM
MoritzMuehlenhoff claimed this task.
MoritzMuehlenhoff updated the task description. (Show Details)

Migration to jessie is complete