Page MenuHomePhabricator

Upgrade all mw* servers to debian jessie
Closed, ResolvedPublic

Description

We should upgrade all the mediawiki layer to use debian jessie. The clusters are:

  • Deployment servers (tin/mira)
  • Script servers (terbium/wasat)
  • Appservers
  • Api appservers
  • Jobrunners
  • Imagescalers
  • Videoscalers

In most of these cases, it's just matter of reimaging a huge number of servers.

Related Objects

Event Timeline

Number of systems to reimage:

  • Jobrunners: 25
  • Appservers: 123
  • Api: 97
  • Videoscalers: 3
  • Script/deployment servers: 4

So we need to reimage 252 servers. Videoscalers and the script/deployment servers need some preparation work before they can be reimaged, but for everything else, (the huge bulk of the machines to reimage) can be done rather mechanically using wmf-reimage right now.

Six of the image scalers in codfw also need to be reimaged; all except mw208[67]

Out of curiosity (and so we know what to do with MW-Vagrant) What's the timeline for this?

Change 307482 had a related patch set uploaded (by Volans):
Reimaging: add option to reboot after the reimage

https://gerrit.wikimedia.org/r/307482

Change 307482 abandoned by Volans:
Reimaging: add option to reboot after the reimage

Reason:
Changed approach and instead creating a Python script to be run on salt master

https://gerrit.wikimedia.org/r/307482

Change 308520 had a related patch set uploaded (by Volans):
Automation: automatically reimage host

https://gerrit.wikimedia.org/r/308520

Change 308554 had a related patch set uploaded (by Volans):
Salt: reducing permissions on the master's Job cache

https://gerrit.wikimedia.org/r/308554

Change 308554 merged by Volans:
Salt: reducing permissions on the master's Job cache

https://gerrit.wikimedia.org/r/308554

Mentioned in SAL [2016-09-12T10:07:53Z] <volans> reimage mw2198, mw2199 to Jessie (again) T143536

All mw* servers in codfw with the exception of mw2152 (the video scaler) are now running jessie.

Change 308520 merged by Volans:
Automation: automatically reimage host

https://gerrit.wikimedia.org/r/308520

Change 310278 had a related patch set uploaded (by Volans):
Salt: fix source path typo

https://gerrit.wikimedia.org/r/310278

Change 310278 merged by Volans:
Salt: fix source path typo

https://gerrit.wikimedia.org/r/310278

Change 310283 had a related patch set uploaded (by Volans):
Salt: include password module

https://gerrit.wikimedia.org/r/310283

Change 310283 merged by Volans:
Salt: include password module

https://gerrit.wikimedia.org/r/310283

Script wmf_auto_reimage was launched by volans on neodymium.eqiad.wmnet for hosts:

['mw2198.codfw.wmnet', 'mw2199.codfw.wmnet']

The log can be found in /var/log/wmf_auto_reimage.log.

Script wmf_auto_reimage was launched by jmm on neodymium.eqiad.wmnet for hosts:

['mw2100.codfw.wmnet']

The log can be found in /var/log/wmf_auto_reimage.log.

Completed auto-reimage of hosts:

['mw2198.codfw.wmnet', 'mw2199.codfw.wmnet']

Those hosts were successful:

['mw2199.codfw.wmnet']

To set back the conftool status to their previous values run:

confctl --quiet select 'name=mw2199.codfw.wmnet' set/pooled=yes
confctl --quiet select 'name=mw2198.codfw.wmnet' set/pooled=yes

Completed auto-reimage of hosts:

['mw2100.codfw.wmnet']

Those hosts were successful:

['mw2100.codfw.wmnet']

To set back the conftool status to their previous values run:

confctl --quiet select 'name=mw2100.codfw.wmnet' set/pooled=yes

Change 310309 had a related patch set uploaded (by Volans):
Automation: improve wmf-auto-reimage

https://gerrit.wikimedia.org/r/310309

Change 310309 merged by Volans:
Automation: improve wmf-auto-reimage

https://gerrit.wikimedia.org/r/310309

Change 310587 had a related patch set uploaded (by Volans):
Salt: use puppetmaster CNAME

https://gerrit.wikimedia.org/r/310587

Change 310587 merged by Volans:
Salt: use puppetmaster CNAME

https://gerrit.wikimedia.org/r/310587

Change 310680 had a related patch set uploaded (by Volans):
auto-reimage: improved output

https://gerrit.wikimedia.org/r/310680

Change 310680 merged by Volans:
auto-reimage: improved output

https://gerrit.wikimedia.org/r/310680

Change 310768 had a related patch set uploaded (by Volans):
Auto-reimage: increase timeout for Icinga command

https://gerrit.wikimedia.org/r/310768

Change 310768 merged by Volans:
Auto-reimage: increase timeout for Icinga command

https://gerrit.wikimedia.org/r/310768

Change 311605 had a related patch set uploaded (by Volans):
Reimage: improve output in case of errors

https://gerrit.wikimedia.org/r/311605

Change 311605 merged by Volans:
Reimage: improve output in case of errors

https://gerrit.wikimedia.org/r/311605

Change 311701 had a related patch set uploaded (by Volans):
Reimage: minor improvements

https://gerrit.wikimedia.org/r/311701

Change 311701 merged by Volans:
Reimage: minor improvements

https://gerrit.wikimedia.org/r/311701

Change 311940 had a related patch set uploaded (by Volans):
Reimage: fix import

https://gerrit.wikimedia.org/r/311940

Volans subscribed.

What is the status of terbium? From the summary it appears to have been upgraded but the host is still a trusty.

MoritzMuehlenhoff claimed this task.
MoritzMuehlenhoff updated the task description. (Show Details)

Migration to jessie is complete