Upgrade mw* servers to Debian Stretch (using HHVM)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MoritzMuehlenhoff
	Aug 29 2017, 12:25 PM

Description

Bug to track the upgrade of the MediaWiki servers from Debian 8 Jessie to Debian 9 Stretch. It consists of:

The following preliminary steps need to be fulfilled:

Build HHVM for stretch-wikimedia
Build HHVM extensions for stretch-wikimedia (luasandbox, tidy, wikidiff2)
ICU has changed it's ABI again (libicu52 in jessie, libicu57 in stretch), we could deploy a backport of libicu57 for jessie and do the migration there

Lilypond is not in stretch, but can be installed from stretch-backports.

P6996 mediawiki reimages

1	These clusters are complete:
2	mwdebug servers
3	application servers
4	API servers
5	job runners

Details

Subject	Repo	Branch	Lines +/-
install_server: let mw21[6-9][0-9] have software RAID	operations/puppet	production	+1 -0
Reimage mwdebug servers with stretch	operations/puppet	production	+0 -8
Switch all mw hosts to stretch	operations/puppet	production	+0 -420
Reimage mw1279 (API canary) with stretch	operations/puppet	production	+0 -2
Reimage mw1265 with stretch	operations/puppet	production	+0 -2
Fix setup of libapache2-mod-security2 on stretch	operations/puppet	production	+20 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T176370 Migrate to PHP 7 in WMF production
Resolved	MoritzMuehlenhoff	T174431 Upgrade mw* servers to Debian Stretch (using HHVM)
Resolved	MoritzMuehlenhoff	T167225 Upload hhvm to stretch apt repo in apt.wikimedia.org
Resolved	MoritzMuehlenhoff	T177443 Missing .deb dependencies for appserver on Stretch
Resolved	MoritzMuehlenhoff	T177498 Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy
Resolved	None	T189295 ICU 57 migration for wikis using non-default collation
Resolved	Quiddity	T189486 Announcing ICU 57 transition to the community
Resolved	Ladsgroup	T190965 Remove uca-fa from beta cluster
Resolved	Joe	T192071 Upgrade deployment-prep appserver fleet to Debian Stretch (using HHVM)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 29 2017, 12:25 PM

Dzahn mentioned this in T136429: [EPIC] Migrate base image to Debian Jessie.Sep 7 2017, 3:28 PM

Jdforrester-WMF subscribed.Sep 7 2017, 3:31 PM

MoritzMuehlenhoff updated the task description. (Show Details)Sep 19 2017, 7:01 AM

Ricordisamoa subscribed.Sep 19 2017, 10:18 PM

bd808 added a subtask: T167225: Upload hhvm to stretch apt repo in apt.wikimedia.org.Sep 20 2017, 10:14 PM

• tstarling mentioned this in T176370: Migrate to PHP 7 in WMF production.Sep 20 2017, 10:19 PM

Paladox subscribed.Sep 20 2017, 10:34 PM

greg subscribed.Sep 20 2017, 10:50 PM

Legoktm added a parent task: T176370: Migrate to PHP 7 in WMF production.Sep 21 2017, 12:47 AM

I updated the steps based on the plan to use PHP 7 instead of HHVM.

In T174431#3623400, @Legoktm wrote:

I updated the steps based on the plan to use PHP 7 instead of HHVM.

There is no way we'll embark in the double migration at the same time.

Upgrading to stretch while remaining on HHVM is a pretty straightforward task ops can perform "in the background", while upgrading to stretch AND php7 will be a much larger project that will require resources we don't currently have dedicated to it; for reference, we had pitched this as a project for the current annual plan but it didn't make the cut, and if we want to perform this transition, we'll have to drop other things.

So a logical sequence of events I see is:

Migrate ICU version (this *is* painful and user noticeable, and will need coordination)
We upgrade all mw* servers to stretch, keep using HHVM
Once, or if, proper resources and a timeline are set, we swap HHVM for PHP7 on stretch.

That will need a completely separated ticket.

In T174431#3623584, @Joe wrote:

In T174431#3623400, @Legoktm wrote:

I updated the steps based on the plan to use PHP 7 instead of HHVM.

There is no way we'll embark in the double migration at the same time.

Upgrading to stretch while remaining on HHVM is a pretty straightforward task ops can perform "in the background", while upgrading to stretch AND php7 will be a much larger project that will require resources we don't currently have dedicated to it; for reference, we had pitched this as a project for the current annual plan but it didn't make the cut, and if we want to perform this transition, we'll have to drop other things.

So a logical sequence of events I see is:

Migrate ICU version (this *is* painful and user noticeable, and will need coordination)

We upgrade all mw* servers to stretch, keep using HHVM

Once, or if, proper resources and a timeline are set, we swap HHVM for PHP7 on stretch.

That will need a completely separated ticket.

Understood, and undid my changes.

Legoktm mentioned this in T167225: Upload hhvm to stretch apt repo in apt.wikimedia.org.Sep 21 2017, 6:16 AM

I'll take care of builds for stretch-wikimedia

Reedy mentioned this in T176387: Migrate from hhvm to php7 in vagrant.Sep 21 2017, 11:46 PM

elukey added a project: User-Elukey.Sep 22 2017, 3:56 PM

elukey subscribed.

Andrew created subtask T177443: Missing .deb dependencies for appserver on Stretch.Oct 4 2017, 10:20 PM

MoritzMuehlenhoff mentioned this in T177443: Missing .deb dependencies for appserver on Stretch.Oct 5 2017, 6:41 AM

MoritzMuehlenhoff closed subtask T177443: Missing .deb dependencies for appserver on Stretch as Resolved.

MoritzMuehlenhoff created subtask T177498: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy.Oct 5 2017, 2:14 PM

elukey moved this task from Backlog to Keep an eye on it on the User-Elukey board.Oct 6 2017, 1:19 PM

Reedy mentioned this in T178146: Add support for a newer Lua version than Lua 5.1 to luasandbox.Oct 13 2017, 1:40 PM

Change 384713 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Fix setup of libapache2-mod-security2 on stretch

https://gerrit.wikimedia.org/r/384713

gerritbot added a project: Patch-For-Review.Oct 17 2017, 3:13 PM

Change 384713 merged by Muehlenhoff:
[operations/puppet@production] Fix setup of libapache2-mod-security2 on stretch

https://gerrit.wikimedia.org/r/384713

TerraCodes subscribed.Oct 28 2017, 11:50 PM

MoritzMuehlenhoff triaged this task as Medium priority.Nov 8 2017, 3:41 PM

ArielGlenn subscribed.Nov 22 2017, 11:57 AM

bd808 mentioned this in T181353: [EPIC] Migrate MediaWiki-Vagrant base image to Debian Stretch.Nov 26 2017, 3:37 AM

Smalyshev mentioned this in T181503: Add proper category collation for the Northern Sami Wikipedia.Dec 6 2017, 9:37 PM

mw2246 today reported a failure in logrotate:

/etc/cron.daily/logrotate:
Job for apache2.service failed because the control process exited with error code.
See "systemctl status apache2.service" and "journalctl -xe" for details.
error: error running shared postrotate script for '/var/log/apache2/*.log '
run-parts: /etc/cron.daily/logrotate exited with return code 1

The only useful thing that I found in the logs is the following:

root@mw2246:/var/log# grep apache2 syslog.1
Dec 27 06:25:01 mw2246 systemd[17882]: apache2.service: Failed at step NAMESPACE spawning /usr/sbin/apachectl: No such file or directory
Dec 27 06:25:01 mw2246 systemd[1]: apache2.service: Control process exited, code=exited status=226

root@mw2246:/var/log# systemctl status apache2
● apache2.service - The Apache HTTP Server
   Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
   Active: active (running) (Result: exit-code) since Mon 2017-12-18 15:24:04 UTC; 1 weeks 1 days ago
  Process: 17882 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=226/NAMESPACE)
 Main PID: 8330 (apache2)
    Tasks: 55 (limit: 6144)
   CGroup: /system.slice/apache2.service
           ├─3610 /usr/sbin/apache2 -k start
           ├─3611 /usr/sbin/apache2 -k start
           └─8330 /usr/sbin/apache2 -k start

Dec 23 06:25:02 mw2246 systemd[1]: Reloaded The Apache HTTP Server.
Dec 24 06:25:02 mw2246 systemd[1]: Reloading The Apache HTTP Server.
Dec 24 06:25:02 mw2246 systemd[1]: Reloaded The Apache HTTP Server.
Dec 25 06:25:02 mw2246 systemd[1]: Reloading The Apache HTTP Server.
Dec 25 06:25:02 mw2246 systemd[1]: Reloaded The Apache HTTP Server.
Dec 26 06:25:02 mw2246 systemd[1]: Reloading The Apache HTTP Server.
Dec 26 06:25:02 mw2246 systemd[1]: Reloaded The Apache HTTP Server.
Dec 27 06:25:01 mw2246 systemd[1]: Reloading The Apache HTTP Server.
Dec 27 06:25:01 mw2246 systemd[1]: apache2.service: Control process exited, code=exited status=226
Dec 27 06:25:01 mw2246 systemd[1]: Reload failed for The Apache HTTP Server.

Krinkle renamed this task from Migration of mw* servers to stretch to Upgrade mw* servers to Debian Stretch (using HHVM).Jan 10 2018, 10:58 PM

Krinkle updated the task description. (Show Details)

Reedy mentioned this in T184664: Install Noto fonts on scaling servers for SVG rendering.Jan 11 2018, 11:07 AM

MaxSem subscribed.Jan 12 2018, 9:39 PM

MoritzMuehlenhoff closed subtask T167225: Upload hhvm to stretch apt repo in apt.wikimedia.org as Resolved.Jan 31 2018, 2:37 PM

MoritzMuehlenhoff updated the task description. (Show Details)Mar 9 2018, 12:40 PM

MoritzMuehlenhoff mentioned this in T189295: ICU 57 migration for wikis using non-default collation.

MoritzMuehlenhoff closed subtask T177498: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy as Resolved.Mar 9 2018, 1:06 PM

kaldari mentioned this in T36947: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning).Mar 26 2018, 7:12 PM

Change 425269 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Reimage mw1265 with stretch

https://gerrit.wikimedia.org/r/425269

Change 425269 merged by Muehlenhoff:
[operations/puppet@production] Reimage mw1265 with stretch

https://gerrit.wikimedia.org/r/425269

Mentioned in SAL (#wikimedia-operations) [2018-04-11T08:59:43Z] <moritzm> reimaging mw1265 to stretch (T174431)

Change 425772 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Reimage mw1279 (API canary) with stretch

https://gerrit.wikimedia.org/r/425772

Change 425772 merged by Muehlenhoff:
[operations/puppet@production] Reimage mw1279 (API canary) with stretch

https://gerrit.wikimedia.org/r/425772

Mentioned in SAL (#wikimedia-operations) [2018-04-13T09:03:18Z] <moritzm> reimaging mw1276-mw1278 to stretch (T174431)

Mentioned in SAL (#wikimedia-operations) [2018-04-13T10:59:54Z] <moritzm> reimaging mw1261-mw1264 to stretch (T174431)

Joe closed subtask T192071: Upgrade deployment-prep appserver fleet to Debian Stretch (using HHVM) as Resolved.Apr 16 2018, 1:53 PM

Change 427608 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Switch all mw hosts to stretch

https://gerrit.wikimedia.org/r/427608

Change 427608 merged by Muehlenhoff:
[operations/puppet@production] Switch all mw hosts to stretch

https://gerrit.wikimedia.org/r/427608

Change 428923 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Reimage mwdebug servers with stretch

https://gerrit.wikimedia.org/r/428923

While investigating cronspam from recent reimages I took a look at mw1247 (for example) and noticed it has two disks but no software raid (T106381). I think we should also fix that while we're reimaging with Stretch anyways.

Change 428923 merged by Muehlenhoff:
[operations/puppet@production] Reimage mwdebug servers with stretch

https://gerrit.wikimedia.org/r/428923

I checked and all the mw22* are getting RAID due to this:

mw22*) echo partman/mw-raid1.cfg ;; \

But mw216* hosts like mw2163, 2164, 2165 are not getting RAID after reinstall. So you pointed this out in the right moment just when i started getting to those in the list.

Change 428961 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: let mw21[6-9] have software RAID

https://gerrit.wikimedia.org/r/428961

Change 428961 merged by Dzahn:
[operations/puppet@production] install_server: let mw21[6-9][0-9] have software RAID

https://gerrit.wikimedia.org/r/428961

Mentioned in SAL (#wikimedia-operations) [2018-04-26T02:05:17Z] <mutante> mw2163 through mw2166: since the wmf-auto-reimage failed after OS but before puppet run due to "Failed to puppet_generate_certs" i manually logged in with install-console and signed puppet certs (T174431)

All mwdebug servers are now running stretch.

Script wmf-auto-reimage was launched by dzahn on neodymium.eqiad.wmnet for hosts:

['mw2229.codfw.wmnet', 'mw2231.codfw.wmnet', 'mw2240.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201805031807_dzahn_2697.log.

Completed auto-reimage of hosts:

['mw2231.codfw.wmnet', 'mw2229.codfw.wmnet', 'mw2240.codfw.wmnet']

and were ALL successful.

Dzahn updated the task description. (Show Details)May 4 2018, 11:29 PM

All (regular) codfw appservers are now on stretch.

TerraCodes updated the task description. (Show Details)May 6 2018, 6:13 PM

All application servers are now running stretch (excluding job runners and API servers).

MoritzMuehlenhoff updated the task description. (Show Details)May 8 2018, 7:07 AM

All API servers in eqiad are now running stretch.

All job runners in eqiad and codfw are now running stretch.

MoritzMuehlenhoff updated the task description. (Show Details)May 9 2018, 11:33 AM

All appservers are running stretch now. (one of them is broken, creating subtask)

This now just needs to stay open for deployment and maintenance servers.

Dzahn mentioned this in T194426: mw2139 failed to boot - hardware check.May 10 2018, 8:56 PM

Reedy mentioned this in T172165: Require either PHP 7.0+ or HHVM in MW 1.31.May 29 2018, 9:27 PM

Can "Deployment servers" be checked off since the two tasks next to it are resolved?

In T174431#4391005, @TerraCodes wrote:

Can "Deployment servers" be checked off since the two tasks next to it are resolved?

Thanks for the note, I just fixed that.

In T174431#4402506, @MoritzMuehlenhoff wrote:

In T174431#4391005, @TerraCodes wrote:

Can "Deployment servers" be checked off since the two tasks next to it are resolved?

Thanks for the note, I just fixed that.

np~

Script runners are now also migrated to stretch, closing the task.

Upgrade mw* servers to Debian Stretch (using HHVM)Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Upgrade mw* servers to Debian Stretch (using HHVM)
Closed, ResolvedPublic
Actions

Related Objects
Search...