Page MenuHomePhabricator

Upgrade MediaWiki appservers to Debian Buster (debian 10)
Open, MediumPublic

Description

Our 4 MediaWiki clusters, application, api, jobrunners/videoscalers, parsoid need to be migrated to Debian Buster.

Provisional plan for the migration:

  • Upgrade all current stretch servers to ICU 63 T264991
  • Rebuild all our php-7.2 packages for Debian Buster (buster-wikimedia)
    • php7.2-cli
    • php7.2-common
    • php7.2-curl
    • php7.2-dba
    • php7.2-fpm
    • php7.2-gd
    • php7.2-gmp
    • php7.2-mysql
    • php7.2-opcache
    • php7.2-phpdbg
    • php7.2-readline
    • php7.2-xml
  • Build missing packages for Buster
    • ploticus
    • prometheus-nutcracker-exporter
    • prometheus-php-fpm-exporter
  • Fix puppet code to support Buster
    • ttf-alee replaced with fonts-alee
    • ttf-wqy-zenhei replaced with fonts-wqy-zenhei
    • code to add PHP72 component on buster
  • Reimage mwdebug1001 to buster OR introduce mwdebug1003, so not to mess with development testing
    • first iteration done with testvm1001, decom'ed again
    • mwdebug1003 to be introduced early December
    • add PHP72 APT component on mwdebug1003
  • Reimage parse2001 to buster (parsoid)
  • Reimage mw2243 to buster (jobrunner)

Q3

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
StalledNone
StalledNone
OpenNone
OpenNone
OpenNone
ResolvedJdforrester-WMF
OpenMoritzMuehlenhoff
Resolvedjijiki
OpenNone
OpenNone
ResolvedAklapper
ResolvedTrizek-WMF
ResolvedDzahn
ResolvedGilles
OpenNone
OpenDzahn
OpenNone
OpenDzahn
ResolvedPapaul
ResolvedCmjohnson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jijiki moved this task from Incoming 🐫 to Unsorted on the serviceops board.Aug 17 2020, 11:46 PM
jijiki renamed this task from upgrade MediaWiki appservers to Debian 10 (buster) to Upgrade MediaWiki appservers to Debian Buster (debian 10).Oct 28 2020, 3:39 PM
jijiki changed the task status from Stalled to Open.
jijiki triaged this task as Medium priority.
jijiki updated the task description. (Show Details)

Change 638159 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] decom testvm1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/638159

Change 638159 merged by Dzahn:
[operations/puppet@production] decom testvm1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/638159

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: testvm1001.eqiad.wmnet

  • testvm1001.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

Dzahn updated the task description. (Show Details)Tue, Nov 3, 1:21 AM
Dzahn updated the task description. (Show Details)Tue, Nov 3, 1:23 AM

A new ticket for this has been created at T264991.

Ignore my previous comment. This is different and should not be merged.

Change 638218 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: introduce mwdebug1003 as debug server on buster

https://gerrit.wikimedia.org/r/638218

jijiki added subscribers: jbond, jijiki.EditedMon, Nov 9, 8:51 PM

There are a couple of things that we will need to keep in mind:

  • mcrouter version we are going to use. @jbond has packaged 0.41 for buster T251574 vs 0.37 we are running now, we should consider packaging 0.37 for buster and after we upgrade, we continue with the 0.41 upgrade. Or the other way around:)
  • If we have completed T244340, those hosts will have a memcached instance installed as well

Change 638218 merged by Dzahn:
[operations/puppet@production] site: introduce mwdebug1003 as debug server on buster

https://gerrit.wikimedia.org/r/638218

jijiki updated the task description. (Show Details)Wed, Nov 18, 7:48 PM

Change 642073 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Also apply the PHP 7.2 component for Buster

https://gerrit.wikimedia.org/r/642073

Change 642567 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mediawiki::php: allow opting-in to use the PHP72 component on buster

https://gerrit.wikimedia.org/r/642567

Dzahn updated the task description. (Show Details)Fri, Nov 20, 11:45 PM
Dzahn updated the task description. (Show Details)Fri, Nov 20, 11:47 PM
Dzahn updated the task description. (Show Details)

Change 642567 merged by Dzahn:
[operations/puppet@production] mediawiki::php: allow opting-in to use the PHP72 component on buster

https://gerrit.wikimedia.org/r/642567

After the change above, you can now opt-in servers in Hiera to use the PHP72 APT component. I did so with mwdebug1003 and now have for the first time the PHP 7.2 packages installed from the puppet role on a buster machine.

I will check the boxes for these packages above.

[mwdebug1003:~] $ dpkg -l | grep php
ii  php-common                           2:69+0~20190215163918.14+stretch~1.gbpfa617b+wmf1 all          Common files for PHP packages
ii  php7.2-bcmath                        7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        Bcmath module for PHP
ii  php7.2-bz2                           7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        bzip2 module for PHP
ii  php7.2-common                        7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        documentation, examples and common module for PHP
ii  php7.2-dba                           7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        DBA module for PHP
ii  php7.2-gd                            7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        GD module for PHP
ii  php7.2-gmp                           7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        GMP module for PHP
ii  php7.2-mbstring                      7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        MBSTRING module for PHP
ii  php7.2-mysql                         7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        MySQL module for PHP
ii  php7.2-opcache                       7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        Zend OpCache module for PHP
ii  php7.2-xml                           7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1   amd64        DOM, SimpleXML, WDDX, XML, and XSL module for PHP
Dzahn updated the task description. (Show Details)Sat, Nov 21, 12:22 AM

While the packages above are installed which is a nice step, we now still have these issues to solve:

php-geoip : Depends: phpapi-20170718 or
                     phpapi-20151012 but it is not installable
php-msgpack : Depends: phpapi-20170718 or
                       phpapi-20151012 but it is not installable
php-redis : Depends: php-igbinary but it is not going to be installed
            Depends: phpapi-20170718
php-luasandbox : Depends: phpapi-20170718
php-wikidiff2 : Depends: phpapi-20170718
php-memcached : Depends: php-igbinary but it is not going to be installed
                Depends: php-msgpack but it is not going to be installed
                Depends: phpapi-20170718
php-igbinary : Depends: phpapi-20170718
php-mongodb : Depends: phpapi-20170718 or
                       phpapi-20151012 but it is not installable
php-tideways-xhprof : Depends: phpapi-20170718 or
                               phpapi-20151012 but it is not installable
php-wmerrors : Depends: phpapi-20170718

From doing a simulation of manual install (apt-get -s install) (as opposed to puppet) for the remaining checkboxes for packages:

php7.2-cli : Depends: libsodium18 (>= 1.0.10) but it is not installable
php7.2-curl : Depends: libcurl3 (>= 7.44.0) but it is not installable
php7.2-fpm : Depends: php7.2-cli but it is not going to be installed
             Depends: libsodium18 (>= 1.0.10) but it is not installable
php7.2-phpdbg : Depends: php7.2-cli but it is not going to be installed
                Depends: libsodium18 (>= 1.0.10) but it is not installable

php7.2-readline would be installed but just did not get pulled by puppet

Change 643028 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] mediawiki: Use python-pil on buster

https://gerrit.wikimedia.org/r/643028

Mentioned in SAL (#wikimedia-operations) [2020-11-23T13:17:31Z] <moritzm> imported ploticus 2.42-4.2~wmf1 to buster-wikimedia T245757

Change 643034 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/debs/prometheus-php-fpm-exporter@master] Rebuild for Buster: - Use golang-github-prometheus-client-golang-dev and add adapt-to-prometheus090.patch - debhelper 12 - Disable dh_dwz

https://gerrit.wikimedia.org/r/643034

Mentioned in SAL (#wikimedia-operations) [2020-11-23T14:01:22Z] <moritzm> imported prometheus-php-fpm-exporter 0.4.1+git20181018.d0d1837-2 to buster-wikimedia T245757

While the packages above are installed which is a nice step, we now still have these issues to solve:

This is caused by your https://gerrit.wikimedia.org/r/c/operations/puppet/+/642567 patch, which adds the component for stretch, instead of buster.

I did the following on 1003:

  • I edited the apt config to point to the correct buster component,
  • upgraded first the base PHP packages
  • then re-ran Puppet (which again tried to force the stretch packages)
  • fixed the apt config again and upgraded the addons and FPM manually so that Puppet doesn't try to install the Stretch versions

With that everything works except three missing pieces:

Change 643039 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Avoid transitional package ttf-wqy-zenhei in favour of fonts-wqy-zenhei

https://gerrit.wikimedia.org/r/643039

Dzahn added a comment.Mon, Nov 23, 9:00 PM

While the packages above are installed which is a nice step, we now still have these issues to solve:

This is caused by your https://gerrit.wikimedia.org/r/c/operations/puppet/+/642567 patch, which adds the component for stretch, instead of buster.

Aaah, duh, yea, I see the mistake. But doesn't your change do the same thing at https://gerrit.wikimedia.org/r/c/operations/puppet/+/642073 then?

Change 643093 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mediawiki::php: fix hardcoded stretch dist name for PHP 72 packages

https://gerrit.wikimedia.org/r/643093

Dzahn added a comment.Mon, Nov 23, 9:18 PM

Using dist => "${::lsbdistcodename}-wikimedia", instead of even trying to set the right distro name to fix that forever?

https://gerrit.wikimedia.org/r/c/operations/puppet/+/643093/1/modules/profile/manifests/mediawiki/php.pp

Change 643093 merged by Dzahn:
[operations/puppet@production] mediawiki::php: fix hardcoded stretch dist name for PHP 72 packages

https://gerrit.wikimedia.org/r/643093

Dzahn added a comment.Mon, Nov 23, 9:38 PM
  • I edited the apt config to point to the correct buster component,
  • then re-ran Puppet (which again tried to force the stretch packages)
  • fixed the apt config again

Fixed this. Puppet now sets buster here:

Notice: /Stage[main]/Profile::Mediawiki::Php/Apt::Repository[wikimedia-php72]/File[/etc/apt/sources.list.d/wikimedia-php72.list]/content: 
--- /etc/apt/sources.list.d/wikimedia-php72.list	2020-11-23 12:24:58.439031179 +0000
+++ /tmp/puppet-file20201123-22451-1mjca4x	2020-11-23 21:34:36.323486491 +0000
@@ -1,2 +1,2 @@
-deb http://apt.wikimedia.org/wikimedia stretch-wikimedia component/php72
-deb-src http://apt.wikimedia.org/wikimedia stretch-wikimedia component/php72
+deb http://apt.wikimedia.org/wikimedia buster-wikimedia component/php72
+deb-src http://apt.wikimedia.org/wikimedia buster-wikimedia component/php72

Using dist => "${::lsbdistcodename}-wikimedia", instead of even trying to set the right distro name to fix that forever?

https://gerrit.wikimedia.org/r/c/operations/puppet/+/643093/1/modules/profile/manifests/mediawiki/php.pp

If that is the case, then @Muehlenhoff's https://gerrit.wikimedia.org/r/642073 patch should be updated, rather than create a new patch. In any case, I think as well that there is no need for a $enable_php72_component flag.

Change 643028 merged by Dzahn:
[operations/puppet@production] mediawiki: Use python-pil on buster

https://gerrit.wikimedia.org/r/643028

Change 643039 merged by Dzahn:
[operations/puppet@production] Avoid transitional package ttf-wqy-zenhei in favour of fonts-wqy-zenhei

https://gerrit.wikimedia.org/r/643039

Dzahn added a comment.EditedMon, Nov 23, 10:48 PM

@jijiki I see the "mcrouter 0.37" checkbox above. On mwdebug1003 we now have "mcrouter 0.41.0-1" installed. Does this mean the thing is resolved and we are using 0.41 or was there a reason to specifically list 0.37 as open?

edit: Nevermind, i saw T245757#6614669 after writing this and that explains it.

upgraded the addons and FPM manually so that Puppet doesn't try to install the Stretch versions

After the fixes above were merged I removed php* packages manually and then ran puppet to have puppet reinstall them.

It all worked now. PHP packages installed. There are no more errors or warnings left on a puppet run and it also doesn't repeat itself due to that font package.

With that everything works except three missing pieces:

merged. python-pil is now installed on mwdebug1003. and noop in prod.

  • ploticus which I have backported from bullseye and uploaded to apt.wikimedia.org

Thanks! Is also installed on mwdebug1003 now.

prometheus-php-fpm-exporter is installed as well

Dzahn updated the task description. (Show Details)Mon, Nov 23, 10:59 PM
Dzahn updated the task description. (Show Details)Tue, Nov 24, 12:07 AM

If that is the case, then @Muehlenhoff's https://gerrit.wikimedia.org/r/642073 patch should be updated

No worries, that's exactly what I did and the new version is waiting for your review.

rather than create a new patch

.. after confirming my suggestion works and fixing my previous patch. I still preferred doing that separately though and this way mwdebug1003 is already done now.

In any case, I think as well that there is no need for a $enable_php72_component flag.

No worries, it wasn't going to stay permanently and removing it is already part of the rebased change. These Boolean flags seemed pretty standard to me when we activate things on a single host before switching everything.

Change 642073 merged by Dzahn:
[operations/puppet@production] Also apply the PHP 7.2 component for Buster

https://gerrit.wikimedia.org/r/642073

Change 643034 merged by Muehlenhoff:
[operations/debs/prometheus-php-fpm-exporter@master] Rebuild for Buster: - Use golang-github-prometheus-client-golang-dev and add adapt-to-prometheus090.patch - debhelper 12 - Disable dh_dwz

https://gerrit.wikimedia.org/r/643034

With all Puppet patches and debs landed, mwdebug1003 should be reimaged again, there were plenty of manual intermediate steps until we made it work, and that way we can conclusively confirm that it works.

Change 643211 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] profile::mediawiki::videoscaler: Add Support for Buster

https://gerrit.wikimedia.org/r/643211

With all Puppet patches and debs landed, mwdebug1003 should be reimaged again, there were plenty of manual intermediate steps until we made it work, and that way we can conclusively confirm that it works.

Actually, before reimaging let's merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/445604, the patch has no impact on existing servers as the timidity and freepats debs will remain installed on existing app servers, but mwdebug1003 will come up with a clean state.

jijiki updated the task description. (Show Details)Tue, Nov 24, 3:18 PM
jijiki added a project: User-jijiki.
jijiki added a subscriber: brion.Tue, Nov 24, 3:35 PM

There are a couple of things that we will need to keep in mind:

  • mcrouter version we are going to use. @jbond has packaged 0.41 for buster T251574 vs 0.37 we are running now, we should consider packaging 0.37 for buster and after we upgrade, we continue with the 0.41 upgrade. Or the other way around:)
  • If we have completed T244340, those hosts will have a memcached instance installed as well

According to T251574#6148741, we can't port mcrouter 0.37 to buster, so we will have to keep in mind that we will be upgrading mcrouter too. We will monitor mcrouter on mwdebug1003 and see how it behaves during testing.

Change 643211 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] profile::mediawiki::videoscaler: Add Support for Buster

https://gerrit.wikimedia.org/r/643211

@Dzahn @hnowlan After discussing with @Muehlenhoff, since we will be using the standard packages for multi-threaded VP9 encoding on Buster, it makes sense to reimage one jobrunner/videoscaler on codfw and make sure everything is in order. I added this to the description. In theory, it shouldn't be much of an issue since we are almost done with mwdebug1003.

@brion can you help us testing this when we have a jobrunner/videoscaler upgraded?

jijiki updated the task description. (Show Details)Tue, Nov 24, 3:38 PM
jijiki updated the task description. (Show Details)

Change 643211 merged by Dzahn:
[operations/puppet@production] profile::mediawiki::videoscaler: Add Support for Buster

https://gerrit.wikimedia.org/r/643211

Mentioned in SAL (#wikimedia-operations) [2020-11-25T16:10:43Z] <mutante> shutting down mwdebug1003 - reimaging for T245757

With all Puppet patches and debs landed, mwdebug1003 should be reimaged again, there were plenty of manual intermediate steps until we made it work, and that way we can conclusively confirm that it works.

Done! I reimaged mwdebug1003 and it went just fine. Besides having to run puppet 2 times on a new host due to some dependencies there were no errors or warnings.

Packages are installed and things look fine to me.

Confirmed this was done.

the patch has no impact on existing servers as the timidity and freepats debs will remain installed on existing app servers, but mwdebug1003 will come up with a clean state.

Confimed mwdebug1003 now without timidity as opposed to say mwdebug1001.

Icinga all green with one exception: opcache-health, attempted to hit "reschedule next service check".

freshly downtimed host and all services for 20 days

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=mwdebug1003

Mentioned in SAL (#wikimedia-operations) [2020-11-25T22:55:25Z] <mutante> mwdebug1003 - scap pull - which rsyncs from deploy1001 and runs php-fpm restart check script (T245757)

Done! I reimaged mwdebug1003 and it went just fine. Besides having to run puppet 2 times on a new host due to some dependencies there were no errors or warnings.

Also looks fine to me. And most of our non-trivial roles require more than one initial Puppet run anyway.