There's no reason for T263437 to be a sub task? It's unrelated work and only needed when we move to a new OS (with a new ICU), but not when we merely migrate to a new PHP release.
Mon, Oct 18
Point of contact for any data which might possibly need to be retained is @WMDE-leszek
Created https://phabricator.wikimedia.org/T293676 for the Data Engineering team to review data in HDFS/stat homes.
Fri, Oct 15
Thanks Timo This quarter we're drafting our plans for requirements in terms of 2FA (and the implied tradeoffs between convenience and security). We can/will also revisit session times as part of that.
Thu, Oct 14
Wed, Oct 13
Tue, Oct 12
Mon, Oct 11
In the latest upload of ca-certificates to Debian unstable, the old X3 cert has now been removed:
Everything that doesn't need features from -extras or -full has been migrated to -light.
Fri, Oct 8
Tue, Oct 5
I've added routinator to apt.wikimedia.org at "thirdparty/routinator" for bullseye-wikimedia and adapted the Puppet code, so that when the these get reinstalled with Bullseye, the thirdparty component is picked.
With T291458 done, I 've already rebuilt bullseye (which was not affected) and buster main images (with libgnutls30 3.6.7-4+deb10u7) so I think the base layers are done.
I 'll delete docker-registry.wikimedia.org/wikimedia/mediawiki-services-graphoid:2019-06-10-060747-production as graphoid is no longer around
I 'll also rebuild
@ayounsi Riccardo suggested maybe using a separate disk/partition for the routinator data? That was partly to just do a quick dirty job and not rebuild, but we've reason to rebuild anyway so let's do that.
Do you think it would still make sense to have a separate disk/partition for the Routinator data?
https://packages.nlnetlabs.nl/ also provides the routinator debs for bullseye (plus it's a static Go binary anyway), so if we're recreating the VMs anyway, let's also switch to Bullseye?
Hi there, just wanted to share that I worked around this issue in the py2 web situation by switching to PyOpenSSL, which brings along a newer version of OpenSSL. The changes were pretty minimal and can be seen here: https://github.com/hatnote/montage/commit/1be5d09ff5b80e2a57eb71802096fc1fcb98e60f
A technical detail which may be of some help: The Python on the Jessie image we were using was linking against OpenSSL 1.0.0, even though 1.0.2 was available, but openssl-dev appears to have been removed from the Wikimedia apt repo, so it was nontrivial to rebuild against the newer SSL.
Mon, Oct 4
Fri, Oct 1
Related is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995432 with a comment by our own @BBlack. From chat seen on IRC I think @BBlack is working on some Puppet code to remove the DST Root CA X3 cert from the system trust store for WMF servers.
Thu, Sep 30
mx1001/mx2001 have been reimaged to Bullseye (reusing the VM/IP for potential IP reputation issues).
Znuny 6.0.36 fixes the following issues: (https://www.znuny.org/en/releases/znuny-6-0-36)
Tue, Sep 28
Bullseye preparations have completed and it's in active use, closing. For future migration tracking, T291916 can be used.
@MoritzMuehlenhoff I don't know is you saw my comment on Sep 10th but i am having issue installing Bullseye. I am getting the error belowFailed to load ldlinux.c32 Boot failed: press a key to retry, or wait for reset...
Mon, Sep 27
Check is now gone.
Looking at /var/log/installer it seems stat1005 was installed in 2019 with Stretch and then later on dist-upgraded to buster (something we rarely do since we prefer reimages, but it happens). Installing usrmerge in this case (and let's check whether other stat* hpsts have the same issue) sounds good to me.
Fri, Sep 24
The two VMs (mx1002/mx2002) which were used to test the Bullseye setup have been taken down.
Thu, Sep 23
Both mx1001 and mx2001 are now running Bullseye. There's a little cleanup/followup work, but the core of the work is completed.
I can easily rebuild/upload a fixed package for apt.wikimedia.org, though. Just let me know.
Tue, Sep 21
The expected version numbers are
The expected version numbers are
Mon, Sep 20
I was able to get a working python3-eventlet package by integrating upstream PR, the easy solution for now IMHO is to upload the package internally for Bullseye.
Script is now deployed on the masters
- OpenSSL in Buster and Bullseye is not affected (only ship OpenSSL 1.1)
- OpenSSL updates for openssl 1.0.2 in Stretch have been rolled out
- GNUTLS in Bullseye is not affected
- GNUTLS in Buster was already fixed in Buster 10.10 (rolled out via T285206)
- GNUTLS updates for Stretch have been rolled out
Sep 20 2021
Sep 17 2021
@Jgiannelos One of the tests fails with Python 3.7 (the Python version in Buster):
Ack, I'll upload to apt.wikimedia.org on Monday.
Sep 16 2021
The approach of the CLI looks good to me. We should now see how to backport the script to debian buster to use on the maps clusters.
@MoritzMuehlenhoff do you have any thoughts regarding the debian packaging backport? How can we proceed with this?
Status update: mx2001 is reimaged to Bullseye and working fine so far. The smart hosts config on our servers has been switched to prefer mx2001 over mx1001 and the MX records of a handful of lesser used domains now point to mx2001.
If there's no further issues, the remaining DNS records will be updated on Monday and following that mx1001 will be reimaged some time mid next week.
scandium has been upgraded. If tests are fine, I'd upload to apt.wikimedia.org
Since the Thanos hosts run Buster and a more recent kernel/glibc/systemd, I disabled the cleanup cron job on these hosts, so that we can check whether this got fixed. If Buster is still affected we can add the cron job back.
Hi @cmooney , actually I just checked again (80 minutes later) and I actually do have the access I need now. Maybe it took a while for everything to fall into place?
Sep 15 2021
Hi @cmooney , thatnks for noticing that. Yes, the 'mraish' account was set up when I was still contracting, and I set up the 'Mikeraish' when I converted and linked to my wmf email. It would be great to remove the original 'mraish' account and add access to 'Mikeraish' as you suggested. I just signed in to the old account looking for a way to delete it, but I wasn't able to find one, however. Should this deletion ideally come from your end or from mine?
Sure thing, I'll upgrade scandium tomorrow morning then.
I've made an updated PHP 7.2 package with a 7.2 backport of https://github.com/php/php-src/commit/781e6b4d214012e9b9c0cf96a239cdf9f948da91
That page mentions that at least firmware version NVM 6.01 (for the NIC) and a current driver version are required. According to ethtool, the X710 in ms-be1051 has firmware 6.8 which should be ok. But it doesn't show the lldp disable option when I run the ethtool "-show-priv-flags" command:
- Decide on a way to have this done at boot-time for affected hosts.
- That also involves working out how to deal with this via automation, a difficulty is identifying hosts using the affected Intel NIC, and the PCI ID of the affected interface on each (which is part of the path the command gets echoed to).
Sep 14 2021
Sep 13 2021
mx2001 is now filtered on the routers, in case there are any issues, this can be reverted by merging https://gerrit.wikimedia.org/r/720783 and running 'homer "cr*" merge' on cumin2002.
Not sure why restbase is ticked off, though? The restbase hosts in production currently run nodejs 6.11 still.
@MoritzMuehlenhoff i created the new sre-admins ldap group manually as i couldn't see a puppet way. pinging incase i missed something.
Sep 10 2021
Bullseye is out and there is not rsyslog-kubernetes in it, maybe we could start working with upstream to have it in unstable first and possibly in backports?
As mentioned on the issue description, debian backported the fix for OpenSSL as it can be seen on a current debian jessie container:root@69310d82543d:~# cat /etc/debian_version 8.11 root@69310d82543d:~# openssl version OpenSSL 1.0.1t 3 May 2016 root@69310d82543d:~# openssl verify -CAfile rsa-2048.chain.crt rsa-2048.crt rsa-2048.crt: OK root@69310d82543d:~# openssl x509 -dates -noout -in rsa-2048.crt notBefore=May 10 13:15:07 2021 GMT notAfter=Aug 8 13:15:07 2021 GMT
If it's not too much trouble, it would be nice if cumin2001 could have a MOTD pointing you to cumin2002. If you accidentally log into cumin2001 you'll end up trying to run cookbooks that haven't been updated since May :/
Sep 9 2021
Adding this functionality goes a little beyond the scope of the logout.d scripts I think. Right now running these scripts is fully idempotent and every logout action really only log outs, while this would actually modify account state.