Page MenuHomePhabricator

Ensure Cloud Services platforms will accept new LE issuance chain
Closed, ResolvedPublic

Description

See parent task for more details, but in short we need to make sure SSL libraries are up to date on our platforms.

  • Cloud VPS should be dealt with unattended updates
  • Toolforge containers
  • PAWS user container

Event Timeline

The expected version numbers are
openssl1.0: 1.0.2u-1~deb9u5
gnutls28: 3.5.8-5+deb9u6

We will probably need to do something like T194665: Provide an up-to-date mono environment on toolforge to get a mono that works. h/t @Legoktm for the past link.

The Jessie based containers in Toolforge are likely all broken by the change and not worth the effort to fix. I was able to "fix" T292243: POI/marker disappeared on Wikivoyage maps generated on Toolforge by moving it from the php5.6 container to a php7.4 container.

bd808 triaged this task as High priority.Sep 30 2021, 8:30 PM

Jessie container use in Toolforge as of 2021-09-30T20:40Z (from https://k8s-status.toolforge.org/images/):

imageactive pods
node6-sssd-base1
node6-sssd-web13
php5-sssd-web216
python2-sssd-base4
python2-sssd-web24
python34-sssd-web88
ruby21-sssd-web2
TOTAL348

We will probably need to do something like T194665: Provide an up-to-date mono environment on toolforge to get a mono that works.

@aborrero Do you have time to do the needful for this? We need a mono on the bastions and grid that can handle LE trust changes. I'm not sure what version that will be, but I'd hope it is mostly about what TLS library is it linked with.

Alternately we could try using cert-sync to just update the trust store (https://www.mono-project.com/docs/faq/security/).

Re: Mono - In case there ends up being no great option within Mono itself: you could also configure a generic outbound HTTPS proxy on the same host, using a proxy with a working HTTPS implementation.

I've gotten reports of users (two students in Wiki Education courses) getting an invalid cert error from visiting en.wikipedia.org. Is that perhaps because of some gadget traffic going to Toolforge while the user is on en.wikipedia.org, or are there also possible related cert problems on Wikipedia itself?

I've gotten reports of users (two students in Wiki Education courses) getting an invalid cert error from visiting en.wikipedia.org. Is that perhaps because of some gadget traffic going to Toolforge while the user is on en.wikipedia.org, or are there also possible related cert problems on Wikipedia itself?

Yes, the production wikis also use Let's Encrypt certificates at some of our edge servers. That is actually the cause of most of the Toolforge/Cloud VPS issues (connecting to the content wikis). See https://meta.wikimedia.org/wiki/HTTPS/2021_Let%27s_Encrypt_root_expiry for information about expected problems (which funny enough you will have a hard time seeing if you are effected).

Thanks, that's helpful. So this is probably breaking Wikipedia for like ~2% of Mac OS users, ~0.3% of overall desktop users? Yikes! (Plus about that many who are still on even older versions, which I guess were already incompatible with Wikipedia.)

Change 725486 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/docker-images/toollabs-images@master] openssl: update stretch container TLS libraries before using LE certs

https://gerrit.wikimedia.org/r/725486

Change 725486 merged by jenkins-bot:

[operations/docker-images/toollabs-images@master] openssl: update stretch container TLS libraries before using LE certs

https://gerrit.wikimedia.org/r/725486

Mentioned in SAL (#wikimedia-cloud) [2021-10-03T21:29:13Z] <bstorm> rebuilt stretch containers for potential issues with LE cert updates T291387

Mentioned in SAL (#wikimedia-cloud) [2021-10-03T21:30:58Z] <bstorm> rebuilding buster containers since they are also affected T291387 T292355

"may be affected" I should have said on buster.

"may be affected" I should have said on buster.

Yeah, it is affected, if the Buster image was created before the release of the Buster 10.10 point release, see https://phabricator.wikimedia.org/T283165#7365637 (i.e. before mid June)

Marking toolforge containers done since there is no hope for the Jessie containers.

Hi there, just wanted to share that I worked around this issue in the py2 web situation by switching to PyOpenSSL, which brings along a newer version of OpenSSL. The changes were pretty minimal and can be seen here: https://github.com/hatnote/montage/commit/1be5d09ff5b80e2a57eb71802096fc1fcb98e60f

More papertrail available here.

A technical detail which may be of some help: The Python on the Jessie image we were using was linking against OpenSSL 1.0.0, even though 1.0.2 was available, but openssl-dev appears to have been removed from the Wikimedia apt repo, so it was nontrivial to rebuild against the newer SSL.

Also surfacing a note on the workaround: this breaks some requests timeout behavior, so if you're relying on requests' timeout parameter, you may see some system errors (EAGAIN) instead of your expected behavior. Hope this helps!

Hi there, just wanted to share that I worked around this issue in the py2 web situation by switching to PyOpenSSL, which brings along a newer version of OpenSSL. The changes were pretty minimal and can be seen here: https://github.com/hatnote/montage/commit/1be5d09ff5b80e2a57eb71802096fc1fcb98e60f

That is a neat hack. Please do be aware that Python 2.7 was the last ever release of the 2.x series and it reached end of life upstream on 2020-01-01 (~22 months ago). Chances are really, really good that things like this will keep breaking for your Python 2.x projects. See https://docs.python.org/3/howto/pyporting.html for tips on porting py2 code to py3.

You're tellin me! 😭 I've got a branch or five dotted about, we'll get to 3 soon.

Hi there, just wanted to share that I worked around this issue in the py2 web situation by switching to PyOpenSSL, which brings along a newer version of OpenSSL. The changes were pretty minimal and can be seen here: https://github.com/hatnote/montage/commit/1be5d09ff5b80e2a57eb71802096fc1fcb98e60f

More papertrail available here.

A technical detail which may be of some help: The Python on the Jessie image we were using was linking against OpenSSL 1.0.0, even though 1.0.2 was available, but openssl-dev appears to have been removed from the Wikimedia apt repo, so it was nontrivial to rebuild against the newer SSL.

Hi Mahmoud,
we do have a -dev package for OpenSSL 1.1 on jessie, but it's called libssl11-dev, not libssl-dev. For background: Jessie originally only provided OpenSSL 1.0.2, but back then we needed OpenSSL 1.1 to provide modern crypto for our TLS terminators. And since the APIs changed between OpenSSL 1.0 and 1.1 we had to provide a separate -dev package, so that we could opt-in select applications to OpenSSL 1.1.