Page MenuHomePhabricator

Replace deployment-maps-master01 with a Bullseye or Bookworm instance
Closed, ResolvedPublic

Description

Buster support ends in a few months. Maps prod hosts are still running on Buster, so let's take this chance to do the upgrade in deployment-prep first.

Event Timeline

Andrew added a subscriber: hnowlan.

@hnowlan, a glance at the puppet repo suggests that you're the person most likely to rebuild these servers. Please re-assign this task as you see fit!

@hnowlan I see you have created deployment-maps-master02. Other than possibly replacing the old master in https://github.com/wikimedia/maps-kartotherian-deploy/blob/master/scap/environments/beta/targets, is there anything needed before deleting master01?

It appears there have been issues imaging a new host which is hampering reimaging it. I'm afraid I can't really commit much time to this at the moment as even getting a shell on the new host seems to be a problem

Puppet is failing on the new host because of

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: No matching entry for selector parameter with value 'bookworm' (file: /srv/puppet_code/environments/production/modules/postgresql/manifests/postgis.pp, line: 17, column: 35) on node deployment-maps-master02.deployment-prep.eqiad1.wikimedia.cloud

Errors like that should be visible on the 'log' tab in horizon.

bd808 subscribed.

Not sure why @Andrew closed this as resolved.

bd808@deployment-maps-master02:~$ sudo -i puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: No matching entry for selector parameter with value 'bookworm' (file: /srv/puppet_code/environments/production/modules/postgresql/manifests/postgis.pp, line: 17, column: 35) on node deployment-maps-master02.deployment-prep.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

The ::postgresql::postgis puppet module does not seem to support anything other than buster because of this declaration:

class postgresql::postgis(                                                          
    $ensure = 'present',
    $postgresql_postgis_package = $::lsbdistcodename ? {
        'buster' => 'postgresql-11-postgis-3',
    },
) {

Mentioned in SAL (#wikimedia-releng) [2025-01-06T19:27:07Z] <bd808> Added postgresql::postgis::postgresql_postgis_package: ignored to deployment-maps Prefix Puppet to work around default parameter problem (T361381)

Mentioned in SAL (#wikimedia-releng) [2025-01-06T19:31:47Z] <bd808> Issued new Puppet cert for deployment-maps-master02.deployment-prep.eqiad1.wikimedia.cloud (T361381)

Mentioned in SAL (#wikimedia-releng) [2025-01-06T19:32:14Z] <bd808> Added postgresql::postgis::postgresql_postgis_package: postgresql-15-postgis-3 to deployment-maps Prefix Puppet to work around default parameter problem (T361381)

Mentioned in SAL (#wikimedia-releng) [2025-01-06T19:35:21Z] <bd808> Manually generated missing en_US.UTF-8 locale on deployment-maps-master02.deployment-prep.eqiad1.wikimedia.cloud (T361381)

$ sudo -i puppet agent -tv
...
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libmapnik3.0' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libmapnik3.0
E: Couldn't find any package by glob 'libmapnik3.0'
E: Couldn't find any package by regex 'libmapnik3.0'
Error: /Stage[main]/Kartotherian/Package[libmapnik3.0]/ensure: change from 'purged' to 'present' failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libmapnik3.0' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libmapnik3.0
E: Couldn't find any package by glob 'libmapnik3.0'
E: Couldn't find any package by regex 'libmapnik3.0'
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install osmborder' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package osmborder
Error: /Stage[main]/Osm/Package[osmborder]/ensure: change from 'purged' to 'present' failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install osmborder' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package osmborder
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install swift' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
swift is already the newest version (2.33.0-7~bpo12+1).
The following package was automatically installed and is no longer required:
  python3-debconf
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 102 not upgraded.
2 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up python3-swift (2.33.0-7~bpo12+1) ...
dpkg: error processing package python3-swift (--configure):
 installed python3-swift package post-installation script subprocess returned error exit status 6
dpkg: dependency problems prevent configuration of swift:
 swift depends on python3-swift (= 2.33.0-7~bpo12+1); however:
  Package python3-swift is not configured yet.

dpkg: error processing package swift (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 python3-swift
 swift
E: Sub-process /usr/bin/dpkg returned an error code (1)
Error: /Stage[main]/Profile::Maps::Osm_master/Package[swift]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install swift' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
swift is already the newest version (2.33.0-7~bpo12+1).
The following package was automatically installed and is no longer required:
  python3-debconf
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 102 not upgraded.
2 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up python3-swift (2.33.0-7~bpo12+1) ...
dpkg: error processing package python3-swift (--configure):
 installed python3-swift package post-installation script subprocess returned error exit status 6
dpkg: dependency problems prevent configuration of swift:
 swift depends on python3-swift (= 2.33.0-7~bpo12+1); however:
  Package python3-swift is not configured yet.

dpkg: error processing package swift (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 python3-swift
 swift
E: Sub-process /usr/bin/dpkg returned an error code (1) (corrective)
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install python3-maps-deduped-tilelist' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package python3-maps-deduped-tilelist
Error: /Stage[main]/Profile::Maps::Osm_master/Package[python3-maps-deduped-tilelist]/ensure: change from 'purged' to 'present' failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install python3-maps-deduped-tilelist' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package python3-maps-deduped-tilelist
bd808 moved this task from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.

This is still happening, and I'm currently struggling to figure out how to get verbose errors to explain the error exit status 6 result:

dpkg: error processing package python3-swift (--configure):
 installed python3-swift package post-installation script subprocess returned error exit status 6

The issue is this bit in the post install script:

if ! getent passwd ${VAR_UG_PKG_NAME} > /dev/null 2>&1 ; then
        adduser --system \
                --home /var/lib/${VAR_UG_PKG_NAME} \
                --no-create-home \
                --quiet \
                --disabled-password \
                --shell ${VAR_UG_SHELL} \
                --group ${VAR_UG_PKG_NAME} ${ADDUSER_PARAM}
else
        usermod \
                --shell ${VAR_UG_SHELL} \
                ${VAR_UG_PKG_NAME} >/dev/null 2>&1
fi

It tries to usermod the 'swift' service user but since our swift user is managed by ldap that command fails.

At first blush, I think that usermod is just wrong -- since when does installing a debian package modify an already-existing user on the system? Is that... normal?

I'm currently struggling to figure out how to get verbose errors to explain the error exit status 6 result

By the way, I also couldn't get any logging at all from this package. I only found the issue by extracting the postinstall script from the package and adding set -x at the top, and even then it didn't report any actual error messages.

At first blush, I think that usermod is just wrong -- since when does installing a debian package modify an already-existing user on the system? Is that... normal?

Seems pretty abnormal to me, but I'm not a DM/DD.

I guess the next question is how to get past this bit of WTF so that deployment-maps-master02 can actually complete a puppet run. We have the user in LDAP: https://ldap.toolforge.org/user/swift

bd808@deployment-maps-master02:~$ getent passwd swift
swift:*:45282:500:Swift:/home/swift:/bin/bash

I decided to really hack around it for the moment by manually adjusting the postinst script and then configuring the package:

bd808@deployment-maps-master02:~/T361381-python3-swift$ apt-get download python3-swift
Get:1 http://mirrors.wikimedia.org/osbpo bookworm-dalmatian-backports/main amd64 python3-swift all 2.34.0-5~bpo12+1 [733 kB]
Fetched 733 kB in 0s (12.2 MB/s)
bd808@deployment-maps-master02:~/T361381-python3-swift$ ls
python3-swift_2.34.0-5~bpo12+1_all.deb
bd808@deployment-maps-master02:~/T361381-python3-swift$ sudo dpkg --unpack python3-swift_2.34.0-5~bpo12+1_all.deb
(Reading database ... 55967 files and directories currently installed.)
Preparing to unpack python3-swift_2.34.0-5~bpo12+1_all.deb ...
Unpacking python3-swift (2.34.0-5~bpo12+1) over (2.34.0-5~bpo12+1) ...
bd808@deployment-maps-master02:~/T361381-python3-swift$ cp /var/lib/dpkg/info/python3-swift.postinst .
bd808@deployment-maps-master02:~/T361381-python3-swift$ sudo vim /var/lib/dpkg/info/python3-swift.postinst
  # commented out the failing bit of the script. see diff below
bd808@deployment-maps-master02:~/T361381-python3-swift$ diff -uw python3-swift.postinst /var/lib/dpkg/info/python3-swift.postinst
--- python3-swift.postinst      2025-05-12 20:47:32.049053356 +0000
+++ /var/lib/dpkg/info/python3-swift.postinst   2025-05-12 20:49:42.229574337 +0000
@@ -892,9 +892,11 @@
                        --shell ${VAR_UG_SHELL} \
                        --group ${VAR_UG_PKG_NAME} ${ADDUSER_PARAM}
        else
-               usermod \
-                       --shell ${VAR_UG_SHELL} \
-                       ${VAR_UG_PKG_NAME} >/dev/null 2>&1
+               # bd808: hack for https://phabricator.wikimedia.org/T361381
+               # usermod \
+               #       --shell ${VAR_UG_SHELL} \
+               #       ${VAR_UG_PKG_NAME} >/dev/null 2>&1
+               echo "Not messing with the ${VAR_UG_PKG_NAME} user"
        fi
 }
bd808@deployment-maps-master02:~/T361381-python3-swift$ sudo dpkg --configure python3-swift
Setting up python3-swift (2.34.0-5~bpo12+1) ...
Not messing with the swift user
bd808@deployment-maps-master02:~/T361381-python3-swift$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-maps-master02.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(66c4f85d44) gitpuppet - [LOCAL HACK] Hack mw-cli-wrapper to work without conftool'
Notice: /Stage[main]/Profile::Maps::Osm_master/Package[swift]/ensure: created (corrective)
Notice: /Stage[main]/Osm::Imposm3/Systemd::Service[imposm]/Service[imposm]/ensure: ensure changed 'stopped' to 'running' (corrective)
Info: /Stage[main]/Osm::Imposm3/Systemd::Service[imposm]/Service[imposm]: Unscheduling refresh on Service[imposm]
Notice: Applied catalog in 12.15 seconds

I have a feeling that everything maps is still busted in deployment-prep, but Puppet is at least running cleanly on deployment-maps-master02.deployment-prep.eqiad1.wikimedia.cloud for the first time in months (possibly ever). Things will blow up again when and if a new python3-swift package that wants to mess with the swift use in post-install arrives.