Debian Buster is well out of upstream support and all Buster VMs need to be replaced.
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| LabsServices: convert more services to svc records | operations/mediawiki-config | master | +2 -2 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Andrew | T327742 Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm | |||
| Resolved | Southparkfan | T370461 Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) |
Event Timeline
Great, can't ssh into my new instance:
$ ssh deployment-sessionstore05.deployment-prep.eqiad1.wikimedia.cloud Connection closed by UNKNOWN port 65535
Mentioned in SAL (#wikimedia-cloud) [2024-07-20T15:52:31Z] <Southparkfan> add deployment-sessionstore05 (bookworm) - T370461
I'm not sure what went wrong here but I forced a puppet run and switched this over to to the deployment-prep puppetserver and it will most likely accept your keys now.
Had to delete sessionstorage05 (bookworm) due to T357791, will replace with a bullseye instance for Cassandra
Puppet fails to install the Cassandra instance:
Error: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0] Error: /Stage[main]/Cassandra/Cassandra::Instance[default]/Exec[install-/var/lib/cassandra/data]/returns: change from 'notrun' to ['0'] failed: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0] (corrective) Error: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0] Error: /Stage[main]/Cassandra/Cassandra::Instance[default]/Exec[install-/var/lib/cassandra/data]/returns: change from 'notrun' to ['0'] failed: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0] (corrective)
The user 'cassandra' does not exist. Asked for help in #wikimedia-sre.
I didn't get a response in -sre, but Andrew has provided me with extra information.
GID/UID on sessionstore1004:
uid=114(cassandra) gid=121(cassandra) groups=121(cassandra)
GID/UID on sessionstore04:
uid=115(cassandra) gid=122(cassandra) groups=122(cassandra)
Added user/group manually:
groupadd -g 122 cassandra useradd cassandra -u 115 -r -s /sbin/nologin -d /var/lib/cassandra -g 122
It failed to install cassandra 3.11.14:
Error: /Stage[main]/Cassandra/Apt::Package_from_component[cassandra]/Package[cassandra]/ensure: change from 'purged' to '3.11.14' failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install cassandra=3.11.14' returned 100: Reading package lists... Building dependency tree... Reading state information... W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Version '3.11.14' for 'cassandra' was not found
Per T313814, we should be using cassandra 4.x. For deployment-prep, this was done in https://gerrit.wikimedia.org/r/c/operations/puppet/+/939750. However, sessionstore was left behind on 3.x. The component cassandra41 is in buster-wikimedia and bullseye-wikimedia, so I'll see if I can get the Buster machine upgraded to cassandra 4.x, before migrating to a Bullseye machine where 4.x is a mandatory choice.
Mentioned in SAL (#wikimedia-cloud) [2024-07-23T15:34:41Z] <Southparkfan> starting kask maintenance - T370461
Couldn't upgrade Buster to 4.x, because there are no packages in buster-wikimedia. Installing Cassandra was a rather interesting process.
The bootstrap failed:
Error: Execution of '/usr/bin/scap deploy-local --repo cassandra/logstash-logback-encoder -D log_json:False' returned 70: Error: /Stage[main]/Cassandra::Logging/Scap::Target[cassandra/logstash-logback-encoder]/Package[cassandra/logstash-logback-encoder]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/bin/scap deploy-local --repo cassandra/logstash-logback-encoder -D log_json:False' returned 70: [...] scap.runcmd.FailedCommand: Command 'git remote set-url origin http://deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud/cassandra/logstash-logback-encoder/.git' failed with exit code 128; stdout: stderr: error: could not lock config file .git/config: Permission denied
Fixed manually by running:
root@deployment-sessionstore06:/srv/deployment# chown -R deploy-service cassandra/ root@deployment-sessionstore06:/srv/deployment# sudo -u deploy-service scap deploy-local --repo cassandra/logstash-logback-encoder -D log_json:False
Cassandra failed to start properly due to the lack of /etc/cassandra/service-enabled, had to touch this file.
Afterwards, Kask started to complain about keyspaces:
Jul 23 16:41:32 deployment-sessionstore06 docker-mediawiki-services-kask[92037]: {"msg":"error: failed to connect to \"[HostInfo hostname=\\\"172.16.2.> [...] =\\\"v4.1.5\\\" state=UP num_tokens=256]\" due to error: Keyspace 'sessions' does not exist","appname":"sessions","time":"2024-07-23T16:41:32Z","level">Thought I'd have it fixed by creating the schema:
CQLSH_HOST=172.16.2.225 cqlsh -f cassandra_schema.cql -u cassandra -p cassandra
However, that uses the keyspace kask, not sessions. After a bit of fiddling, I have adjusted the new node's hiera to reflect the new keyspace.
And finally: the container has started.
I have ran out of time, so will not be migrating to the new node today.
Mentioned in SAL (#wikimedia-cloud) [2024-07-23T16:55:49Z] <Southparkfan> cancel kask maintenance, not going to perform switchover yet, see https://phabricator.wikimedia.org/T370461
Change #1056513 had a related patch set uploaded (by Southparkfan; author: Southparkfan):
[operations/mediawiki-config@master] LabsServices: convert more services to svc records
Change #1056513 merged by Andrew Bogott:
[operations/mediawiki-config@master] LabsServices: convert more services to svc records
Mentioned in SAL (#wikimedia-cloud) [2024-07-24T16:02:45Z] <Southparkfan> moved sessionstorage/kask from sessionstorage04 to sessionstorage06 T370461