User Details
- User Since
- Jun 7 2021, 7:25 AM (252 w, 5 d)
- Availability
- Available
- IRC Nick
- jelto
- LDAP User
- Jelto
- MediaWiki User
- JWodstrcil (WMF) [ Global Accounts ]
Yesterday
Puppet is happy on the test instance now.
I think this is related to enabling QoS for rsync in production: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1234984
I pointed os-reports to the read-only endpoint in the change above and ran authdns-update. Also the service was back after a few minutes. So I'll resolve the task.
Thu, Apr 9
Bumping the priority a bit because it's quite confusing to see miscweb Kubernetes alerts firing for people hosts (for example T422819).
os-reports.wikimedia.org is down because of maintenance in the aux-k8s-eqiad cluster. I think this is acceptable even though it should be deployed in both aux clusters (eqiad and codfw).
Wed, Apr 8
The patch above fixed the noisy rsyslog configuration on the devtools test instance. The syslog/journal is much more quite now. I also truncated the big syslog files.
The disk on gitlab-1002 test instance was full again. I disabled rsyslog on the host temporarily using:
Tue, Apr 7
Tue, Mar 17
The change above fixed the ssh issue, ssh-gitlab.service listens on the secondary IPv4 and IPv6 properly after the reboot (tested on gitlab2002).
Mon, Mar 16
I'm un-assigning the task from me until we have a decision about how to move forward in April 2026.
This happened due to a reboot in T419960.
Mar 12 2026
Adding collaboration-services for the tcp-proxy hosts.
I can login and logout normally on firefox, I can't reproduce the CSRF error
Mar 10 2026
Mar 4 2026
Thank you @ABran-WMF for merging the DNS change. I can confirm gerrit-replica host is accessible over the CDN now:
Thank you @ABran-WMF for merging the DNS change. I can confirm gerrit-spare host is accessible over the CDN now:
Thank you for the key. You should have access now. I also created a kerberos principal because Jupyter + Hive/Spark was mentioned.
That happened during a CDN incident (https://portal.victorops.com/ui/wikimedia/incident/7705/details). It resolved after 5 minutes. I'll resole the task.
Mar 3 2026
The access should be available in roughly 30 minutes. I'll resolve the task. Please re-open if you have trouble with the access.
Mar 2 2026
@Milimetric can you ping your manager to sign this access request too?
Unfortunately the provided SSH key can not be used, you copied the fingerprint and not the public key. Can you update the task with the actual public key? You should find the public key somewhere in ~/.ssh/id_ed25519.pub or ~/.ssh/id_rsa.pub.
This has been a CDN issue and was not Gerrit related. Alert fired two times and recovered after 5 minutes. I'll resolve the task.
Feb 27 2026
This was because of a reboot in T418483
@MoritzMuehlenhoff all GitLab server hosts are on 18.9.1 and use the new openssl version now:
Thank you for the quick sign off @Khantstop! I reach out out of band to confirm the ssh key
Thanks for the quick feedback. I double-checked the memberships and couldn't find anything in ldap or shell groups
Feb 26 2026
thank you @Aklapper
Key is merged into puppet, you should have access in 30 minutes. I'll resolve the task, please re-open if you have any problems with the access.
@Lucas_Werkmeister_WMDE is still a member of the deployment group. So approval from @thcipriani is not really needed. I'll proceed with this change then.
Thanks for the access request, we have to confirm the key out of band and another approval from @thcipriani is needed for the deployment group.
@MoritzMuehlenhoff or @SLyngshede-WMF this task sounds like an offboarding procedure. Can you check what is needed on our side for that? Do we disable ldap accounts? The account is cn: AndreiJirohOnDevsCentral
Hi, thank you for the access request.
all hosts updated, I'll resolve the task.
Hi, thanks for opening the access request.
replicas done, I'll proceed with production.
Test hosts done, I'll proceed with the replicas soon
I bumped the apt-package for gitlab-ce and gitlab-runner to the newer 18.8 versions. gitlab-runner-helper-images was updated properly as well with the fix in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1244600.
@MoritzMuehlenhoff 18.9.1 finally contains openssl 3.4.4. I'll update the hosts to 18.8.5 first and then to 18.9.1.
Is this documentation in wikitech up to date with the process? Does the failover just contains the three steps?
Likely related to T418392?
Feb 25 2026
@MoritzMuehlenhoff ftr GitLab is still using the old openssl version in the 18.7 patch release:
All hosts updated, I'll resolve the task.
replicas updated, I'll proceed with the production host.