User Details
- User Since
- Jan 23 2023, 12:05 PM (67 w, 3 d)
- Availability
- Available
- LDAP User
- EoghanGaffney
- MediaWiki User
- EGaffney-WMF [ Global Accounts ]
Wed, May 1
Both hosts have now been reprovisioned with public IPs. Thanks @Arnoldokoth for taking care of lists1004!
Mon, Apr 29
Fri, Apr 26
Thu, Apr 25
Tue, Apr 23
I've enabled this setting, and approved @brennen's change to include this into the settings file.
Fri, Apr 19
This is related to T362989
Thu, Apr 18
Fri, Apr 12
The switchover of the replicas completed successfully.
Thu, Apr 11
This related to T361219 and was fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1018796
This related to T361219
Tue, Apr 9
We'll need a maintenance window of around 4 hours, and we'll use most of it.
Apr 8 2024
https://docs.gitlab.com/ee/security/two_factor_authentication.html#enforce-2fa-for-administrator-users is the docs for this setting
Mar 30 2024
I reverted this change in this CR since it was running into an issue where it couldn't find /srv/gitlab_backup/*_gitlab_backup.tar. This is because the rsync is run as a systemd unit with an ExecStart which doesn't wrap commands in a shell, so no globbing will work. Alternative solutions are either to wrap it in something like sh -c, or to wrap the whole thing in a shell script which gets executed by the systemd service.
This was related to some backup testing work that I was doing on gitlab1004 to see if the rsync changes in T361219 would work, I think what happened was that the gitlab backup script stopped the ssh-gitlab service, but the backup process was terminated so the auto_restart service gave up.
Mar 28 2024
Mar 25 2024
The cookbook completed successfully, however it registered a failure because the final puppet run was blocked by a cron job. I've rectified this in another patch to the failover cookbook
This was related to the switchover between gitlab-replica and gitlab-replica-old. I've made improvements to the cookbook to leave a rollback in a better position.
This was related to the switchover between gitlab-replica and gitlab-replica-old. I've made improvements to the cookbook to leave a rollback in a better position.
This was related to the switchover between gitlab-replica and gitlab-replica-old. I've made improvements to the cookbook to leave a rollback in a better position.
This was related to the switchover between gitlab-replica and gitlab-replica-old. I've made improvements to the cookbook to leave a rollback in a better position.
Mar 22 2024
Moving this back to the backlog for the moment, it's not something we expect to do right now. Before we consider this, we'll need to get some input from the community about their workflow and how they typically use the service.
Mar 12 2024
Mar 11 2024
Mar 8 2024
We've ticked all the boxes here for the most part. The two outstanding items are monitoring spam rates and one-click unsubscribe.
Mar 6 2024
Mar 4 2024
Feb 23 2024
Hey @Krd, for the attachment storage this will be an invisible change to the vrt community, attachments will continue to be loaded and displayed as normal, and we're not anticipating needing to purge old attachments/tickets yet. For this particular piece of work we don't anticipate making any user changes, but if we do we'll definitely involve the community (This is something that's likely going to come out of T358065, I'd be interested to hear the best way to solicit feedback from the community for changes like these).
Feb 22 2024
@Matthewrbowker That's one of the things we're intending to establish before we make any kind of recommendation to the VRT team in terms of cost/benefit. I believe that this will not affect permalinks to tickets, and tickets will still be searchable, as far as I can see the only change will be that to search for archived tickets the user may need to add an "archived" attribute to the search form.
@brennen Good catch. I didn't spot this in our replica/test instance because I was logged in. Perhaps worth remembering to try a logged out browser as well.
Feb 20 2024
I've configured a GenericAdmin job to archive tickets that were closed more than a year ago, this runs twice an hour. I'm going to leave this running and start looking at results on Thursday. I believe the article_search_index table is the one that the search functions use, before archival this is 8937mb in size.
We've answered the questions that we're able to answer for the moment, so I'm moving this back to the Backlog until we have physical machines in place. Once that happens, we'll work on a plan to migrate.
@jcrespo That's useful context, thanks!
This was due to work to remove the spare disk we had for testing in T355980
Feb 13 2024
We ran the migration yesterday with --tolerant allowing it to continue on failures. Unfortunately, we filled up the disk. Some observations:
Feb 12 2024
This was deployed and running correctly!
This is not the simple transition that I expected it to be. Here's what we know so far:
Feb 6 2024
I'm in the process of adding the disk image to the ganeti instance of vrts1002, using this command: sudo gnt-instance modify --disk add:size=600g vrts1002.eqiad.wmnet
Feb 1 2024
We got a reply from znuny support to say that they'll add the index in a future release, but that there's no reason we shouldn't be able to do this ourselves now.
Deployed this to the production instance today. It doesn't seem to have brought us much in terms of improvement, but we are closer to implementing all of the performance warnings from the support data collector page.
Jan 30 2024
We tried rolling this out to ticket-test.wm.o earlier, and it went pretty quickly. The questions are answered below:
There shouldn't be anything directly sending from phabricator.wm.o, it should route through the mx* hosts
@thcipriani Can you give a quick approval for @Aklapper, please?
Jan 29 2024
@jhathaway and I took a look through, and I've updated the checklist above. The tl;dr is that we're looking ok from the phabricator side. We should probably remove the existing spf records that point to ip6:2620:0:861:102:10:64:16:101 ip6:2620:0:860:103:10:192:32:54, since these are essentially redundant but also don't resolve publicly so might run foul of the requirement to have records forward-and-reverse resolvable.
The patch did get the script to run, but there seems to be an error from Phabricator in addition to this:
Jan 27 2024
Jan 26 2024
Jan 23 2024
While we keep working on getting this fixed, I think the best option is to remove the timer for the job entirely, to avoid spurious alerts.
Jan 8 2024
@akosiaris That's great, thanks so much for digging into those and writing it up!
Jan 6 2024
I restarted clamav-daemon and apache2 on the host, both had been stopped. It looks like it was a memory pressure issue again. Short term, we could look at auto-restarting apache/clamav on failure, longer term we should investigate whether increasing the memory allocation of the VM would be possible/worthwhile. I'll take care of more in-depth investigation and follow-up on Monday.
Jan 2 2024
Dec 8 2023
Dec 7 2023
This is done, please reach out if there's any issues!
Hi @XiaoXiao-WMF , this should be done! Please reach out if you're having any problems!