All instances updated, for cloud-runners I opened https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/304.
Fri, Dec 1
replicas done. I'll upgrade production at 11:00 UTC.
Test instances updates successfully. I'll proceed with replicas.
I updated gitlab-ce to 16.4.3-ce.0 and gitlab-runner to 16.4.2 on the apt hosts. I'll start updating the test instance and test runners soon.
Thu, Nov 30
I can take care of the upgrade tomorrow UTC morning. If somebody else wants to do that feel free to claim the task :)
We chose to use the rails command rather than the API to avoid having to create a service user and API key, but since we'll likely need to do this for the apt-staging work, that gets rid of that reason
Hmm, it's still not the most beautiful but at least recognizable on light grey. I'd say let's go for it for now. :)
To summarize our meeting on Tuesday:
Both logo and background are single color. I tried setting a thicker outline to get contrast on any background. Would you want to try the 4px stroke logo above?
So something like a2a9b1ff?
#A2A9B1 on the Gray setting (#303030) will fail WCAG AAA. #A2A9B1 on the Light Gray setting (#F0F0F0) will also fail WCAG AA.
hmm yes true, with the Light Gray theme the logo is a bit too light.
GitLab uses WMF flavored logos for the navbar header and login logo now.
GitLab uses just a square logo without text for the login screen (see T285354#9367898 for example).
The new logo is configured on all instances using https://gitlab.wikimedia.org/repos/releng/gitlab-settings.
Wed, Nov 29
I can make sure the same text is set by gitlab-settings (appearance API) after https://gitlab.wikimedia.org/repos/releng/gitlab-settings/-/merge_requests/51/ is merged.
We don't set sign_in_text in the config for application settings API (see gitlab2002.yaml for example). And yes there is a deprecation warning for that field in the api:
Tue, Nov 28
As discussed in our last Monday meeting, I double checked bastion and ssh config. The test instance uses a public address which does not need a bastion host:
Mon, Nov 27
So to sum this up, we need a bullseye, bookworm and maybe a buster image which contain the following packages:
Fri, Nov 24
Thu, Nov 23
Application Not Authorized to Use CAS
I uploaded the new logo to the test instance. The logo works with the white theme.
I updated the MR to upload the new WMF GitLab logo. The code works but is a bit duplicate with the application settings script. I also did not want to make the scripts too complicated. I'll wait for review from RelEng.
There is a test job for the research-landing-page now: https://gitlab.wikimedia.org/repos/sre/miscweb/research-landing-page/-/jobs/168533
I think @Dzahn added the new host plane1003 for the bullseye migration (T348392).
Puppet still has some certificate errors on the host. A speculative guess: is certificate signing different on puppet7? SAL states:
Wed, Nov 22
Backups look fine again since 18:30 UTC yesterday: https://grafana.wikimedia.org/d/413r2vbWk/bacula?orgId=1&var-site=eqiad&var-job=gerrit1003.wikimedia.org-Hourly-Fri-productionEqiad-gerrit-repo-data&from=1700471853896&to=1700642496593. No elevated backup freshness or gaps in backups.
Tue, Nov 21
@hashar gerrit2002 was migrated to puppet7. I restarted gerrit and apache processes and the instance looks fine so far. Could you double check gerrit2002?
run existing lint job in gitlab-ci
Mon, Nov 20
The old profile::gitlab::runner::registration_token were removed from private puppet and the two CI variables TF_VAR_runner_registration_token and RUNNER_REGISTRATION_TOKEN_STAGING were removed from https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/settings/ci_cd.
There is no proper way of deleting the old reimaged runner beside deleting the row from ci_runner_machines table. There is no API endpoint for deleting a single grouped runner. We could have used the gitlab-runner unregister command, but that needs the old authentication token. So I'll leave the stale Trusted Runner in the group of Runners for now. If it causes any issues, we can test dropping the row in postgres or extracting it's authentication token from the database.
Reimage of a Trusted Runner worked, the Runner is available again after the reimage in GitLab.
Fri, Nov 17
I think I managed to run the tests which where configured in the old Gerrit repo for jenkins:
Alert recovered after 5 minutes. syslog on contin2002 shows some dns issues during the time of the alert
The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!
Thu, Nov 16
Cloud Runners have been migrated to the new authentication scheme. Thanks @dancy for the help deploying and testing the new runners!
I migrated all WMCS Shared Runners to the new authentication scheme. It was necessary to unregister the runner, clear the machine-id and generate a new one:
Tue, Nov 14
Trusted Runners were migrated to the new authentication scheme.
Mon, Nov 13
The converted Shared Runner (https://gitlab.wikimedia.org/admin/runners/1479#/) and Trusted Runner (https://gitlab.wikimedia.org/admin/runners/1484#/) behave as expected and executed jobs successfully.
Fri, Nov 10
Yea, style-guide is deployed by scap and other parts are not. This is because there were large files in there afair.
We gotta have a chat with @Volker_E about this before kicking it off.
One Shared Runner runner-1021.gitlab-runners.eqiad1.wikimedia.cloud and one Trusted Runner gitlab-runner1002.eqiad.wmnet were migrated to the new authentication scheme. I'll leave them running and migrate the other Runners next week, if everything looks good.
I updated the settings to limit 20000 and children 4 again.
Thu, Nov 9
Unregistering and re-registering runners with the new authentication scheme still works and settings (like protected, tags) look good.
I migrated the two devtools test runners in WMCS to the new authentication scheme. The steps were:
Wed, Nov 8
File to be edited is operations/container/miscweb/html/static-bugzilla/index.html.gz
The script failed to load a project which was deleted some time ago (id 1033 repos/releng/mathoid). The first change above added proper error handling for adding and removing runners. So the issue is fixed and Trusted Runner CI works again.
From reading the docs in merge train it seems merge trains are for a single project only.
After merging and applying the lower buildkitd_gckeepstorage limits there were no more failed docker-gc runs. From my perspective this is resolved now. So I'm closing the task optimistically.
Tue, Nov 7
Is this premium feature Merge request dependencies similar to the feature we need here?
The script fails when trying to access project https://gitlab.wikimedia.org/repos/qte/catalyst/prototype-api with the ID 1658.
@Jelto Let's see if https://gerrit.wikimedia.org/r/c/operations/puppet/+/971502/ helps.