Page MenuHomePhabricator
Feed Search

Thu, Jun 11

bd808 triaged T428976: Investigate dynamic threshold model comparisons for canary and production log volume checks as Medium priority.
Thu, Jun 11, 10:59 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 created T428976: Investigate dynamic threshold model comparisons for canary and production log volume checks.
Thu, Jun 11, 10:59 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 renamed T428972: Configure additional httpbb checks to perform during a deployment from Configure optional httpbb checks to perform during a deployment to Configure additional httpbb checks to perform during a deployment.
Thu, Jun 11, 10:47 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 triaged T428972: Configure additional httpbb checks to perform during a deployment as Medium priority.
Thu, Jun 11, 10:47 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 created T428972: Configure additional httpbb checks to perform during a deployment.
Thu, Jun 11, 10:45 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 added a comment to T428971: Allow configuration of canary and production checks based on deployment target.

@Scott_French Does it seem reasonable that what I called the "control group" will always be all logs minus the target deployment's logs? I'm wondering if we need to find a way to configure both lookups separately or if it can be formulaic with only the thresholds needing to be set independently.

Thu, Jun 11, 10:36 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 triaged T428971: Allow configuration of canary and production checks based on deployment target as Medium priority.
Thu, Jun 11, 10:30 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 created T428971: Allow configuration of canary and production checks based on deployment target.
Thu, Jun 11, 10:29 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 merged T428944: Puppet agent failure detected on instance deployment-db15 in project deployment-prep into T428930: Set up deployment-db15 with Trixie and wmf-mariadb1011.
Thu, Jun 11, 10:09 PM · Patch-For-Review, Beta-Cluster-Infrastructure
bd808 merged task T428944: Puppet agent failure detected on instance deployment-db15 in project deployment-prep into T428930: Set up deployment-db15 with Trixie and wmf-mariadb1011.
Thu, Jun 11, 10:09 PM · Beta-Cluster-Infrastructure
bd808 merged T428934: Last Puppet run was over 24 hours ago on instance deployment-db15 in project deployment-prep into T428930: Set up deployment-db15 with Trixie and wmf-mariadb1011.
Thu, Jun 11, 10:08 PM · Patch-For-Review, Beta-Cluster-Infrastructure
bd808 merged task T428934: Last Puppet run was over 24 hours ago on instance deployment-db15 in project deployment-prep into T428930: Set up deployment-db15 with Trixie and wmf-mariadb1011.
Thu, Jun 11, 10:08 PM · Beta-Cluster-Infrastructure
bd808 closed T428901: Widespread puppet agent failures in project deployment-prep as Invalid.

Resolved per https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive&q=project%3Ddeployment-prep monitoring.

Thu, Jun 11, 10:08 PM · Beta-Cluster-Infrastructure
bd808 closed T428905: No Puppet resources found on instance deployment-poolcounter07 on project deployment-prep as Invalid.
bd808@deployment-poolcounter07:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-poolcounter07.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(e1acc46365) gitpuppet - beta: Add a wmf-beta-update-all timer and script'
Notice: Applied catalog in 6.53 seconds

Self-resolved?

Thu, Jun 11, 10:07 PM · Beta-Cluster-Infrastructure
bd808 moved T428944: Puppet agent failure detected on instance deployment-db15 in project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Thu, Jun 11, 10:04 PM · Beta-Cluster-Infrastructure
bd808 moved T428934: Last Puppet run was over 24 hours ago on instance deployment-db15 in project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Thu, Jun 11, 10:04 PM · Beta-Cluster-Infrastructure
bd808 moved T428930: Set up deployment-db15 with Trixie and wmf-mariadb1011 from To Triage to Future on the Beta-Cluster-Infrastructure board.
Thu, Jun 11, 10:04 PM · Patch-For-Review, Beta-Cluster-Infrastructure
bd808 moved T428910: Beta Cluster MariaDB is still 10.6.17, MW now requires 10.11 from To Triage to Future on the Beta-Cluster-Infrastructure board.
Thu, Jun 11, 10:04 PM · MediaWiki-libs-Rdbms, Beta-Cluster-Infrastructure
bd808 added a comment to T402454: Replace deprecated (frozen) Phabricator Conduit API calls with their stable equivalents.

Wikibugs has a lot of maintainers in theory: https://toolsadmin.wikimedia.org/tools/id/wikibugs

Thu, Jun 11, 9:34 PM · Patch-For-Review, Phabricator, Wikibugs
bd808 added a comment to T402454: Replace deprecated (frozen) Phabricator Conduit API calls with their stable equivalents.

Now I wonder who could review that.

Thu, Jun 11, 6:03 PM · Patch-For-Review, Phabricator, Wikibugs
bd808 merged task T428851: Replace deprecated Phabricator Conduit API calls with their stable equivalents into T418328: Move away from deprecated `maniphest.query` Conduit API.
Thu, Jun 11, 5:55 PM · Tool-Phabricator-bug-status, Phabricator
bd808 merged T428851: Replace deprecated Phabricator Conduit API calls with their stable equivalents into T418328: Move away from deprecated `maniphest.query` Conduit API.
Thu, Jun 11, 5:55 PM · Phabricator, Tool-Phabricator-bug-status
bd808 updated the task description for T428910: Beta Cluster MariaDB is still 10.6.17, MW now requires 10.11.
Thu, Jun 11, 2:48 PM · MediaWiki-libs-Rdbms, Beta-Cluster-Infrastructure
bd808 added a subtask for T401839: Migrate deployment-prep away from Debian Bullseye to Bookworm/Trixie: T428910: Beta Cluster MariaDB is still 10.6.17, MW now requires 10.11.
Thu, Jun 11, 2:47 PM · Epic, Release-Engineering-Team (Priority Backlog 📥), Cloud-VPS (Debian Bullseye Deprecation), Beta-Cluster-Infrastructure
bd808 added a parent task for T428910: Beta Cluster MariaDB is still 10.6.17, MW now requires 10.11: T401839: Migrate deployment-prep away from Debian Bullseye to Bookworm/Trixie.
Thu, Jun 11, 2:47 PM · MediaWiki-libs-Rdbms, Beta-Cluster-Infrastructure
bd808 added a comment to T256168: Move beta cluster automatic deployment to a dedicated infrastructure.

In production, the train-presync.service uses /usr/local/bin/systemd-timer-mail-wrapper for this purpose. I wonder if it works in beta.

Thu, Jun 11, 12:08 AM · Patch-For-Review, User-bd808, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure, Quality-and-Test-Engineering-Team (Test Infrastructure), Jenkins, Continuous-Integration-Config, Beta-Cluster-Infrastructure

Wed, Jun 10

bd808 moved T428819: No Puppet resources found on instance deployment-cirrussearch14 on project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Wed, Jun 10, 11:34 PM · Beta-Cluster-Infrastructure
bd808 moved T428822: No Puppet resources found on instance deployment-cirrussearch13 on project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Wed, Jun 10, 11:34 PM · Data-Platform-SRE (2026-06-05 - 2026-06-26), Beta-Cluster-Infrastructure
bd808 added a comment to T428354: Specifying --filelog-stdout or --filelog-stderr requires --filelog.

We'll update the documentation to reflect the fact that --filelog should be added

Wed, Jun 10, 7:59 PM · tools-platform-team, Toolforge

Tue, Jun 9

bd808 added a comment to T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s.

So no, it wasn't a data loss, but the data was populated only on the single cluster.

Tue, Jun 9, 9:18 PM · Patch-For-Review, Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub
bd808 added a comment to T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s.

@bd808 I noticed that the opensearch-toolhub-test.svc.codfw.wmnet wasn't populated, I've done it myself using your script.

Tue, Jun 9, 3:40 PM · Patch-For-Review, Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub
bd808 closed T428432: Horizon project access to cluebotng-trainer adn cluebotng-editsets needed for DamianZaremba as Resolved.

The projects were missing any members at all. I added @DamianZaremba to both as a "member" which is the name our Keystone setup gives to sysop users. You can add other member and reader accounts as needed Damian.

Tue, Jun 9, 3:36 PM · User-bd808, cloud-services-team, Cloud-VPS
bd808 renamed T428432: Horizon project access to cluebotng-trainer adn cluebotng-editsets needed for DamianZaremba from Horizon project access to Horizon project access to cluebotng-trainer adn cluebotng-editsets needed for DamianZaremba.
Tue, Jun 9, 3:29 PM · User-bd808, cloud-services-team, Cloud-VPS

Mon, Jun 8

bd808 added a comment to T428515: Quota increase request for zuul.

@dduvall The tls-server-name: 127.0.0.1 trick in the kubeconfig does not work for the executor access?

Mon, Jun 8, 11:19 PM · Release-Engineering-Team (Radar), Cloud-VPS (Quota-requests)
bd808 added a parent task for T423970: [tofu-cloudvps] Add support for importing legacy cloudvps_puppet_prefix objects: T394316: Use infrastructure as code techniques to rebuild the Beta Cluster.
Mon, Jun 8, 6:06 PM · cloud-services-team, Cloud-VPS
bd808 added a subtask for T394316: Use infrastructure as code techniques to rebuild the Beta Cluster: T423970: [tofu-cloudvps] Add support for importing legacy cloudvps_puppet_prefix objects.
Mon, Jun 8, 6:06 PM · Epic, Beta-Cluster-Infrastructure
bd808 closed T262284: horizon: hiera config reseted to an empty state for deployment-prep instances as Declined.

Too old to do anything about

Mon, Jun 8, 6:05 PM · cloud-services-team, Beta-Cluster-Infrastructure, Cloud-VPS
bd808 closed T428410: Puppet agent failure detected on instance deployment-schema-3 in project deployment-prep as Invalid.
bd808@deployment-schema-3:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-schema-3.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(2c7ddab7b7) git-sync-upstream - [BETA HACK] haproxy: disable warn-blocked-traffic-after'
Notice: Applied catalog in 5.28 seconds
Mon, Jun 8, 5:42 PM · Beta-Cluster-Infrastructure
bd808 closed T428447: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep as Resolved.
bd808@deployment-puppetserver-1.deployment-prep.eqiad1:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(2c7ddab7b7) git-sync-upstream - [BETA HACK] haproxy: disable warn-blocked-traffic-after'
Notice: /Stage[main]/Profile::Puppetserver::Volatile/File[/srv/puppet_fileserver/volatile/datacenter_vendors]: Not removing directory; use 'force' to override
Notice: /Stage[main]/Profile::Puppetserver::Volatile/File[/srv/puppet_fileserver/volatile/datacenter_vendors]/ensure: removed (corrective)
Notice: Applied catalog in 12.12 seconds

Puppet works on the puppetserver itself again. The alert for the original error here has cleared as well.

Mon, Jun 8, 4:55 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 added a comment to T428447: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep.
bd808@deployment-puppetserver-1.deployment-prep.eqiad1:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Must define a private entry in profile::puppetserver::git::repos to use profile::puppetserver::volatile (file: /srv/puppet_code/environments/production/modules/profile/manifests/puppetserver/volatile.pp, line: 30, column: 9) on node deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

It looks like maybe I messed up the cherry-pick? I'll try again.

Mon, Jun 8, 4:26 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 added a comment to T428447: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep.

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/e6bffc4dcf5ec95407f045c1092446610533a979%5E%21/#F0

diff --git a/deployment-prep/deployment-puppetserver.yaml b/deployment-prep/deployment-puppetserver.yaml
index bb6dcfa..3833832 100644
--- a/deployment-prep/deployment-puppetserver.yaml
+++ b/deployment-prep/deployment-puppetserver.yaml
Mon, Jun 8, 4:22 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 added a comment to T428447: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep.

New problems:

bd808@deployment-puppetserver-1.deployment-prep.eqiad1:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'profile::conftool::hiddenparma::root_token' (file: /srv/puppet_code/environments/production/modules/profile/manifests/puppetserver/volatile.pp, line: 13) on node deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

This is https://gerrit.wikimedia.org/r/c/operations/puppet/+/1298768 conflicting with the Beta Cluster config. We need to rename our profile::conftool::hiddenparma::api_tokens to profile::conftool::hiddenparma::root_token in the deployment-puppetserver prefix hiera apparently.

Mon, Jun 8, 4:20 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 added a comment to T428447: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep.

Hack patch updated in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1137013/6. Then I did the fetch, interactive rebase to drop the old patch, and cherry-pick the updated patch dance on the puppetserver.

Mon, Jun 8, 4:10 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 triaged T428447: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep as High priority.
gitpuppet@deployment-puppetserver-1:/srv/git/operations/puppet$ git fetch
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
gitpuppet@deployment-puppetserver-1:/srv/git/operations/puppet$ git rebase --interactive origin/production
INFO: Deploying Puppet code...
INFO: Exiting, skipping Puppet code deploy, because a git rebase is in progress
INFO: Deploying Puppet code...
INFO: Exiting, skipping Puppet code deploy, because a git rebase is in progress
INFO: Deploying Puppet code...
INFO: Exiting, skipping Puppet code deploy, because a git rebase is in progress
INFO: Deploying Puppet code...
INFO: Exiting, skipping Puppet code deploy, because a git rebase is in progress
Auto-merging modules/profile/manifests/puppetserver/volatile.pp
CONFLICT (content): Merge conflict in modules/profile/manifests/puppetserver/volatile.pp
error: could not apply 6c0417d261... [BETA HACK] Changes to profile::puppetserver::volatile
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 6c0417d261... [BETA HACK] Changes to profile::puppetserver::volatile
Mon, Jun 8, 3:35 PM · User-bd808, Beta-Cluster-Infrastructure

Fri, Jun 5

bd808 closed T428239: No Puppet resources found on instance deployment-docker-mobileapps02 on project deployment-prep as Resolved.
bd808@deployment-docker-mobileapps02:/var/log$ sudo rm syslog.1
bd808@deployment-docker-mobileapps02:/var/log$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           796M  624K  795M   1% /run
/dev/sda1        20G   15G  4.2G  78% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      124M   12M  113M  10% /boot/efi
tmpfs           796M     0  796M   0% /run/user/0
tmpfs           796M     0  796M   0% /run/user/3518

This will probably get us by for a while.

Fri, Jun 5, 2:58 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 added a comment to T428239: No Puppet resources found on instance deployment-docker-mobileapps02 on project deployment-prep.
bd808@deployment-docker-mobileapps02:/var/log$ sudo du -sh *| sort -h | tail -10
28M     syslog.27.gz
29M     syslog.28.gz
30M     syslog.21.gz
42M     syslog.24.gz
45M     syslog.2.gz
56M     syslog.12.gz
1.7G    account
2.9G    syslog
3.2G    syslog.1
3.7G    journal
bd808@deployment-docker-mobileapps02:/var/log$ ls -lh account/pacct
-rw-r----- 1 root adm 1.7G Jun  4 15:12 account/pacct
bd808@deployment-docker-mobileapps02:/var/log$ sudo truncate -s 0 /var/log/account/pacct
bd808@deployment-docker-mobileapps02:/var/log$ df -h | grep sda1
/dev/sda1        20G   18G  776M  96% /
/dev/sda15      124M   12M  113M  10% /boot/efi
bd808@deployment-docker-mobileapps02:/var/log$ sudo rm syslog.??.gz
bd808@deployment-docker-mobileapps02:/var/log$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-docker-mobileapps02.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(68641ed297) gitpuppet - [BETA HACK] haproxy: disable warn-blocked-traffic-after'
Notice: Applied catalog in 8.03 seconds
Fri, Jun 5, 2:56 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 added a comment to T428239: No Puppet resources found on instance deployment-docker-mobileapps02 on project deployment-prep.
bd808@deployment-docker-mobileapps02:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-docker-mobileapps02.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(68641ed297) gitpuppet - [BETA HACK] haproxy: disable warn-blocked-traffic-after'
Error: Could not prefetch package provider 'apt': Execution of '/usr/bin/apt-mark showmanual' returned 100: E: Write error - write (28: No space left on device)
E: IO Error saving source cache
E: The package lists or status file could not be parsed or opened.
...
Warning: /Stage[main]/Rsyslog/Service[rsyslog]: Skipping because of failed dependencies
Error: Failed to apply catalog: No space left on device @ dir_s_mkdir - /var/lib/puppet/state/state.yaml20260605-3112715-1t5l9pe.lock
Error: Could not save last run local report: No space left on device @ dir_s_mkdir - /var/cache/puppet/public/last_run_summary.yaml20260605-3112715-1ivm4du.lock
Error: Could not send report: No space left on device @ dir_s_mkdir - /var/lib/puppet/state/last_run_report.yaml20260605-3112715-odci0o.lock

Disk is full. Seems to be /var/log

Fri, Jun 5, 2:47 PM · User-bd808, Beta-Cluster-Infrastructure

Thu, Jun 4

bd808 added a comment to T428090: Request creation of humaniki-2 VPS project.

A different path would be to talk with the folks who run the wikidumpparse Cloud VPS project which is the current home of the https://humaniki.wmcloud.org project and see if your project can become part of theirs. The humaniki-2 name you have chosen makes it seem like you would like your project to be associated with theirs in the minds of users.

Thu, Jun 4, 10:59 PM · Cloud-VPS (Project-requests)
bd808 added a comment to T428090: Request creation of humaniki-2 VPS project.

I'll answer your remarks in order:

  1. I don't want to run a MediaWiki instance alongside my tool.
Thu, Jun 4, 10:56 PM · Cloud-VPS (Project-requests)
bd808 merged T427813: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep into T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.
Thu, Jun 4, 12:10 AM · Traffic, SRE, Beta-Cluster-Infrastructure
bd808 merged task T427813: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep into T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.
Thu, Jun 4, 12:10 AM · Traffic, Beta-Cluster-Infrastructure
bd808 merged T427819: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep into T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.
Thu, Jun 4, 12:10 AM · Traffic, SRE, Beta-Cluster-Infrastructure
bd808 merged task T427819: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep into T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.
Thu, Jun 4, 12:10 AM · Traffic, Beta-Cluster-Infrastructure

Wed, Jun 3

bd808 moved T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Wed, Jun 3, 11:30 PM · Traffic, SRE, Beta-Cluster-Infrastructure
bd808 added a comment to T428090: Request creation of humaniki-2 VPS project.

using an SQLite database

Wed, Jun 3, 11:16 PM · Cloud-VPS (Project-requests)
bd808 added a subtask for T401839: Migrate deployment-prep away from Debian Bullseye to Bookworm/Trixie: T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.
Wed, Jun 3, 4:11 PM · Epic, Release-Engineering-Team (Priority Backlog 📥), Cloud-VPS (Debian Bullseye Deprecation), Beta-Cluster-Infrastructure
bd808 added a parent task for T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword: T401839: Migrate deployment-prep away from Debian Bullseye to Bookworm/Trixie.
Wed, Jun 3, 4:11 PM · Traffic, SRE, Beta-Cluster-Infrastructure
bd808 added a comment to T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.

The Beta Cluster cache nodes are Debian Bullseye running HAProxy version 2.8.18-1~bpo11+1 2025/12/26. It looks like the prod CDN edge is using Debian Trixie and HAProxy 3.2.15-1~bpo13+1.

Wed, Jun 3, 4:09 PM · Traffic, SRE, Beta-Cluster-Infrastructure
bd808 added a comment to T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.

Pushed this on beta as a stopgap:

Wed, Jun 3, 3:58 PM · Traffic, SRE, Beta-Cluster-Infrastructure
bd808 renamed T428052: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword from haproxy in Beta cluster has invalid config to Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword.
Wed, Jun 3, 3:58 PM · Traffic, SRE, Beta-Cluster-Infrastructure

Tue, Jun 2

bd808 updated subscribers of T421653: New 2.20.0 upstream release for Pygments.

I missed seeing this for some reason. There is work in progress to upgrade the base image for shellbox to bookworm. Once that is done this should be a relatively simple follow up. I think this will be the first time that we can use the createPygmentizeBundle.php script that @SD0001 built too.

Tue, Jun 2, 11:37 PM · SyntaxHighlight
bd808 triaged T427968: [FY25-26 WE6.7.3] Automated supervision as Medium priority.
Tue, Jun 2, 5:50 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 created T427968: [FY25-26 WE6.7.3] Automated supervision.
Tue, Jun 2, 5:48 PM · Release-Engineering-Team (Priority Backlog 📥)
bd808 updated the task description for T421653: New 2.20.0 upstream release for Pygments.
Tue, Jun 2, 4:01 PM · SyntaxHighlight

Mon, Jun 1

bd808 changed the subtype of T427659: WE6.7.2 (FY25-26) Pretrain MVP deployment environment from "Task" to "Goal".
Mon, Jun 1, 11:22 PM · MW-on-K8s, ServiceOps-Mediawiki, Epic, ServiceOps new
bd808 added a subtask for T369112: Pretrain (née Group -1) QTE validation environment: T427659: WE6.7.2 (FY25-26) Pretrain MVP deployment environment.
Mon, Jun 1, 11:22 PM · Release-Engineering-Team (Priority Backlog 📥), Quality-and-Test-Engineering-Team (Test Infrastructure), Epic
bd808 added a parent task for T427659: WE6.7.2 (FY25-26) Pretrain MVP deployment environment: T369112: Pretrain (née Group -1) QTE validation environment.
Mon, Jun 1, 11:22 PM · MW-on-K8s, ServiceOps-Mediawiki, Epic, ServiceOps new
bd808 created T427826: "Application Not Authorized to Use CAS" error when attempting to authenticate to IDP.
Mon, Jun 1, 3:35 PM · cloud-services-team, Infrastructure-Foundations, CAS-SSO, Cloud-VPS

Thu, May 28

bd808 added a comment to T415293: Shut down the API Portal.

I haven't reviewed the whole task tree, but is there a planned step to replace the custom skin with something we generally use such as vector-2022 as part of the decomm process? It would be nice to be able to drop and archive the wikimediaapiportal skin (T259661: Restrict skin options on API Portal Beta Site) rather than carry it indefinitely as baggage in the train.

Thu, May 28, 8:14 PM · User-notice, Wiki-Setup (Close), API-Portal, Tech-Docs-Team
bd808 added a comment to T415293: Shut down the API Portal.

I haven't reviewed the whole task tree, but is there a planned step to replace the custom skin with something we generally use such as vector-2022 as part of the decomm process? It would be nice to be able to drop and archive the wikimediaapiportal skin (T259661: Restrict skin options on API Portal Beta Site) rather than carry it indefinitely as baggage in the train.

Thu, May 28, 8:09 PM · User-notice, Wiki-Setup (Close), API-Portal, Tech-Docs-Team
bd808 reassigned T426073: Migrate toolhub indices from production OpenSearch to OpenSearch on k8s from bd808 to atsuko.

Hi, indices access should now work without HTTP auth now. If the test cluster is working, I'll provision the prod cluster as well.

Thu, May 28, 12:36 AM · Patch-For-Review, Discovery-Search (2026.06.01 - 2026.07.03), Data-Platform-SRE (2026-06-05 - 2026-06-26), User-bd808, Toolhub

Wed, May 27

bd808 added a comment to T256168: Move beta cluster automatic deployment to a dedicated infrastructure.

Can we claim victory on this one did you have following steps in mind? The ones I think of are removing the agent in Jenkins and deleting the jobs (I can take care of that).

Wed, May 27, 3:50 PM · Patch-For-Review, User-bd808, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure, Quality-and-Test-Engineering-Team (Test Infrastructure), Jenkins, Continuous-Integration-Config, Beta-Cluster-Infrastructure

Tue, May 26

bd808 renamed T427284: Consider redirecting toolhub.toolforge.org to toolhub.wikimedia.org from toolhub.toolforge.org doesn't redirect to toolhub.wikimedia.org to Consider redirecting toolhub.toolforge.org to toolhub.wikimedia.org.
Tue, May 26, 5:17 PM · Toolhub
bd808 changed the subtype of T427284: Consider redirecting toolhub.toolforge.org to toolhub.wikimedia.org from "Bug Report" to "Feature Request".

https://tools.wmflabs.org/toolhub was where we were publishing the initial versions of the toolinfo schema. In those early versions, we used identifiers like https://tools.wmflabs.org/toolhub/schema/1.1.1 which we intended to be usable as a URI to retrieve the specification. It looks like https://tools.wmflabs.org/toolhub/schema/1.2.0-draft01 was the last specification published there on 2020-09-15. Later we began using https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/toolhub/+/refs/heads/main/jsonschema/toolinfo as the git source tree location for schemas which ends up exposing them to the internet via URLs like https://toolhub.wikimedia.org/static/jsonschema/toolinfo/1.2.2.json.

Tue, May 26, 5:17 PM · Toolhub

Sat, May 23

bd808 added a comment to T427037: Export a dataset of licenses of Toolforge tools (Toolforge Licenses Catalogue).
  • and if Toolforge tools are in scope anyway 🤔 - maybe a question for the Wikidata bar
Sat, May 23, 1:19 AM · Technical-Tool-Request, Wikimania-Hackathon-2026

Thu, May 21

bd808 added a comment to T426790: Quota increase request for project osmit.

WMCS has an ancient and largely locally undocumented tie to OpenStreetMap projects. The connection comes from a OpenStreetMap-Wikimedia cooperation that included OpenStreetMap tools/servers being part of the Toolserver environment. When Toolserver was disbanded some of the OSM projects came to what we now call Cloud VPS. See also osmwiki:Collaboration with Wikipedia which talks about some of this.

Thu, May 21, 5:17 PM · WMIT-Infrastructure, Cloud-VPS (Quota-requests)
bd808 added a comment to T152581: Expand the Toolforge definition of "free license" to include FSF-approved and DFSG-compatible licenses.

→ So, in all cases, we need answers. And we can produce a small export with:

  • tool name
  • license
  • repo link

Unfortunately I do not know Toolforge internals (Striker internals? etc.) to produce such export. But this sounds like a feasible sub-task.

Thu, May 21, 4:47 PM · cloud-services-team, Software-Licensing, Toolforge
bd808 added a comment to T420833: Beta cluster is being crawled to death by bot traffic coming from residential proxies.

That is 11,118 distinct /24 networks each sending 2 requests in the last ~2 hours.

Thu, May 21, 4:24 PM · Beta-Cluster-Infrastructure
bd808 added a comment to T420833: Beta cluster is being crawled to death by bot traffic coming from residential proxies.

The current traffic pattern is full of the "2 requests per /24 network" pattern. Range blocks will need to be vast again to make much difference. We really need to get requestctl and more fine grained pattern matching tech into the Beta Cluster CDN edge.

Thu, May 21, 2:09 AM · Beta-Cluster-Infrastructure
bd808 triaged T420833: Beta cluster is being crawled to death by bot traffic coming from residential proxies as High priority.
Thu, May 21, 1:58 AM · Beta-Cluster-Infrastructure
bd808 merged T426831: Beta cluster down: Error: 502, Backend fetch failed into T420833: Beta cluster is being crawled to death by bot traffic coming from residential proxies.
Thu, May 21, 1:55 AM · Beta-Cluster-Infrastructure
bd808 merged task T426831: Beta cluster down: Error: 502, Backend fetch failed into T420833: Beta cluster is being crawled to death by bot traffic coming from residential proxies.
Thu, May 21, 1:55 AM · Beta-Cluster-Infrastructure
bd808 added a comment to T420833: Beta cluster is being crawled to death by bot traffic coming from residential proxies.

The bots are back.

Screenshot 2026-05-20 at 19.53.50.png (2,054×1,414 px, 472 KB)

Thu, May 21, 1:54 AM · Beta-Cluster-Infrastructure

Wed, May 20

bd808 added a comment to T426822: No Puppet resources found on instance deployment-cache-upload08 on project deployment-prep.

This is currently blocking an unblock request for Beta Cluster.

Wed, May 20, 9:17 PM · Traffic, Beta-Cluster-Infrastructure
bd808 assigned T426822: No Puppet resources found on instance deployment-cache-upload08 on project deployment-prep to ssingh.

This is currently blocking an unblock request for Beta Cluster.

Wed, May 20, 9:11 PM · Traffic, Beta-Cluster-Infrastructure

Tue, May 19

bd808 closed T405150: k8s-status can't show information about one tool as Invalid.

https://k8s-status.toolforge.org/namespaces/tool-jimmy/ is rendering today. It is very difficult to say what API crawling failure was breaking this particular tool's display 8 months ago.

Tue, May 19, 10:27 PM · Tool-k8s-status
bd808 moved T423124: ScheduleDeploymentBot should escape wikitext in commit message ({{deploy}} |title= parameter) from Backlog to Ready to Go on the Tool-schedule-deployment board.
Tue, May 19, 9:40 PM · Tool-schedule-deployment
bd808 moved T385007: Extend functionality to support MediaWiki infrastructure Windows and related repos from Backlog to Doing on the Tool-schedule-deployment board.
Tue, May 19, 9:39 PM · User-jijiki, Release-Engineering-Team, ServiceOps new, Patch-For-Review, Wikimedia-Hackathon-2026, Tool-schedule-deployment

Mon, May 18

bd808 added a comment to T393782: Investigate new Magnum drivers.

"The new CAPI driver and the old Heat driver are compatible and can both be active on the same deployment"

Mon, May 18, 10:48 PM · Openstack-Magnum, cloud-services-team
bd808 added a comment to Volunteer.

I created T426638: Make it possible for users to self-claim the "Volunteer" badge for discussion of finding a way for this badge to be self-awarded again.

Mon, May 18, 3:47 PM
bd808 added a comment to T426638: Make it possible for users to self-claim the "Volunteer" badge.

I suppose one option available could be a tool hosted on Toolforge or elsewhere that would perform the granting action on behalf of an authenticated user. That is more moving parts to keep track of than just button clicks within Phabricator itself, but if the permissions model of Badges does not separate editing the badge from awarding the badge that direct use might not be reasonably possible.

Mon, May 18, 3:46 PM · Phabricator
bd808 created T426638: Make it possible for users to self-claim the "Volunteer" badge.
Mon, May 18, 3:44 PM · Phabricator
bd808 awarded Volunteer to recipient: Gerges.
Mon, May 18, 3:36 PM

Sat, May 16

bd808 moved T425039: Set a default wikipage target for saving the final state of an Etherpad from Backlog to Needs Discussion on the Tool-etherpad-backup board.
Sat, May 16, 11:42 PM · Tool-etherpad-backup, Wikimedia-Etherpad
bd808 added a comment to T415237: etherpad table size is 233GB / plan to delete all etherpads.

@bd808 Could you please import all the pads referenced by mwstake.org? Special:LinkSearch can quickly list all of them.

Sat, May 16, 5:22 PM · User-notice, collaboration-services, Wikimedia-Etherpad, Data-Persistence
bd808 closed T426501: Wikibugs is no longer in some IRC channels (2026-05-16) as Resolved.

Probably was a T410540: Wikibugs does not rejoin channels automatically following a BNC restart recurrence.

tools.wikibugs@tools-bastion-14:~$ kubectl get po
NAME                        READY   STATUS    RESTARTS   AGE
gerrit-6464bb8dc9-4dr2f     1/1     Running   0          17h
gitlab-7cb775c5f9-7kjzg     1/1     Running   0          18h
irc-66579f4f68-k4dpq        1/1     Running   0          95s
phorge-6cb946dfdb-krtd9     1/1     Running   0          20h
wikibugs-5bf6746d8f-2j6cr   1/1     Running   0          17h
znc-6658b7c4f4-jtlfh        1/1     Running   0          17h

That znc pod restart likely being the trigger.

Sat, May 16, 5:17 PM · User-bd808, Wikibugs

Fri, May 15

bd808 renamed T426394: Developer account email updated via idm.wikimedia.org not showing as changed in toolsadmin.wikimedia.org from Cannot update email for "Wikimedia developer account" in toolsadmin.wikimedia.org to Developer account email updated via idm.wikimedia.org not showing as changed in toolsadmin.wikimedia.org.
Fri, May 15, 5:08 PM · Patch-For-Review, Striker
bd808 added a comment to T426394: Developer account email updated via idm.wikimedia.org not showing as changed in toolsadmin.wikimedia.org.

I'm pretty sure this is a variation of the older bug at T144943: Groups and tools only refreshed at login.

Fri, May 15, 5:07 PM · Patch-For-Review, Striker
bd808 removed a project from T426399: Add Prove to CSP Policy Exception: cloud-services-team.
Fri, May 15, 5:02 PM · 2026-user-javascript-incident, Wikidata, ContentSecurityPolicy
bd808 renamed T426378: Tools may not allow non-interactive commands via 'become' due to dotfile configuration from Tools may not allow non-interactive commands via 'become' to Tools may not allow non-interactive commands via 'become' due to dotfile configuration.
Fri, May 15, 4:42 PM · tools-platform-team, cloud-services-team, Toolforge