Page MenuHomePhabricator

bd808 (Bryan Davis)
Principal Software EngineerAdministrator

Today

  • No visible events.

Tomorrow

  • No visible events.

Monday

  • No visible events.

User Details

User Since
Oct 3 2014, 2:36 PM (590 w, 1 d)
Roles
Administrator
Availability
Available
IRC Nick
bd808
LDAP User
BryanDavis
MediaWiki User
BDavis (WMF) [ Global Accounts ]

I'm BDavis (WMF) on wiki, bd808 on irc & GitLab, and BryanDavis on Gerrit & Wikitech.

I've got a thing for 🦄s. Don't judge.

I work for or provide services to the Wikimedia Foundation, but this is my only Phabricator account. Edits, statements, or other contributions made from this account are my own, and may not reflect the views of the Foundation.

Recent Activity

Today

bd808 closed T415389: Create a Fundraising specific SAL as Resolved.
Sat, Jan 24, 12:58 AM · User-bd808, fundraising-tech-ops, Fundraising-Backlog, Stashbot
bd808 added a comment to T415389: Create a Fundraising specific SAL.

Stashbot's config is kept in private files on Toolforge because I have been too lazy to update things to read secrets from envvars instead of embedding them in a YAML document. I will log the changes here for posterity.

Sat, Jan 24, 12:54 AM · User-bd808, fundraising-tech-ops, Fundraising-Backlog, Stashbot
bd808 changed the status of T415389: Create a Fundraising specific SAL from Open to In Progress.
Sat, Jan 24, 12:43 AM · User-bd808, fundraising-tech-ops, Fundraising-Backlog, Stashbot
bd808 renamed T415389: Create a Fundraising specific SAL from Create a Fundraising specific channel to Create a Fundraising specific SAL.
Sat, Jan 24, 12:19 AM · User-bd808, fundraising-tech-ops, Fundraising-Backlog, Stashbot
bd808 added a comment to T415389: Create a Fundraising specific SAL.

Both are fine. And I assume the separate view on https://sal.toolforge.org/ is just going to be via having a fundraising entry on the /projects list?

Sat, Jan 24, 12:19 AM · User-bd808, fundraising-tech-ops, Fundraising-Backlog, Stashbot
bd808 added a comment to T415389: Create a Fundraising specific SAL.

Would https://wikitech.wikimedia.org/wiki/Fundraising/SAL be a reasonable place to record the !log messages?

Sat, Jan 24, 12:10 AM · User-bd808, fundraising-tech-ops, Fundraising-Backlog, Stashbot

Yesterday

bd808 added a comment to T415352: Add serviceops new to the monitored channels on IRC bot/wikibugs.

The semi-secret magic here was that the #wikimedia-operations channel config matches all Phabricator tags and alternate tags that start with sre- (case insensitive). The case insensitive part is likely non-obvious from reading the current config file.

Fri, Jan 23, 10:58 PM · Wikibugs, ServiceOps new
bd808 added a comment to T337570: Get GitLab to render `T{\d}+` in MR overviews, comments, etc. as links to Phabricator.

I have a hunch that this configuration state is the problem, and that the Phorge integration being enabled at the instance level may be complicating disabling the legacy "Custom issue tracker" integration.

Fri, Jan 23, 10:46 PM · Phabricator, GitLab (Integrations), User-brennen, Release-Engineering-Team (Priority Backlog 📥)
bd808 added a comment to T337570: Get GitLab to render `T{\d}+` in MR overviews, comments, etc. as links to Phabricator.

Phabricator linking seems to be working on https://gitlab.wikimedia.org/toolforge-repos/versions/-/merge_requests/8. I wonder if there is something interesting going on in that wikifunctions MR description? Does the regex that is in use allow for trailing whitespace? Is there a newline requirement?

Interesting. By the way it was https://gitlab.wikimedia.org/repos/releng/cli/-/merge_requests/641 that made me notice this.

Fri, Jan 23, 10:29 PM · Phabricator, GitLab (Integrations), User-brennen, Release-Engineering-Team (Priority Backlog 📥)
bd808 claimed T125678: Scap reliance on extension-list in turn requires extensions to be deployed to WMF production before being deployed to the Beta Cluster.

The train ran for a week and continued to run in the next week without any identified problems. I think we can call this experiment a success and proceed to the cleanup phase where we update docs on mediawiki.org to stop telling people that they need to rush to get their extension/skin on the train early.

Fri, Jan 23, 10:03 PM · User-bd808, Scap, MediaWiki-Configuration
bd808 closed T411516: Add ability to ignore missing extensions in mergeMessageFileList's `--list-file` input, a subtask of T125678: Scap reliance on extension-list in turn requires extensions to be deployed to WMF production before being deployed to the Beta Cluster, as Resolved.
Fri, Jan 23, 9:57 PM · User-bd808, Scap, MediaWiki-Configuration
bd808 closed T411516: Add ability to ignore missing extensions in mergeMessageFileList's `--list-file` input as Resolved.

We could technically close this as invalid because it turned out that the requested functionality was already in core, but resolved seems more correct because of the effort that was put into proving that the desired change was actually sufficient to keep scap working as desired. It would also be fair to toss a coin to decide if @thcipriani or I deserve the claim on resolution; I proved that core worked as desired and @thcipriani was able to show that scap worked in testing with a missing extension. The production test was a trivial, but necessary follow up. I used https://justflipacoin.com/ with Heads == @thcipriani, Tails == @bd808 to determine that @thcipriani gets the credit.

Fri, Jan 23, 9:57 PM · User-bd808, Release-Engineering-Team (Doing 😎), Scap, MediaWiki-Internationalization
bd808 added a comment to T411516: Add ability to ignore missing extensions in mergeMessageFileList's `--list-file` input.

The train ran for a week and continued to run in the next week without any identified problems. I think we can call this experiment a success and proceed to the cleanup phase where we update docs on mediawiki.org to stop telling people that they need to rush to get their extension/skin on the train early.

Fri, Jan 23, 9:48 PM · User-bd808, Release-Engineering-Team (Doing 😎), Scap, MediaWiki-Internationalization
bd808 closed T415318: versions % doesn't add up to 100% as Resolved.

The display values add up to 99.99% now. We could get incrementally closer to 100% by showing more digits, but hopefully this is good enough.

Fri, Jan 23, 9:48 PM · Release-Engineering-Team, Tools
bd808 added a comment to T337570: Get GitLab to render `T{\d}+` in MR overviews, comments, etc. as links to Phabricator.

If a merge request description body consists of a single Bug: TXXXXX line, the link to Phabricator does not get created. I'm wondering if this is a new problem since the recent Gitlab UI changes. See the job link referenced in the task description for an example (https://gitlab.wikimedia.org/repos/abstract-wiki/wikifunctions/function-orchestrator/-/merge_requests/18).

Fri, Jan 23, 9:34 PM · Phabricator, GitLab (Integrations), User-brennen, Release-Engineering-Team (Priority Backlog 📥)
bd808 closed T268288: Launch API Portal as Declined.

Superseded by T415293: Shut down the API Portal

Fri, Jan 23, 5:57 PM · API-Portal
bd808 added a comment to T415360: Request creation of Psychopragmatics VPS project.

Alright, no worries, we'll check out a possible Toolforge implementation.

Fri, Jan 23, 5:28 PM · Cloud-VPS (Project-requests)
bd808 added a comment to T415360: Request creation of Psychopragmatics VPS project.

The wiki has not been created yet on Miraheze and is awaiting approval (https://meta.miraheze.org/wiki/Special:RequestWikiQueue/73470).

Fri, Jan 23, 4:31 PM · Cloud-VPS (Project-requests)
bd808 added a comment to T415239: Toolforge SSH login: connection closed after publickey authentication.

There is a user record for the https://ldap.toolforge.org/user/jacobhung Developer account in the https://toolsadmin.wikimedia.org/ database. That is also connected to the JacobHung SUL account which would keep toolsadmin from letting you create another Developer account using that same SUL account.

Fri, Jan 23, 4:17 PM · cloud-services-team, Toolforge
bd808 added a comment to T415360: Request creation of Psychopragmatics VPS project.

It is generally possible to host a MediaWiki instance as a Toolforge tool. Some examples:

Fri, Jan 23, 4:02 PM · Cloud-VPS (Project-requests)

Thu, Jan 22

bd808 changed the subtype of T415313: Investigate switching to the matterbridge-org/matterbridge fork from "Task" to "Spike".

I think the thing to try here would be a branch of https://gitlab.wikimedia.org/toolforge-repos/bridgebot that pulls in a treeish from https://github.com/matterbridge-org/matterbridge instead of the one currently being used from https://gitlab.wikimedia.org/toolforge-repos/bridgebot-matterbridge.

Thu, Jan 22, 10:20 PM · Tool-bridgebot
bd808 created T415313: Investigate switching to the matterbridge-org/matterbridge fork.
Thu, Jan 22, 9:56 PM · Tool-bridgebot

Wed, Jan 21

bd808 added a comment to T414836: Create a reusable container to replace nginx ingress anonymizing reverse proxy setups.

Looking at P87548 I just realized that the current rproxy solution I have built really only works to provide a single reverse proxy per Toolforge tool. This is because the envvar based configuration only allows one set of config data to be supplied. I need to think a bit harder about reasonable ways to support N deployments per tool. It would be relatively simple to add support for a config file. I would like a solution that avoids using NFS if I can dream one up.

Wed, Jan 21, 9:40 PM · User-bd808, Tool-containers
bd808 edited P87548 (An Untitled Masterwork).
Wed, Jan 21, 9:25 PM
bd808 edited P87548 (An Untitled Masterwork).
Wed, Jan 21, 9:21 PM
bd808 edited P87548 (An Untitled Masterwork).
Wed, Jan 21, 9:17 PM
bd808 changed the status of T414860: [bug] File list is not shown in the Terminal from Resolved to Invalid.
Wed, Jan 21, 6:14 PM · cloud-services-team, PAWS
bd808 edited P87548 (An Untitled Masterwork).
Wed, Jan 21, 5:39 PM
bd808 added a comment to T414836: Create a reusable container to replace nginx ingress anonymizing reverse proxy setups.

I have an initial working solution at https://gitlab.wikimedia.org/toolforge-repos/containers-rproxy. I have deployed it to replace the tool-bd808-test.proxy-scdn ingress only proxy from the task description. That looked something like:

bd808@laptop:~$ ssh dev.toolforge.org
bd808@tools-bastion-14:~$ become bd808-test
tools.bd808-test@tools-bastion-14:~$ kubectl delete ingress proxy-scdn
tools.bd808-test@tools-bastion-14:~$ kubectl delete service i-scdn-co
tools.bd808-test@tools-bastion-14:~$ toolforge envvars create RPROXY_UPSTREAM_URL 'https://i.scdn.co'
tools.bd808-test@tools-bastion-14:~$ toolforge envvars create RPROXY_PATH_REGEX '/scdn(/|$)(.*)'
tools.bd808-test@tools-bastion-14:~$ toolforge envvars create RPROXY_PATH_TEMPLATE '/$2'
tools.bd808-test@tools-bastion-14:~$ toolforge envvars create GO_LOG debug
tools.bd808-test@tools-bastion-14:~$ toolforge jobs run \
    --image tool-containers/rproxy:latest \
    --command web \
    --continuous \
    --port 8000 \
    --health-check-http '/healthz' \
    rproxy-scdn
tools.bd808-test@tools-bastion-14:~$ kubectl apply --validate=true -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rproxy-scdn
spec:
  rules:
    - host: bd808-test.toolforge.org
      http:
        paths:
          - path: /scdn
            pathType: Prefix
            backend:
              service:
                name: rproxy-scdn
                port:
                  number: 8000
EOF
Wed, Jan 21, 4:56 PM · User-bd808, Tool-containers

Tue, Jan 20

bd808 added a comment to T413803: 1.46.0-wmf.12 deployment blockers.

And the Beta wikis are down now, there is just not enough time.

Tue, Jan 20, 10:25 PM · Release-Engineering-Team (Doing 😎), Essential-Work, Release, Train Deployments
bd808 added a comment to T415021: Project deployment-prep instance deployment-sessionstore06 is down.

sudo journalctl --since "2026-01-20 10:05:00" --until "2026-01-20 10:15:00" turned up the kernel oom-killer going after cassandra in the time range where we had a data collection gap.

Tue, Jan 20, 9:18 PM · Beta-Cluster-Infrastructure
bd808 moved T415133: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Tue, Jan 20, 9:01 PM · Traffic, Beta-Cluster-Infrastructure
bd808 moved T415115: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Tue, Jan 20, 9:01 PM · Traffic, Beta-Cluster-Infrastructure
bd808 merged task T415133: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep into T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.
Tue, Jan 20, 9:01 PM · Traffic, Beta-Cluster-Infrastructure
bd808 merged task T415115: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep into T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.
Tue, Jan 20, 9:00 PM · Traffic, Beta-Cluster-Infrastructure
bd808 merged tasks T415115: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep, T415133: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep into T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.
Tue, Jan 20, 9:00 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 closed Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), as Resolved.
Tue, Jan 20, 8:56 PM · Epic, Beta-Cluster-Infrastructure
bd808 changed the status of T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library from Open to In Progress.

Cherry-pick has things running again. Hopefully @SLyngshede-WMF or @Joe can make time to review and merge my patch upstream in ops/puppet.git. Thanks for the nudge in the right direction @ssingh.

Tue, Jan 20, 8:44 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 added a comment to T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.
bd808@deployment-cache-text08.deployment-prep.eqiad1:~$ sudo run-puppet-agent
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(f43e818c6d) git-sync-upstream - haproxy: guard call to private function with feature flag'
Notice: /Stage[main]/Profile::Cache::Haproxy/Haproxy::Site[tls]/File[/etc/haproxy/conf.d/tls.cfg]/content:
--- /etc/haproxy/conf.d/tls.cfg 2026-01-20 16:51:38.138281007 +0000
+++ /tmp/puppet-file20260120-30406-xkqdin       2026-01-20 20:38:14.131855754 +0000
@@ -214,8 +214,7 @@
     # bots that honour our UA and so have contact info go in class D, unless set to another value before. This means that requests coming from abusive networks will
     # still get an F score.
     http-request set-var(req.trusted_request,ifnotexists) str(D) if { var(req.ua_class) -m str "robot" }
-    # Temp fix for bot-password
-    http-request lua.check_traffic_class
+
Tue, Jan 20, 8:40 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 renamed T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library from HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 to HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.
Tue, Jan 20, 8:29 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 added a comment to T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.

I wonder if that lua.check_traffic_class method is coming from a private location in production?

bd808@mbp03:~/projects/wmf/operations/puppet$ git grep check_traffic_class
modules/profile/templates/cache/haproxy/tls_terminator.cfg.erb:    http-request lua.check_traffic_class
Tue, Jan 20, 8:28 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 added a comment to T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.

Running the config check with all of the config files gives a different error:

bd808@deployment-cache-text08.deployment-prep.eqiad1:~$ sudo haproxy -f /etc/haproxy/haproxy.cfg -f /etc/haproxy/conf.d -c
[NOTICE]   (24908) : haproxy version is 2.8.18-1~bpo11+1
[NOTICE]   (24908) : path to executable is /usr/sbin/haproxy
[ALERT]    (24908) : config : parsing [/etc/haproxy/conf.d/tls.cfg:218]: 'http-request' expects 'wait-for-handshake', 'set-log-level', 'set-nice', 'use-service', 'sc-add-gpc(*)', 'sc-inc-gpc(*)', 'sc-inc-gpc0(*)', 'sc-inc-gpc1(*)', 'sc-set-gpt(*)', 'sc-set-gpt0(*)', 'send-spoe-group', 'do-resolve(*)', 'cache-use', 'add-acl(*)', 'add-header', 'allow', 'auth', 'capture', 'del-acl(*)', 'del-header', 'del-map(*)', 'deny', 'disable-l7-retry', 'early-hint', 'normalize-uri', 'redirect', 'reject', 'replace-header', 'replace-path', 'replace-pathq', 'replace-uri', 'replace-value', 'return', 'set-header', 'set-map(*)', 'set-method', 'set-path', 'set-pathq', 'set-query', 'set-uri', 'strict-mode', 'tarpit', 'track-sc(*)', 'set-timeout', 'wait-for-body', 'set-var-fmt(*)', 'set-var(*)', 'unset-var(*)', 'set-dst', 'set-dst-port', 'set-mark', 'set-src', 'set-src-port', 'set-tos', 'silent-drop', 'set-priority-class', 'set-priority-offset', 'set-bandwidth-limit', 'lua.is_datacenter', 'lua.res_proxy', 'lua.set_contact_info', but got 'lua.check_traffic_class'.
[ALERT]    (24908) : config : Error(s) found in configuration file : /etc/haproxy/conf.d/tls.cfg
Tue, Jan 20, 8:25 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 triaged T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library as High priority.
Tue, Jan 20, 7:24 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 created T415113: HAProxy failing to start on deployment-cache-text08 and deployment-cache-upload08 because of missing `traffic_class.lua` library.
Tue, Jan 20, 7:24 PM · User-bd808, Patch-For-Review, Traffic, Beta-Cluster-Infrastructure
bd808 moved T414864: Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners from To Triage to Backlog on the Beta-Cluster-Infrastructure board.
Tue, Jan 20, 7:18 PM · GitLab, m3api, Beta-Cluster-Infrastructure
bd808 triaged T415021: Project deployment-prep instance deployment-sessionstore06 is down as Medium priority.

Looks to be a duplicate of the behavior from T412774: Project deployment-prep instance deployment-sessionstore06 is down. The instance is up, but something spiked it's load to a point where Prometheus scrapes failed causing a down time alert.

Tue, Jan 20, 7:17 PM · Beta-Cluster-Infrastructure
bd808 closed T414934: Puppet agent failure detected on instance deployment-puppetserver-1 in project deployment-prep as Invalid.

Already resolved.

bd808@deployment-puppetserver-1.deployment-prep.eqiad1:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(51d830ad4e) gitpuppet - varnish: Move error message from footer to body for HTTP 4xx responses'
Notice: /Stage[main]/Profile::Puppetserver::Volatile/File[/srv/puppet_fileserver/volatile/datacenter_vendors]: Not removing directory; use 'force' to override
Notice: /Stage[main]/Profile::Puppetserver::Volatile/File[/srv/puppet_fileserver/volatile/datacenter_vendors]/ensure: removed (corrective)
Notice: Applied catalog in 11.95 seconds
Tue, Jan 20, 7:11 PM · Beta-Cluster-Infrastructure
bd808 renamed T414864: Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners from Unblock IPs for Beta Cluster access to Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners.
Tue, Jan 20, 7:08 PM · GitLab, m3api, Beta-Cluster-Infrastructure
bd808 changed the status of Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), from Open to In Progress.
Tue, Jan 20, 5:57 PM · Epic, Beta-Cluster-Infrastructure
bd808 added a comment to T414864: Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners.

For this to be feasible we would need to set up an egress gateway node on the D.O. side and use the DOKS routing agent to ensure that outbound traffic exits through the gateway.

Tue, Jan 20, 4:49 PM · GitLab, m3api, Beta-Cluster-Infrastructure
bd808 updated subscribers of T414864: Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners.

The runners you would like unblocked are hosted on Digital Ocean. I do not think that it would be reasonable to open the Beta Cluster to the full DO IPv4 address space. If we have a fixed sub-space for egress we can unblock that. @dancy can you help me figure out if there is a restricted range that the runner egress through?

Tue, Jan 20, 3:23 AM · GitLab, m3api, Beta-Cluster-Infrastructure
bd808 added a comment to T414920: On deployment server, skip fetching MediaWiki branches/tags we have no use for.

wmf/next might be needed though.

Tue, Jan 20, 3:06 AM · Release-Engineering-Team, Scap

Fri, Jan 16

bd808 added a subtask for T392356: Replace ingress-nginx before upstream EOL date: T414836: Create a reusable container to replace nginx ingress anonymizing reverse proxy setups.
Fri, Jan 16, 10:22 PM · cloud-services-team, Toolforge
bd808 added a parent task for T414836: Create a reusable container to replace nginx ingress anonymizing reverse proxy setups: T392356: Replace ingress-nginx before upstream EOL date.
Fri, Jan 16, 10:22 PM · User-bd808, Tool-containers
bd808 added a comment to T410730: Continuous integration and continuous delivery.

Effectively the script would do the following steps:

  1. ssh into each encoder
  2. Apply a patched Puppet manifest with credentials
  3. Run systemctl --no-block reload v2ccelery.service to initiate the Celery restart (non-blocking)
  4. Repeat until there are no more instances
Fri, Jan 16, 9:30 PM · video2commons
bd808 claimed T414836: Create a reusable container to replace nginx ingress anonymizing reverse proxy setups.
Fri, Jan 16, 6:35 PM · User-bd808, Tool-containers
bd808 created T414836: Create a reusable container to replace nginx ingress anonymizing reverse proxy setups.
Fri, Jan 16, 6:29 PM · User-bd808, Tool-containers
bd808 added a comment to T414674: Remove remaining uses of ingress-nginx specific annotations.

I will try to make a reverse proxy container in the same spirit as the redirect container. I can borrow from things I learned building https://gitlab.wikimedia.org/toolforge-repos/gitlab-content to make something relatively light weight I think.

That would be very helpful, thank you!

Do you have thoughs already on how that would deal with tools that currently host a "normal" web service in addition to those reverse proxies? (e.g. tool-moedata has four proxy ingresses in addition to the mail webservice, it seems.)

Fri, Jan 16, 6:14 PM · Toolforge, cloud-services-team
bd808 added a comment to T414674: Remove remaining uses of ingress-nginx specific annotations.

The current list of tools relying on this functionality is at P87548. Of those, a couple are simple redirects that can be replaced with https://wikitech.wikimedia.org/wiki/Tool:Containers#Redirect_container, but the ones proxying to external upstream servers will need an another solution.

Fri, Jan 16, 5:35 PM · Toolforge, cloud-services-team
bd808 added a comment to T414756: fresh-node24 is not installed by fresh-install due to bad checksum.

fresh-node24 is also not in the most recently tagged 24.05.1 release, so I guess after the checksum is fixed a new release is needed as well.

Fri, Jan 16, 12:44 AM · Patch-For-Review, Fresh
bd808 created T414756: fresh-node24 is not installed by fresh-install due to bad checksum.
Fri, Jan 16, 12:34 AM · Patch-For-Review, Fresh
bd808 renamed T142237: Adopt and move phabricator-bug-status to the Toolforge Jobs Framework from Usurp and move phabricator-bug-status to the Toolforge Jobs Framework to Adopt and move phabricator-bug-status to the Toolforge Jobs Framework.
Fri, Jan 16, 12:00 AM · User-bd808, Tool-Phabricator-bug-status

Thu, Jan 15

bd808 set Due Date to Fri, Jan 30, 12:00 AM on T142237: Adopt and move phabricator-bug-status to the Toolforge Jobs Framework.
Thu, Jan 15, 11:59 PM · User-bd808, Tool-Phabricator-bug-status
bd808 added a subtask for T329927: BugStatusUpdate gadget can't work: T142237: Adopt and move phabricator-bug-status to the Toolforge Jobs Framework.
Thu, Jan 15, 11:58 PM · Tool-Phabricator-bug-status
bd808 added a parent task for T142237: Adopt and move phabricator-bug-status to the Toolforge Jobs Framework: T329927: BugStatusUpdate gadget can't work.
Thu, Jan 15, 11:58 PM · User-bd808, Tool-Phabricator-bug-status
bd808 triaged T329927: BugStatusUpdate gadget can't work as Medium priority.
Thu, Jan 15, 11:57 PM · Tool-Phabricator-bug-status
bd808 changed the status of T142237: Adopt and move phabricator-bug-status to the Toolforge Jobs Framework from Open to Stalled.
Thu, Jan 15, 11:56 PM · User-bd808, Tool-Phabricator-bug-status
bd808 moved T142237: Adopt and move phabricator-bug-status to the Toolforge Jobs Framework from Backlog to Doing on the Tool-Phabricator-bug-status board.
Thu, Jan 15, 11:55 PM · User-bd808, Tool-Phabricator-bug-status
bd808 added a comment to T142237: Adopt and move phabricator-bug-status to the Toolforge Jobs Framework.

It would be nice though, so I will try not to loose track of this for years again.

It was only one year this time... I was about to finally request adoption, but I saw that Matt has had some edits on enwiki really recently. I bumped the threads on his enwiki and wikitech Talk pages one more time. I am setting a reminder for myself to go ahead and file the adoption request on 2026-01-29 if I haven't heard back.

Thu, Jan 15, 11:53 PM · User-bd808, Tool-Phabricator-bug-status
bd808 added a comment to T414719: Opt-in testing of Gerrit-via-CDN.

I think this may just be a quirk of tunnelencabulator, but when I run it and use ssh gerrit -- gerrit show-connections --wide I see myself entering via IPv6 and not an edge proxy. For me ssh -4 is necessary to route traffic over the tunnel.

Thu, Jan 15, 8:57 PM · Gerrit, collaboration-services

Wed, Jan 14

bd808 closed Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), as Resolved.
Wed, Jan 14, 3:58 PM · Epic, Beta-Cluster-Infrastructure
bd808 changed the status of Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), from Open to In Progress.
Wed, Jan 14, 3:52 PM · Epic, Beta-Cluster-Infrastructure

Tue, Jan 13

bd808 changed the status of T411516: Add ability to ignore missing extensions in mergeMessageFileList's `--list-file` input, a subtask of T125678: Scap reliance on extension-list in turn requires extensions to be deployed to WMF production before being deployed to the Beta Cluster, from Open to In Progress.
Tue, Jan 13, 9:00 PM · User-bd808, Scap, MediaWiki-Configuration
bd808 changed the status of T411516: Add ability to ignore missing extensions in mergeMessageFileList's `--list-file` input from Open to In Progress.
Tue, Jan 13, 9:00 PM · User-bd808, Release-Engineering-Team (Doing 😎), Scap, MediaWiki-Internationalization
bd808 added a comment to T411516: Add ability to ignore missing extensions in mergeMessageFileList's `--list-file` input.
IMPORTANT: Do this early! Ideally, at least three weeks prior to your target deployment date, to ensure that your extension is present as a submodule in the required branches. (The extension submodule must be present in all branches currently running on the cluster, or the localization cache builder will fail.)
Tue, Jan 13, 8:58 PM · User-bd808, Release-Engineering-Team (Doing 😎), Scap, MediaWiki-Internationalization
bd808 closed T414512: Testwikis header on tools.versions is very hard to read as Resolved.
Tue, Jan 13, 5:37 PM · User-bd808, Accessibility, Release-Engineering-Team, Tools
bd808 changed the status of T414512: Testwikis header on tools.versions is very hard to read from Open to In Progress.
Tue, Jan 13, 5:34 PM · User-bd808, Accessibility, Release-Engineering-Team, Tools
bd808 closed T414498: Requesting GitLab account activation for Lwilson-ctr as Resolved.

https://wikitech.wikimedia.org/w/index.php?title=Tool:Gitlab-account-approval/Log&diff=prev&oldid=2374449

Tue, Jan 13, 5:22 PM · User-bd808, GitLab (Account Approval), Release-Engineering-Team
bd808 added a comment to T414504: Puppet failure on etherpad-bookworm.devtools.eqiad1.wikimedia.cloud.

Yeah, it's going to be the same problem. The quick fix is adding profile::tlsproxy::envoy::upstream_sni: null and profile::tlsproxy::envoy::upstream_tls: false to the project's hiera. The longer fix is either https://gerrit.wikimedia.org/r/c/operations/puppet/+/702326 or moving the defaults into the lookup calls in the ::profile::tlsproxy::envoy module.

Tue, Jan 13, 5:14 PM · Release-Engineering-Team
bd808 added a comment to T414498: Requesting GitLab account activation for Lwilson-ctr.

Welcome to the projects @Lwilson-ctr. I have added you to the Trusted-Contributors group here in Phabricator. After you link your https://ldap.toolforge.org/user/lwilson-ctr Developer account to your Phabricator profile in User Settings the Tool-gitlab-account-approval bot should approve your GitLab account within a few minutes. That group membership should also make using Phabricator a little bit nicer for you.

Tue, Jan 13, 5:10 PM · User-bd808, GitLab (Account Approval), Release-Engineering-Team
bd808 added a member for Trusted-Contributors: Lwilson-ctr.
Tue, Jan 13, 5:02 PM

Mon, Jan 12

bd808 closed Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), as Resolved.
Mon, Jan 12, 9:08 PM · Epic, Beta-Cluster-Infrastructure
bd808 changed the status of Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), from Open to In Progress.
Mon, Jan 12, 9:02 PM · Epic, Beta-Cluster-Infrastructure
bd808 added a comment to T285539: Easing pain points caused by divergence between cloudservices and production puppet usecases .

I just nudged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702326 as a possible fix for problems like T414304: No Puppet resources found on instance deployment-shellbox01 on project deployment-prep which are caused by the failure to load /etc/puppet/hieradata/common/$classpath.yaml configuration in the Cloud VPS realm(s).

Mon, Jan 12, 7:09 PM · Puppet-Core, cloud-services-team, Patch-For-Review, User-jbond, Infrastructure-Foundations, Cloud-VPS, Cloud Services Proposals
bd808 closed T414304: No Puppet resources found on instance deployment-shellbox01 on project deployment-prep as Resolved.

A local fix for Beta Cluster is copying the new hiera settings into the project global hiera:
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/e1aea5f956fbc945bd7bb34ede02b24e77efbaf5%5E%21/#F0

diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml
index 2151103..5f2d748 100644
--- a/deployment-prep/_.yaml
+++ b/deployment-prep/_.yaml
Mon, Jan 12, 6:49 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 updated subscribers of T414304: No Puppet resources found on instance deployment-shellbox01 on project deployment-prep.

Failure appears to be caused by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1219770 where @ABran-WMF introduced new Hera settings profile::tlsproxy::envoy::upstream_tls and profile::tlsproxy::envoy::upstream_sni. These have been given Production global defaults in hieradata/common/profile/tlsproxy/envoy.yaml, but that config file is not in the puppetserver's hiera lookup scope within the Beta Cluster or other WMCS projects.

Mon, Jan 12, 6:45 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 moved T414304: No Puppet resources found on instance deployment-shellbox01 on project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
bd808@deployment-shellbox01:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'profile::tlsproxy::envoy::upstream_tls' (file: /srv/puppet_code/environments/production/modules/profile/manifests/tlsproxy/envoy.pp, line: 75) on node deployment-shellbox01.deployment-prep.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Mon, Jan 12, 6:26 PM · User-bd808, Beta-Cluster-Infrastructure
bd808 merged task T414262: Puppet agent failure detected on instance deployment-mx03 in project deployment-prep into T412975: Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd").
Mon, Jan 12, 6:25 PM · Beta-Cluster-Infrastructure
bd808 merged T414262: Puppet agent failure detected on instance deployment-mx03 in project deployment-prep into T412975: Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd").
Mon, Jan 12, 6:25 PM · Patch-For-Review, collaboration-services, Beta-Cluster-Infrastructure
bd808 merged task T414366: Project deployment-prep instance deployment-mx03 is down into T412975: Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd").
Mon, Jan 12, 6:19 PM · Beta-Cluster-Infrastructure
bd808 merged T414366: Project deployment-prep instance deployment-mx03 is down into T412975: Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd").
Mon, Jan 12, 6:18 PM · Patch-For-Review, collaboration-services, Beta-Cluster-Infrastructure

Sat, Jan 10

bd808 closed Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), as Resolved.
Sat, Jan 10, 12:28 AM · Epic, Beta-Cluster-Infrastructure
bd808 changed the status of Restricted Task, a subtask of T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers), from Open to In Progress.
Sat, Jan 10, 12:23 AM · Epic, Beta-Cluster-Infrastructure
bd808 created T414249: No new changesets added to cloud/instance-puppet.git since df63d60.
Sat, Jan 10, 12:20 AM · Cloud-VPS, cloud-services-team
bd808 moved T414206: Puppet agent failure detected on instance deployment-mx03 in project deployment-prep from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
Sat, Jan 10, 12:02 AM · Beta-Cluster-Infrastructure
bd808 merged T414206: Puppet agent failure detected on instance deployment-mx03 in project deployment-prep into T412975: Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd").
Sat, Jan 10, 12:02 AM · Patch-For-Review, collaboration-services, Beta-Cluster-Infrastructure
bd808 merged task T414206: Puppet agent failure detected on instance deployment-mx03 in project deployment-prep into T412975: Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd").
Sat, Jan 10, 12:02 AM · Beta-Cluster-Infrastructure

Fri, Jan 9

bd808 added a comment to T409493: Toolforge interwiki link handling no longer strips URL-encoding before redirecting when it previously did, breaking existing on-wiki links.

I added some user facing docs that someone might accidentally see at https://wikitech.wikimedia.org/wiki/Tool:Iw#URLs_with_query_parameters describing the problem and some workarounds.

Fri, Jan 9, 11:47 PM · Tool-iw, cloud-services-team, Toolforge
bd808 updated subscribers of T411516: Add ability to ignore missing extensions in mergeMessageFileList's `--list-file` input.

@jeena will be shipping my risky patch with the 1.46-wmf.11 train (T413802: 1.46.0-wmf.11 deployment blockers). Let's hope for a smooth run like @thcipriani's local and Beta Cluster tests have already seen.

Fri, Jan 9, 10:29 PM · User-bd808, Release-Engineering-Team (Doing 😎), Scap, MediaWiki-Internationalization
bd808 added a comment to T413802: 1.46.0-wmf.11 deployment blockers.
  1. Risky Patch! 🚂🔥
Fri, Jan 9, 9:14 PM · Essential-Work, Release-Engineering-Team (Priority Backlog 📥), Release, Train Deployments