Page MenuHomePhabricator

Bstorm (Brooke)
Ops Witch -- Wikimedia Cloud Services Team

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Jan 22 2018, 10:09 PM (94 w, 6 d)
Availability
Available
IRC Nick
bstorm_
LDAP User
Bstorm
MediaWiki User
BStorm (WMF) [ Global Accounts ]

On the wikis, I'm BStorm (WMF), bstorm_ on IRC and Bstorm on gerrit and WikiTech.

I work for or provide services to the Wikimedia Foundation, but this is my only Phabricator account. Edits, statements, or other contributions made from this account are my own, and may not reflect the views of the Foundation.

Recent Activity

Thu, Nov 14

Bstorm added a comment to T236203: Add CI checks for golang admission controllers.

@Jdforrester-WMF ๐Ÿ‘‹๐Ÿป This is the ticket I mentioned at TechConf

Thu, Nov 14, 10:30 PM ยท Release-Engineering-Team-TODO (201911), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm closed T238337: IABot being limited to 10 connections as Declined.
Thu, Nov 14, 7:01 PM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T236974: Establish a process for increasing a toolforge tool's connections to the wiki replicas.

Bursting the connection limit like that isn't directly possible within the database system, as I understand. I can poke around where that is possible.
Overall, I think the Data-Services (Quota-requests) name sounds good to me.

Thu, Nov 14, 7:00 PM ยท cloud-services-team (Kanban), Data-Services
Bstorm added a comment to T238337: IABot being limited to 10 connections.

We are working on a process at T236974 for increasing them on a case by case basis, but I can assure you that you have the same connection limit as everyone else.

Thu, Nov 14, 6:55 PM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T238337: IABot being limited to 10 connections.

All users of the wiki replicas are set at 10 connection maximum. The toolsdb service is limited separately.

Thu, Nov 14, 6:52 PM ยท cloud-services-team (Kanban), Toolforge

Tue, Nov 12

Bstorm added a parent task for T238162: Establish a process for renewing TLS certs for the 2 webhook controllers: T215553: Figure out cert management for Toolforge kubernetes and make it clear in documents, etc. for the upgrade.
Tue, Nov 12, 10:30 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a subtask for T215553: Figure out cert management for Toolforge kubernetes and make it clear in documents, etc. for the upgrade: T238162: Establish a process for renewing TLS certs for the 2 webhook controllers.
Tue, Nov 12, 10:30 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm created T238162: Establish a process for renewing TLS certs for the 2 webhook controllers.
Tue, Nov 12, 10:28 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T237643: toolforge: new k8s: figure out metrics / observability.

I imagine that if prometheus is running inside the cluster, it uses a service account, right?

Tue, Nov 12, 10:25 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T235743: Prepare and check storage layer for mnwwiki.

@Marostegui if you can get those steps, that would be great. I'm at tech conf, and it might be simpler for whoever grabs this.

Tue, Nov 12, 12:53 PM ยท cloud-services-team (Kanban), Data-Services, DBA
Bstorm added a comment to T237354: "bigram" instance for Language team.

Sorry about that. Not sure why the math seemed to work out to me at the time, but it's good to go now.

Tue, Nov 12, 12:51 PM ยท Language-Team (Language-2019-October-December), Cloud-Services, cloud-services-team

Sun, Nov 10

Bstorm merged T235756: Toolforge: webservice utility: add support for thew new k8s setup into T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.
Sun, Nov 10, 1:56 AM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)
Bstorm merged task T235756: Toolforge: webservice utility: add support for thew new k8s setup into T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.
Sun, Nov 10, 1:55 AM ยท Goal, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T215531: Deploy upgraded Kubernetes to toolsbeta.

@aborrero I have noticed a strange behavior in the new proxy in toolsbeta. If I spin up new tools on the old cluster, they are sometimes unreachable over the flannel IP until I reboot the proxy server (!?!). Restarting flannel did not help, only reboot. I also saw it return when I took a service on the new cluster and put it back on the old cluster.

Sun, Nov 10, 1:55 AM ยท Patch-For-Review, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a subtask for T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster: T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51.
Sun, Nov 10, 1:49 AM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)
Bstorm added a parent task for T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51: T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.
Sun, Nov 10, 1:49 AM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51.

In tests the new version is able to correctly find all webservice pods and delete them in a sensible fashion (old and new cluster).

Sun, Nov 10, 1:48 AM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51.

Workaround for users until deploy:
Delete ALL existing objects in the webservice:

Sun, Nov 10, 1:14 AM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51.

Also, by running again with the old version, you changed the labels a second time. Yes, it all makes sense. Fix coming.

Sun, Nov 10, 1:08 AM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51.

Yes, that makes sense, by adding a new label, I broke deletion of old things that didn't have the new label because it looks for the ENTIRE list of labels. Maybe easiest fix is a quick patch and deploy.

Sun, Nov 10, 1:05 AM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51.

Wait...maybe that's it, you restarted, but the labels it looks for are different because it doesn't use the name of the object, it uses labels. There is absolutely no reason not to use the name of the object unless pykube is incapable of it.

Sun, Nov 10, 1:03 AM ยท cloud-services-team (Kanban), Toolforge
Bstorm added a comment to T237836: `webservice restart` regression with backend=kubernetes in webservice 0.51.

Having run this every which way repeatedly in testing, I now think I *did* see this, but I thought it was an odd one-off because it didn't seem consistent.
What happens isn't about the deployment in the second case, it's the service object in the old cluster only. If you do a kubectl delete service --all it will clear up. That's why you are missing it in the start phase--didn't check for services, just deployments and its decendents. It also shows up in that traceback above for start.

Sun, Nov 10, 1:02 AM ยท cloud-services-team (Kanban), Toolforge

Sat, Nov 9

Bstorm moved T237789: Document (and execute) the upgrade process for the new Toolforge K8s cluster from Inbox to Important on the cloud-services-team (Kanban) board.
Sat, Nov 9, 12:41 AM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm triaged T237789: Document (and execute) the upgrade process for the new Toolforge K8s cluster as Normal priority.
Sat, Nov 9, 12:40 AM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T237789: Document (and execute) the upgrade process for the new Toolforge K8s cluster.

The upgrade cycle is meant to be done on a six month process to stay ahead of certificate refreshes. CVEs could easily have us doing it more often than that. We should expect to be upgrading fairly often.

Sat, Nov 9, 12:40 AM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm created T237789: Document (and execute) the upgrade process for the new Toolforge K8s cluster.
Sat, Nov 9, 12:39 AM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T234032: Toolforge ingress: create a default landing page for unknown/default URLs.

Found T107697: Extend 'tool not found' 404 page and T180262: Reduce byte-size of Toolforge 404 page
Just for reference and possible connection/closure.

Sat, Nov 9, 12:34 AM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T234032: Toolforge ingress: create a default landing page for unknown/default URLs.

Perhaps if we can leave off the "host" field, it will work for both toolforge.org and wmflabs.org?

Sat, Nov 9, 12:18 AM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T234032: Toolforge ingress: create a default landing page for unknown/default URLs.

@bd808: is that deployed in toolsbeta already? If not we can try to figure that out as well (the nonsense in ldap that is).

Sat, Nov 9, 12:17 AM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T234032: Toolforge ingress: create a default landing page for unknown/default URLs.

@bd808 has created https://phabricator.wikimedia.org/source/tool-fourohfour/ which we could apply as a default route in the ingress.

Sat, Nov 9, 12:16 AM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes

Fri, Nov 8

Bstorm triaged T237784: Document migration plans and timelines as High priority.
Fri, Nov 8, 11:44 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm moved T237784: Document migration plans and timelines from Inbox to Doing on the cloud-services-team (Kanban) board.
Fri, Nov 8, 11:44 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T237784: Document migration plans and timelines.

User doc begun here https://wikitech.wikimedia.org/wiki/User:Bstorm/New_k8s_migration

Fri, Nov 8, 11:43 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm created T237784: Document migration plans and timelines.
Fri, Nov 8, 11:43 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm changed the status of T215553: Figure out cert management for Toolforge kubernetes and make it clear in documents, etc. for the upgrade, a subtask of T214513: Upgrade Toolforge Kubernetes, from Open to Stalled.
Fri, Nov 8, 11:34 PM ยท Wikimedia-Incident, Goal, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm changed the status of T215553: Figure out cert management for Toolforge kubernetes and make it clear in documents, etc. for the upgrade from Open to Stalled.

So this is on hold waiting for monitoring to show the new kubelets (which it should soon if it doesn't already) T237643

Fri, Nov 8, 11:34 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T237509: maintain views has to run or has to be updated to fix errors on globalblocks and protected_titles for wikireplicas.

I think that the globalblocks part of this can be cleaned up with re-runs with the --clean option. The pt_reason error surprises me, but there could have been a miss there.
@jcrespo was that pt_reason error only on ruwiki_p or across most/all of them?

Fri, Nov 8, 8:40 PM ยท cloud-services-team (Kanban), Data-Services
Bstorm added a comment to T237354: "bigram" instance for Language team.

I meant bigram in that, but you get the idea :)

Fri, Nov 8, 8:35 PM ยท Language-Team (Language-2019-October-December), Cloud-Services, cloud-services-team
Bstorm added a comment to T237354: "bigram" instance for Language team.

I expanded your quota, which should give you the ability to launch the new instance.

Fri, Nov 8, 8:34 PM ยท Language-Team (Language-2019-October-December), Cloud-Services, cloud-services-team
Bstorm added a comment to T234037: Toolforge ingress: decide on final layout of north-south proxy setup.

Ingress logs for our reference. In this version of webservice, I have to edit the ingress after it is launched to be "toolsbeta.wmflabs.org" (the UPDATE below). That will not be true after https://gerrit.wikimedia.org/r/c/operations/software/tools-webservice/+/549613. I see I should also switch things up in the code so the ingress is created last and deleted first. The leaking service objects are interesting on the old grid. I will try to figure that out, if possible. It may be a problem with the API versions used in pykube.

I1108 15:39:05.350076       6 event.go:258] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"tool-test", Name:"test", UID:"81f75096-a8f7-469b-b9d0-244981433249", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"2803724", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress tool-test/test
W1108 15:39:08.689924       6 controller.go:878] Service "tool-test/test" does not have any active Endpoint.
I1108 15:39:08.690033       6 controller.go:133] Configuration changes detected, backend reload required.
I1108 15:39:09.000514       6 controller.go:149] Backend successfully reloaded.
I1108 15:39:52.141743       6 controller.go:133] Configuration changes detected, backend reload required.
I1108 15:39:52.141813       6 event.go:258] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"tool-test", Name:"test", UID:"81f75096-a8f7-469b-b9d0-244981433249", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"2803832", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress tool-test/test
I1108 15:39:52.325561       6 controller.go:149] Backend successfully reloaded
Fri, Nov 8, 3:44 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T234037: Toolforge ingress: decide on final layout of north-south proxy setup.

It works now!

Fri, Nov 8, 3:40 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T237354: "bigram" instance for Language team.

Sorry, I'll try to get this today.

Fri, Nov 8, 3:25 PM ยท Language-Team (Language-2019-October-December), Cloud-Services, cloud-services-team
Bstorm added a comment to T234037: Toolforge ingress: decide on final layout of north-south proxy setup.

BTW, if you are set to the new cluster, use /usr/bin/kubectl , which you probably already noticed, but just in case.

Fri, Nov 8, 3:24 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T234037: Toolforge ingress: decide on final layout of north-south proxy setup.

Apparently webservice failed to delete the Service object in the old cluster (which I'll check into today). So that explains the first error at least.

Fri, Nov 8, 3:23 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes

Thu, Nov 7

Bstorm added a comment to T234037: Toolforge ingress: decide on final layout of north-south proxy setup.

@aborrero I have lots of local hacks on the toolsbeta bastion right now, so please don't enable puppet, but!
I have a good example of a tool running via webservice in the new setup and it isn't working with the ingress.

Thu, Nov 7, 10:25 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.

Doesn't look quite like it. I need the contents inserted into the KubernetesBackend object with a self.project assignment (so I'll just do a with open). That just mounts it in the old cluster (the new cluster does it automatically with PodPreset because I wanted to make sure the resulting pods looked roughly the same). I need to commit my fix for the ingress object anyway :)

Thu, Nov 7, 5:35 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.

Hrm...that frontend is used in the pods, too, isn't it. Well, either way. It should work.

Thu, Nov 7, 5:09 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.

We would need to mount /etc/wmcs_project into the Pods for that to work I think, but other than that no reason not to.

Thu, Nov 7, 5:07 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.

Currently, webservice has no idea what "project" you are in. @bd808 any reason not to have it grok that from /etc/wmcs_project?

Thu, Nov 7, 3:59 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.

Ok, the problem was my format for the ingress (which I am fixing). The current ingress setup won't work with the existing toolsbeta routing, but I couldn't edit the ingress object with the 1.10.6 version of kubectl. We will need a newer one installed for people to use if it throws random errors in unexpected places.

Thu, Nov 7, 12:14 AM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)

Wed, Nov 6

Bstorm added a comment to T236202: Modify webservice and maintain-kubeusers to allow switching to the new cluster.

The changes to maintain-kubeusers and such worked exactly as desired.
What didn't work on the first test:

Wed, Nov 6, 11:17 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban)
Bstorm committed rODIT849798dbbcfb: jessie fixes: port the fix from the base image to the jessie-sssd one (authored by Bstorm).
jessie fixes: port the fix from the base image to the jessie-sssd one
Wed, Nov 6, 9:08 PM
Bstorm committed rODITc67b6afec6ab: bugfix: actually fix the typo this time. (authored by Bstorm).
bugfix: actually fix the typo this time.
Wed, Nov 6, 9:08 PM
Bstorm committed rODITede79082bddc: bugfix: fix typo in tcl sssd image definition (authored by Bstorm).
bugfix: fix typo in tcl sssd image definition
Wed, Nov 6, 9:08 PM
Bstorm added a comment to T237557: new proxy and etcd nodes unreachable by ssh for tools-prometheus.

Looks like tools-acme-chief nodes also don't allow it.

Wed, Nov 6, 7:36 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T237443: toolsbeta: new k8s: deploy a front proxy (dynamicproxy).
toolsbeta.test@toolsbeta-sgebastion-04:~$ curl http://toolsbeta.wmflabs.org/test/
Hello World!
Wed, Nov 6, 7:19 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T237443: toolsbeta: new k8s: deploy a front proxy (dynamicproxy).

That did it! I moved the proxy config out of the prefix puppet into the project puppet, added a second node, restarted flannel and switched the master name from a FQDN to hostname. That made it a master, opened flannel etc.

Wed, Nov 6, 7:15 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T237443: toolsbeta: new k8s: deploy a front proxy (dynamicproxy).

Had to flip a bunch of hiera switches to get the proxy a bit more functional. kube2proxy is still failing due to some redis thing. Checking that.

Wed, Nov 6, 6:40 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm created T237557: new proxy and etcd nodes unreachable by ssh for tools-prometheus.
Wed, Nov 6, 5:50 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T215531: Deploy upgraded Kubernetes to toolsbeta.

It created working configs so far. Will try migrating a tool today in toolsbeta.

Wed, Nov 6, 4:55 PM ยท Patch-For-Review, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T215531: Deploy upgraded Kubernetes to toolsbeta.

Redeployed maintain-kubeusers in toolsbeta:

Wed, Nov 6, 4:51 PM ยท Patch-For-Review, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm triaged T237509: maintain views has to run or has to be updated to fix errors on globalblocks and protected_titles for wikireplicas as High priority.
Wed, Nov 6, 4:40 PM ยท cloud-services-team (Kanban), Data-Services
Bstorm moved T237509: maintain views has to run or has to be updated to fix errors on globalblocks and protected_titles for wikireplicas from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Wed, Nov 6, 4:39 PM ยท cloud-services-team (Kanban), Data-Services
Bstorm moved T237509: maintain views has to run or has to be updated to fix errors on globalblocks and protected_titles for wikireplicas from Backlog to Wiki replicas on the Data-Services board.
Wed, Nov 6, 4:39 PM ยท cloud-services-team (Kanban), Data-Services
Bstorm edited projects for T237509: maintain views has to run or has to be updated to fix errors on globalblocks and protected_titles for wikireplicas, added: cloud-services-team (Kanban); removed cloud-services-team.
Wed, Nov 6, 4:39 PM ยท cloud-services-team (Kanban), Data-Services
Bstorm added a subtask for T215531: Deploy upgraded Kubernetes to toolsbeta: T237541: CoreDNS in the new k8s cluster cannot talk to the Cloud recursors.
Wed, Nov 6, 4:18 PM ยท Patch-For-Review, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a parent task for T237541: CoreDNS in the new k8s cluster cannot talk to the Cloud recursors: T215531: Deploy upgraded Kubernetes to toolsbeta.
Wed, Nov 6, 4:18 PM ยท cloud-services-team
Bstorm closed T237541: CoreDNS in the new k8s cluster cannot talk to the Cloud recursors as Resolved.

Turns out it was a need to reboot the nodes! It was an iptables routing thing.

Wed, Nov 6, 4:15 PM ยท cloud-services-team
Bstorm triaged T237541: CoreDNS in the new k8s cluster cannot talk to the Cloud recursors as High priority.
Wed, Nov 6, 3:25 PM ยท cloud-services-team

Tue, Nov 5

Bstorm closed T237468: tools-k8s-master-01 (Kubernetes API server for toolforge) has failing puppet staleness cron as Resolved.

And that did it! Thanks @Krenair

Tue, Nov 5, 10:38 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm committed rLTMKe9d42fc5eff9: deploy: prepare for deployment in toolsbeta (authored by Bstorm).
deploy: prepare for deployment in toolsbeta
Tue, Nov 5, 10:05 PM
Bstorm added a comment to T237468: tools-k8s-master-01 (Kubernetes API server for toolforge) has failing puppet staleness cron.

Happens after prometheus_client import

Tue, Nov 5, 9:54 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T237468: tools-k8s-master-01 (Kubernetes API server for toolforge) has failing puppet staleness cron.

For the record this is sudo /usr/local/bin/prometheus-puppet-agent-stats -d --outfile /var/lib/prometheus/node.d/puppet_agent.prom
That produces the error (-d should enable debug logging).

Tue, Nov 5, 9:53 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T237468: tools-k8s-master-01 (Kubernetes API server for toolforge) has failing puppet staleness cron.

The weird thing is that it should crash a lot more if that's happening here...but other python3 things appear to be working?

Tue, Nov 5, 9:51 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T237468: tools-k8s-master-01 (Kubernetes API server for toolforge) has failing puppet staleness cron.

That was https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=931044

Tue, Nov 5, 9:50 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm added a comment to T237468: tools-k8s-master-01 (Kubernetes API server for toolforge) has failing puppet staleness cron.

That happened a few months ago when someone at Debian backpatched something into 3.4.2-1+deb8u3, which we ended up with across the fleet due to unattended upgrades. It was corrected in 3.4.2-1+deb8u4. We are using the same version of python3 as the rest of the k8s cluster, so I am surprised to see this:

Tue, Nov 5, 9:49 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm triaged T237468: tools-k8s-master-01 (Kubernetes API server for toolforge) has failing puppet staleness cron as Low priority.
Tue, Nov 5, 9:33 PM ยท Toolforge, cloud-services-team (Kanban)
Bstorm closed T237222: Request creation of srwiki-dev VPS project as Resolved.

This should be all set. When you refresh, you'll see the project in https://horizon.wikimedia.org

Tue, Nov 5, 9:23 PM ยท Cloud-VPS (Project-requests)
Bstorm moved T235743: Prepare and check storage layer for mnwwiki from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
Tue, Nov 5, 3:09 PM ยท cloud-services-team (Kanban), Data-Services, DBA

Mon, Nov 4

Bstorm added a comment to T236824: Toolforge: new k8s: get new deb packages for 1.15.4 or 1.15.5.

Couple of things:

  • why is this in stretch-wikimedia? we don't have any stretch servers for the new k8s cluster
Mon, Nov 4, 7:05 PM ยท Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T237270: The spacemedia tool keeps crashing and filling kubernetes nodes.

@Phamhi: That's docker, not kubernetes. Users cannot do that directly.

Mon, Nov 4, 4:25 PM ยท Tool-spacemedia, cloud-services-team (Kanban)
Bstorm added a comment to T237270: The spacemedia tool keeps crashing and filling kubernetes nodes.
org.jsoup.HttpStatusException: HTTP error fetching URL
        at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:760) ~[jsoup-1.12.1.jar!/:na]
        at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:705) ~[jsoup-1.12.1.jar!/:na]
        at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:295) ~[jsoup-1.12.1.jar!/:na]
        at org.jsoup.helper.HttpConnection.get(HttpConnection.java:284) ~[jsoup-1.12.1.jar!/:na]
        at org.wikimedia.commons.donvip.spacemedia.service.agencies.EsaService.updateMedia(EsaService.java:337) ~[classes!/:0.0.1-SNAPSHOT]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_232]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_232]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_232]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_232]
        at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) [spring-context-5.1.9.RELEASE.jar!/:5.1.9.RELEASE]
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.1.9.RELEASE.jar!/:5.1.9.RELEASE]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_232]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_232]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_232]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_232]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_232]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_232]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]
Mon, Nov 4, 3:23 PM ยท Tool-spacemedia, cloud-services-team (Kanban)
Bstorm moved T237270: The spacemedia tool keeps crashing and filling kubernetes nodes from Inbox to Watching on the cloud-services-team (Kanban) board.
Mon, Nov 4, 3:20 PM ยท Tool-spacemedia, cloud-services-team (Kanban)
Bstorm triaged T237270: The spacemedia tool keeps crashing and filling kubernetes nodes as High priority.
Mon, Nov 4, 3:20 PM ยท Tool-spacemedia, cloud-services-team (Kanban)

Thu, Oct 31

Bstorm added a comment to T236203: Add CI checks for golang admission controllers.

Whichever one you folks want to use is alright by me. I don't know much about the deployment pipeline. These get deployed to a Cloud VPS-based Kubernetes cluster via the Toolforge docker registry, if that affects anything here. My big concern is getting tests run when reviewing a commit.

Thu, Oct 31, 10:24 PM ยท Release-Engineering-Team-TODO (201911), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm closed T227290: Design and document how to integrate the new Toolforge k8s cluster with PodSecurityPolicy, a subtask of T215531: Deploy upgraded Kubernetes to toolsbeta, as Resolved.
Thu, Oct 31, 10:19 PM ยท Patch-For-Review, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm closed T227290: Design and document how to integrate the new Toolforge k8s cluster with PodSecurityPolicy, a subtask of T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup, as Resolved.
Thu, Oct 31, 10:19 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm closed T227290: Design and document how to integrate the new Toolforge k8s cluster with PodSecurityPolicy as Resolved.

Moved the doc into the tree with the others. I think this is done for now.

Thu, Oct 31, 10:19 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm closed T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup, a subtask of T214513: Upgrade Toolforge Kubernetes, as Resolved.
Thu, Oct 31, 9:29 PM ยท Wikimedia-Incident, Goal, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm closed T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup as Resolved.

And that completes this task. With two webhooks, PodSecurityPolicy, PodPreset, RBAC and the new maintain-kubeusers, we do not need the compiled-in custom controllers to make Kubernetes what we want it to be, and it will now know several new tricks.

Thu, Oct 31, 9:29 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm updated the task description for T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup.
Thu, Oct 31, 9:28 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm updated subscribers of T236974: Establish a process for increasing a toolforge tool's connections to the wiki replicas.
Thu, Oct 31, 12:45 AM ยท cloud-services-team (Kanban), Data-Services
Bstorm added a comment to T236974: Establish a process for increasing a toolforge tool's connections to the wiki replicas.

It seems clear that a Phabricator task requesting the increase and describing the need with a review would be a sensible part of things. I imagine the review should include WMCS and the DBA team, perhaps on a work board like what we have for https://phabricator.wikimedia.org/project/view/2880/

Thu, Oct 31, 12:44 AM ยท cloud-services-team (Kanban), Data-Services
Bstorm moved T236974: Establish a process for increasing a toolforge tool's connections to the wiki replicas from Backlog to Wiki replicas on the Data-Services board.
Thu, Oct 31, 12:42 AM ยท cloud-services-team (Kanban), Data-Services
Bstorm triaged T236974: Establish a process for increasing a toolforge tool's connections to the wiki replicas as Normal priority.
Thu, Oct 31, 12:39 AM ยท cloud-services-team (Kanban), Data-Services

Wed, Oct 30

Bstorm triaged T236945: Make the labs-ip-alias-dump.py script a bit smarter as Low priority.
Wed, Oct 30, 7:18 PM ยท Cloud-VPS, cloud-services-team (Kanban)

Mon, Oct 28

Bstorm moved T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup from Doing to Needs discussion on the cloud-services-team (Kanban) board.
Mon, Oct 28, 11:54 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup.

This is a working PodPreset I used locally that functions well alongside the PodSecurityPolicies we are using:

Mon, Oct 28, 11:54 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup.

Unless a pod has a custom annotation, if it matches certain labels defined on a namespace, the pod will be altered by the PodPreset. The obvious use-case is the label tools.wmflabs.org/webservice: "true" which is applied by webservice now and perhaps an additional one to make it easy for non-webservice pods to use a preset to mount all the "standard" items as well.

Mon, Oct 28, 11:53 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm added a comment to T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup.

I've done a POC of PodPreset locally and observed how it works. I think it might be worth it.

Mon, Oct 28, 11:50 PM ยท Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes