Page MenuHomePhabricator

aborrero (arturo)
SRE at Wikimedia Cloud Services Team

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 23 2017, 12:19 PM (282 w, 4 d)
Availability
Available
IRC Nick
arturo
LDAP User
Arturo Borrero Gonzalez
MediaWiki User
ABorrero (WMF) [ Global Accounts ]

I'm Arturo Borrero Gonzalez from Spain (Seville). I'm Site Reliability Engineer (SRE) in the Wikimedia Cloud Services Team, a Wikimedia Foundation staff.

You may find me in some FLOSS projects, like Netfilter and Debian.

Recent Activity

Yesterday

Aklapper awarded T332907: Cloud Services: introduce some kind of email list as feedback collector a Dislike token.
Fri, Mar 24, 1:30 PM · Cloud-Services

Thu, Mar 23

Aklapper awarded T332904: Toolforge: consider introducing some kind of CLI feedback reporting tool a Dislike token.
Thu, Mar 23, 3:45 PM · Toolforge
aborrero created T332907: Cloud Services: introduce some kind of email list as feedback collector.
Thu, Mar 23, 3:39 PM · Cloud-Services
aborrero created T332906: Cloud Services: introduce feedback in webpages for some of our services.
Thu, Mar 23, 3:37 PM · Cloud Services Proposals
aborrero created T332904: Toolforge: consider introducing some kind of CLI feedback reporting tool.
Thu, Mar 23, 3:34 PM · Toolforge

Wed, Mar 22

aborrero added a comment to T332762: New tool not allowed to connect to toolsdb.

this could be related to work on the parent task

Wed, Mar 22, 9:26 AM · cloud-services-team, Toolforge
aborrero added a subtask for T303663: Split maintain-dbusers.py into two parts, one to run on cloudcontrol nodes and one to run on an NFS server VM: T332762: New tool not allowed to connect to toolsdb.
Wed, Mar 22, 9:26 AM · Patch-For-Review, cloud-services-team (FY2022/2023-Q3), Cloud-VPS
aborrero added a parent task for T332762: New tool not allowed to connect to toolsdb: T303663: Split maintain-dbusers.py into two parts, one to run on cloudcontrol nodes and one to run on an NFS server VM.
Wed, Mar 22, 9:26 AM · cloud-services-team, Toolforge
aborrero moved T332762: New tool not allowed to connect to toolsdb from Triage to Backlog on the Toolforge board.
Wed, Mar 22, 9:25 AM · cloud-services-team, Toolforge
aborrero triaged T332762: New tool not allowed to connect to toolsdb as High priority.
Wed, Mar 22, 9:25 AM · cloud-services-team, Toolforge

Tue, Mar 21

aborrero added a comment to T327919: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it.

@cmooney Please see first batch proposal. We can move all those servers next week. @aborrero can you please let us know when will be the best day and time next week for of to move those servers? Thank you

Tue, Mar 21, 9:36 AM · cloud-services-team (FY2022/2023-Q3), SRE, Infrastructure-Foundations, netops

Fri, Mar 17

aborrero added a parent task for T332406: cloudcephosd1025: power supply temperature critical: Unknown Object (Task).
Fri, Mar 17, 3:29 PM · SRE, cloud-services-team (Hardware), ops-eqiad
aborrero moved T332406: cloudcephosd1025: power supply temperature critical from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Fri, Mar 17, 3:28 PM · SRE, cloud-services-team (Hardware), ops-eqiad
aborrero moved T332406: cloudcephosd1025: power supply temperature critical from Backlog to Hardware faults on the cloud-services-team (Hardware) board.
Fri, Mar 17, 3:28 PM · SRE, cloud-services-team (Hardware), ops-eqiad
aborrero created T332406: cloudcephosd1025: power supply temperature critical.
Fri, Mar 17, 3:28 PM · SRE, cloud-services-team (Hardware), ops-eqiad

Thu, Mar 16

aborrero updated subscribers of T327919: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it.

In terms of the move we need to work with @aborrero and the team to decide when is good to do the work. We can do it in a number of batches or all in one go, whatever you guys think is best. I can move the interfaces in Netbox and configure the new switch in advance in either case.

Thu, Mar 16, 11:30 AM · cloud-services-team (FY2022/2023-Q3), SRE, Infrastructure-Foundations, netops

Wed, Mar 15

aborrero added a comment to T332191: Decision request - Choose a subdomain for new cloud-private subnets.

I got a question, these will be shared in https://config-master.wikimedia.org/known_hosts.ecdsa ?
I use https://pypi.org/project/wm-ssh/, and it uses that url to fetch lists of hosts, just curious

Wed, Mar 15, 4:25 PM · Cloud Services Proposals
aborrero updated the task description for T332191: Decision request - Choose a subdomain for new cloud-private subnets.
Wed, Mar 15, 3:49 PM · Cloud Services Proposals
aborrero updated the task description for T332191: Decision request - Choose a subdomain for new cloud-private subnets.
Wed, Mar 15, 3:46 PM · Cloud Services Proposals
aborrero updated the task description for T332191: Decision request - Choose a subdomain for new cloud-private subnets.
Wed, Mar 15, 3:45 PM · Cloud Services Proposals
aborrero triaged T332191: Decision request - Choose a subdomain for new cloud-private subnets as Medium priority.
Wed, Mar 15, 3:37 PM · Cloud Services Proposals
aborrero created T332191: Decision request - Choose a subdomain for new cloud-private subnets.
Wed, Mar 15, 3:37 PM · Cloud Services Proposals
aborrero added a comment to T324992: cloudlb: create PoC on codfw.

cloudlb2001-dev lacks the right switch vlan trunk in the main interface: https://netbox.wikimedia.org/dcim/interfaces/16653/ We need to enable 2151 there like in the other cloudlb hosts (example: https://netbox.wikimedia.org/dcim/interfaces/28615/)

Wed, Mar 15, 12:43 PM · cloud-services-team (FY2022/2023-Q3), Patch-For-Review
aborrero added a comment to T324992: cloudlb: create PoC on codfw.

cloudlb2001-dev lacks the right switch vlan trunk in the main interface: https://netbox.wikimedia.org/dcim/interfaces/16653/ We need to enable 2151 there like in the other cloudlb hosts (example: https://netbox.wikimedia.org/dcim/interfaces/28615/)

Wed, Mar 15, 12:39 PM · cloud-services-team (FY2022/2023-Q3), Patch-For-Review
aborrero created T332153: cloudlb: prepare backends.
Wed, Mar 15, 12:25 PM · Patch-For-Review, cloud-services-team (FY2022/2023-Q3)

Tue, Mar 14

aborrero changed the status of T324992: cloudlb: create PoC on codfw from Stalled to In Progress.

just rebased https://gerrit.wikimedia.org/r/c/operations/puppet/+/868731 to introduce the BIRD configuration. Please take a look and comment.
We will need to think about the public IPv4 address to use as VIP before merging it.

Tue, Mar 14, 12:39 PM · cloud-services-team (FY2022/2023-Q3), Patch-For-Review
aborrero changed the status of T324992: cloudlb: create PoC on codfw, a subtask of T297596: have cloud hardware servers in the cloud realm using a dedicated LB layer, from Stalled to In Progress.
Tue, Mar 14, 12:39 PM · cloud-services-team (FY2022/2023-Q3)
aborrero moved T331984: cloudcontrol1007: power supply temperature critical from Backlog to Hardware faults on the cloud-services-team (Hardware) board.
Tue, Mar 14, 12:05 PM · SRE, cloud-services-team (Hardware), ops-eqiad
aborrero triaged T331984: cloudcontrol1007: power supply temperature critical as Medium priority.
Tue, Mar 14, 12:05 PM · SRE, cloud-services-team (Hardware), ops-eqiad
aborrero created T331984: cloudcontrol1007: power supply temperature critical.
Tue, Mar 14, 12:05 PM · SRE, cloud-services-team (Hardware), ops-eqiad

Fri, Mar 10

aborrero committed rCTKFa8e53830c57e: cli: introduce support for lima-kilo latest isolation setup (authored by aborrero).
cli: introduce support for lima-kilo latest isolation setup
Fri, Mar 10, 12:06 PM

Thu, Mar 9

aborrero added a comment to T293649: [tbs.harbor] Pre-create namespaces.

I would like to know more details about why maintain_harbor is planned to run as a Toolforge jobs framework cronjob rather than a standalone application (or cronjob) in the kubernetes cluster.

I'm mentioning this because I feel that tying the two things together can make it cumbersome to operate (both things) in the future, for little added value.

The change to run as a standalone cronjob deployment in k8s would be very small.

Thu, Mar 9, 2:09 PM · cloud-services-team, Toolforge Build Service (Iteration 08), User-Raymond_Ndibe, Cloud-Services-Worktype-Project, Cloud-Services-Origin-Team, User-dcaro
aborrero created T331619: toolforge: rbac: change existing roles to reference PSP in the policy group.
Thu, Mar 9, 12:17 PM · cloud-services-team, Toolforge
aborrero added a comment to T327919: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it.

Please let me know if there is something I can do to help with this (no switch config but perhaps testing, double checking stuff, IP allocation, connectivity, etc)

Thanks @aborrero

Config of the new switch is progressing well, just waiting on two cable moves (see T331470#8676018) and I will migrate the uplink/gateway for the cloud vlans from CR routers to the new switch.

Once that's done we can try to reimage / install OS on the new cloudlb's. If that goes to plan we can migrate the existing hosts over from old switch to new. If you can have a think about what's involved do do both of those that'd be great. No IPs etc. need to change so I think it should just be a matter of arranging the downtime and co-ordinating with DC-Ops. Thanks!

Thu, Mar 9, 11:46 AM · cloud-services-team (FY2022/2023-Q3), SRE, Infrastructure-Foundations, netops
aborrero added a comment to T331572: maintain-kubeusers container in CrashLoopBackoff preventing new tool creation after 'user-maintainer' ClusterRole changes.

I can't explain how is possible the code was working before. PSP were in policy/v1beta1 in 1.21 https://v1-21.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#podsecuritypolicy-v1beta1-policy

Thu, Mar 9, 10:03 AM · cloud-services-team, Toolforge

Wed, Mar 8

aborrero added a comment to T326758: Research Openstack Deployment Paradigms.

Another option is to make sure we're using an aio1 hostname from the beginning. Will try that!

Wed, Mar 8, 5:16 PM · cloud-services-team (FY2022/2023-Q3)
aborrero added a comment to T293649: [tbs.harbor] Pre-create namespaces.

I would like to know more details about why maintain_harbor is planned to run as a Toolforge jobs framework cronjob rather than a standalone application (or cronjob) in the kubernetes cluster.

Wed, Mar 8, 3:39 PM · cloud-services-team, Toolforge Build Service (Iteration 08), User-Raymond_Ndibe, Cloud-Services-Worktype-Project, Cloud-Services-Origin-Team, User-dcaro
aborrero added a comment to T326758: Research Openstack Deployment Paradigms.

Therefore I don't see an easy way to play with openstack-ansible in AIO mode within Cloud VPS VMs.

Wed, Mar 8, 1:22 PM · cloud-services-team (FY2022/2023-Q3)
aborrero added a comment to T326758: Research Openstack Deployment Paradigms.

Note, by default openstack-ansible all-in-one setup renames the VM hostname, introducing a severe drift wrt. other Cloud VPS context (like puppet, etc), therefore making it difficult to operate inside Cloud VPS for evaluation & testing purposes. Will investigate next if this renaming can be disabled.

Wed, Mar 8, 1:17 PM · cloud-services-team (FY2022/2023-Q3)

Tue, Mar 7

aborrero added a comment to T324992: cloudlb: create PoC on codfw.

@aborrero I'd propose to allocate the following for the cloud-private subnets / vlans, look ok to you?

"supernet": 172.20.0.0/16

Tue, Mar 7, 11:30 AM · cloud-services-team (FY2022/2023-Q3), Patch-For-Review
aborrero added a comment to T329073: eqiad row A switches upgrade.

Sent a ping to @Marostegui regarding clouddb[1013-1014,1021]

Tue, Mar 7, 11:25 AM · Patch-For-Review, Discovery-Search (Current work), Shared-Data-Infrastructure, Data-Engineering-Planning, DBA, SRE, Platform Engineering, Infrastructure-Foundations, Traffic, serviceops, Machine-Learning-Team, cloud-services-team, Data-Persistence, SRE Observability, serviceops-collab
aborrero updated the task description for T329073: eqiad row A switches upgrade.
Tue, Mar 7, 11:25 AM · Patch-For-Review, Discovery-Search (Current work), Shared-Data-Infrastructure, Data-Engineering-Planning, DBA, SRE, Platform Engineering, Infrastructure-Foundations, Traffic, serviceops, Machine-Learning-Team, cloud-services-team, Data-Persistence, SRE Observability, serviceops-collab
aborrero updated the task description for T329073: eqiad row A switches upgrade.
Tue, Mar 7, 11:24 AM · Patch-For-Review, Discovery-Search (Current work), Shared-Data-Infrastructure, Data-Engineering-Planning, DBA, SRE, Platform Engineering, Infrastructure-Foundations, Traffic, serviceops, Machine-Learning-Team, cloud-services-team, Data-Persistence, SRE Observability, serviceops-collab
aborrero updated the language for P45185 Masterwork From Distant Lands from autodetect to diff.
Tue, Mar 7, 9:26 AM
aborrero added a comment to T286856: Upgrade Toolforge Kubernetes to latest 1.22.

That could work, but mind that is the Easter/Holy week and some countries (including mine) have at least 2 bank holidays and could be a short week. Anyway, I'm planning to me on the laptop monday to wednesday.

I'm aware but I think it's fine if we do it early in the week. Would Monday work for you? Do you have any time preferences?

Tue, Mar 7, 8:55 AM · cloud-services-team, Toolforge

Mon, Mar 6

aborrero added a comment to T286856: Upgrade Toolforge Kubernetes to latest 1.22.

Good to know, thanks. Next week does not work for me, nor does the week starting the 27th, so looks like this needs to be pushed into April. What about the week starting April 3rd?

Mon, Mar 6, 5:15 PM · cloud-services-team, Toolforge
aborrero added a comment to T286856: Upgrade Toolforge Kubernetes to latest 1.22.

All of the blockers have been resolved so we can now start thinking about timelines for the actual upgrade. A change to the timeline this time is that PAWS no longer uses the same Puppetization and will be upgraded separately. With that and my personal schedules in mind I propose the following:

  • toolsbeta: Upgrade this week, either tomorrow or on Wednesday.
  • tools: Upgrade on Wednesday, March 22nd.

Any objections?

Mon, Mar 6, 4:22 PM · cloud-services-team, Toolforge
aborrero added a comment to T328539: toolforge: consider relocating core k8s components out of puppet into its own repository.

Will this include creating a repo for toolforge where we bundle up all these components or similar? (something like a toolforge repo with a helmfile pulling all the others)

I'm a bit concern on the sprawl of components without keeping track of the combinations that are deployed.

If so, that might belong to that repository.

Mon, Mar 6, 1:03 PM · cloud-services-team
aborrero added a comment to T328539: toolforge: consider relocating core k8s components out of puppet into its own repository.

The base RBAC is a good candidate for moving into the maintain-kubeusers repository, I think.

Mon, Mar 6, 12:43 PM · cloud-services-team
aborrero updated the task description for T328539: toolforge: consider relocating core k8s components out of puppet into its own repository.
Mon, Mar 6, 12:28 PM · cloud-services-team
aborrero updated the task description for T328539: toolforge: consider relocating core k8s components out of puppet into its own repository.
Mon, Mar 6, 12:28 PM · cloud-services-team
aborrero committed rCTKF10fc50ca460d: tests: add support for new api-gateway setup (authored by aborrero).
tests: add support for new api-gateway setup
Mon, Mar 6, 11:16 AM

Fri, Mar 3

aborrero added a comment to T327919: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it.

thanks for the update!

Fri, Mar 3, 3:36 PM · cloud-services-team (FY2022/2023-Q3), SRE, Infrastructure-Foundations, netops

Thu, Mar 2

aborrero added a comment to T327919: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it.

This ticket had little activity in the last month. Did something happen offline that wasn't recorded in here?

Thu, Mar 2, 3:20 PM · cloud-services-team (FY2022/2023-Q3), SRE, Infrastructure-Foundations, netops
aborrero changed the status of T324992: cloudlb: create PoC on codfw, a subtask of T297596: have cloud hardware servers in the cloud realm using a dedicated LB layer, from Open to Stalled.
Thu, Mar 2, 3:18 PM · cloud-services-team (FY2022/2023-Q3)
aborrero changed the status of T324992: cloudlb: create PoC on codfw from Open to Stalled.
Thu, Mar 2, 3:18 PM · cloud-services-team (FY2022/2023-Q3), Patch-For-Review

Wed, Mar 1

aborrero added a subtask for T127367: Provide modern, non-NFS error log solution for Toolforge webservices and bots: T330715: Allow TJF job logs to go to Kuberenetes output buffer rather than disk.
Wed, Mar 1, 4:58 PM · cloud-services-team, Epic, Toolforge
aborrero added a parent task for T330715: Allow TJF job logs to go to Kuberenetes output buffer rather than disk: T127367: Provide modern, non-NFS error log solution for Toolforge webservices and bots.
Wed, Mar 1, 4:58 PM · Toolforge Jobs framework
aborrero committed rCTKF6af7a857f5c8: cli: drop unused _flush_and_wait() function (authored by aborrero).
cli: drop unused _flush_and_wait() function
Wed, Mar 1, 12:27 PM

Fri, Feb 24

aborrero added a comment to T326758: Research Openstack Deployment Paradigms.

Note, by default openstack-ansible all-in-one setup renames the VM hostname, introducing a severe drift wrt. other Cloud VPS context (like puppet, etc), therefore making it difficult to operate inside Cloud VPS for evaluation & testing purposes. Will investigate next if this renaming can be disabled.

Fri, Feb 24, 4:49 PM · cloud-services-team (FY2022/2023-Q3)

Thu, Feb 23

aborrero closed T324774: Openstack hacks as Resolved.

Created https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Local_customization_and_hacks with this information

Thu, Feb 23, 10:42 AM · cloud-services-team, Cloud-VPS

Feb 22 2023

aborrero closed T330192: Request increased quota for text-to-speech Cloud VPS project as Resolved.

This should be done now.

Feb 22 2023, 3:54 PM · cloud-services-team, Cloud-VPS (Quota-requests)
aborrero added a comment to T330102: Decision request - What buildpacks to allow and include for toolforge build service beta.

On a quick read perhaps Option 4 provides the most value/flexibility?

Feb 22 2023, 3:48 PM · Toolforge Build Service (Iteration 11), Cloud Services Proposals
aborrero added a comment to T326789: Toolforge: improve local kubernetes development setup.

Created some additional docs: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/lima-kilo

Feb 22 2023, 12:55 PM · Patch-For-Review, cloud-services-team, Toolforge
aborrero added a comment to T330075: [cloudvirt] Move to jumbo frames.

Next question would be:

Feb 22 2023, 11:18 AM · Infrastructure-Foundations, netops, Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-Team, cloud-services-team (FY2022/2023-Q3), User-dcaro
aborrero moved T324992: cloudlb: create PoC on codfw from In progress to Blocked on the cloud-services-team (FY2022/2023-Q3) board.

Blocked by T327919: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it

Feb 22 2023, 11:07 AM · cloud-services-team (FY2022/2023-Q3), Patch-For-Review

Feb 21 2023

aborrero added a project to T330075: [cloudvirt] Move to jumbo frames: netops.
Feb 21 2023, 5:14 PM · Infrastructure-Foundations, netops, Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-Team, cloud-services-team (FY2022/2023-Q3), User-dcaro
aborrero added a comment to T326758: Research Openstack Deployment Paradigms.

Bootstrapped: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/2023_Openstack_deployment_workflow

Feb 21 2023, 1:44 PM · cloud-services-team (FY2022/2023-Q3)
aborrero added a comment to T326758: Research Openstack Deployment Paradigms.

Patching kolla containers: https://docs.openstack.org/kolla/latest/admin/image-building.html

Feb 21 2023, 1:27 PM · cloud-services-team (FY2022/2023-Q3)
aborrero created T330146: Toolforge: grid: sqid tool using a lot of resources.
Feb 21 2023, 12:54 PM · Tools, Toolforge

Feb 20 2023

aborrero added a comment to T330075: [cloudvirt] Move to jumbo frames.

Questions for NetOps: they live in the cloud-hosts vlan. It is OK if some hosts attached to that VLAN use high MTU and other don't?

Feb 20 2023, 1:30 PM · Infrastructure-Foundations, netops, Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-Team, cloud-services-team (FY2022/2023-Q3), User-dcaro

Feb 17 2023

aborrero closed T254636: mysqldump is not present in Kubernetes container images as Resolved.
tools.arturo-test-tool@tools-sgebastion-11:~$ toolforge-jobs images
Short name    Container image URL
------------  ----------------------------------------------------------------------
bullseye      docker-registry.tools.wmflabs.org/toolforge-bullseye-sssd:latest
golang1.11    docker-registry.tools.wmflabs.org/toolforge-golang111-sssd-base:latest
jdk17         docker-registry.tools.wmflabs.org/toolforge-jdk17-sssd-base:latest
mariadb       docker-registry.tools.wmflabs.org/toolforge-mariadb-sssd-base:latest
mono6.8       docker-registry.tools.wmflabs.org/toolforge-mono68-sssd-base:latest
node16        docker-registry.tools.wmflabs.org/toolforge-node16-sssd-base:latest
perl5.32      docker-registry.tools.wmflabs.org/toolforge-perl532-sssd-base:latest
php7.4        docker-registry.tools.wmflabs.org/toolforge-php74-sssd-base:latest
python3.9     docker-registry.tools.wmflabs.org/toolforge-python39-sssd-base:latest
ruby2.1       docker-registry.tools.wmflabs.org/toolforge-ruby21-sssd-base:latest
ruby2.7       docker-registry.tools.wmflabs.org/toolforge-ruby27-sssd-base:latest
tcl8.6        docker-registry.tools.wmflabs.org/toolforge-tcl86-sssd-base:latest
tools.arturo-test-tool@tools-sgebastion-11:~$ toolforge-jobs run mariadb --command 'sleep 3600' --image mariadb
tools.arturo-test-tool@tools-sgebastion-11:~$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
mariadb-49nnz           1/1     Running   0          8s
test-6d76568d94-mbzrh   1/1     Running   1          23d
test2-bcb6c74d9-6ncd7   1/1     Running   0          3d16h
tools.arturo-test-tool@tools-sgebastion-11:~$ kubectl exec -it mariadb-49nnz -- bash
tools.arturo-test-tool@mariadb-49nnz:~$ sql -h
usage: sql [-h] [-v] [-N] [--cluster {analytics,web}] DATABASE ...
[..]
tools.arturo-test-tool@mariadb-49nnz:~$ mysql --help
mysql  Ver 15.1 Distrib 10.5.18-MariaDB, for debian-linux-gnu (x86_64) using  EditLine wrapper
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Feb 17 2023, 11:39 AM · User-bd808, Toolforge (Software install/update)
aborrero closed T254636: mysqldump is not present in Kubernetes container images, a subtask of T319642: Migrate commtech-commons from Toolforge GridEngine to Toolforge Kubernetes, as Resolved.
Feb 17 2023, 11:38 AM · Community-Tech, Grid-Engine-to-K8s-Migration
aborrero closed T254636: mysqldump is not present in Kubernetes container images, a subtask of T319779: Migrate grantmetrics from Toolforge GridEngine to Toolforge Kubernetes, as Resolved.
Feb 17 2023, 11:38 AM · Community-Tech, Grid-Engine-to-K8s-Migration
aborrero closed T254636: mysqldump is not present in Kubernetes container images, a subtask of T319914: Migrate musikbot from Toolforge GridEngine to Toolforge Kubernetes, as Resolved.
Feb 17 2023, 11:38 AM · Grid-Engine-to-K8s-Migration
aborrero added a comment to T320178: Migrate wmcz from Toolforge GridEngine to Toolforge Kubernetes.

We have now a new image available which contains both curl and some mysql client tools, see:

Feb 17 2023, 11:36 AM · Patch-For-Review, Grid-Engine-to-K8s-Migration, User-Urbanecm

Feb 16 2023

aborrero added a comment to T320178: Migrate wmcz from Toolforge GridEngine to Toolforge Kubernetes.

thanks for the answer! It's possible I'm missing something here, but I don't understand how the bullseye image would help in this case. As far as I can see, the bullsyeye image has neither mysql nor wget available (both of those utilities are needed by the script):

Feb 16 2023, 11:07 AM · Patch-For-Review, Grid-Engine-to-K8s-Migration, User-Urbanecm
aborrero added a comment to T312935: tool-fourohfour: "uWSGI listen queue of socket ":8000" (fd: 4) full!".

What about having more pods in the deployment?

Feb 16 2023, 10:08 AM · cloud-services-team, Toolforge

Feb 15 2023

aborrero added a comment to T320178: Migrate wmcz from Toolforge GridEngine to Toolforge Kubernetes.

Since the script also runs a Python script, I first tried tf-python39, where there is no wget. I was unable to find wget installed in other containers, too. By shelling into the jobs container, I also figured that mysql is missing as well.

How can I migrate similar simple shell scripts (in tools.wmcz and elsewhere), please? Thanks!

Feb 15 2023, 2:10 PM · Patch-For-Review, Grid-Engine-to-K8s-Migration, User-Urbanecm
aborrero added a comment to T319700: Migrate dow from Toolforge GridEngine to Toolforge Kubernetes.

Hello.

Thank you for letting me know.

Will the jlocal command be saved (for small tasks)?

Feb 15 2023, 1:52 PM · Grid-Engine-to-K8s-Migration

Feb 14 2023

aborrero committed rCCKB56da0a305cac: toolforge.worker.depool_and_remove_node: improve SAL messages (authored by aborrero).
toolforge.worker.depool_and_remove_node: improve SAL messages
Feb 14 2023, 1:42 PM
aborrero triaged T329530: Convert all Toolforge custom components to standardized Helm based deployment as Medium priority.
Feb 14 2023, 1:33 PM · cloud-services-team, Toolforge
aborrero moved T329530: Convert all Toolforge custom components to standardized Helm based deployment from Inbox to Watching on the cloud-services-team board.
Feb 14 2023, 1:33 PM · cloud-services-team, Toolforge
aborrero triaged T328539: toolforge: consider relocating core k8s components out of puppet into its own repository as Medium priority.
Feb 14 2023, 1:32 PM · cloud-services-team
aborrero updated the task description for T329530: Convert all Toolforge custom components to standardized Helm based deployment.
Feb 14 2023, 1:28 PM · cloud-services-team, Toolforge
aborrero closed T329611: Toolforge grid: start webservices after outage as Resolved.
<taavi> Feb 14 13:21:28 tools-sgecron-2 collector-runner[4667]: 2023-02-14 13:21:28,798 Service monitor run completed, 283 webservices restarted
Feb 14 2023, 1:27 PM · Sustainability (Incident Followup), Toolforge, cloud-services-team (FY2022/2023-Q3), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro
aborrero closed T329611: Toolforge grid: start webservices after outage, a subtask of T329535: Cloud Ceph outage 2023-02-13, as Resolved.
Feb 14 2023, 1:26 PM · User-notice, Patch-For-Review, cloud-services-team (FY2022/2023-Q3), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro, Wikimedia-Incident, Cloud-VPS
aborrero committed rCCKB3d6919ba5831: toolforge.worker.depool_and_remove_node: handle toolsbeta special prefix (authored by aborrero).
toolforge.worker.depool_and_remove_node: handle toolsbeta special prefix
Feb 14 2023, 1:24 PM
aborrero added a comment to T329611: Toolforge grid: start webservices after outage.

Some additional information.

Feb 14 2023, 1:12 PM · Sustainability (Incident Followup), Toolforge, cloud-services-team (FY2022/2023-Q3), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro
aborrero triaged T329619: Toolforge: decide what to do with tools that have 'stretch' in their service manifests as Lowest priority.
Feb 14 2023, 12:27 PM · cloud-services-team, Toolforge
aborrero updated the task description for T329619: Toolforge: decide what to do with tools that have 'stretch' in their service manifests.
Feb 14 2023, 12:25 PM · cloud-services-team, Toolforge
aborrero created T329619: Toolforge: decide what to do with tools that have 'stretch' in their service manifests.
Feb 14 2023, 12:24 PM · cloud-services-team, Toolforge
aborrero triaged T329467: remove webservicemonitor (down due to DNS errors) as Medium priority.

Option #2 has been implemented (drop statsd support). The service is now up and running.

Feb 14 2023, 12:20 PM · Patch-For-Review, cloud-services-team, Toolforge
aborrero triaged T329611: Toolforge grid: start webservices after outage as High priority.
Feb 14 2023, 12:18 PM · Sustainability (Incident Followup), Toolforge, cloud-services-team (FY2022/2023-Q3), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro
aborrero closed T244809: Remove or fix stats collecting from tools-manifest (webservice-monitor), a subtask of T194333: [Epic] Provide logging/metrics/monitoring SaaS for Cloud VPS tenants, as Resolved.
Feb 14 2023, 12:14 PM · cloud-services-team, Epic, Cloud-VPS
aborrero closed T244809: Remove or fix stats collecting from tools-manifest (webservice-monitor) as Resolved.

Removed them.

Feb 14 2023, 12:14 PM · cloud-services-team, Toolforge
aborrero added a comment to T329611: Toolforge grid: start webservices after outage.

The webservicemonitor doing its thing:

Feb 14 2023, 12:13 PM · Sustainability (Incident Followup), Toolforge, cloud-services-team (FY2022/2023-Q3), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro
aborrero added a comment to T329611: Toolforge grid: start webservices after outage.

I think the actual fix here may be to fix T329467: remove webservicemonitor (down due to DNS errors) and let it recover webservices instead of me doing manually

Feb 14 2023, 10:48 AM · Sustainability (Incident Followup), Toolforge, cloud-services-team (FY2022/2023-Q3), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro
aborrero created T329611: Toolforge grid: start webservices after outage.
Feb 14 2023, 9:51 AM · Sustainability (Incident Followup), Toolforge, cloud-services-team (FY2022/2023-Q3), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, User-dcaro
aborrero created P44617 toolforge grid webservices redis keys.
Feb 14 2023, 9:49 AM