Page MenuHomePhabricator

aborrero (arturo)
Operations Engineer at Wikimedia Cloud Services Team

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 23 2017, 12:19 PM (111 w, 3 d)
Availability
Available
IRC Nick
arturo
LDAP User
Arturo Borrero Gonzalez
MediaWiki User
ABorrero (WMF) [ Global Accounts ]

I'm Arturo Borrero Gonzalez from Spain (Seville). I'm Site Reliability Engineer (SRE) in the Wikimedia Cloud Services Team, a Wikimedia Foundation staff.

You may find me in some FLOSS projects, like Netfilter and Debian.

Recent Activity

Yesterday

aborrero raised the priority of T238766: openstack: dns_floating_ip_updater mechanism improvements to better handle transient errors from Low to Medium.

This is annoying. Raising priority.

Wed, Dec 11, 5:47 PM · Cloud-Services, cloud-services-team (Kanban)
aborrero added a comment to T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes.

For some reason, the metrics-server doesn't work on the tools project cluster.

Wed, Dec 11, 4:31 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
Bstorm awarded T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes a Party Time token.
Wed, Dec 11, 3:35 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes.

I think this is what we are looking for in this ticket:

Wed, Dec 11, 12:58 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes.

My first impulse was to look at https://github.com/kubernetes/kube-state-metrics instead, but I'm not sure at this point if they provide the same or are competing options, or what.

Wed, Dec 11, 9:46 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero claimed T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes.
Wed, Dec 11, 9:26 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero removed a parent task for T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes: T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.
Wed, Dec 11, 9:26 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero removed a subtask for T239405: toolforge: new k8s: evaluate ingress controller reload behaviour: T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes.
Wed, Dec 11, 9:26 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a subtask for T237643: toolforge: new k8s: figure out metrics / observability: T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes.
Wed, Dec 11, 9:22 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a parent task for T240402: Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes: T237643: toolforge: new k8s: figure out metrics / observability.
Wed, Dec 11, 9:22 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes

Tue, Dec 10

aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

@Bstorm please give this a final review.

Tue, Dec 10, 5:52 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero updated the task description for T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.
Tue, Dec 10, 5:51 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero moved T238820: CloudVPS: consider mirroring debian repos for openstack packages from Needs discussion to Blocked on the cloud-services-team (Kanban) board.

Waiting for the folks in charge of the repo to rename the directory before enabling the mirror.

Tue, Dec 10, 5:11 PM · cloud-services-team (Kanban), Cloud-Services
aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

I just gave this another tested: 3 pods with 20k ingress objects. It works pretty well. I'm used that as the replica factor for now.

Tue, Dec 10, 1:23 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero updated subscribers of T239347: create a 'normal' network for codf1dev neutron w/public IPs.

First step I think would be to have a proper routing_source_ip address in codfw1dev.

Tue, Dec 10, 12:56 PM · cloud-services-team (Kanban)
aborrero triaged T239347: create a 'normal' network for codf1dev neutron w/public IPs as Medium priority.
Tue, Dec 10, 12:31 PM · cloud-services-team (Kanban)
aborrero moved T240245: Create an SOP for handling of Cloud/Toolforge open vulnerability issues from Inbox to Watching on the cloud-services-team (Kanban) board.
Tue, Dec 10, 10:02 AM · cloud-services-team (Kanban), Security-Team, Security, Cloud-VPS, Toolforge
aborrero added a project to T240245: Create an SOP for handling of Cloud/Toolforge open vulnerability issues: cloud-services-team (Kanban).
Tue, Dec 10, 10:01 AM · cloud-services-team (Kanban), Security-Team, Security, Cloud-VPS, Toolforge
aborrero assigned T238424: Enable Shinken monitoring for 'gratitude' Cloud VPS Project to Andrew.

@Andrew is the clinic-duty person this week. Assigning this task to him.

Tue, Dec 10, 9:48 AM · Patch-For-Review, VPS-Projects, Shinken, cloud-services-team (Kanban)
aborrero added a comment to T239406: toolforge: new k8s: evalute and test firewalling via calico.

Saw the patch, thanks!

Tue, Dec 10, 9:46 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes

Thu, Dec 5

aborrero updated the task description for T238641: toolforge: some additional testing before final migration.
Thu, Dec 5, 12:59 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T215531: Deploy upgraded Kubernetes to toolsbeta, a subtask of T214513: Upgrade Toolforge Kubernetes, as Resolved.
Thu, Dec 5, 12:54 PM · Wikimedia-Incident, Goal, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T215531: Deploy upgraded Kubernetes to toolsbeta as Resolved.

I think the upgraded k8s cluster in toolbeta has been up and running stable for some time now. Resolving this task with the hope we can better focus on the several subtasks we have previous to the final operations in the tools project.

Thu, Dec 5, 12:54 PM · Patch-For-Review, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero triaged T239407: toolfoge: new k8s: package newer/more convenient python3 k8s client libs as Low priority.

Triaging this as low priority, since this would be required for k8s 1.16. We are targeting 1.15 for the initial migration.

Thu, Dec 5, 12:51 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T239409: toolforge: new k8s: introduce more robust controls for deb pkg versions, a subtask of T214513: Upgrade Toolforge Kubernetes, as Resolved.
Thu, Dec 5, 12:50 PM · Wikimedia-Incident, Goal, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T239409: toolforge: new k8s: introduce more robust controls for deb pkg versions as Resolved.

This should be done now. Please reopen if required.

Thu, Dec 5, 12:50 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T237749: Upgrade wmcs OpenStack version to Ocata.

codfw1-dev is running Ocata, and I've scheduled an upgrade window for eqiad1. Approximate steps are...

Thu, Dec 5, 12:45 PM · cloud-services-team (Kanban)
aborrero added a comment to T236583: "discourse" Cloud VPS project jessie deprecation.

Doing apt-get dist-upgrade is technically possible, but we don't encourage it becuase we would lost track of what the base operating system is. This is something that could be improved in our side but that won't happen in the short term I'm afraid.

Thu, Dec 5, 12:24 PM · Space (Oct-Dec-2019), Cloud-VPS (Debian Jessie Deprecation)
aborrero closed T236545: "otrs" Cloud VPS project jessie deprecation as Resolved.

I deleted all the VMs and the project itself. Thanks!

Thu, Dec 5, 12:22 PM · Cloud-VPS (Debian Jessie Deprecation)

Mon, Dec 2

aborrero claimed T239409: toolforge: new k8s: introduce more robust controls for deb pkg versions.
Mon, Dec 2, 2:15 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

BTW, the reload time with 20k ingress objects with 2 pods is about ~90s:

Mon, Dec 2, 1:31 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

good news, donwscaling the nginx-ingress to 2 pod replicas and creating again 20k ingress objects and the issue above didn't show up again. Our magic number could be 2 perhaps.

Mon, Dec 2, 12:59 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

Oh, some more news. I scaled the deployment to 5 pods and created 20k ingress objects and the whole thing crashed.
I was not able to curl the test tool we have and I was unable to fetch ingress-nginx logs. Pods crashed and were re-created by k8s.

Mon, Dec 2, 12:12 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero triaged T239584: CloudVPS: keystone may need update in hardware servers running Buster as Low priority.
Mon, Dec 2, 10:21 AM · cloud-services-team (Kanban)
aborrero created T239584: CloudVPS: keystone may need update in hardware servers running Buster.
Mon, Dec 2, 10:20 AM · cloud-services-team (Kanban)

Fri, Nov 29

aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

Regarding the number of nginx-ingress pods, I have some comments. We may not need any autoscaling mechanism. Let's assume we use the magic static number of 5 nginx-ingress pods.

  • If we have very little usage on the cluster, this is no problem. We are not paying anything per use. We can afford having 5 pods doing nothing. No need to downscale. The magic number works.
  • If we have suddenly have very high load in the cluster. This is indicative of other kind of problems. We may need to manually scale the cluster itself (workers), not just the number of nginx-ingress pods. Adjusting the magic number is just a very minor step in that scaling workflow. So the magic number works in this case too.
  • We have currently 30 worker nodes in the legacy cluster. A magic number of 5 nginx-ingress pods mean 1 pod per 6 worker nodes. I think nginx can handle this.
Fri, Nov 29, 4:59 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero updated the task description for T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.
Fri, Nov 29, 4:49 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

With an actual destination svc/pod, we get similar reload times. I can confirm there is no downtime for current traffic while the reload is taking place, ie:

Fri, Nov 29, 4:47 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.

Some initial simple tests, with a single nginx-ingress pod, a fake ingress object that points to nowhere (non existent svc or endpoint):

  • with 100 ingress objects, config reload takes about ~0.5s
  • with 1k ingress objects, config reload takes about ~2s
  • with 5k ingress objects, config reload takes about ~10s
  • with 10k ingress objects, config reload takes about ~20s
Fri, Nov 29, 4:20 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero claimed T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.
Fri, Nov 29, 12:58 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T239403: toolforge: new k8s: scale up a bit the cluster before final tests and initial migrations as Resolved.

Joined the 3 new VMs into the cluster and updated the docs https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Deploying_k8s to mention what to do in case kubeadm token is expired.

Fri, Nov 29, 12:18 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T239403: toolforge: new k8s: scale up a bit the cluster before final tests and initial migrations, a subtask of T238641: toolforge: some additional testing before final migration, as Resolved.
Fri, Nov 29, 12:18 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes

Thu, Nov 28

aborrero created T239409: toolforge: new k8s: introduce more robust controls for deb pkg versions.
Thu, Nov 28, 12:45 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero created T239407: toolfoge: new k8s: package newer/more convenient python3 k8s client libs.
Thu, Nov 28, 12:42 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero created T239406: toolforge: new k8s: evalute and test firewalling via calico.
Thu, Nov 28, 12:39 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero created T239405: toolforge: new k8s: evaluate ingress controller reload behaviour.
Thu, Nov 28, 12:36 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero created T239404: toolforge: new k8s: evaluate DNS (coredns) autoscale options.
Thu, Nov 28, 12:33 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero renamed T228660: Toolforge: new k8s: issues with the initial coredns setup from Toolforge: new k8s: evaluate DNS setup for coredns to Toolforge: new k8s: issues with the initial coredns setup.
Thu, Nov 28, 12:31 PM · Toolforge, cloud-services-team (Kanban)
aborrero created T239403: toolforge: new k8s: scale up a bit the cluster before final tests and initial migrations.
Thu, Nov 28, 12:29 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T210750: Track NFS statistics through Prometheus.

We might lack some of the things described in the task description, but for the record, I'm sharing here some grafana dashboard based on prometheus metrics we have for NFS:

Thu, Nov 28, 12:07 PM · cloud-services-team (Kanban)

Wed, Nov 27

aborrero moved T239347: create a 'normal' network for codf1dev neutron w/public IPs from Important to Doing on the cloud-services-team (Kanban) board.
Wed, Nov 27, 5:07 PM · cloud-services-team (Kanban)
aborrero closed T237643: toolforge: new k8s: figure out metrics / observability as Resolved.

I declare this is mostly done, at least until we start to have real traffic in the service and see where we lack metrics.

Wed, Nov 27, 5:07 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T237643: toolforge: new k8s: figure out metrics / observability, a subtask of T214513: Upgrade Toolforge Kubernetes, as Resolved.
Wed, Nov 27, 5:07 PM · Wikimedia-Incident, Goal, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero triaged T190377: Keep track of tools without stated default licenses as Lowest priority.
Wed, Nov 27, 5:05 PM · cloud-services-team (Kanban), Toolforge-standards-committee, Toolforge
aborrero triaged T190376: Phase out .description files in favor of tool records as Lowest priority.
Wed, Nov 27, 5:04 PM · cloud-services-team (Kanban), Toolhub, Toolforge
aborrero triaged T193648: Improve documentation on SSH host fingerprints as Lowest priority.
Wed, Nov 27, 5:03 PM · cloud-services-team (Kanban), Cloud-Services
aborrero moved T192733: Remove old symlinks to trunk/rewrite/compat/pywikipedia in /shared from Inbox to Watching on the cloud-services-team (Kanban) board.
Wed, Nov 27, 5:02 PM · cloud-services-team (Kanban), Pywikibot, Toolforge
aborrero moved T192059: CloudVPS: VMs created with non-allowed characters in the hostname fail to be autosigned by puppet from Inbox to Graveyard on the cloud-services-team (Kanban) board.
Wed, Nov 27, 5:02 PM · cloud-services-team (Kanban), Horizon
aborrero triaged T192059: CloudVPS: VMs created with non-allowed characters in the hostname fail to be autosigned by puppet as Low priority.
Wed, Nov 27, 5:01 PM · cloud-services-team (Kanban), Horizon
aborrero added a subtask for T239352: CloudVPS: horizon improvements: T192059: CloudVPS: VMs created with non-allowed characters in the hostname fail to be autosigned by puppet.
Wed, Nov 27, 5:01 PM · Epic, cloud-services-team (Kanban)
aborrero added a parent task for T192059: CloudVPS: VMs created with non-allowed characters in the hostname fail to be autosigned by puppet: T239352: CloudVPS: horizon improvements.
Wed, Nov 27, 5:01 PM · cloud-services-team (Kanban), Horizon
aborrero added a subtask for T239352: CloudVPS: horizon improvements: Unknown Object (Task).
Wed, Nov 27, 4:59 PM · Epic, cloud-services-team (Kanban)
aborrero added a parent task for T229660: Horizon warning for instance deletion does not include instance name: T239352: CloudVPS: horizon improvements.
Wed, Nov 27, 4:59 PM · Upstream, Horizon, cloud-services-team (Kanban)
aborrero added a subtask for T239352: CloudVPS: horizon improvements: T229660: Horizon warning for instance deletion does not include instance name.
Wed, Nov 27, 4:59 PM · Epic, cloud-services-team (Kanban)
aborrero triaged T239352: CloudVPS: horizon improvements as Medium priority.
Wed, Nov 27, 4:59 PM · Epic, cloud-services-team (Kanban)
aborrero created T239352: CloudVPS: horizon improvements.
Wed, Nov 27, 4:58 PM · Epic, cloud-services-team (Kanban)
aborrero triaged T190414: Delete legacy svn accounts from LDAP directory as Low priority.
Wed, Nov 27, 4:56 PM · cloud-services-team (Kanban), Cloud-Services, LDAP
aborrero removed a project from T190451: Find out who maintains http://hikebikemap.org/: cloud-services-team (Kanban).
Wed, Nov 27, 4:55 PM · Cloud-VPS, Tools
aborrero removed a project from T191955: [Investigation] Ability for users to "claim" Toolhub entries: cloud-services-team (Kanban).
Wed, Nov 27, 4:54 PM · Toolhub
aborrero moved T236399: Upgrade mariadb on toolsdb servers to 10.1.42 as soon as it is available from Inbox to Important on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:53 PM · Data-Services, cloud-services-team (Kanban), Tools
aborrero triaged T200649: Port operations/docker-images/toollabs-images to use docker-pkg as Lowest priority.
Wed, Nov 27, 4:52 PM · cloud-services-team (Kanban), docker-pkg, Toolforge
aborrero triaged T202431: Move labweb hosts from nutcracker to mcrouter? as Low priority.
Wed, Nov 27, 4:51 PM · cloud-services-team (Kanban), wikitech.wikimedia.org, Striker, Horizon
aborrero triaged T202949: Compile the frequently used webpage design snippets for Tools authors as Low priority.

Removing the cloud-services-team (Kanban) tag since I don't think we the SREs in the WMCS team have a lot of action items here.

Wed, Nov 27, 4:50 PM · Documentation, Toolforge
aborrero moved T202949: Compile the frequently used webpage design snippets for Tools authors from Triage to Feature requests on the Toolforge board.
Wed, Nov 27, 4:49 PM · Documentation, Toolforge
aborrero triaged T199555: HTTP 500 from edithiera endpoint when given certain hiera data as Low priority.
Wed, Nov 27, 4:48 PM · cloud-services-team (Kanban), Horizon
aborrero closed T199575: labs/private.git hieradata/codfw.yaml is missing keys set in hieradata/eqiad.yaml as Resolved.

I don't think the diff is such a big deal. Closing task now.

Wed, Nov 27, 4:46 PM · cloud-services-team (Kanban), Cloud-VPS
aborrero moved T236565: "tools" Cloud VPS project jessie deprecation from Inbox to Important on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:23 PM · cloud-services-team (Kanban), Toolforge, Cloud-VPS (Debian Jessie Deprecation)
aborrero closed T236826: Toolforge: new k8s: initial build of the new kubernetes cluster, a subtask of T214513: Upgrade Toolforge Kubernetes, as Resolved.
Wed, Nov 27, 4:22 PM · Wikimedia-Incident, Goal, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T236826: Toolforge: new k8s: initial build of the new kubernetes cluster as Resolved.
Wed, Nov 27, 4:22 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a project to T229441: CloudVPS: codfw1dev: missing bits: Epic.
Wed, Nov 27, 4:21 PM · Epic, cloud-services-team (Kanban)
aborrero added a subtask for T229441: CloudVPS: codfw1dev: missing bits: T239347: create a 'normal' network for codf1dev neutron w/public IPs.
Wed, Nov 27, 4:20 PM · Epic, cloud-services-team (Kanban)
aborrero added a parent task for T239347: create a 'normal' network for codf1dev neutron w/public IPs: T229441: CloudVPS: codfw1dev: missing bits.
Wed, Nov 27, 4:20 PM · cloud-services-team (Kanban)
aborrero moved T239347: create a 'normal' network for codf1dev neutron w/public IPs from Inbox to Important on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:19 PM · cloud-services-team (Kanban)
aborrero moved T238766: openstack: dns_floating_ip_updater mechanism improvements to better handle transient errors from Inbox to Important on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:18 PM · Cloud-Services, cloud-services-team (Kanban)
aborrero moved T176757: CamelCase vs. VPS instance naming from Inbox to Graveyard on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:18 PM · cloud-services-team (Kanban)
aborrero moved T158883: Issues in enabling NFS for new projects (was 'adding project to nfs-mounts.yaml does not create directories') from Inbox to Graveyard on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:16 PM · Data-Services, cloud-services-team (Kanban), Patch-For-Review
aborrero moved T143639: Write a simple script that handles failovering proxies from Inbox to Important on the cloud-services-team (Kanban) board.
Wed, Nov 27, 4:14 PM · cloud-services-team (Kanban), Wikimedia-Incident, Cloud-Services
aborrero closed T183436: Add memory limit configuration for Kubernetes pods, a subtask of T175593: Increases the memory available to the corenlp tool container, as Resolved.
Wed, Nov 27, 4:11 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
aborrero closed T183436: Add memory limit configuration for Kubernetes pods as Resolved.

We are working on a new kubernetes cluster for Toolforge. This cluster has better user quota management, see T234702: Review and establish configurable quotas for users in the new Kubernetes cluster for reference.
Closing this task now.

Wed, Nov 27, 4:11 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
aborrero closed T183436: Add memory limit configuration for Kubernetes pods, a subtask of T230284: Raise spacemedia tool memory limit, as Resolved.
Wed, Nov 27, 4:11 PM · Tool-spacemedia, Toolforge
aborrero closed T175593: Increases the memory available to the corenlp tool container as Resolved.

We are working on a new kubernetes cluster for Toolforge. This cluster has better user quota management, see T234702: Review and establish configurable quotas for users in the new Kubernetes cluster for reference.
Closing this task now.

Wed, Nov 27, 4:10 PM · cloud-services-team (Kanban), Kubernetes, Toolforge

Tue, Nov 26

aborrero claimed T237643: toolforge: new k8s: figure out metrics / observability.

Created a couple of grafana dashboards:

  • this one is for haproxy in front of the apiserver and nginx-ingress:

https://grafana-labs.wikimedia.org/d/5O3YKfbWz/toolforge-k8s-haproxy

  • this one aggregates metrics for all the ingress path:

https://grafana-labs.wikimedia.org/d/R7BPaEbWk/toolforge-ingress?refresh=1m&orgId=1

Tue, Nov 26, 7:05 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero moved T224585: Migrate labmon* to Buster from Needs discussion to Doing on the cloud-services-team (Kanban) board.

In the WMCS team meeting, we decided to rename these servers to better reflect what they do and to avoid naming clashes:

  • from labmon1001 to cloudmetrics1001 and
  • from labmon1002 to cloudmetrics1002
Tue, Nov 26, 5:33 PM · Patch-For-Review, Cloud-VPS (Debian Jessie Deprecation), cloud-services-team (Kanban), Operations

Mon, Nov 25

aborrero closed T238655: toolforge: new k8s: issues with the apiserver and etcd, a subtask of T215531: Deploy upgraded Kubernetes to toolsbeta, as Resolved.
Mon, Nov 25, 10:49 AM · Patch-For-Review, Epic, Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T238655: toolforge: new k8s: issues with the apiserver and etcd as Resolved.

This seems solved! Please reopen if required.

Mon, Nov 25, 10:49 AM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero added a comment to T238655: toolforge: new k8s: issues with the apiserver and etcd.

This seems to be solved by using puppet cert SANs to include all the other servers. This can be done easily via hiera.

Mon, Nov 25, 10:28 AM · Toolforge, cloud-services-team (Kanban), Kubernetes

Fri, Nov 22

aborrero closed T234032: Toolforge ingress: create a default landing page for unknown/default URLs, a subtask of T107697: Extend 'tool not found' 404 page, as Resolved.
Fri, Nov 22, 5:48 PM · Toolforge, Cloud-Services
aborrero closed T234032: Toolforge ingress: create a default landing page for unknown/default URLs, a subtask of T180262: Reduce byte-size of Toolforge 404 page, as Resolved.
Fri, Nov 22, 5:48 PM · Toolforge
aborrero closed T234032: Toolforge ingress: create a default landing page for unknown/default URLs as Resolved.

This seems solved. Thanks @bd808 and @Bstorm !! Closing task now, please reopen if required.

Fri, Nov 22, 5:48 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
aborrero closed T234032: Toolforge ingress: create a default landing page for unknown/default URLs, a subtask of T228500: Toolforge: evaluate ingress mechanism, as Resolved.
Fri, Nov 22, 5:47 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban), Kubernetes