Page MenuHomePhabricator

aborrero (arturo)
SRE at Wikimedia Cloud Services Team

Projects (8)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 23 2017, 12:19 PM (243 w, 5 d)
Availability
Available
IRC Nick
arturo
LDAP User
Arturo Borrero Gonzalez
MediaWiki User
ABorrero (WMF) [ Global Accounts ]

I'm Arturo Borrero Gonzalez from Spain (Seville). I'm Site Reliability Engineer (SRE) in the Wikimedia Cloud Services Team, a Wikimedia Foundation staff.

You may find me in some FLOSS projects, like Netfilter and Debian.

Recent Activity

Apr 18 2022

MusikAnimal awarded T285944: Toolforge: beta phase for the new jobs framework a Love token.
Apr 18 2022, 8:53 PM · Toolforge Jobs framework, cloud-services-team (Kanban)

Apr 13 2022

aborrero added a comment to T302178: prometheus-openstack-exporter No module named 'urlparse'.

current status:

  • the exporter was deployed into the 3 cloudcontrol servers
  • the API request rate increased significantly, but no problems were detected (meaning, the new API hammering didn't bring down the service)
  • the exporter takes a long time to return the metrics and prometheus will timeout scraping it, so currently exploring options
Apr 13 2022, 10:06 AM · Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, User-dcaro, Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)

Apr 12 2022

aborrero added a comment to T302178: prometheus-openstack-exporter No module named 'urlparse'.

A few things detected upon the initial deployment:

  • request rate to openstack API has significantly increased.
    • this is probably because there is no caching (i.e, every exporter run fetches and generates all metrics)
    • also because we run the exporter in all 3 cloudcontrol nodes
  • prometheus is still seeing the exporter as down, because it takes a lot of time to return the GET to /metrics, which can also be related to the caching issue mentioned above.
Apr 12 2022, 3:29 PM · Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, User-dcaro, Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)

Apr 11 2022

bd808 awarded T305831: Cloud VPS: evaluate if VM name global uniqueness enforcement can be dropped a Like token.
Apr 11 2022, 8:06 PM · Cloud-VPS, cloud-services-team (Kanban)
aborrero closed T301380: Request creation of wmdeanalytics VPS project as Resolved.
Apr 11 2022, 4:01 PM · User-ItamarWMDE, Cloud-VPS (Project-requests)
aborrero added a comment to T301380: Request creation of wmdeanalytics VPS project.

Thank you @aborrero.

@ItamarWMDE @Tobi_WMDE_SW @Manuel

I hope you aware of the fact that changing the instance name in CloudVPS implies the change in its URL, e.g.

@Manuel This is especially relevant for you since you have placed an (understandable, rational) demand to always keep the old URLs alive.
@ItamarWMDE Do you think there is something that we can do about this?

Apr 11 2022, 12:24 PM · User-ItamarWMDE, Cloud-VPS (Project-requests)
aborrero triaged T305829: horizon (or nova) doesn't correctly report duplicated VM names as High priority.
Apr 11 2022, 11:29 AM · cloud-services-team (Kanban), Horizon
aborrero moved T305829: horizon (or nova) doesn't correctly report duplicated VM names from Inbox to Soon! on the cloud-services-team (Kanban) board.
Apr 11 2022, 11:29 AM · cloud-services-team (Kanban), Horizon
aborrero triaged T305831: Cloud VPS: evaluate if VM name global uniqueness enforcement can be dropped as Lowest priority.
Apr 11 2022, 11:29 AM · Cloud-VPS, cloud-services-team (Kanban)
aborrero added a comment to T305780: toolforge-jobs – wikihistory needs a container with both php7 and mono.

There is no short term solution to this.

Apr 11 2022, 11:28 AM · Toolforge Jobs framework
aborrero changed the status of T305834: Cloud VPS: drop wmflabs names from profile::resolving::domain_search from Open to Stalled.

This is blocked on T277653: Toolforge: add Debian Buster to the grid and eliminate Debian Stretch, the old Debian Stretch grid relies on resolving the .eqiad.wmflabs names.

Apr 11 2022, 11:22 AM · Cloud-VPS, cloud-services-team (Kanban)
aborrero triaged T299121: Job getting killed on k8s as Low priority.
Apr 11 2022, 11:11 AM · Toolforge Jobs framework, Kubernetes
aborrero added a comment to T299121: Job getting killed on k8s.

Perhaps try requesting more resources for the job, see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Job_quotas

Apr 11 2022, 11:10 AM · Toolforge Jobs framework, Kubernetes
aborrero awarded T299039: All started jobs failed on Kubernetes during 24h with no visible error or output a Like token.
Apr 11 2022, 11:08 AM · Toolforge, Kubernetes
aborrero updated the task description for T305829: horizon (or nova) doesn't correctly report duplicated VM names.
Apr 11 2022, 11:04 AM · cloud-services-team (Kanban), Horizon
aborrero updated the task description for T305829: horizon (or nova) doesn't correctly report duplicated VM names.
Apr 11 2022, 11:03 AM · cloud-services-team (Kanban), Horizon
aborrero created T305831: Cloud VPS: evaluate if VM name global uniqueness enforcement can be dropped.
Apr 11 2022, 11:03 AM · Cloud-VPS, cloud-services-team (Kanban)
aborrero created T305829: horizon (or nova) doesn't correctly report duplicated VM names.
Apr 11 2022, 10:58 AM · cloud-services-team (Kanban), Horizon
aborrero updated subscribers of T301380: Request creation of wmdeanalytics VPS project.

Hey, thanks to @Majavah who did some internal inspection, we believe the problem is that there is already a virtual machine with the same name somewhere in Cloud VPS.

Apr 11 2022, 10:54 AM · User-ItamarWMDE, Cloud-VPS (Project-requests)

Apr 7 2022

aborrero added a comment to T302178: prometheus-openstack-exporter No module named 'urlparse'.
Apr 7 2022, 12:53 PM · Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, User-dcaro, Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)
aborrero updated the task description for T305631: cloudvirt1016: sudden reboot.
Apr 7 2022, 12:50 PM · cloud-services-team (Hardware)
aborrero added a subtask for T165531: rack/setup/install labvirt101[5-8]: T305631: cloudvirt1016: sudden reboot.
Apr 7 2022, 12:46 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-Services, SRE
aborrero added a parent task for T305631: cloudvirt1016: sudden reboot: T165531: rack/setup/install labvirt101[5-8].
Apr 7 2022, 12:46 PM · cloud-services-team (Hardware)
aborrero created T305631: cloudvirt1016: sudden reboot.
Apr 7 2022, 12:45 PM · cloud-services-team (Hardware)

Apr 6 2022

aborrero added a comment to T304716: Cloud services enhancement proposal: Prometheus metrics for Toolforge/Toolsbeta/Paws Kubernetes clusters.

We discussed this in the WMCS team meeting today, and pretty much agreed with this idea.

Apr 6 2022, 3:30 PM · Patch-For-Review, User-dcaro, Cloud Services Proposals
aborrero closed T304598: cloudgw: upgrade servers to Debian 11 Bullseye, a subtask of T300254: Upgrade codfw1dev to bullseye, as Resolved.
Apr 6 2022, 12:24 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero closed T304598: cloudgw: upgrade servers to Debian 11 Bullseye as Resolved.
Apr 6 2022, 12:24 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero closed T305494: interface renaming via systemd .link file can race with sysctl parameters, a subtask of T304598: cloudgw: upgrade servers to Debian 11 Bullseye, as Resolved.
Apr 6 2022, 12:22 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero closed T305494: interface renaming via systemd .link file can race with sysctl parameters as Resolved.
Apr 6 2022, 12:22 PM · cloud-services-team (Kanban), Cloud-VPS
aborrero changed the status of T302178: prometheus-openstack-exporter No module named 'urlparse', a subtask of T281276: Upgrade cloud-vps openstack hosts to Debian 'Bullseye', from Open to In Progress.
Apr 6 2022, 11:36 AM · Cloud-VPS, cloud-services-team (Kanban)
aborrero changed the status of T302178: prometheus-openstack-exporter No module named 'urlparse', a subtask of T302050: prometheus-openstack-exporter in Bullseye, from Open to In Progress.
Apr 6 2022, 11:36 AM · cloud-services-team (Kanban)
aborrero changed the status of T302178: prometheus-openstack-exporter No module named 'urlparse' from Open to In Progress.
Apr 6 2022, 11:36 AM · Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, User-dcaro, Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)
aborrero added a comment to T302178: prometheus-openstack-exporter No module named 'urlparse'.

I will put the .deb packaging in here: https://gitlab.wikimedia.org/repos/cloud/deb/pkg-prometheus-openstack-exporter

Apr 6 2022, 11:35 AM · Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, User-dcaro, Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)
aborrero closed T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython, a subtask of T304694: upgrade codfw1dev to wallaby, as Resolved.
Apr 6 2022, 9:16 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero closed T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython as Resolved.

all agents are back online:

Apr 6 2022, 9:16 AM · cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython.

We have now python3-eventlet version 0.30.2-5~bpo11+1 in the bullseye-wallaby repo, upgrading codfw1dev with that.

Apr 6 2022, 9:06 AM · cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython.

Talked to fellow Debian Developers to ask them to put a newer version of python3-eventlet on the bullseye-wallaby repo.

Apr 6 2022, 9:06 AM · cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython.

The version of python3-eventlet that contains the mentioned DNS fixes is >= 0.30.2-3 per changelog at https://tracker.debian.org/media/packages/p/python-eventlet/changelog-0.30.2-5

Apr 6 2022, 8:31 AM · cloud-services-team (Kanban), Cloud-VPS

Apr 5 2022

aborrero created T305494: interface renaming via systemd .link file can race with sysctl parameters.
Apr 5 2022, 5:28 PM · cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T304598: cloudgw: upgrade servers to Debian 11 Bullseye.
aborrero@cloudgw1001:~ 4 $ sudo systemctl status systemd-sysctl
● systemd-sysctl.service - Apply Kernel Variables
     Loaded: loaded (/lib/systemd/system/systemd-sysctl.service; static)
     Active: active (exited) since Tue 2022-04-05 12:16:43 UTC; 10min ago
       Docs: man:systemd-sysctl.service(8)
             man:sysctl.d(5)
    Process: 434 ExecStart=/lib/systemd/systemd-sysctl (code=exited, status=0/SUCCESS)
   Main PID: 434 (code=exited, status=0/SUCCESS)
        CPU: 15ms
Apr 5 2022, 12:33 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T304598: cloudgw: upgrade servers to Debian 11 Bullseye.

The reimage resulted in new NIC names for cloudgw :-( the newer ones are longer and don't support the vlan tag attached to them.

Apr 5 2022, 9:17 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero updated the task description for T304598: cloudgw: upgrade servers to Debian 11 Bullseye.
Apr 5 2022, 9:04 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero committed rCTKF38b6d5f737d4: jobs-framework-cli: only print timestamp and file in debug mode (authored by aborrero).
jobs-framework-cli: only print timestamp and file in debug mode
Apr 5 2022, 7:17 AM
aborrero committed rCTKFc6603cb45a87: d/changelog: generate entry for buster/7 (authored by aborrero).
d/changelog: generate entry for buster/7
Apr 5 2022, 7:17 AM

Apr 4 2022

aborrero committed rCTKF8677d6997eda: images: update API endpoint URL (authored by aborrero).
images: update API endpoint URL
Apr 4 2022, 5:23 PM
aborrero awarded T305391: Disable creation of new web proxies under .wmflabs.org a Burninate token.
Apr 4 2022, 4:48 PM · Horizon, cloud-services-team (Kanban)
aborrero committed rCTKF862e5d0cf8b2: jobs-framework-cli: remove references to Docker (authored by aborrero).
jobs-framework-cli: remove references to Docker
Apr 4 2022, 9:07 AM
aborrero committed rCTKFf21b018da611: d/changelog: regenerate entry for release 6 buster (authored by aborrero).
d/changelog: regenerate entry for release 6 buster
Apr 4 2022, 9:07 AM
aborrero committed rCTKF0a70aa733475: jobs-framework-cli: use timestamp in logs (authored by aborrero).
jobs-framework-cli: use timestamp in logs
Apr 4 2022, 8:51 AM
aborrero committed rCTKF12a61d8233f7: d/changelog: generate entry for release 6 (authored by aborrero).
d/changelog: generate entry for release 6
Apr 4 2022, 8:51 AM
aborrero committed rCTKFd1456a283983: jobs-framework-cli: rename container options to images (authored by aborrero).
jobs-framework-cli: rename container options to images
Apr 4 2022, 8:50 AM
aborrero added a comment to T277653: Toolforge: add Debian Buster to the grid and eliminate Debian Stretch.

One user sent a technical enquiry to the Cloud mailing list but the post is currently being held for moderation because of size.
Can this be reviewed?

Apr 4 2022, 8:40 AM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)

Apr 1 2022

aborrero added a parent task for T237773: Move Wikitech onto the production MW cluster: T305233: consider eliminating labweb/cloudweb hardware servers.
Apr 1 2022, 10:09 AM · cloud-services-team (Kanban), wikitech.wikimedia.org
aborrero added a subtask for T305233: consider eliminating labweb/cloudweb hardware servers: T237773: Move Wikitech onto the production MW cluster.
Apr 1 2022, 10:08 AM · cloud-services-team (Kanban)
aborrero created T305233: consider eliminating labweb/cloudweb hardware servers.
Apr 1 2022, 10:05 AM · cloud-services-team (Kanban)

Mar 31 2022

aborrero updated subscribers of T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython.
Mar 31 2022, 3:43 PM · cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T304694: upgrade codfw1dev to wallaby.

forked the eventlet/dnspython problem into T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython

Mar 31 2022, 3:01 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero created T305157: Openstack Wallaby on Debian 11 Bullseye problems because eventlet and dnspython.
Mar 31 2022, 3:00 PM · cloud-services-team (Kanban), Cloud-VPS
aborrero updated subscribers of T304694: upgrade codfw1dev to wallaby.

Again, @dcaro pointed at the combo of dnspython/eventlet as being troubled.

Mar 31 2022, 12:36 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero updated subscribers of T304694: upgrade codfw1dev to wallaby.

Latest theory by @dcaro is name resolution intermixed with IPv6 connectivity issues.

Mar 31 2022, 12:19 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T304694: upgrade codfw1dev to wallaby.

All 3 cloudcontrols show the same mariadb connectivity problem:

Mar 31 2022, 11:44 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero added a comment to T304694: upgrade codfw1dev to wallaby.

Neutron has been detected to be down @ codfw1dev after the upgrade.

Mar 31 2022, 11:38 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS

Mar 30 2022

aborrero awarded T302855: cloudcontrol1005 - Check unit status of backup_cinder_volumes a Barnstar token.
Mar 30 2022, 11:36 AM · Patch-For-Review, Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-Alert, cloud-services-team (Kanban), User-dcaro

Mar 29 2022

aborrero committed rCTKFe023cc530462: jobs-framework-cli: introduce list --long option (authored by aborrero).
jobs-framework-cli: introduce list --long option
Mar 29 2022, 5:05 PM
aborrero committed rCTKF8cd844d26d35: jobs-framework-cli: introduce custom User-Agent (authored by aborrero).
jobs-framework-cli: introduce custom User-Agent
Mar 29 2022, 5:05 PM
aborrero added a comment to T304845: gitlab: consider enabling docker container registry.

The idea to have this gitlab CI-related container images stored in gitlab itself came from this PoC:

Mar 29 2022, 4:15 PM · Patch-For-Review, Release-Engineering-Team (GitLab-a-thon 🦊), GitLab (Administration, Settings & Policy), cloud-services-team (Kanban)
aborrero added a comment to T304900: toolforge-jobs should properly process 'out of quota' errors.

I confirm you are out of quota for more deployments.

Mar 29 2022, 1:52 PM · Toolforge Jobs framework
aborrero closed T286135: Toolforge jobs framework: email maintainers on job failure, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Mar 29 2022, 1:48 PM · Toolforge Jobs framework, cloud-services-team (Kanban)
aborrero closed T286135: Toolforge jobs framework: email maintainers on job failure as Resolved.

This is done for now.

Mar 29 2022, 1:48 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero updated the task description for T302863: Decision request - Toolforge kubernetes container images.
Mar 29 2022, 11:00 AM · Cloud Services Proposals
aborrero awarded T304918: cloud: horizon login fails with invalid credentials a Like token.
Mar 29 2022, 10:51 AM · Patch-For-Review, Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-Alert, cloud-services-team (Kanban), User-dcaro

Mar 28 2022

aborrero closed T302863: Decision request - Toolforge kubernetes container images as Resolved.

We had a meeting today.

Mar 28 2022, 4:47 PM · Cloud Services Proposals
aborrero moved T304845: gitlab: consider enabling docker container registry from Inbox to Watching on the cloud-services-team (Kanban) board.
Mar 28 2022, 3:00 PM · Patch-For-Review, Release-Engineering-Team (GitLab-a-thon 🦊), GitLab (Administration, Settings & Policy), cloud-services-team (Kanban)
aborrero triaged T304845: gitlab: consider enabling docker container registry as Medium priority.
Mar 28 2022, 2:59 PM · Patch-For-Review, Release-Engineering-Team (GitLab-a-thon 🦊), GitLab (Administration, Settings & Policy), cloud-services-team (Kanban)
aborrero created T304845: gitlab: consider enabling docker container registry.
Mar 28 2022, 2:59 PM · Patch-For-Review, Release-Engineering-Team (GitLab-a-thon 🦊), GitLab (Administration, Settings & Policy), cloud-services-team (Kanban)
aborrero closed T304816: Toolforge grid queue problem: epilog failed as Resolved.
Mar 28 2022, 11:34 AM · cloud-services-team (Kanban), Toolforge
aborrero created T304816: Toolforge grid queue problem: epilog failed.
Mar 28 2022, 9:25 AM · cloud-services-team (Kanban), Toolforge

Mar 24 2022

aborrero added a comment to T304598: cloudgw: upgrade servers to Debian 11 Bullseye.

Will wait a few more days before upgrading eqiad server, to run a few more tests, merge these patches, etc.

Mar 24 2022, 4:53 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero updated the task description for T304598: cloudgw: upgrade servers to Debian 11 Bullseye.
Mar 24 2022, 4:35 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero changed the status of T304598: cloudgw: upgrade servers to Debian 11 Bullseye, a subtask of T300254: Upgrade codfw1dev to bullseye, from Open to In Progress.
Mar 24 2022, 12:54 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero changed the status of T304598: cloudgw: upgrade servers to Debian 11 Bullseye from Open to In Progress.
Mar 24 2022, 12:54 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero created T304598: cloudgw: upgrade servers to Debian 11 Bullseye.
Mar 24 2022, 12:51 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS

Mar 23 2022

aborrero updated the task description for T291915: toolforge: automate how we deploy custom k8s components.
Mar 23 2022, 4:21 PM · Toolforge, cloud-services-team (Kanban)
aborrero closed T303931: Decision request - WMCS kubernetes standard deployment code pattern as Resolved.

A meeting was held today and we agreed on going with Option 3, a deploy.sh script.

Mar 23 2022, 3:50 PM · Patch-For-Review, User-dcaro, cloud-services-team (Kanban), Cloud Services Proposals
aborrero updated the task description for T304504: cloud: review lldp setup on hypervisors and VMs.
Mar 23 2022, 12:14 PM · Cloud-Services-Worktype-Maintenance, Cloud-Services-Origin-Team, cloud-services-team (Kanban), User-dcaro
aborrero committed rCTJE6e5b11d630d2: emailer: config: increase task_compose_emails_loop_sleep value (authored by aborrero).
emailer: config: increase task_compose_emails_loop_sleep value
Mar 23 2022, 10:44 AM

Mar 22 2022

aborrero updated the task description for T302863: Decision request - Toolforge kubernetes container images.
Mar 22 2022, 1:15 PM · Cloud Services Proposals
aborrero added a comment to T302863: Decision request - Toolforge kubernetes container images.

I'm adding Option 4: Enable BYOC only for a few selected users that request it.

Mar 22 2022, 1:11 PM · Cloud Services Proposals
aborrero added a comment to T304420: upgrade cloudnet servers to Debian 11 Bullseye.

For the record:

Mar 22 2022, 12:43 PM · Cloud-VPS, cloud-services-team (Kanban)
aborrero updated the task description for T304420: upgrade cloudnet servers to Debian 11 Bullseye.
Mar 22 2022, 12:42 PM · Cloud-VPS, cloud-services-team (Kanban)
aborrero updated the task description for T304420: upgrade cloudnet servers to Debian 11 Bullseye.
Mar 22 2022, 12:39 PM · Cloud-VPS, cloud-services-team (Kanban)
aborrero added a subtask for T285944: Toolforge: beta phase for the new jobs framework: T304421: Allow customizing the out/err files with toolforge-jobs.
Mar 22 2022, 12:30 PM · Toolforge Jobs framework, cloud-services-team (Kanban)
aborrero added a parent task for T304421: Allow customizing the out/err files with toolforge-jobs: T285944: Toolforge: beta phase for the new jobs framework.
Mar 22 2022, 12:30 PM · Toolforge Jobs framework, cloud-services-team (Kanban)
aborrero created T304420: upgrade cloudnet servers to Debian 11 Bullseye.
Mar 22 2022, 12:05 PM · Cloud-VPS, cloud-services-team (Kanban)
aborrero added a comment to T286135: Toolforge jobs framework: email maintainers on job failure.

Done, note the 400:

2022-03-22 10:54:38 INFO: new configuration: {'task_compose_emails_loop_sleep': '400', 'task_send_emails_loop_sleep': '10', 'task_send_emails_max': '10', 'task_watch_pods_timeout': '60', 'task_read_configmap_sleep': '10', 'email_to_domain': 'tools.wmflabs.org', 'email_to_prefix': 'tools', 'email_from_addr': 'noreply@toolforge.org', 'smtp_server_fqdn': 'mail.tools.wmflabs.org', 'smtp_server_port': '25', 'send_emails_for_real': 'yes', 'debug': 'yes'}
Mar 22 2022, 10:55 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286135: Toolforge jobs framework: email maintainers on job failure.

As a quick counter measure, will try increasing the time we cache events before we send an email. Hopefully this is enough to catch repeated events.

Mar 22 2022, 10:52 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286135: Toolforge jobs framework: email maintainers on job failure.

I think I have a theory of what's happening. The k8s API is really chatty about events going on for pods, which is good, but forces the emailer to do some filtering and caching to avoid flooding you with meaningless emails, which could be tricky.

Mar 22 2022, 10:46 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286135: Toolforge jobs framework: email maintainers on job failure.

thanks!

Mar 22 2022, 10:17 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge

Mar 21 2022

aborrero added a comment to T286135: Toolforge jobs framework: email maintainers on job failure.

Can you please paste here the full repeated emails, with the complete email source and headers?

Mar 21 2022, 5:22 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero added a comment to T283894: Spike: Research hosts for preview environment.

Change 767249 had a related patch set uploaded (by Brennen Bearnes; author: Brennen Bearnes):

[operations/puppet@production] WIP: gitlab: enable agent server for kubernetes

https://gerrit.wikimedia.org/r/767249

Mar 21 2022, 4:49 PM · User-brennen, GitLab (Integrations), Patch-For-Review, preview-environment, User-jeena