Page MenuHomePhabricator

aborrero (arturo)
SRE at Wikimedia Cloud Services Team

Projects (9)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 23 2017, 12:19 PM (197 w, 2 d)
Availability
Available
IRC Nick
arturo
LDAP User
Arturo Borrero Gonzalez
MediaWiki User
ABorrero (WMF) [ Global Accounts ]

I'm Arturo Borrero Gonzalez from Spain (Seville). I'm Site Reliability Engineer (SRE) in the Wikimedia Cloud Services Team, a Wikimedia Foundation staff.

You may find me in some FLOSS projects, like Netfilter and Debian.

Recent Activity

Thu, Jul 22

aborrero added a comment to T287107: CloudVPS: we may need DNS records for neutron port VIP addresses.

Just imagine this situation:

Thu, Jul 22, 4:23 PM · cloud-services-team (Kanban), PAWS
aborrero closed T287077: toolforge-jobs: Indicate which containers are deprecated as Resolved.
Thu, Jul 22, 3:36 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T287077: toolforge-jobs: Indicate which containers are deprecated, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Thu, Jul 22, 3:35 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286601: Stop some services before and healthcheck labstore1004/5 following row C network change.

For completeness, drbd after the operation:

Thu, Jul 22, 3:24 PM · cloud-services-team (Kanban)
aborrero closed T286601: Stop some services before and healthcheck labstore1004/5 following row C network change, a subtask of T286065: Switch buffer re-partition - Eqiad Row C, as Resolved.
Thu, Jul 22, 3:22 PM · Patch-For-Review, DBA, Analytics, Infrastructure-Foundations, SRE, netops
aborrero closed T286601: Stop some services before and healthcheck labstore1004/5 following row C network change as Resolved.
Thu, Jul 22, 3:22 PM · cloud-services-team (Kanban)
aborrero added a comment to T286601: Stop some services before and healthcheck labstore1004/5 following row C network change.

Done. No issues detected with the connection test.

Thu, Jul 22, 3:08 PM · cloud-services-team (Kanban)
aborrero closed T286614: Communicate wikireplicas outage and healthcheck the system after Eqiad Row C network changes , a subtask of T286065: Switch buffer re-partition - Eqiad Row C, as Resolved.
Thu, Jul 22, 3:07 PM · Patch-For-Review, DBA, Analytics, Infrastructure-Foundations, SRE, netops
aborrero closed T286614: Communicate wikireplicas outage and healthcheck the system after Eqiad Row C network changes as Resolved.

Done. No packet drop detected on this test.

Thu, Jul 22, 3:07 PM · cloud-services-team (Kanban)
aborrero added a comment to T286601: Stop some services before and healthcheck labstore1004/5 following row C network change.

I'll leave this test running during the operation:

Thu, Jul 22, 2:53 PM · cloud-services-team (Kanban)
aborrero added a comment to T286614: Communicate wikireplicas outage and healthcheck the system after Eqiad Row C network changes .

I'll leave this ping test running for the time of the operation:

Thu, Jul 22, 2:51 PM · cloud-services-team (Kanban)
aborrero added a comment to T287077: toolforge-jobs: Indicate which containers are deprecated.

I think I'll go the configmap route. Let me write a patch.

Thu, Jul 22, 11:10 AM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286601: Stop some services before and healthcheck labstore1004/5 following row C network change.

For the record, before the operation:

Thu, Jul 22, 10:59 AM · cloud-services-team (Kanban)
aborrero renamed T287107: CloudVPS: we may need DNS records for neutron port VIP addresses from CloudVPS: VMs cannot seem to curl public IPs unless they also have public IPs, even with an open security group to CloudVPS: we may need DNS records for neutron port VIP addresses.
Thu, Jul 22, 10:51 AM · cloud-services-team (Kanban), PAWS
aborrero added a comment to T287107: CloudVPS: we may need DNS records for neutron port VIP addresses.

So a simple solution here is to connect to the neutron port VIP directly, instead of the floating IP address. Didn't we have a trick in the resolver to workaround this? Or did we drop it?

We do (labsaliaser), but this address is not directly assigned to a VM (it's managed with keepalived), so it does not find the address when looping thru all instances and their addresses from Nova.

Thu, Jul 22, 10:39 AM · cloud-services-team (Kanban), PAWS
aborrero added a comment to T287107: CloudVPS: we may need DNS records for neutron port VIP addresses.

I see at least one thing I can explain.

Thu, Jul 22, 10:17 AM · cloud-services-team (Kanban), PAWS
aborrero triaged T287107: CloudVPS: we may need DNS records for neutron port VIP addresses as Medium priority.
Thu, Jul 22, 9:28 AM · cloud-services-team (Kanban), PAWS
aborrero updated the task description for T286601: Stop some services before and healthcheck labstore1004/5 following row C network change.
Thu, Jul 22, 9:24 AM · cloud-services-team (Kanban)
aborrero added a comment to T286784: toolforge-jobs: figure out default quotas and limits.

@aborrero if all that sounds reasonable, I can try to start implementing it quick before you are done with the beta so we get feedback.

Thu, Jul 22, 9:15 AM · cloud-services-team (Kanban), Toolforge

Wed, Jul 21

aborrero placed T277549: neutron: investigate using IRC conntrack helpers to improve IRC bots connectiviy up for grabs.
Wed, Jul 21, 3:47 PM · cloud-services-team (Kanban)
aborrero placed T278436: Toolforge: clarify usefullness of 'deb-tools.wmflabs.org' and refresh it if so up for grabs.
Wed, Jul 21, 3:47 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
aborrero placed T209011: Change routing to ensure that traffic originating from Cloud VPS is seen as non-private IPs by Wikimedia wikis up for grabs.
Wed, Jul 21, 3:47 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS
aborrero placed T268335: cloud: neutron l3 agent: improve failover handling up for grabs.
Wed, Jul 21, 3:47 PM · Patch-For-Review, cloud-services-team (Kanban)
aborrero placed T273730: potential NAT overflow up for grabs.
Wed, Jul 21, 3:47 PM · cloud-services-team (Kanban), Cloud-VPS
aborrero placed T273734: consider storing information on cloud NAT mappings up for grabs.
Wed, Jul 21, 3:47 PM · cloud-services-team (Kanban), Cloud-VPS
aborrero placed T273739: Get performance team green light for Cloud NAT to wikis change up for grabs.
Wed, Jul 21, 3:46 PM · Performance-Team (Radar), cloud-services-team (Kanban), Cloud-VPS
aborrero placed T273942: sbuild isn't behaving well in tools up for grabs.
Wed, Jul 21, 3:46 PM · Toolforge, cloud-services-team (Kanban)
aborrero placed T275865: Toolforge: migrate bastions to Debian Buster up for grabs.
Wed, Jul 21, 3:46 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
aborrero placed T277653: Toolforge: migrate grid to Debian Buster up for grabs.
Wed, Jul 21, 3:46 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
aborrero placed T278748: Toolforge: introduce support for selecting grid queue release up for grabs.
Wed, Jul 21, 3:45 PM · Patch-For-Review, Toolforge, cloud-services-team (Kanban)
aborrero placed T256881: wmcs: evaluate impact of stretch-backports being archived up for grabs.
Wed, Jul 21, 3:45 PM · Patch-For-Review, cloud-services-team (Kanban)
aborrero added a comment to T287077: toolforge-jobs: Indicate which containers are deprecated.

The current approach as of this writing to avoid the harcoded container list is to fetch them from the registry at API startup time.

Wed, Jul 21, 2:31 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286108: toolforge-jobs: Clean up old individual job objects as Resolved.
Wed, Jul 21, 12:11 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286108: toolforge-jobs: Clean up old individual job objects, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Wed, Jul 21, 12:10 PM · cloud-services-team (Kanban), Toolforge
aborrero committed rCTKF687c063badb4: wait: if the job doesn't exists it means it was already pruned by k8s (authored by aborrero).
wait: if the job doesn't exists it means it was already pruned by k8s
Wed, Jul 21, 11:49 AM
aborrero added a comment to T287036: Figure out a patched backport of systemd 241 for stretch.

@aborrero do you think we can expect a patched backport from the official repo eventually?

Wed, Jul 21, 8:41 AM · Toolforge, cloud-services-team (Kanban)

Tue, Jul 20

aborrero closed T286492: toolforge-jobs: load jobs from a file, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Tue, Jul 20, 6:07 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286492: toolforge-jobs: load jobs from a file as Resolved.
Tue, Jul 20, 6:07 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286824: toolforge-jobs cli: commands are missing or truncated in list/show output.

oh, I actually know what happened here. I changed the way we store the command in the job definition. Now we wrap the command on a /bin/sh exec to support file logging (T286485), and that's preventing the jobs API parser from working properly.

Tue, Jul 20, 6:05 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286824: toolforge-jobs cli: commands are missing or truncated in list/show output.

I just uploaded new versions of both the API and the CLI.

Tue, Jul 20, 6:01 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286126: toolforge-jobs: allow setting limits and requests as Resolved.
Tue, Jul 20, 5:49 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286126: toolforge-jobs: allow setting limits and requests, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Tue, Jul 20, 5:48 PM · cloud-services-team (Kanban), Toolforge
aborrero committed rCTKFd5ac84e6b975: d/changelog: generate entry for release 3 (authored by aborrero).
d/changelog: generate entry for release 3
Tue, Jul 20, 5:40 PM
aborrero committed rCTKFfbab6bc079b5: jobs-framework-cli: add basic test suite (authored by aborrero).
jobs-framework-cli: add basic test suite
Tue, Jul 20, 5:20 PM
aborrero committed rCTKF118f5b53d99d: toolforge-jobs: move CLI configuration into a file (authored by aborrero).
toolforge-jobs: move CLI configuration into a file
Tue, Jul 20, 4:49 PM
aborrero committed rCTKF099a554a7b4f: toolforge-jobs: don't show unknown fields (authored by aborrero).
toolforge-jobs: don't show unknown fields
Tue, Jul 20, 4:46 PM
aborrero closed T286600: failover cloud NFS from labstore1007 to labstore1006, a subtask of T286069: Switch buffer re-partition - Eqiad Row D, as Resolved.
Tue, Jul 20, 9:54 AM · Patch-For-Review, DBA, cloud-services-team (Kanban), Analytics, Infrastructure-Foundations, SRE, netops
aborrero closed T286600: failover cloud NFS from labstore1007 to labstore1006 as Resolved.
Tue, Jul 20, 9:54 AM · cloud-services-team (Kanban)
aborrero updated the task description for T286069: Switch buffer re-partition - Eqiad Row D.
Tue, Jul 20, 9:52 AM · Patch-For-Review, DBA, cloud-services-team (Kanban), Analytics, Infrastructure-Foundations, SRE, netops

Mon, Jul 19

aborrero updated the task description for T286614: Communicate wikireplicas outage and healthcheck the system after Eqiad Row C network changes .
Mon, Jul 19, 5:14 PM · cloud-services-team (Kanban)
aborrero claimed T286614: Communicate wikireplicas outage and healthcheck the system after Eqiad Row C network changes .
Mon, Jul 19, 4:30 PM · cloud-services-team (Kanban)
aborrero triaged T286600: failover cloud NFS from labstore1007 to labstore1006 as Medium priority.
Mon, Jul 19, 4:30 PM · cloud-services-team (Kanban)
aborrero claimed T286601: Stop some services before and healthcheck labstore1004/5 following row C network change.
Mon, Jul 19, 4:29 PM · cloud-services-team (Kanban)
aborrero added a comment to T286600: failover cloud NFS from labstore1007 to labstore1006.

I just checked this: labstore1007 is already standby. labstore1006 is active. No actions are required for this server.

Mon, Jul 19, 4:29 PM · cloud-services-team (Kanban)
aborrero updated the task description for T286600: failover cloud NFS from labstore1007 to labstore1006.
Mon, Jul 19, 4:28 PM · cloud-services-team (Kanban)
aborrero claimed T286600: failover cloud NFS from labstore1007 to labstore1006.
Mon, Jul 19, 4:14 PM · cloud-services-team (Kanban)

Fri, Jul 16

aborrero committed rCTKF30dc3e85ad42: toolforge-jobs: add support for job resource limits (authored by aborrero).
toolforge-jobs: add support for job resource limits
Fri, Jul 16, 5:19 PM
aborrero committed rCTKFbd5d3e759f5d: toolforge-jobs: fix parser group for --no-filelog option (authored by aborrero).
toolforge-jobs: fix parser group for --no-filelog option
Fri, Jul 16, 5:16 PM
aborrero committed rCTKF177f1d8c06db: toolforge-jobs: add new option to load jobs from a YAML file (authored by aborrero).
toolforge-jobs: add new option to load jobs from a YAML file
Fri, Jul 16, 5:15 PM
aborrero committed rCTKFeadb68286559: toolforge-jobs: introduce job name hint in wait report (authored by aborrero).
toolforge-jobs: introduce job name hint in wait report
Fri, Jul 16, 5:13 PM
aborrero closed T286132: toolforge-jobs: "Status: Unknown" when job is running as Resolved.
Fri, Jul 16, 2:05 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286132: toolforge-jobs: "Status: Unknown" when job is running, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Fri, Jul 16, 2:04 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286108: toolforge-jobs: Clean up old individual job objects.

I suspect we'll be on k8s 1.21 before we leave the beta phase for this.

How long do you expect the beta phase to last?

Fri, Jul 16, 1:23 PM · cloud-services-team (Kanban), Toolforge
aborrero updated the task description for T286784: toolforge-jobs: figure out default quotas and limits.
Fri, Jul 16, 12:51 PM · cloud-services-team (Kanban), Toolforge
aborrero created T286784: toolforge-jobs: figure out default quotas and limits.
Fri, Jul 16, 12:47 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286108: toolforge-jobs: Clean up old individual job objects.

I wasn't uncomfortable with having to delete each job individually after completion. It allowed to review execution results and status.
But I totally understand the desire for them to be auto-cleaned up.

Fri, Jul 16, 12:44 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286135: Toolforge jobs framework: email maintainers on job failure.

I'm currently evaluating this: https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/ i.e, simply executing a shell oneliner to send an email from within the job pod.

Fri, Jul 16, 12:37 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286003: NFS backups on aptly aren't set up for tools-services-05.tools.eqiad1.wikimedia.cloud and jessie packages are gone.

I added the package back!

Fri, Jul 16, 11:58 AM · Toolforge, cloud-services-team (Kanban)
aborrero added a comment to T286197: Evaluate nginx-controller as an Ingress.

Sharing a bit our experience @ WMCS with ingress-nginx:

Fri, Jul 16, 11:39 AM · MW-on-K8s, serviceops, SRE

Thu, Jul 15

JJMC89 awarded T286492: toolforge-jobs: load jobs from a file a Like token.
Thu, Jul 15, 5:04 PM · cloud-services-team (Kanban), Toolforge
aborrero committed rCTKFab56766426c8: d/changelog: generate entry for release 2 (authored by aborrero).
d/changelog: generate entry for release 2
Thu, Jul 15, 4:18 PM
aborrero closed T285963: jobs framework should not error out when executed by a non-tool account, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Thu, Jul 15, 4:15 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T285963: jobs framework should not error out when executed by a non-tool account as Resolved.
Thu, Jul 15, 4:15 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T285979: toolforge-jobs with --wait hangs indefinitely if the job fails as Resolved.
Thu, Jul 15, 4:14 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T285979: toolforge-jobs with --wait hangs indefinitely if the job fails, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Thu, Jul 15, 4:14 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286485: toolforge-jobs: figure out logging as Resolved.
Thu, Jul 15, 4:14 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero closed T286485: toolforge-jobs: figure out logging , a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Thu, Jul 15, 4:14 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286107: toolforge-jobs: Allow specifying arguments to commands as Resolved.
Thu, Jul 15, 4:13 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T286107: toolforge-jobs: Allow specifying arguments to commands, a subtask of T285944: Toolforge: beta phase for the new jobs framework, as Resolved.
Thu, Jul 15, 4:13 PM · cloud-services-team (Kanban), Toolforge
aborrero committed rCTKFa716aed12674: toolforge-jobs: document command arguments (authored by aborrero).
toolforge-jobs: document command arguments
Thu, Jul 15, 12:22 PM
aborrero committed rCTKF120492263d7c: toolforge-jobs: add support new filelog option (authored by aborrero).
toolforge-jobs: add support new filelog option
Thu, Jul 15, 12:21 PM
aborrero committed rCTKF1cc819620100: toolforge-jobs: make --wait smarter (authored by aborrero).
toolforge-jobs: make --wait smarter
Thu, Jul 15, 12:20 PM
aborrero committed rCTKF50d859ca2633: toolforge-jobs: explicit warning if not running as tool account (authored by aborrero).
toolforge-jobs: explicit warning if not running as tool account
Thu, Jul 15, 12:19 PM
aborrero updated subscribers of T286003: NFS backups on aptly aren't set up for tools-services-05.tools.eqiad1.wikimedia.cloud and jessie packages are gone.

I have more information on this.

Thu, Jul 15, 10:52 AM · Toolforge, cloud-services-team (Kanban)

Mon, Jul 12

aborrero added a comment to T285944: Toolforge: beta phase for the new jobs framework.

Is there a way to dump/export the toolforge-jobs config so it can be committed in Git? And then a way to import that config back. I tend to commit crontabs for documentation and transparency purposes.

Mon, Jul 12, 4:49 PM · cloud-services-team (Kanban), Toolforge
aborrero created T286492: toolforge-jobs: load jobs from a file.
Mon, Jul 12, 4:49 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286126: toolforge-jobs: allow setting limits and requests.

I don't see any other option here but add additional parameters to our REST API. Will do!

Mon, Jul 12, 4:37 PM · cloud-services-team (Kanban), Toolforge
aborrero updated the task description for T286485: toolforge-jobs: figure out logging .
Mon, Jul 12, 3:05 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero created T286485: toolforge-jobs: figure out logging .
Mon, Jul 12, 3:03 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286135: Toolforge jobs framework: email maintainers on job failure.

Is this something the grid supports today?

Mon, Jul 12, 12:06 PM · cloud-services-team (Kanban), Toolforge
aborrero added a comment to T286107: toolforge-jobs: Allow specifying arguments to commands.

For the record, I identified the problem early in the development of the framework. It was a conscious decision to leave it out in the first iteration because I wanted to collect clear use cases and expectations.
I was hoping that the wrapper approach that is suggested in wikitech was enough to cover basically all cases.

Mon, Jul 12, 9:48 AM · cloud-services-team (Kanban), Toolforge
aborrero moved T284823: Delete content from labtestwikitech from Clinic Duty to Needs discussion on the cloud-services-team (Kanban) board.
Mon, Jul 12, 9:33 AM · cloud-services-team (Kanban), wikitech.wikimedia.org
aborrero added a comment to T286065: Switch buffer re-partition - Eqiad Row C.

> @aborrero does cloudgw require manual failover?

Mon, Jul 12, 8:59 AM · Patch-For-Review, DBA, Analytics, Infrastructure-Foundations, SRE, netops

Jul 2 2021

aborrero added a comment to T285944: Toolforge: beta phase for the new jobs framework.

If toolforge-jobs is only available on dev-buster.toolforge.org – is the SSH fingerprint for that host available somewhere? (I don’t see it on Help:SSH Fingerprints yet, nor in the draft announcement.)

Jul 2 2021, 8:54 AM · cloud-services-team (Kanban), Toolforge

Jul 1 2021

Krinkle awarded T285944: Toolforge: beta phase for the new jobs framework a Love token.
Jul 1 2021, 4:09 PM · cloud-services-team (Kanban), Toolforge
aborrero triaged T285944: Toolforge: beta phase for the new jobs framework as Medium priority.
Jul 1 2021, 3:33 PM · cloud-services-team (Kanban), Toolforge
aborrero created T285944: Toolforge: beta phase for the new jobs framework.
Jul 1 2021, 12:38 PM · cloud-services-team (Kanban), Toolforge
aborrero closed T283238: Toolforge: develop jobs-framework-api as Resolved.

The initial build out is completed. Closing the ticket now.

Jul 1 2021, 12:10 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero closed T283238: Toolforge: develop jobs-framework-api, a subtask of T251917: Design the Jobs service in k8s, as Resolved.
Jul 1 2021, 12:09 PM · Patch-For-Review, cloud-services-team (Kanban), Toolforge
aborrero committed rCTKF19d115bfa2b0: debian/: run wrap-and-sort, refresh json dependency (authored by aborrero).
debian/: run wrap-and-sort, refresh json dependency
Jul 1 2021, 11:49 AM