Page MenuHomePhabricator

RLazarus (Reuven Lazarus) (rzl)
User

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 15 2019, 4:02 PM (140 w, 5 d)
Availability
Available
IRC Nick
rzl
LDAP User
RLazarus
MediaWiki User
Unknown

Recent Activity

Mon, Jun 13

RLazarus triaged T310557: Shellbox resource management as Medium priority.
Mon, Jun 13, 11:03 PM · Shellbox, serviceops, SRE
RLazarus created T310557: Shellbox resource management.
Mon, Jun 13, 11:03 PM · Shellbox, serviceops, SRE

Fri, Jun 10

RLazarus updated the task description for T300324: Upgrade Envoy to supported version.
Fri, Jun 10, 6:34 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mon, Jun 6

RLazarus triaged T310009: Make it easier to create a new requestctl object as Medium priority.
Mon, Jun 6, 5:50 PM · Sustainability (Incident Followup), conftool, SRE

Wed, Jun 1

RLazarus added a comment to T299914: Bring current version of Metrics Platform schema inline with current ERD .

@EChetty Hi from the SLO project! Thanks for this -- Will asked me to have a look at the schema, from the POV of where we're going next with the edit save SLO. Broadly I think this is going to have everything we need, just a couple of clarification questions.

Wed, Jun 1, 2:26 AM · Patch-For-Review, Metrics-Platform

Sat, May 28

RLazarus created T309447: Icinga paged for a host that should have been downtimed.
Sat, May 28, 7:20 PM · SRE-tools, Infrastructure-Foundations, Icinga, observability, SRE

May 16 2022

RLazarus updated the task description for T308294: Grant Access to `wmf` for `Dmantena`.
May 16 2022, 8:58 PM · Data-Engineering, SRE, LDAP-Access-Requests
RLazarus closed T308294: Grant Access to `wmf` for `Dmantena` as Resolved.

Great, thanks!

May 16 2022, 8:57 PM · Data-Engineering, SRE, LDAP-Access-Requests

May 13 2022

RLazarus added a project to T308350: Access to trusted gitlab runners for gitlab-roots (or appropriate similar group): Infrastructure-Foundations.

Hmm, also: As a group access change, this should be reviewed and approved in the Infrastructure-Foundations team meeting.

May 13 2022, 6:36 PM · Infrastructure-Foundations, SRE, Release-Engineering-Team (GitLab-a-thon 🦊), serviceops, GitLab (CI & Job Runners), User-brennen, SRE-Access-Requests
RLazarus triaged T308350: Access to trusted gitlab runners for gitlab-roots (or appropriate similar group) as Medium priority.
May 13 2022, 6:28 PM · Infrastructure-Foundations, SRE, Release-Engineering-Team (GitLab-a-thon 🦊), serviceops, GitLab (CI & Job Runners), User-brennen, SRE-Access-Requests
RLazarus moved T308350: Access to trusted gitlab runners for gitlab-roots (or appropriate similar group) from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.

We're past the European work day, so I don't expect a response from Lukasz (who's OOO) or Alex before Monday. I'm sure next week's SRE clinician will pick this up once it's signed off.

May 13 2022, 6:28 PM · Infrastructure-Foundations, SRE, Release-Engineering-Team (GitLab-a-thon 🦊), serviceops, GitLab (CI & Job Runners), User-brennen, SRE-Access-Requests
RLazarus triaged T308308: Requesting access to the deployment POSIX group for aikochou and kevinbazira as Medium priority.
May 13 2022, 5:45 PM · Machine-Learning-Team (Active Tasks), SRE, SRE-Access-Requests
RLazarus changed the status of T308308: Requesting access to the deployment POSIX group for aikochou and kevinbazira from Open to Stalled.

I may have created this task too soon, some discussion on T305729 is still happening, let's wait before proceeding.

May 13 2022, 5:35 PM · Machine-Learning-Team (Active Tasks), SRE, SRE-Access-Requests
RLazarus added a comment to T308294: Grant Access to `wmf` for `Dmantena`.

Hi @Dmantena, you should be all set now:

May 13 2022, 3:19 AM · Data-Engineering, SRE, LDAP-Access-Requests
RLazarus moved T308294: Grant Access to `wmf` for `Dmantena` from Backlog to Awaiting User Input on the LDAP-Access-Requests board.
May 13 2022, 3:18 AM · Data-Engineering, SRE, LDAP-Access-Requests
RLazarus added a member for WMF-NDA: Dmantena.
May 13 2022, 3:12 AM

May 11 2022

thcipriani awarded T307351: Add new user identity to Keyholder for scap a Love token.
May 11 2022, 10:35 PM · SRE-Access-Requests, SRE, Scap
RLazarus closed T307351: Add new user identity to Keyholder for scap as Resolved.

This should be all set!

May 11 2022, 7:20 PM · SRE-Access-Requests, SRE, Scap

May 10 2022

RLazarus closed T308053: Requesting access to analytics-privatedata-users for RoccoMo as Resolved.

Added to ldap/nda:

May 10 2022, 8:40 PM · SRE, SRE-Access-Requests
RLazarus closed T305373: +2 for esther-akinloose in Gerrit (mediawiki/extensions/VisualEditor) as Resolved.
rzl@mwmaint1002:~$ ldapsearch -x cn=wmf | grep esther-akinloose
member: uid=esther-akinloose,ou=people,dc=wikimedia,dc=org
May 10 2022, 8:01 PM · SRE, LDAP-Access-Requests, User-zeljkofilipin
RLazarus added a comment to T308053: Requesting access to analytics-privatedata-users for RoccoMo.

Thanks both! Proceeding.

May 10 2022, 8:01 PM · SRE, SRE-Access-Requests
RLazarus updated the task description for T308053: Requesting access to analytics-privatedata-users for RoccoMo.
May 10 2022, 8:01 PM · SRE, SRE-Access-Requests
RLazarus added a member for WMF-NDA: EAkinloose.
May 10 2022, 7:58 PM
RLazarus added a comment to T305373: +2 for esther-akinloose in Gerrit (mediawiki/extensions/VisualEditor).

After consulting with SRE colleagues, I stand corrected -- the email address on the account is fine, and we'll just use the wikimedia.org address in our own records. Going ahead!

May 10 2022, 7:44 PM · SRE, LDAP-Access-Requests, User-zeljkofilipin
RLazarus updated subscribers of T308053: Requesting access to analytics-privatedata-users for RoccoMo.

@RoccoMo Hi from the SRE team! Thanks for the request, we'll get you sorted out shortly.

May 10 2022, 6:31 PM · SRE, SRE-Access-Requests
RLazarus added a comment to T305373: +2 for esther-akinloose in Gerrit (mediawiki/extensions/VisualEditor).

I think she already has a wikitech account: https://wikitech.wikimedia.org/wiki/User:Esther_Akinloose

May 10 2022, 4:55 PM · SRE, LDAP-Access-Requests, User-zeljkofilipin
RLazarus updated subscribers of T307351: Add new user identity to Keyholder for scap.

Thanks both @Joe and @thcipriani for the ping -- agreed clinic duty is as good a route for this as any.

May 10 2022, 12:56 AM · SRE-Access-Requests, SRE, Scap

May 9 2022

RLazarus closed T307737: Grant Access to PII in Superset for HMonroy and Dmaza as Resolved.

You're both in the wmf group already, so nothing to do there:

May 9 2022, 7:16 PM · SRE-Access-Requests, SRE, Community-Tech
RLazarus added a comment to T305373: +2 for esther-akinloose in Gerrit (mediawiki/extensions/VisualEditor).

Just picking up SRE clinic duty for the week -- I'm so sorry this has been sitting for so long! I'll ask around and try to find out what happened here.

May 9 2022, 7:04 PM · SRE, LDAP-Access-Requests, User-zeljkofilipin
RLazarus placed T307737: Grant Access to PII in Superset for HMonroy and Dmaza up for grabs.

Grabbing this from @jhathaway as I've taken over SRE clinic duty for this week. This is actually the right template for the use case, since Superset access is via LDAP, not SSH (even though PII access is controlled with a posix group). We can go right ahead with it.

May 9 2022, 6:14 PM · SRE-Access-Requests, SRE, Community-Tech

Apr 26 2022

RLazarus triaged T306860: Videoscalers fail health checks while CPU is maxed as High priority.
Apr 26 2022, 2:36 AM · Sustainability (Incident Followup), WMF-JobQueue, serviceops, SRE

Apr 18 2022

RLazarus updated the task description for T306397: Service Ops SRE support for iOS notifications update.
Apr 18 2022, 9:45 PM · serviceops, SRE
RLazarus triaged T306397: Service Ops SRE support for iOS notifications update as Medium priority.
Apr 18 2022, 9:44 PM · serviceops, SRE
RLazarus created T306397: Service Ops SRE support for iOS notifications update.
Apr 18 2022, 9:44 PM · serviceops, SRE

Apr 14 2022

RLazarus claimed T305581: ipblocks support for other "entities" (not clouds, not abuse nets).

Yeah -- I can do the implementation but I'm not sure if we've settled on what we want it to look like.

Apr 14 2022, 9:26 PM · Patch-For-Review, SRE, conftool

Apr 12 2022

Dzahn awarded T299705: Debian package for httpbb a Grey Medal token.
Apr 12 2022, 5:07 PM · Patch-For-Review, serviceops, SRE

Apr 8 2022

RLazarus added a comment to T299989: Pairing tool for new SREs using sudo under supervision.

@MoritzMuehlenhoff Checking in -- have you had any time to take a look at this?

Apr 8 2022, 1:55 AM · User-MoritzMuehlenhoff, SRE-tools, Infrastructure-Foundations, SRE

Apr 5 2022

RLazarus closed T299705: Debian package for httpbb, a subtask of T236699: Build a black-box httpd testing framework, as Resolved.
Apr 5 2022, 8:35 PM · Wikimedia-Apache-configuration, SRE, serviceops
RLazarus closed T299705: Debian package for httpbb as Resolved.
Apr 5 2022, 8:35 PM · Patch-For-Review, serviceops, SRE

Mar 31 2022

RLazarus added a project to T305119: Slow query bringing down s7: SRE-OnFire.

Draft incident report: https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-03-31_api_errors

Mar 31 2022, 8:31 PM · SRE-OnFire, Vuln-DoS, SecTeam-Processed, Sustainability (Incident Followup), Data-Persistence (Consultation), Platform Engineering, MediaWiki-extensions-CentralAuth, Security, Security-Team
RLazarus moved T305119: Slow query bringing down s7 from Backlog to Pending Review & Scorecard on the SRE-OnFire board.
Mar 31 2022, 8:31 PM · SRE-OnFire, Vuln-DoS, SecTeam-Processed, Sustainability (Incident Followup), Data-Persistence (Consultation), Platform Engineering, MediaWiki-extensions-CentralAuth, Security, Security-Team

Mar 29 2022

RLazarus added a comment to T289202: Run httpbb periodically.

Another way I'd like to improve this is to deal with Puppet skew on the two hosts.

Mar 29 2022, 2:32 AM · Patch-For-Review, serviceops, SRE

Mar 28 2022

RLazarus added a comment to T228812: Place names get cut off and unreadable on tile boundary.

Found another example of this, in case the extra data helps -- thanks @MSantos and @ECohen_WMDE for pointing me to the right task. (Moved here from T228612.)

Mar 28 2022, 9:15 PM · Maps (Kartotherian)
RLazarus added a comment to T228612: Place names get cut off and unreadable due to placement of neighbouring labels.

@RLazarus - I feel like your screenshots are actually an example of the bug in this task T228812: Place names get cut off and unreadable on tile boundary

Mar 28 2022, 9:15 PM · Maps (Map-Styles)

Mar 27 2022

RLazarus created T304800: Set API server weights.
Mar 27 2022, 8:13 PM · Sustainability (Incident Followup), serviceops, SRE
RLazarus created T304799: Investigate shorter-lived persistent connections for Envoy.
Mar 27 2022, 8:11 PM · Sustainability (Incident Followup), ChangeProp, envoy, serviceops, SRE

Mar 24 2022

RLazarus triaged T304660: Better automated validation of Puppet-generated Envoy configs as Medium priority.
Mar 24 2022, 9:26 PM · Patch-For-Review, serviceops, envoy, SRE
RLazarus awarded T292606: Grafana share button drops duplicate URL params a Like token.
Mar 24 2022, 9:01 PM · Observability-Metrics, SRE
RLazarus reopened T303230: Refactor envoy HTTP protocol options to new version as "In Progress".
Mar 24 2022, 6:08 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus reopened T303230: Refactor envoy HTTP protocol options to new version, a subtask of T300324: Upgrade Envoy to supported version, as In Progress.
Mar 24 2022, 6:08 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus closed T303230: Refactor envoy HTTP protocol options to new version, a subtask of T300324: Upgrade Envoy to supported version, as Resolved.
Mar 24 2022, 5:05 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus closed T303230: Refactor envoy HTTP protocol options to new version as Resolved.
Mar 24 2022, 5:05 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mar 23 2022

RLazarus changed the status of T303230: Refactor envoy HTTP protocol options to new version from Stalled to In Progress.
Mar 23 2022, 11:40 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus changed the status of T303230: Refactor envoy HTTP protocol options to new version, a subtask of T300324: Upgrade Envoy to supported version, from Stalled to In Progress.
Mar 23 2022, 11:40 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus added a comment to T300324: Upgrade Envoy to supported version.

Hmm, the 1.21.1 build didn't work out of the box. Running build-envoy-deb buster future got me this:

Mar 23 2022, 6:49 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus closed T303770: Clean up Puppet support for Envoy v2 config API as Resolved.
Mar 23 2022, 5:26 PM · Beta-Cluster-Infrastructure, SRE, serviceops, envoy
RLazarus closed T303770: Clean up Puppet support for Envoy v2 config API, a subtask of T300324: Upgrade Envoy to supported version, as Resolved.
Mar 23 2022, 5:26 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mar 22 2022

RLazarus updated RLazarus.
Mar 22 2022, 8:48 PM
RLazarus updated RLazarus.
Mar 22 2022, 8:46 PM
RLazarus updated RLazarus.
Mar 22 2022, 8:45 PM
RLazarus changed IRC Nick from rlazarus to rzl on RLazarus.
Mar 22 2022, 8:45 PM
RLazarus updated the task description for T300324: Upgrade Envoy to supported version.
Mar 22 2022, 12:20 AM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus added a comment to T300324: Upgrade Envoy to supported version.

As in T300324#7752134, I've rolled out all the k8s services where Envoy version was the only diff. We're now up to 1.18 everywhere, except for k8s services with other undeployed changes, and I'll follow up with those at the end.

Mar 22 2022, 12:20 AM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mar 20 2022

RLazarus updated the task description for T304237: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet.
Mar 20 2022, 5:26 PM · Patch-For-Review, Infrastructure-Foundations, serviceops, SRE
RLazarus triaged T304237: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet as High priority.
Mar 20 2022, 5:22 PM · Patch-For-Review, Infrastructure-Foundations, serviceops, SRE

Mar 17 2022

RLazarus updated the task description for T300324: Upgrade Envoy to supported version.
Mar 17 2022, 10:35 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus changed the status of T304124: Refactor envoy max_requests_per_connection from Cluster to HttpProtocolOptions, a subtask of T300324: Upgrade Envoy to supported version, from Open to Stalled.
Mar 17 2022, 10:26 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus changed the status of T304124: Refactor envoy max_requests_per_connection from Cluster to HttpProtocolOptions from Open to Stalled.
Mar 17 2022, 10:26 PM · SRE, serviceops, envoy
RLazarus created T304124: Refactor envoy max_requests_per_connection from Cluster to HttpProtocolOptions.
Mar 17 2022, 10:21 PM · SRE, serviceops, envoy

Mar 16 2022

RLazarus added a project to T304005: Etherpads corrupted: SRE.

From the time sliders it looks like the issue is that all or part of the pad gets deleted and replaced by a character, at these revisions respectively:

Mar 16 2022, 8:13 PM · SRE, serviceops, Wikimedia-Etherpad

Mar 14 2022

RLazarus added a project to T303770: Clean up Puppet support for Envoy v2 config API: Beta-Cluster-Infrastructure.
Mar 14 2022, 8:27 PM · Beta-Cluster-Infrastructure, SRE, serviceops, envoy
RLazarus updated subscribers of T303770: Clean up Puppet support for Envoy v2 config API.
Mar 14 2022, 8:21 PM · Beta-Cluster-Infrastructure, SRE, serviceops, envoy
RLazarus created T303770: Clean up Puppet support for Envoy v2 config API.
Mar 14 2022, 8:21 PM · Beta-Cluster-Infrastructure, SRE, serviceops, envoy

Mar 11 2022

RLazarus updated the task description for T303231: Refactor envoy access_log_path to access loggers.
Mar 11 2022, 6:01 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mar 10 2022

RLazarus added a comment to T300119: Using port in Host header for thanos-swift / thanos-query breaks vhost selection.

Oh, yep, it's strip_matching_host_port in the HTTP connection manager: https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto

Mar 10 2022, 6:09 PM · SRE Observability (FY2021/2022-Q4), Patch-For-Review, envoy, serviceops, User-fgiunchedi
RLazarus added a comment to T300119: Using port in Host header for thanos-swift / thanos-query breaks vhost selection.

I just upgraded thanos-fe to envoy 1.18.3, but out of the box I see the same behavior:

Mar 10 2022, 6:06 PM · SRE Observability (FY2021/2022-Q4), Patch-For-Review, envoy, serviceops, User-fgiunchedi
RLazarus added a parent task for T287983: Raw "upstream connect error or disconnect/reset before headers. reset reason: overflow" error message shown to users during outage: Unknown Object (Task).
Mar 10 2022, 5:10 PM · serviceops, envoy, Sustainability (Incident Followup), SRE, Traffic

Mar 9 2022

RLazarus updated the task description for T300324: Upgrade Envoy to supported version.
Mar 9 2022, 6:15 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mar 7 2022

RLazarus triaged T303231: Refactor envoy access_log_path to access loggers as Medium priority.
Mar 7 2022, 11:05 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus created T303231: Refactor envoy access_log_path to access loggers.
Mar 7 2022, 11:02 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus changed the status of T303230: Refactor envoy HTTP protocol options to new version from Open to Stalled.
Mar 7 2022, 10:55 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus changed the status of T303230: Refactor envoy HTTP protocol options to new version, a subtask of T300324: Upgrade Envoy to supported version, from Open to Stalled.
Mar 7 2022, 10:55 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus created T303230: Refactor envoy HTTP protocol options to new version.
Mar 7 2022, 10:53 PM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mar 4 2022

RLazarus updated the task description for T300324: Upgrade Envoy to supported version.
Mar 4 2022, 1:49 AM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus added a comment to T300324: Upgrade Envoy to supported version.

1.15.4 is still running in a few places on k8s -- after bumping the default version, I rolled out all services where that was the only diff. Some services had some undeployed changes from who knows how long ago, so I left them untouched (T265979 for that problem in general).

Mar 4 2022, 1:46 AM · Patch-For-Review, SRE, Traffic, serviceops, envoy
RLazarus updated the task description for T300324: Upgrade Envoy to supported version.
Mar 4 2022, 1:40 AM · Patch-For-Review, SRE, Traffic, serviceops, envoy

Mar 3 2022

RLazarus added a comment to T302842: SLO dashboard refinements.

To take a step back, the varnish slo dashboard linked in the description didn't actually originate from a template. Presumably this one was a manual fork of the original etcd slo example dashboard that's been manually adjusted.

Mar 3 2022, 11:05 PM · SRE Observability (FY2021/2022-Q4), SRE

Mar 1 2022

RLazarus created T302842: SLO dashboard refinements.
Mar 1 2022, 10:09 PM · SRE Observability (FY2021/2022-Q4), SRE

Feb 25 2022

RLazarus triaged T302549: Evaluate whether and how to route abuse@ emails to Legal as Low priority.
Feb 25 2022, 12:41 AM · Mail, SRE, Infrastructure-Foundations

Feb 24 2022

RLazarus added a comment to T301505: upstream connect error or disconnect/reset before headers. reset reason: overflow.

Thanks for letting us know! We did indeed have this issue again for a few minutes earlier (intermittently between 02:36 and 03:00 UTC) but things are back to normal now. Sorry for the inconvenience, and more permanent solutions are in progress to keep this from happening again.

Feb 24 2022, 3:41 AM · User-Ladsgroup, SRE, Wikimedia-Incident

Feb 23 2022

RLazarus added a comment to T287983: Raw "upstream connect error or disconnect/reset before headers. reset reason: overflow" error message shown to users during outage.

This came up again in T301507.

Feb 23 2022, 10:42 PM · serviceops, envoy, Sustainability (Incident Followup), SRE, Traffic

Feb 17 2022

RLazarus claimed T301505: upstream connect error or disconnect/reset before headers. reset reason: overflow.
Feb 17 2022, 2:18 AM · User-Ladsgroup, SRE, Wikimedia-Incident

Feb 14 2022

RLazarus closed T301606: Requesting access to analytics-privatedata-users for rzl as Resolved.

Done, thanks!

Feb 14 2022, 5:56 PM · SRE, SRE-Access-Requests
RLazarus updated the task description for T301606: Requesting access to analytics-privatedata-users for rzl.
Feb 14 2022, 5:56 PM · SRE, SRE-Access-Requests

Feb 12 2022

RLazarus created T301606: Requesting access to analytics-privatedata-users for rzl.
Feb 12 2022, 2:28 AM · SRE, SRE-Access-Requests
RLazarus created T301605: Central and South American countries in geo-maps.
Feb 12 2022, 2:08 AM · DNS, Traffic

Feb 9 2022

RLazarus updated subscribers of T228612: Place names get cut off and unreadable due to placement of neighbouring labels.

Found another example of this, in case the extra data helps -- thanks @MSantos for pointing me to the right task.

Feb 9 2022, 11:27 PM · Maps (Map-Styles)
RLazarus claimed T57857: Unit tests for apache config/rewrites.
Feb 9 2022, 10:33 PM · Wikimedia-Apache-configuration
RLazarus placed T57857: Unit tests for apache config/rewrites up for grabs.
Feb 9 2022, 10:33 PM · Wikimedia-Apache-configuration
RLazarus closed T237407: basic prometheus monitoring for PoolCounter as Resolved.
Feb 9 2022, 10:26 PM · SRE, observability, serviceops