Page MenuHomePhabricator

RLazarus (Reuven Lazarus) (rzl)
User

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 15 2019, 4:02 PM (179 w, 6 d)
Availability
Available
IRC Nick
rzl
LDAP User
RLazarus
MediaWiki User
Unknown

Recent Activity

Mon, Mar 20

RLazarus awarded T331801: Webrequest Sampled Live on Superset shows data from only upload and not text CDN nodes a Love token.
Mon, Mar 20, 9:07 PM · User-fgiunchedi, SRE Observability, SRE

Sun, Mar 12

RLazarus created T331801: Webrequest Sampled Live on Superset shows data from only upload and not text CDN nodes.
Sun, Mar 12, 3:07 AM · User-fgiunchedi, SRE Observability, SRE

Fri, Mar 10

RLazarus added a comment to T331461: Logstash SLO excursion on 2023-02-11.

Perfect, thank you! I started https://wikitech.wikimedia.org/wiki/Incidents/2023-02-11_logstash_latency and filled in what we know so far (as far as I know, anyway).

Fri, Mar 10, 2:57 AM · Wikimedia-Logstash, Observability-Logging, SRE

Tue, Mar 7

RLazarus assigned T331461: Logstash SLO excursion on 2023-02-11 to lmata.
Tue, Mar 7, 5:46 PM · Wikimedia-Logstash, Observability-Logging, SRE
RLazarus triaged T331461: Logstash SLO excursion on 2023-02-11 as High priority.
Tue, Mar 7, 5:45 PM · Wikimedia-Logstash, Observability-Logging, SRE

Mon, Mar 6

RLazarus updated subscribers of T326363: mw2420-mw2451 service implementation tracking.

@akosiaris and @Clement_Goubert will come up with a cluster layout this week, and @Clement_Goubert wanted to try putting at least one or two into service themselves. Feel free to assign to me afterward to churn through the rest.

Mon, Mar 6, 4:52 PM · SRE, serviceops

Thu, Mar 2

RLazarus updated subscribers of T330973: Add unique error IDs to 4xx responses.

Adding @CDanis as we were just talking about something along these lines.

Thu, Mar 2, 5:07 PM · Patch-For-Review, SRE, Traffic

Feb 25 2023

RLazarus claimed T290989: switchdc cache warmup should include URLs that warmup relevant Wikidata caches.
Feb 25 2023, 12:08 AM · Data-Persistence (work done), Patch-For-Review, Datacenter-Switchover, wdwb-tech, Wikidata

Feb 24 2023

RLazarus closed T288867: Rewrite mw-warmup.js in Python, a subtask of T290989: switchdc cache warmup should include URLs that warmup relevant Wikidata caches, as Resolved.
Feb 24 2023, 6:42 PM · Data-Persistence (work done), Patch-For-Review, Datacenter-Switchover, wdwb-tech, Wikidata
RLazarus closed T288867: Rewrite mw-warmup.js in Python as Resolved.
Feb 24 2023, 6:42 PM · Performance-Team, serviceops

Feb 23 2023

RLazarus added a comment to T288867: Rewrite mw-warmup.js in Python.

Sure, we could look at adding a warmup step to the server repool process. Historically we haven't worried about it, because the impact for one host is much smaller than when the entire cluster is cold, but it's worth looking at. I'd rather make this change in-place first though, and then it would be easy to also install something on each host. (We might choose to wait and do this after mw-on-k8s, but we can take a look at it for sure.)

Feb 23 2023, 8:05 PM · Performance-Team, serviceops

Feb 21 2023

RLazarus added a comment to T326363: mw2420-mw2451 service implementation tracking.

We decided we'll put these into service after the upcoming DC switchover, so we'll make a plan at the March 6 serviceops meeting.

Feb 21 2023, 4:47 PM · SRE, serviceops

Feb 15 2023

RLazarus created T329791: Vopsbot doesn't have channel topic rights.
Feb 15 2023, 8:10 PM · Patch-For-Review, Observability-Alerting, SRE-OnFire, SRE

Feb 14 2023

RLazarus awarded T328623: db2181 crashed a Love token.
Feb 14 2023, 4:28 PM · ops-codfw, DBA

Feb 9 2023

RLazarus added a comment to T288867: Rewrite mw-warmup.js in Python.

Perfect, thanks!

Feb 9 2023, 7:46 PM · Performance-Team, serviceops
RLazarus added a comment to T288867: Rewrite mw-warmup.js in Python.

@Krinkle Are you aware of any current uses of warmup.js besides the DC switchover automation? Anywhere else I need to maintain compatibility, or adapt either humans or software to call the new script?

Feb 9 2023, 7:29 PM · Performance-Team, serviceops
RLazarus added a subtask for T290989: switchdc cache warmup should include URLs that warmup relevant Wikidata caches: T288867: Rewrite mw-warmup.js in Python.
Feb 9 2023, 7:20 PM · Data-Persistence (work done), Patch-For-Review, Datacenter-Switchover, wdwb-tech, Wikidata
RLazarus added a parent task for T288867: Rewrite mw-warmup.js in Python: T290989: switchdc cache warmup should include URLs that warmup relevant Wikidata caches.
Feb 9 2023, 7:20 PM · Performance-Team, serviceops

Feb 7 2023

RLazarus added a comment to T290989: switchdc cache warmup should include URLs that warmup relevant Wikidata caches.

POSTs will probably have to wait for the Python rewrite, but then they'll be easy. Can you recommend specific requests?

Feb 7 2023, 5:15 PM · Data-Persistence (work done), Patch-For-Review, Datacenter-Switchover, wdwb-tech, Wikidata

Feb 6 2023

RLazarus reopened T288867: Rewrite mw-warmup.js in Python as "In Progress".

I'm working on this.

Feb 6 2023, 10:21 PM · Performance-Team, serviceops
RLazarus added a comment to T290989: switchdc cache warmup should include URLs that warmup relevant Wikidata caches.

In that case it sounds like yes, we do need cache warming in eqiad before repooling it -- and we'll need to add URLs to warm up s8, per this task.

Feb 6 2023, 5:56 PM · Data-Persistence (work done), Patch-For-Review, Datacenter-Switchover, wdwb-tech, Wikidata

Feb 3 2023

RLazarus added a comment to T290989: switchdc cache warmup should include URLs that warmup relevant Wikidata caches.

Agree we don't need it in order to switch the RW site to codfw.

Feb 3 2023, 7:25 PM · Data-Persistence (work done), Patch-For-Review, Datacenter-Switchover, wdwb-tech, Wikidata

Feb 2 2023

RLazarus closed T328280: httpbb with HTTP POSTs and json payload as Resolved.

This is deployed! Thanks again for the patch, let me know if you need anything else.

Feb 2 2023, 8:33 PM · SRE-tools, Infrastructure-Foundations, Machine-Learning-Team

Feb 1 2023

RLazarus triaged T328623: db2181 crashed as High priority.
Feb 1 2023, 11:39 PM · ops-codfw, DBA

Jan 31 2023

RLazarus closed T328120: httpbb doesn't support integers in the POST's body as Resolved.
Jan 31 2023, 5:33 PM · Machine-Learning-Team, Infrastructure-Foundations, SRE-tools

Jan 27 2023

RLazarus closed T328162: Release httpbb 0.0.2, a subtask of T323707: httpbb shouldn't alert when large pages are occasionally slow, as Resolved.
Jan 27 2023, 11:11 PM · serviceops
RLazarus closed T328162: Release httpbb 0.0.2, a subtask of T328120: httpbb doesn't support integers in the POST's body, as Resolved.
Jan 27 2023, 11:11 PM · Machine-Learning-Team, Infrastructure-Foundations, SRE-tools
RLazarus closed T328162: Release httpbb 0.0.2 as Resolved.
Jan 27 2023, 11:11 PM · SRE-tools, Infrastructure-Foundations, serviceops
RLazarus added a subtask for T323707: httpbb shouldn't alert when large pages are occasionally slow: T328162: Release httpbb 0.0.2.
Jan 27 2023, 7:34 PM · serviceops
RLazarus added a subtask for T328120: httpbb doesn't support integers in the POST's body: T328162: Release httpbb 0.0.2.
Jan 27 2023, 7:34 PM · Machine-Learning-Team, Infrastructure-Foundations, SRE-tools
RLazarus added parent tasks for T328162: Release httpbb 0.0.2: T323707: httpbb shouldn't alert when large pages are occasionally slow, T328120: httpbb doesn't support integers in the POST's body.
Jan 27 2023, 7:34 PM · SRE-tools, Infrastructure-Foundations, serviceops
RLazarus changed the status of T328162: Release httpbb 0.0.2 from Open to In Progress.
Jan 27 2023, 7:34 PM · SRE-tools, Infrastructure-Foundations, serviceops
RLazarus created T328162: Release httpbb 0.0.2.
Jan 27 2023, 7:33 PM · SRE-tools, Infrastructure-Foundations, serviceops

Jan 18 2023

RLazarus closed T308952: get a legend for haproxy "anomalous session termination states" as Resolved.

(Back from vacation, sorry for the delay.) Yeah, I think we can close this. Thanks!

Jan 18 2023, 10:16 PM · SRE, Sustainability (Incident Followup)
RLazarus awarded T270526: Set CORS headers on error pages? a Love token.
Jan 18 2023, 6:38 PM · Traffic, SRE

Dec 14 2022

RLazarus committed rCCKBcd58bb944227: 08-start-maintenance: Remove cron-specific maintenance implementation details (authored by RLazarus).
08-start-maintenance: Remove cron-specific maintenance implementation details
Dec 14 2022, 3:28 PM
RLazarus committed rCCKBe51bab27e425: 00-warmup-caches: Repeat until execution time converges. (authored by RLazarus).
00-warmup-caches: Repeat until execution time converges.
Dec 14 2022, 3:27 PM
RLazarus committed rCCKB9eb02aada64c: 08-restart-envoy-on-jobrunners: Add a shorter __title__ (authored by RLazarus).
08-restart-envoy-on-jobrunners: Add a shorter __title__
Dec 14 2022, 3:27 PM
RLazarus committed rCCKBf77ba5db26fb: sre.switchdc.mediawiki: Add a step to restart Envoy on jobrunners. (authored by RLazarus).
sre.switchdc.mediawiki: Add a step to restart Envoy on jobrunners.
Dec 14 2022, 3:27 PM
RLazarus committed rCCKB4aba4a5102d2: 04-switch-mediawiki: Fix a backwards minus sign. (authored by RLazarus).
04-switch-mediawiki: Fix a backwards minus sign.
Dec 14 2022, 3:27 PM
RLazarus committed rCCKB509a36f600a4: 08-run-puppet-on-db-masters: Correct docstring (authored by RLazarus).
08-run-puppet-on-db-masters: Correct docstring
Dec 14 2022, 3:27 PM
RLazarus committed rCCKB74fa3fe65174: switchdc: Run Puppet on DB masters after setting read-write (authored by RLazarus).
switchdc: Run Puppet on DB masters after setting read-write
Dec 14 2022, 3:27 PM
RLazarus committed rCCKB553aa73b2eb1: sre.switchdc.mediawiki: Add -ro targets to the TTL steps also. (authored by RLazarus).
sre.switchdc.mediawiki: Add -ro targets to the TTL steps also.
Dec 14 2022, 3:26 PM
RLazarus committed rCCKBb8bdd99b63c2: sre.switchdc.mediawiki: Add -ro services and handle parsoid-php specially. (authored by RLazarus).
sre.switchdc.mediawiki: Add -ro services and handle parsoid-php specially.
Dec 14 2022, 3:26 PM

Dec 2 2022

RLazarus added a comment to T306162: Decommission mw13[07-48].

Thanks @Volans. DC ops, for mw1320, I wasn't able to manually shut it off -- please do just kill the power when you go in to unrack it. Thanks!

Dec 2 2022, 6:15 PM · serviceops-radar, SRE, ops-eqiad, DC-Ops
RLazarus added a comment to T306162: Decommission mw13[07-48].

Over to dcops!

Dec 2 2022, 12:42 AM · serviceops-radar, SRE, ops-eqiad, DC-Ops
RLazarus changed the status of T306162: Decommission mw13[07-48] from Stalled to Open.
Dec 2 2022, 12:42 AM · serviceops-radar, SRE, ops-eqiad, DC-Ops
RLazarus changed the status of T306162: Decommission mw13[07-48], a subtask of T308339: eqiad: move non WMCS servers out of rack C8, from Stalled to Open.
Dec 2 2022, 12:42 AM · SRE, DBA, ops-eqiad

Dec 1 2022

RLazarus added a comment to T303162: Part-time coordinator Job Description.

Oops, yes, that should have been 306162. Thanks for the catch!

Dec 1 2022, 9:44 PM · Wiki Loves Monuments FY 2022-2023
RLazarus created P42203 (An Untitled Masterwork).
Dec 1 2022, 9:37 PM

Nov 29 2022

RLazarus closed T323707: httpbb shouldn't alert when large pages are occasionally slow as Resolved.

Maybe if the page we're trying to fetch is that cumbersome, we should switch to a different, lighter one?

Nov 29 2022, 12:25 AM · serviceops

Nov 24 2022

RLazarus added a comment to T323707: httpbb shouldn't alert when large pages are occasionally slow.

Changed my mind on this -- still going to look into other solutions, but I did bump the deadline to 60s so that it doesn't spuriously alert in the meantime.

Nov 24 2022, 1:55 AM · serviceops
RLazarus renamed T323707: httpbb shouldn't alert when large pages are occasionally slow from httpbb random read timeout on cumin2002 to httpbb shouldn't alert when large pages are occasionally slow.
Nov 24 2022, 1:13 AM · serviceops
RLazarus claimed T323707: httpbb shouldn't alert when large pages are occasionally slow.

Good find, thanks.

Nov 24 2022, 1:02 AM · serviceops

Oct 18 2022

RLazarus added a comment to T320994: Check DIMM A6 on db1131.

Thanks John!

Oct 18 2022, 8:52 PM · DBA, ops-eqiad, Sustainability (Incident Followup), SRE
RLazarus moved T320990: s6 master failure from Backlog to Pending Review & Scorecard on the SRE-OnFire board.

Draft: https://wikitech.wikimedia.org/wiki/Incidents/2022-10-15_s6_master_failure

Oct 18 2022, 3:03 AM · User-notice-archive, SRE-OnFire, Wikimedia-Incident, Data-Persistence, SRE

Oct 17 2022

RLazarus created T320994: Check DIMM A6 on db1131.
Oct 17 2022, 6:17 PM · DBA, ops-eqiad, Sustainability (Incident Followup), SRE
RLazarus added a parent task for T320879: Switchover s6 master (db1131 -> db1173): T320990: s6 master failure.
Oct 17 2022, 5:56 PM · DBA
RLazarus added a subtask for T320990: s6 master failure: T320879: Switchover s6 master (db1131 -> db1173).
Oct 17 2022, 5:56 PM · User-notice-archive, SRE-OnFire, Wikimedia-Incident, Data-Persistence, SRE
RLazarus triaged T320990: s6 master failure as Medium priority.
Oct 17 2022, 5:55 PM · User-notice-archive, SRE-OnFire, Wikimedia-Incident, Data-Persistence, SRE

Oct 14 2022

RLazarus added a comment to T320773: db1143 is lagging with its replica.

I was surprised to see SHOW PROCESSLIST didn't empty out after "very few minutes" like wt:MariaDB/troubleshooting#Depooling_a_replica suggests -- for the record, here's what it looked like, in full, 15-20 minutes after @Joe depooled it: P35486

Oct 14 2022, 4:11 AM · DBA
RLazarus created P35486 (An Untitled Masterwork).
Oct 14 2022, 4:09 AM

Oct 13 2022

RLazarus added a comment to T320749: SLO dashboards with N latency targets.

Oh, one more angle to think about! There are two Maps services we're writing SLOs for. For Kartotherian, we're just planning a request latency target at the 50th and the 95th -- i.e, the Prometheus query is the same except for the percentile. But for Tegola, we're measuring two different types of latency (HTTP request latency vs. cache operation time), each of those at p50 and p95. So we'd end up writing different queries, too.

Oct 13 2022, 6:18 PM · SRE Observability (FY2022/2023-Q3), User-herron, Observability-Metrics, serviceops, observability, SRE, Maps
RLazarus triaged T320749: SLO dashboards with N latency targets as Medium priority.
Oct 13 2022, 5:47 PM · SRE Observability (FY2022/2023-Q3), User-herron, Observability-Metrics, serviceops, observability, SRE, Maps
RLazarus created T320749: SLO dashboards with N latency targets.
Oct 13 2022, 5:47 PM · SRE Observability (FY2022/2023-Q3), User-herron, Observability-Metrics, serviceops, observability, SRE, Maps
RLazarus triaged T320748: Get Kartotherian SLO metrics into Prometheus as Medium priority.
Oct 13 2022, 5:32 PM · Observability-Metrics, serviceops, SRE, Maps (Kartotherian)

Sep 21 2022

RLazarus added a comment to T318281: Systematic PCC error.

I was wondering if https://gerrit.wikimedia.org/r/c/operations/puppet/+/831230 might be related but I couldn't work out exactly what's going on here.

Sep 21 2022, 8:26 PM · SRE, Data Pipelines, puppet-compiler, Infrastructure-Foundations

Sep 14 2022

RLazarus triaged T317799: Rate limiting for hotlinked images as High priority.
Sep 14 2022, 6:30 PM · SRE-Sprint-Week-Sustainability-March2023, Traffic, Patch-For-Review, Sustainability (Incident Followup)
RLazarus added a project to T317794: requestctl can't act on cache hits: Sustainability (Incident Followup).
Sep 14 2022, 6:27 PM · SRE-Sprint-Week-Sustainability-March2023, Patch-For-Review, Traffic, Sustainability (Incident Followup), conftool
RLazarus triaged T317794: requestctl can't act on cache hits as Medium priority.
Sep 14 2022, 6:01 PM · SRE-Sprint-Week-Sustainability-March2023, Patch-For-Review, Traffic, Sustainability (Incident Followup), conftool
RLazarus created T317794: requestctl can't act on cache hits.
Sep 14 2022, 6:00 PM · SRE-Sprint-Week-Sustainability-March2023, Patch-For-Review, Traffic, Sustainability (Incident Followup), conftool

Aug 29 2022

RLazarus closed T316560: Update the videoscaler alert to point at the correct runbook as Resolved.

T312947 already tracks the larger question of how to organize runbooks for ProbeDown effectively.

Aug 29 2022, 5:24 PM · serviceops

Aug 23 2022

RLazarus added a comment to T304800: Set API and appserver weights in eqiad.

That sounds right to me; it would give us the same distribution as codfw, which is probably as much work as we need to do on this. I don't think it's worth investing time into any deliberate benchmarking, but if the cluster happens to saturate unevenly again in a future incident, we can tweak opportunistically.

Aug 23 2022, 4:31 PM · Sustainability (Incident Followup), serviceops, SRE

Aug 21 2022

RLazarus triaged T315742: Replication stopped on db1143 as High priority.
Aug 21 2022, 1:50 AM · DBA, SRE

Aug 5 2022

RLazarus committed rOSCTd9b5357fd356: requestctl: Add a reminder to "requestctl commit" after enable/disable (authored by RLazarus).
requestctl: Add a reminder to "requestctl commit" after enable/disable
Aug 5 2022, 12:09 AM

Aug 2 2022

RLazarus added a comment to T293012: Productionise mc20[38-55].

!log rzl@cumin2002 START - Cookbook sre.hosts.remove-downtime for mc2038.codfw.wmnet
!log rzl@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2038.codfw.wmnet

Aug 2 2022, 6:42 PM · Patch-For-Review, serviceops
RLazarus added a comment to T293012: Productionise mc20[38-55].

Due to T309956 I'm moving ahead with mc2038 early, and using it to replace mc2024 which is currently out of service.

Aug 2 2022, 6:06 PM · Patch-For-Review, serviceops

Jul 31 2022

RLazarus added a comment to T313823: domain name Wikkipedia.be.

Yep.

Jul 31 2022, 8:13 PM · SecTeam-Processed, Security, Traffic, Domains, SRE
RLazarus changed the visibility for T313823: domain name Wikkipedia.be.
Jul 31 2022, 8:13 PM · SecTeam-Processed, Security, Traffic, Domains, SRE

Jul 29 2022

RLazarus closed T313823: domain name Wikkipedia.be as Resolved.

Marking this resolved, thank you very much!

Jul 29 2022, 9:32 PM · SecTeam-Processed, Security, Traffic, Domains, SRE

Jul 28 2022

RLazarus committed rOSCT4cbd1d90fa9f: requestctl: Add a missing f on an f-string (authored by RLazarus).
requestctl: Add a missing f on an f-string
Jul 28 2022, 1:55 PM
RLazarus added a comment to T313730: decommission mw2251-mw2255, mw2257-mw2258.

@Papaul All yours!

Jul 28 2022, 12:00 AM · SRE, ops-codfw, serviceops
RLazarus reassigned T313730: decommission mw2251-mw2255, mw2257-mw2258 from RLazarus to Papaul.
Jul 28 2022, 12:00 AM · SRE, ops-codfw, serviceops

Jul 27 2022

RLazarus renamed T313730: decommission mw2251-mw2255, mw2257-mw2258 from decomission mw2251-mw2258 to decomission mw2251-mw2255, mw2257-mw2258.
Jul 27 2022, 5:51 PM · SRE, ops-codfw, serviceops
RLazarus added a comment to T313730: decommission mw2251-mw2255, mw2257-mw2258.

N.B. this is only seven hosts, mw225[1-5,7-8] -- mw2256 was already decommed in T263065.

Jul 27 2022, 5:50 PM · SRE, ops-codfw, serviceops
RLazarus updated subscribers of T313823: domain name Wikkipedia.be.

Hi @CRoslof -- adding you here, per my email just now.

Jul 27 2022, 1:12 AM · SecTeam-Processed, Security, Traffic, Domains, SRE
RLazarus closed T310557: Shellbox resource management as Resolved.
Jul 27 2022, 12:46 AM · SRE-OnFire, Sustainability (Incident Followup), Shellbox, serviceops, SRE
RLazarus added a project to T313384: eqiad row C switch fabric recabling: Sustainability (Incident Followup).
Jul 27 2022, 12:23 AM · Sustainability (Incident Followup), SRE, Infrastructure-Foundations, ops-eqiad, netops
RLazarus moved T313382: asw2-c5-eqiad crash from Backlog to Pending Review & Scorecard on the SRE-OnFire board.
Jul 27 2022, 12:22 AM · SRE-OnFire, Sustainability (Incident Followup), cloud-services-team (Kanban), SRE, Infrastructure-Foundations, netops
RLazarus added projects to T313382: asw2-c5-eqiad crash: Sustainability (Incident Followup), SRE-OnFire.
Jul 27 2022, 12:22 AM · SRE-OnFire, Sustainability (Incident Followup), cloud-services-team (Kanban), SRE, Infrastructure-Foundations, netops
RLazarus created T313879: Phabricator: Unable to view tasks in DB read-only mode.
Jul 27 2022, 12:18 AM · SRE-Sprint-Week-Sustainability-March2023, serviceops-collab, Release-Engineering-Team (Radar), serviceops-radar, Sustainability (Incident Followup), Phabricator

Jul 26 2022

RLazarus added a comment to T310557: Shellbox resource management.

With https://gerrit.wikimedia.org/r/813924 we ought to see smaller bursts in utilization, so I'm going to tentatively crank the shellbox replicas back down to 8, where it was before https://gerrit.wikimedia.org/r/803953.

Jul 26 2022, 12:00 AM · SRE-OnFire, Sustainability (Incident Followup), Shellbox, serviceops, SRE

Jul 25 2022

RLazarus closed T312319: Reduce Lilypond shellouts from VisualEditor as Resolved.

Yep, looks much better! Closing.

Jul 25 2022, 11:57 PM · MW-1.39-notes (1.39.0-wmf.21; 2022-07-18), Editing-team, Editing-Team-Request, MediaWiki-extensions-Score, Sustainability (Incident Followup), Shellbox, serviceops, SRE
RLazarus closed T312319: Reduce Lilypond shellouts from VisualEditor, a subtask of T310557: Shellbox resource management, as Resolved.
Jul 25 2022, 11:56 PM · SRE-OnFire, Sustainability (Incident Followup), Shellbox, serviceops, SRE
RLazarus claimed T313730: decommission mw2251-mw2255, mw2257-mw2258.
Jul 25 2022, 10:00 PM · SRE, ops-codfw, serviceops
RLazarus added a comment to T312319: Reduce Lilypond shellouts from VisualEditor.

I'd like to redo @Legoktm's manual test first, and make sure I can't reproduce a spike -- I'll do that later today, and close afterward.

Jul 25 2022, 5:42 PM · MW-1.39-notes (1.39.0-wmf.21; 2022-07-18), Editing-team, Editing-Team-Request, MediaWiki-extensions-Score, Sustainability (Incident Followup), Shellbox, serviceops, SRE

Jul 23 2022

RLazarus triaged T313634: Survey the third-party library market for UA policy compliance as Low priority.
Jul 23 2022, 12:13 AM · SRE

Jul 8 2022

RLazarus added a comment to T255511: mcrouter memcached flapping in gutter pool.

I can't find any incident documentation for an incident on 2020-06-08, and I'm unclear on what problem was caused by mcrouter flapping. Was mc1029 slow, able to serve VERSION probes, but unable to serve a significant amount of real traffic? So while mc1029 was pooled, requests were handled slowly or gave errors?

Jul 8 2022, 11:09 PM · SRE-Sprint-Week-Sustainability-March2023, Sustainability (Incident Followup), serviceops

Jul 7 2022

RLazarus created T312319: Reduce Lilypond shellouts from VisualEditor.
Jul 7 2022, 2:37 AM · MW-1.39-notes (1.39.0-wmf.21; 2022-07-18), Editing-team, Editing-Team-Request, MediaWiki-extensions-Score, Sustainability (Incident Followup), Shellbox, serviceops, SRE

Jun 13 2022

RLazarus triaged T310557: Shellbox resource management as Medium priority.
Jun 13 2022, 11:03 PM · SRE-OnFire, Sustainability (Incident Followup), Shellbox, serviceops, SRE