Page MenuHomePhabricator

lmata (Leo Mata)
SRE

Projects (13)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
May 14 2020, 7:26 PM (175 w, 1 d)
Availability
Available
IRC Nick
lmata
LDAP User
LMata
MediaWiki User
LMata (WMF) [ Global Accounts ]

Recent Activity

Thu, Sep 21

lmata awarded T344136: Upgrade LibreNMS to 23.7.0 or higher a Love token.
Thu, Sep 21, 2:23 AM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)

Wed, Sep 20

lmata moved T346438: [Epic] Review alerting strategy for Data Platform SRE from Inbox to Radar on the observability board.

Moving to radar to keep an eye out in case you need our help. Thanks!

Wed, Sep 20, 2:10 PM · Epic, Data-Platform-SRE, observability
lmata moved T346807: Review alerting around Search update pipeline from Inbox to Radar on the observability board.

Moving to radar, as I understand you're already in contact with @andrea.denisse, that seems good to go, please let me know if we can assist further.

Wed, Sep 20, 2:08 PM · Epic, Data-Platform-SRE, observability
lmata removed a project from T346893: Investigate swagger-exporter failures: observability.
Wed, Sep 20, 2:06 PM · Patch-For-Review, Observability-Alerting, serviceops

Mon, Sep 18

lmata placed T288623: Observability tools are easy to use, docs easy to read, help easy to find. up for grabs.
Mon, Sep 18, 5:41 PM · SRE Observability

Fri, Sep 15

lmata added a project to T343025: Identify path forward for k8s deployment of prometheus-statsd-exporter: serviceops.

@Kappakayala, this is a task for the K8s work.

Fri, Sep 15, 3:23 PM · Patch-For-Review, serviceops, Observability-Metrics, SRE Observability (FY2023/2024-Q1)

Thu, Sep 14

lmata moved T346360: New VictorOps user request from Inbox to Done on the SRE Observability (FY2023/2024-Q1) board.
Thu, Sep 14, 9:30 PM · SRE Observability (FY2023/2024-Q1), observability
lmata closed T346360: New VictorOps user request as Resolved.

Invite sent, lmk if you have any issues!

Thu, Sep 14, 9:29 PM · SRE Observability (FY2023/2024-Q1), observability
lmata awarded T346318: Fix librenms/alertmanager integration a Like token.
Thu, Sep 14, 9:26 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)

Wed, Sep 13

lmata awarded T341488: Split Thanos components from thanos-fe hosts into titan hosts a Like token.
Wed, Sep 13, 2:48 PM · SRE Observability (FY2023/2024-Q1), User-fgiunchedi, SRE-swift-storage, Observability-Metrics
lmata moved T346144: Hardcode the SLO time windows in Grafana dashboards generated via Grizzly from Inbox to Radar on the observability board.
Wed, Sep 13, 2:06 PM · SRE Observability (FY2023/2024-Q1), serviceops, observability
lmata added a project to T346144: Hardcode the SLO time windows in Grafana dashboards generated via Grizzly: SRE Observability (FY2023/2024-Q1).

adding to quarter for tracking

Wed, Sep 13, 2:05 PM · SRE Observability (FY2023/2024-Q1), serviceops, observability

Tue, Sep 12

lmata added a comment to T342179: Q1:rack/setup/install titan100[12].

Thank you!

Tue, Sep 12, 3:47 PM · SRE Observability, SRE, observability, ops-eqiad, DC-Ops

Sat, Sep 9

lmata updated the task description for T343023: Deploy StatsD Exporter to production.
Sat, Sep 9, 1:19 AM · User-herron, Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T345970: Deploy StatsD exporter for Kubernetes.
Sat, Sep 9, 1:18 AM · MW-on-K8s, serviceops, User-herron, Observability-Metrics, SRE Observability (FY2023/2024-Q1)

Wed, Sep 6

lmata added a project to T213902: Implement sensitive logstash access control: SRE Observability (FY2023/2024-Q2).
Wed, Sep 6, 4:53 PM · SRE Observability (FY2023/2024-Q2), Observability-Logging, Patch-For-Review
lmata moved T345202: Implement alerting for Growth-consumed or Growth-managed services/pipelines from Inbox to Radar on the observability board.
Wed, Sep 6, 3:15 PM · observability, Growth-Team
lmata moved T345204: Alert the Growth team when number of available task recommendations drops significantly from Inbox to Radar on the observability board.

Moving to radar, please let me know if there's anything we can help with.

Wed, Sep 6, 3:15 PM · MW-1.41-notes (1.41.0-wmf.27; 2023-09-19), Growth-Team (Current Sprint), Growth-Structured-Tasks, observability
lmata moved T328117: Move performance.w.o to be backed by an active/active discovery record from Inbox to Prioritized on the Observability-Metrics board.
Wed, Sep 6, 3:11 PM · Observability-Metrics
lmata triaged T328117: Move performance.w.o to be backed by an active/active discovery record as Medium priority.
Wed, Sep 6, 3:10 PM · Observability-Metrics

Mon, Sep 4

lmata awarded T344748: MediaWiki Core - Review and merge StatsLib patch a Love token.
Mon, Sep 4, 7:03 PM · MediaWiki-Platform-Team, Observability-Metrics

Thu, Aug 31

lmata added a comment to T344937: Decom dispatch infrastructure.

Sounds good to me, I've posed an ask from the SRE-OnFire team also to confirm we are no longer interested in dispatch. Maybe allow for another week for us to meet, and we can move forward.

Thu, Aug 31, 10:46 PM · Patch-For-Review, Incident Tooling, User-herron

Wed, Aug 30

lmata changed the status of T344136: Upgrade LibreNMS to 23.7.0 or higher from Open to In Progress.
Wed, Aug 30, 4:59 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)

Aug 23 2023

lmata added a comment to T313229: Production Dispatch Infrastructure.

@BCornwall I'm good with that. Do you want to do the honors?

Aug 23 2023, 9:00 PM · Incident Tooling, User-herron
lmata archived Observability-Performance.
Aug 23 2023, 2:25 PM
lmata edited projects for T328117: Move performance.w.o to be backed by an active/active discovery record, added: SRE Observability; removed Observability-Performance.
Aug 23 2023, 2:24 PM · Observability-Metrics

Aug 22 2023

lmata added a project to T344748: MediaWiki Core - Review and merge StatsLib patch: MediaWiki-Platform-Team.
Aug 22 2023, 5:49 PM · MediaWiki-Platform-Team, Observability-Metrics

Aug 16 2023

lmata moved T274377: Ingest Cron and Root Alerts Into Logstash from Up next to Inbox on the SRE Observability (FY2023/2024-Q1) board.
Aug 16 2023, 2:44 PM · SRE Observability (FY2023/2024-Q1)
lmata moved T343529: Prometheus doesn't reload or alert on expired client certificates from Inbox to Up next on the SRE Observability (FY2023/2024-Q1) board.
Aug 16 2023, 2:44 PM · Prod-Kubernetes, SRE Observability (FY2023/2024-Q1), Observability-Metrics, User-fgiunchedi, Kubernetes, serviceops-radar
lmata closed T290263: Observability wikitech documentation update, a subtask of T288623: Observability tools are easy to use, docs easy to read, help easy to find., as Resolved.
Aug 16 2023, 2:43 PM · SRE Observability
lmata closed T290263: Observability wikitech documentation update as Resolved.
Aug 16 2023, 2:43 PM · SRE Observability (FY2023/2024-Q1), Documentation, Goal
lmata closed T343812: On-call batphone escalation configuration holidays Aug 2023 as Resolved.
Aug 16 2023, 2:43 PM · SRE Observability (FY2023/2024-Q1)
lmata triaged T343529: Prometheus doesn't reload or alert on expired client certificates as High priority.
Aug 16 2023, 2:43 PM · Prod-Kubernetes, SRE Observability (FY2023/2024-Q1), Observability-Metrics, User-fgiunchedi, Kubernetes, serviceops-radar
lmata moved T344202: Create VictorOps config for new Data Platform SRE team from Inbox to Radar on the observability board.
Aug 16 2023, 2:41 PM · observability, Observability-Alerting, Data-Platform-SRE

Aug 14 2023

lmata added a comment to T343812: On-call batphone escalation configuration holidays Aug 2023.

leaving open until tomorrow when i reenable EMEA, disable batphone and resolve

Aug 14 2023, 11:15 PM · SRE Observability (FY2023/2024-Q1)
lmata changed the status of T343812: On-call batphone escalation configuration holidays Aug 2023 from Open to In Progress.
Aug 14 2023, 11:14 PM · SRE Observability (FY2023/2024-Q1)
lmata added a comment to T343812: On-call batphone escalation configuration holidays Aug 2023.

Removed EMEA from rotation to enable batphone for bank holiday Aug 15th

Aug 14 2023, 11:12 PM · SRE Observability (FY2023/2024-Q1)
lmata added a comment to T278309: Move librenms deployment to Debian package.

@andrea.denisse adding this to Q1 for tracking but i think this task isnt on the critical path to the libreNMS upgrade, if we can perform the upgrade without packaging everything, I think that might be a preferred/faster approach. If so, maybe we can decline/close this task.

Aug 14 2023, 1:42 PM · SRE Observability (FY2023/2024-Q1), Patch-For-Review, Observability-Metrics
lmata added a project to T278309: Move librenms deployment to Debian package: SRE Observability (FY2023/2024-Q1).
Aug 14 2023, 1:40 PM · SRE Observability (FY2023/2024-Q1), Patch-For-Review, Observability-Metrics
lmata moved T344136: Upgrade LibreNMS to 23.7.0 or higher from Inbox to Up next on the SRE Observability (FY2023/2024-Q1) board.
Aug 14 2023, 1:29 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata moved T344136: Upgrade LibreNMS to 23.7.0 or higher from Inbox to Prioritized on the Observability-Metrics board.
Aug 14 2023, 1:29 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata edited projects for T344136: Upgrade LibreNMS to 23.7.0 or higher, added: SRE Observability (FY2023/2024-Q1), Observability-Metrics; removed observability.
Aug 14 2023, 1:28 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)

Aug 8 2023

lmata added a project to T343377: Grant slightly broader access to Klaxon: Incident Tooling.
Aug 8 2023, 4:36 PM · Sustainability (Incident Followup), Incident Tooling, SRE-OnFire, SRE
lmata renamed T343812: On-call batphone escalation configuration holidays Aug 2023 from On-call batphone escalation configuration holidays Q1 to On-call batphone escalation configuration holidays Aug 2023.
Aug 8 2023, 2:25 PM · SRE Observability (FY2023/2024-Q1)
lmata created T343812: On-call batphone escalation configuration holidays Aug 2023.
Aug 8 2023, 2:01 PM · SRE Observability (FY2023/2024-Q1)

Aug 4 2023

lmata updated the task description for T343021: Deploy StatsD Exporter to Test Env.
Aug 4 2023, 1:17 PM · SRE Observability (FY2023/2024-Q1)

Aug 2 2023

lmata updated the task description for T228380: Tech debt: sunsetting of Graphite (part 1) .
Aug 2 2023, 2:52 PM · Observability-Metrics
lmata closed T311262: swift hosts (thanos-fe1001, ms-be2012) with failed prometheus-ipmi-exporter services as Declined.

Discussed in the today's team meeting, boldly declining. Please re-open if you feel differently.

Aug 2 2023, 2:38 PM · SRE-swift-storage, SRE Observability
lmata removed projects from T253810: Alert on ECC warnings in SEL: SRE Observability, observability.

@joanna_borun grooming phab board this week, we feel this is better suited for I/F please retag if you need our assistance. thanks!

Aug 2 2023, 2:36 PM · SRE-Sprint-Week-Sustainability-March2023, Sustainability (Incident Followup), User-MoritzMuehlenhoff
lmata moved T342179: Q1:rack/setup/install titan100[12] from Inbox to Radar on the SRE Observability board.
Aug 2 2023, 2:35 PM · SRE Observability, SRE, observability, ops-eqiad, DC-Ops
lmata removed a project from T330770: Investigate DispatchChanges Normal job backlog time (mean avg, 15min) alert post datacenter switch: observability.
Aug 2 2023, 2:35 PM · Observability-Alerting, Wikidata
lmata removed a project from T329026: Disk-related graphs on WMF Grafana have multiple defects: observability.

keeping under Observability-Metrics only

Aug 2 2023, 2:34 PM · Observability-Metrics, Data-Persistence
lmata moved T329669: Netbox: use the netbox to also sync networks from Inbox to Radar on the observability board.
Aug 2 2023, 2:34 PM · Patch-For-Review, netbox, Infrastructure-Foundations, observability, User-crusnov, User-jbond, Puppet, SRE
lmata removed a project from T111934: Nutcracker stats monitoring should only listen on localhost: observability.

Untagging observability, there doesn't seem anything for us to do; please re-tag if you need us to engage. Thanks!

Aug 2 2023, 2:33 PM · SRE
lmata added a project to T330770: Investigate DispatchChanges Normal job backlog time (mean avg, 15min) alert post datacenter switch: Observability-Alerting.
Aug 2 2023, 2:32 PM · Observability-Alerting, Wikidata
lmata edited projects for T303253: Duplicate monitoring for systemd::timer::job, added: Observability-Alerting; removed observability.
Aug 2 2023, 2:31 PM · Observability-Alerting, Puppet-Core, Patch-For-Review, SRE, Infrastructure-Foundations
lmata edited projects for T337951: check_puppetrun fails to run under certain conditions, added: Infrastructure-Foundations; removed observability.

@joanna_borun we had a chat in team meeting and thought this was better suited for your team.

Aug 2 2023, 2:30 PM · Puppet-Core, Infrastructure-Foundations
lmata moved T342300: Q1:rack/setup/install titan200[12] from Inbox to Radar on the observability board.
Aug 2 2023, 2:28 PM · SRE, observability, ops-codfw, DC-Ops
lmata moved T343000: HAProxy metrics go down on config reload from Inbox to Radar on the observability board.
Aug 2 2023, 2:27 PM · SRE, observability, Traffic

Jul 28 2023

lmata created T343045: Audit & convert stats for mediawiki modules.
Jul 28 2023, 8:49 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343029: Audit & convert stats for mediawiki extensions .
Jul 28 2023, 6:11 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata renamed T343028: Audit & convert stats for mediawiki core from Audit & convert stats for mediawiki to Audit & convert stats for mediawiki core.
Jul 28 2023, 6:08 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343028: Audit & convert stats for mediawiki core.
Jul 28 2023, 6:08 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343026: Configure Prometheus to scrape MW metrics from statsd-exporter.
Jul 28 2023, 6:04 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343025: Identify path forward for k8s deployment of prometheus-statsd-exporter.
Jul 28 2023, 6:03 PM · Patch-For-Review, serviceops, Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343024: Configure MediaWiki to use new StatsLib in production.
Jul 28 2023, 6:00 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata updated the task description for T343023: Deploy StatsD Exporter to production.
Jul 28 2023, 5:59 PM · User-herron, Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343023: Deploy StatsD Exporter to production.
Jul 28 2023, 5:58 PM · User-herron, Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata added a project to T343020: Converting MediaWiki Metrics to StatsLib: Observability-Metrics.
Jul 28 2023, 5:56 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343022: Configure MediaWiki to use Stats lib in Test Env.
Jul 28 2023, 5:56 PM · SRE Observability (FY2023/2024-Q1)
lmata created T343021: Deploy StatsD Exporter to Test Env.
Jul 28 2023, 5:54 PM · SRE Observability (FY2023/2024-Q1)
lmata updated subscribers of T343020: Converting MediaWiki Metrics to StatsLib.
Jul 28 2023, 5:45 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata created T343020: Converting MediaWiki Metrics to StatsLib.
Jul 28 2023, 5:45 PM · Observability-Metrics, SRE Observability (FY2023/2024-Q1)
lmata moved T240685: MediaWiki Prometheus support from Inbox to In progress on the SRE Observability (FY2023/2024-Q1) board.
Jul 28 2023, 5:35 PM · MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), SRE Observability (FY2023/2024-Q1), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MediaWiki-libs-Stats, Platform Team Workboards (External Code Reviews), Patch-For-Review, serviceops, SRE, MediaWiki-General, observability
lmata added a project to T240685: MediaWiki Prometheus support: SRE Observability (FY2023/2024-Q1).
Jul 28 2023, 5:33 PM · MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), SRE Observability (FY2023/2024-Q1), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MediaWiki-libs-Stats, Platform Team Workboards (External Code Reviews), Patch-For-Review, serviceops, SRE, MediaWiki-General, observability
lmata moved T290263: Observability wikitech documentation update from Inbox to In progress on the SRE Observability (FY2023/2024-Q1) board.
Jul 28 2023, 5:31 PM · SRE Observability (FY2023/2024-Q1), Documentation, Goal
lmata updated the task description for T342998: Splunk not displaying rotations correctly.
Jul 28 2023, 5:19 PM · SRE Observability (FY2023/2024-Q1), Observability-Alerting
lmata added a comment to T342998: Splunk not displaying rotations correctly.

The rotation looks fine but the calendar doesnt match the start/stop dates as if its cut-over will be mon/tue instead of sun/mon. will monitor and keep open over the weekend to see if visualization corrects itself

Jul 28 2023, 5:16 PM · SRE Observability (FY2023/2024-Q1), Observability-Alerting
lmata moved T342998: Splunk not displaying rotations correctly from Inbox to In progress on the SRE Observability (FY2023/2024-Q1) board.
Jul 28 2023, 2:18 PM · SRE Observability (FY2023/2024-Q1), Observability-Alerting
lmata moved T342998: Splunk not displaying rotations correctly from Inbox to Prioritized on the Observability-Alerting board.
Jul 28 2023, 2:18 PM · SRE Observability (FY2023/2024-Q1), Observability-Alerting
lmata changed the status of T342998: Splunk not displaying rotations correctly from Open to In Progress.
Jul 28 2023, 2:18 PM · SRE Observability (FY2023/2024-Q1), Observability-Alerting
lmata created T342998: Splunk not displaying rotations correctly.
Jul 28 2023, 2:16 PM · SRE Observability (FY2023/2024-Q1), Observability-Alerting

Jul 19 2023

lmata renamed T334733: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYMOUS in all Kafka clusters from Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in all Kafka clusters to Grant IdempotentWrite Kafka Cluster ACL to User:ANONYMOUS in all Kafka clusters.
Jul 19 2023, 2:14 PM · Data-Platform-SRE, SRE, Data-Engineering
lmata removed a project from T213777: Consider making a variant of the fatalmonitor CLI tool that ignores appserver timeouts: SRE Observability.

Fatalmonitor no longer actively supported: https://wikitech.wikimedia.org/wiki/Wikimedia_binaries#fatalmonitor
Untagging observability, please re-tag if you need our assistance.

Jul 19 2023, 2:13 PM · SRE
lmata moved T341439: Upgrade prometheus-jmx-exporter from Inbox to Prioritized on the Observability-Metrics board.
Jul 19 2023, 2:10 PM · Observability-Metrics, observability
lmata triaged T341439: Upgrade prometheus-jmx-exporter as Low priority.
Jul 19 2023, 2:10 PM · Observability-Metrics, observability

Jul 18 2023

lmata moved T325143: Logstash dashboard for thumbor from Inbox to Radar on the observability board.
Jul 18 2023, 11:46 PM · Observability-Logging, Thumbor, observability, Platform Team Workboards (Platform Engineering Reliability), Wikimedia-Logstash, Thumbor Migration
lmata removed a project from T339137: Ingest php syslog from Excimer UI (webperf host) into Logstash: observability.
Jul 18 2023, 11:46 PM · Observability-Logging, Patch-For-Review, WikimediaDebug, Performance-Team
lmata moved T336728: Instrument how suggested language pair is chosen from Inbox to Radar on the observability board.
Jul 18 2023, 11:45 PM · Language-Team (Language-2023-July-September), Patch-For-Review, Outreachy (Round 26), observability, ContentTranslation
lmata archived SRE-OnFire (FY2021/2022-Q2).
Jul 18 2023, 11:43 PM
lmata archived SRE-OnFire (FY2021/2022-Q3).
Jul 18 2023, 11:37 PM
lmata moved T293504: non-wikimedia.org domain names for status page from Backlog to Scorecard Done on the SRE-OnFire (FY2021/2022-Q3) board.
Jul 18 2023, 11:37 PM · SRE-OnFire (FY2021/2022-Q3), WMF-NDA
lmata moved T285769: Ensure SRE team has a good understanding of how & when to declare an outage on the status page; & it is easy to do so from Backlog to Scorecard Done on the SRE-OnFire (FY2021/2022-Q3) board.
Jul 18 2023, 11:37 PM · SRE Observability (FY2021/2022-Q3), SRE-OnFire (FY2021/2022-Q3), SRE
lmata moved T307166: Incident: 2022-03-10_MediaWiki_availability from Backlog to Scorecard Done on the SRE-OnFire (FY2021/2022-Q3) board.
Jul 18 2023, 11:37 PM · SRE-OnFire (FY2021/2022-Q3)
lmata archived SRE-OnFire (FY2021/2022-Q4).
Jul 18 2023, 11:27 PM
lmata edited projects for T202061: Implement an accurate and easy to understand status page for all wikis, added: Incident Tooling; removed Observability-Alerting, SRE-OnFire (FY2021/2022-Q4).
Jul 18 2023, 11:25 PM · Incident Tooling, SRE
lmata edited projects for T318804: ncredir redirects for status.wiki* --> status.wikimedia.org, added: Incident Tooling; removed SRE-OnFire (FY2021/2022-Q4).
Jul 18 2023, 11:24 PM · Incident Tooling, Traffic, SRE
lmata moved T202061: Implement an accurate and easy to understand status page for all wikis from In Progress to Backlog on the SRE-OnFire (FY2021/2022-Q4) board.
Jul 18 2023, 11:23 PM · Incident Tooling, SRE
lmata added a project to T342179: Q1:rack/setup/install titan100[12]: SRE Observability.
Jul 18 2023, 11:22 PM · SRE Observability, SRE, observability, ops-eqiad, DC-Ops
lmata moved T335027: Prometheus: ingest SONiC metrics from Inbox to Radar on the observability board.
Jul 18 2023, 11:21 PM · Patch-For-Review, Observability-Metrics, SRE, observability, Infrastructure-Foundations, netops