User Details
User Details
- User Since
- May 14 2020, 7:26 PM (291 w, 4 d)
- Roles
- Disabled
- IRC Nick
- lmata
- LDAP User
- LMata
- MediaWiki User
- LMata (WMF) [ Global Accounts ]
Jul 30 2025
Jul 30 2025
• lmata updated subscribers of T398229: FY25-26 SDS2.1.3 Reliability - Production Monitoring.
• lmata edited projects for T400443: Allow wider Alertmanager API RO access, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability.
Jul 23 2025
Jul 23 2025
• lmata moved T399807: Allow team customization for service::catalog probes from Inbox to Prioritized on the Observability-Alerting board.
Jul 16 2025
Jul 16 2025
• lmata moved T399195: Update logging and monitoring for multiple session storage backends from Inbox to Radar on the observability board.
Jul 9 2025
Jul 9 2025
• lmata moved T398605: Prometheus puppettization has a very large directory from Inbox to Radar on the observability board.
Jul 2 2025
Jul 2 2025
• lmata updated the task description for T398302: On-call batphone escalation configuration holidays FY2025/26.
• lmata moved T305223: Clean up stale Prometheus target and rules files from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T228380: Tech debt: sunsetting of Graphite from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T393630: Cookbook downtiming does not work, continues anyway from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395032: Cookbook sre.hosts.remove_downtime does not remove silences from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T397427: librenms-syslog leaks memory from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T396862: Improve titan hosts stateless-ness from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T397756: Kafka-logging -> Bookworm from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T397757: Kafkamon -> Bookworm from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T392886: Revisit default Istio histogram buckets from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T372242: Alert on unscrapable pods from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T321808: Port all Icinga checks to Prometheus/Alertmanager from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T372845: Migrate all o11y services to nftables from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T390196: Deploy and document a method to dump logs from logstash from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T394069: Rendering Graph's as images times out on Grafana 11 from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395441: Port all Icinga checks to Prometheus/Alertmanager: preparation from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395442: Setup reliable migration dashboards from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T396626: Hardware retirement Graphite Infrastructure (ETA June 2026) from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395448: Discuss about "host down" semantics from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395447: Prototype / experiment with moving raid checks to alertmanager from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395553: ircecho (icinga-wm) doesn't automatically restart if not connected from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395449: Reimage cookbook icinga logic review from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T395916: Reduce Pyrra's default window from 12w to 4w from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata moved T398444: More frequent Puppet runs on the alert hosts? from Inbox to Up next on the SRE Observability (FY2025/2026-Q1) board.
• lmata edited projects for T398073: Ensure DPE SRE can receive alerts for applications hosted in wikikube, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability.
• lmata edited projects for T398444: More frequent Puppet runs on the alert hosts?, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability.
• lmata moved T398313: Add a banner to slo.wikimedia.org explaining rolling vs calendar views from Inbox to Radar on the observability board.
• lmata moved T387350: liftwing SLO performance issues from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata edited projects for T395916: Reduce Pyrra's default window from 12w to 4w, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability (FY2024/2025-Q4).
Jul 1 2025
Jul 1 2025
• lmata updated the task description for T398302: On-call batphone escalation configuration holidays FY2025/26.
Jun 30 2025
Jun 30 2025
• lmata moved T394045: When selecting a DC some Grafana panels show instances for other DC from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T394319: Move thanos cache out of process from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T394318: Revisit thanos queries concurrency and limits from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T395098: Upgrade to Grafana 12.0.1 from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T395130: Migrate prometheus7001 to prometheus7002 from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T392488: kafka-logging2005 is down since six days from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T393439: Graphite data sources broken on grafana-next from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T391661: Weekly indices are force-merged by curator every day for a week from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T390194: Add read-only users capability to logs-api.svc from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T385693: thanos-query overload due to heavy queries from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T383966: Upgrade Thanos to 0.38.0 from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T381665: module to define custom Prometheus alerts directly in Puppet from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T383232: Move k8s Prometheus instances to new Prometheus hw in eqiad/codfw from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T372457: Remove librenms -> graphite integration, replace with gnmi from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T370772: Prometheus eqiad/codfw hw expansion architecture options from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T384841: Upgrade to Grafana 11 from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T359271: (Analytics?) Migrate MediaWiki.TemplateData to statslib from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T397967: scap logs are being dead-lettered from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T395819: Create a tool to validate ECS logs from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T391333: Revisit default envoy histogram buckets from In progress to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata moved T392994: Move Thanos trace sampling to native and off otlp coll from In progress to Done on the SRE Observability (FY2024/2025-Q4) board.
• lmata edited projects for T353912: Observability Bookworm upgrades, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability (FY2024/2025-Q4).
• lmata edited projects for T343020: Converting MediaWiki Metrics to StatsLib, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability (FY2024/2025-Q4).
• lmata edited projects for T288622: All Prometheus based alerts move from Icinga to alert manager exclusively, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability (FY2024/2025-Q4).
• lmata edited projects for T350592: EPIC: migrate in use metrics and dashboards to statslib, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability (FY2024/2025-Q4).
Jun 25 2025
Jun 25 2025
• lmata edited projects for T397427: librenms-syslog leaks memory, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability.
• lmata moved T391516: https://performance.wikimedia.org/php-profiling/ leads to 404 for all listed sources from Inbox to Radar on the observability board.
• lmata edited projects for T397264: create a new place for prometheus/alertmanager checks not tied to physical machines, added: Observability-Alerting, SRE Observability (FY2025/2026-Q1); removed observability.
Jun 23 2025
Jun 23 2025
• lmata moved T368786: Add support for nesting to StatsFactory->getTiming start/stop feature from Inbox to Done on the SRE Observability (FY2024/2025-Q4) board.
Jun 19 2025
Jun 19 2025
• lmata closed T363753: Only select o11y-owned datasources on the Grafana Datasource utilization dashboard as Resolved.
Closing, this is either resolved or no longer necessary per the current status of T228380: Tech debt: sunsetting of Graphite
• lmata updated the task description for T369122: On-call batphone escalation configuration holidays FY2024/25.
Jun 18 2025
Jun 18 2025
• lmata closed T379156: Change/fix real user performance alert to only use Prometheus, a subtask of T228380: Tech debt: sunsetting of Graphite, as Resolved.
• lmata closed T379156: Change/fix real user performance alert to only use Prometheus, a subtask of T384459: Web Performance responsibilities, as Resolved.
Hi! Discussed this task with the team today, they've shared that these have been migrated. I'm resolving this task for now. Please re-open if this is not the case and there is still work pending.
• lmata edited projects for T305223: Clean up stale Prometheus target and rules files, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability.
• lmata edited projects for T393630: Cookbook downtiming does not work, continues anyway, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability.
• lmata edited projects for T396862: Improve titan hosts stateless-ness, added: SRE Observability (FY2025/2026-Q1); removed SRE Observability.
• lmata moved T382181: Investigate adding toolforge projects to Prometheus from Inbox to Radar on the observability board.