Page MenuHomePhabricator
Feed Advanced Search

Jan 26 2024

lmata removed a project from T349521: Prometheus/Pyrra: establish backfill process for recording rules: SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:06 AM · Patch-For-Review, User-herron, Observability-Metrics
lmata edited projects for T353691: Reload thanos-rule on new pyrra rules deployed, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:06 AM · SRE Observability (FY2023/2024-Q3), User-herron, Observability-Metrics
lmata removed a project from T353716: Pyrra: cleanup output-rules when config is removed: SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:05 AM · User-herron, Observability-Metrics
lmata edited projects for T288622: All Prometheus based alerts move from Icinga to alert manager exclusively, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:05 AM · SRE Observability (FY2024/2025-Q1)
lmata edited projects for T321808: Port most/all Icinga checks to Prometheus/Alertmanager, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:04 AM · SRE Observability (FY2024/2025-Q1), Observability-Alerting
lmata edited projects for T348756: Wikimedia\MWConfig\Profiler::excimerFlushToArclamp(): PHP Warning: RedisException: Connection timed out, added: Observability-Metrics; removed SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:03 AM · MediaWiki-Platform-Team, Observability-Metrics, observability, Arc-Lamp, Wikimedia-production-error
lmata removed a project from T329232: kafka-logging: ensure cluster wide failure mode alerting coverage: SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:01 AM · Observability-Alerting, Observability-Logging
lmata removed a project from T274372: Improve Automation for Alert Reviews: SRE Observability (FY2023/2024-Q2).
Jan 26 2024, 1:01 AM · Observability-Alerting

Jan 19 2024

lmata awarded T350591: Audit legacy mediawiki stats used in production dashboards a Party Time token.
Jan 19 2024, 8:19 PM · SRE Observability (FY2023/2024-Q3), Patch-For-Review, Observability-Metrics

Jan 17 2024

lmata added a comment to T354904: Benthos cannot join logstash consumer groups.

Based on my understanding of the task, it appears that the only viable way to test it is by implementing it in a live environment. While this may seem spooky, I don't think we have another environment to simulate this effectively to test the sampling.

Jan 17 2024, 5:29 PM · Patch-For-Review, Observability-Logging

Jan 16 2024

lmata added a comment to T350192: On-call batphone escalation configuration holidays FY2023-24.

We are back to regular on-call

Jan 16 2024, 12:18 AM · SRE Observability (FY2023/2024-Q4)

Jan 15 2024

lmata added a comment to T350192: On-call batphone escalation configuration holidays FY2023-24.

batphone enabled for MLK

Jan 15 2024, 1:00 PM · SRE Observability (FY2023/2024-Q4)
lmata updated the task description for T350192: On-call batphone escalation configuration holidays FY2023-24.
Jan 15 2024, 1:00 PM · SRE Observability (FY2023/2024-Q4)

Jan 12 2024

lmata moved T326419: Expand kafka-logging using hosts kafka-logging[12]00[45] from Inbox to Up next on the SRE Observability (FY2023/2024-Q3) board.
Jan 12 2024, 6:05 PM · SRE Observability (FY2023/2024-Q3), User-herron, Observability-Logging
lmata moved T317887: Upgrade to Grafana 9 from Inbox to Up next on the SRE Observability (FY2023/2024-Q2) board.

hi @colewhite, a friendly reminder that the silence is expiring on 2024-02-01.

Jan 12 2024, 5:58 PM · SRE Observability (FY2023/2024-Q3), Observability-Metrics
lmata edited projects for T350591: Audit legacy mediawiki stats used in production dashboards, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:55 PM · SRE Observability (FY2023/2024-Q3), Patch-For-Review, Observability-Metrics
lmata edited projects for T349626: Migrate SRE repositories to GitLab - operations/alerts, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:55 PM · Observability-Alerting, Patch-For-Review, GitLab (Project Migration), collaboration-services
lmata edited projects for T350597: Audit and prioritize metrics for conversion to statslib that are used for graphite-based alerting, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:54 PM · SRE Observability (FY2024/2025-Q1), Discovery-Search (Current work), Data-Platform-SRE, MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), User-fgiunchedi, Observability-Metrics
lmata removed a project from T325775: improve email template for alertmanager notifications: SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:53 PM · Observability-Alerting
lmata lowered the priority of T317240: Improve AlertManager alert titles as sent to VictorOps from High to Medium.

Lowering priority due to lack of activity, we can revisit this if it continues to be a pressing matter.

Jan 12 2024, 5:53 PM · User-fgiunchedi, SRE-OnFire, Observability-Alerting
lmata lowered the priority of T325745: Improve AlertManager notifications from High to Medium.

Based on the lack of recent feedback indicating that this issue persists, we have decided to downgrade its severity. By doing so, we can focus our resources on more pressing concerns that require immediate attention.

Jan 12 2024, 5:51 PM · Observability-Alerting
lmata removed a project from T317240: Improve AlertManager alert titles as sent to VictorOps: SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:50 PM · User-fgiunchedi, SRE-OnFire, Observability-Alerting
lmata edited projects for T333615: Upgrade alert* hosts to Bookworm, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:49 PM · Patch-For-Review, SRE, SRE Observability (FY2023/2024-Q3)
lmata edited projects for T302373: Upgrade prometheus-statsd-exporter, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:48 PM · SRE Observability (FY2023/2024-Q4), User-fgiunchedi, Observability-Metrics
lmata removed a project from T351935: Audit Prometheus metrics size/label values: SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:48 PM · Observability-Metrics
lmata removed a project from T343029: Audit & convert stats for mediawiki extensions : SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:46 PM · Observability-Metrics
lmata removed a project from T343045: Audit & convert stats for mediawiki modules: SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:46 PM · Observability-Metrics
lmata moved T343020: Converting MediaWiki Metrics to StatsLib from Up next to Epics In Progress on the SRE Observability (FY2023/2024-Q2) board.
Jan 12 2024, 5:46 PM · SRE Observability (FY2024/2025-Q1), Observability-Metrics
lmata edited projects for T350192: On-call batphone escalation configuration holidays FY2023-24, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:44 PM · SRE Observability (FY2023/2024-Q4)
lmata merged task T343028: Audit & convert stats for mediawiki core into T350592: EPIC: migrate in use metrics and dashboards to statslib.
Jan 12 2024, 5:44 PM · SRE Observability (FY2023/2024-Q3), Observability-Metrics
lmata merged T343028: Audit & convert stats for mediawiki core into T350592: EPIC: migrate in use metrics and dashboards to statslib.
Jan 12 2024, 5:44 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata edited projects for T343028: Audit & convert stats for mediawiki core, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:42 PM · SRE Observability (FY2023/2024-Q3), Observability-Metrics
lmata closed T343024: Configure MediaWiki to use new StatsLib in production as Resolved.

I'm resolving this one on my understanding that this has already been deployed. Please reopen if that's not the case, and there's work pending. Thanks!

Jan 12 2024, 5:42 PM · MW-1.42-notes (1.42.0-wmf.12; 2024-01-02), SRE Observability (FY2023/2024-Q2), Observability-Metrics
lmata closed T343024: Configure MediaWiki to use new StatsLib in production, a subtask of T343020: Converting MediaWiki Metrics to StatsLib, as Resolved.
Jan 12 2024, 5:42 PM · SRE Observability (FY2024/2025-Q1), Observability-Metrics
lmata moved T240685: MediaWiki Prometheus support from Inbox to Epics In Progress on the SRE Observability (FY2023/2024-Q3) board.
Jan 12 2024, 5:39 PM · SRE Observability (FY2023/2024-Q4), MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MediaWiki-libs-Stats, Platform Team Workboards (External Code Reviews), serviceops, SRE, MediaWiki-General, observability
lmata edited projects for T240685: MediaWiki Prometheus support, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:39 PM · SRE Observability (FY2023/2024-Q4), MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MediaWiki-libs-Stats, Platform Team Workboards (External Code Reviews), serviceops, SRE, MediaWiki-General, observability
lmata edited projects for T354905: migrate MediaWiki.timing.editResponseTime to statslib, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:36 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), MediaWiki-Platform-Team, Patch-For-Review, SRE Observability (FY2023/2024-Q3), Observability-Metrics
lmata edited projects for T354906: migrate mw.performance.save to statslib, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:36 PM · SRE Observability (FY2023/2024-Q3), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata edited projects for T354907: (mw.track) migrate MediaWiki.RevisionSlider.timing.init to statslib, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:36 PM · MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata edited projects for T354908: evaluate and migrate in-use parsoid metrics to statslib, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:36 PM · Patch-For-Review, Content-Transform-Team, OKR-Work, Content-Transform-Team-WIP, MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata edited projects for T354909: migrate MediaWiki.wikibase.quality.constraints.type.php.success.entities to statslib, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 12 2024, 5:36 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), SRE Observability (FY2023/2024-Q3), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata updated the task description for T240685: MediaWiki Prometheus support.
Jan 12 2024, 5:13 PM · SRE Observability (FY2023/2024-Q4), MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MediaWiki-libs-Stats, Platform Team Workboards (External Code Reviews), serviceops, SRE, MediaWiki-General, observability

Jan 11 2024

lmata raised the priority of T350592: EPIC: migrate in use metrics and dashboards to statslib from Medium to High.
Jan 11 2024, 9:51 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata edited projects for T350592: EPIC: migrate in use metrics and dashboards to statslib, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 11 2024, 9:51 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata updated the task description for T350592: EPIC: migrate in use metrics and dashboards to statslib.
Jan 11 2024, 9:44 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata created T354909: migrate MediaWiki.wikibase.quality.constraints.type.php.success.entities to statslib.
Jan 11 2024, 9:43 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), SRE Observability (FY2023/2024-Q3), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata updated the task description for T354908: evaluate and migrate in-use parsoid metrics to statslib.
Jan 11 2024, 9:42 PM · Patch-For-Review, Content-Transform-Team, OKR-Work, Content-Transform-Team-WIP, MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata renamed T354908: evaluate and migrate in-use parsoid metrics to statslib from migrate MediaWiki.Parsoid.html2wt.setup to statslib to migrate top used parsoid to statslib.
Jan 11 2024, 9:41 PM · Patch-For-Review, Content-Transform-Team, OKR-Work, Content-Transform-Team-WIP, MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata updated the task description for T354908: evaluate and migrate in-use parsoid metrics to statslib.
Jan 11 2024, 9:40 PM · Patch-For-Review, Content-Transform-Team, OKR-Work, Content-Transform-Team-WIP, MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata created T354908: evaluate and migrate in-use parsoid metrics to statslib.
Jan 11 2024, 9:39 PM · Patch-For-Review, Content-Transform-Team, OKR-Work, Content-Transform-Team-WIP, MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata created T354907: (mw.track) migrate MediaWiki.RevisionSlider.timing.init to statslib.
Jan 11 2024, 9:37 PM · MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata created T354906: migrate mw.performance.save to statslib.
Jan 11 2024, 9:36 PM · SRE Observability (FY2023/2024-Q3), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata created T354905: migrate MediaWiki.timing.editResponseTime to statslib.
Jan 11 2024, 9:35 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), MediaWiki-Platform-Team, Patch-For-Review, SRE Observability (FY2023/2024-Q3), Observability-Metrics
lmata updated the task description for T350592: EPIC: migrate in use metrics and dashboards to statslib.
Jan 11 2024, 9:34 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics

Jan 10 2024

lmata moved T354217: Make arc-lamp aware of the new names of request life cycle methods. from Inbox to Up next on the SRE Observability (FY2023/2024-Q3) board.
Jan 10 2024, 3:56 PM · SRE Observability (FY2023/2024-Q3), Patch-For-Review, Arc-Lamp
lmata removed a project from T278309: Move librenms deployment to Debian package: SRE Observability (FY2023/2024-Q4).
Jan 10 2024, 3:55 PM · Patch-For-Review, Observability-Metrics
lmata moved T277816: Improve Logstash's rate-limiting capabilities from Inbox to Up next on the SRE Observability (FY2023/2024-Q3) board.
Jan 10 2024, 3:53 PM · Observability-Logging, Wikimedia-Logstash
lmata edited projects for T277816: Improve Logstash's rate-limiting capabilities, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2), observability.
Jan 10 2024, 3:53 PM · Observability-Logging, Wikimedia-Logstash
lmata moved T353912: Observability Bookworm upgrades from Inbox to Up next on the SRE Observability (FY2023/2024-Q3) board.
Jan 10 2024, 3:47 PM · SRE Observability (FY2024/2025-Q1), Patch-For-Review
lmata moved T354570: Create alert Review for FY2023/2024-Q3 from Inbox to Up next on the SRE Observability (FY2023/2024-Q3) board.
Jan 10 2024, 3:47 PM · SRE Observability (FY2023/2024-Q3)
lmata edited projects for T354255: Alert in need of triage: AlertLintProblem (instance localhost:9123), added: SRE Observability (FY2023/2024-Q3); removed SRE Observability.
Jan 10 2024, 3:13 PM · SRE Observability (FY2024/2025-Q1), sre-alert-triage
lmata edited projects for T354217: Make arc-lamp aware of the new names of request life cycle methods., added: SRE Observability (FY2023/2024-Q3); removed observability.
Jan 10 2024, 3:12 PM · SRE Observability (FY2023/2024-Q3), Patch-For-Review, Arc-Lamp
lmata edited projects for T354762: [pint,karma] Find a way to forward AlertLintProblem to the right team (ex. using the team=wmcs label), added: Observability-Alerting; removed observability.
Jan 10 2024, 3:06 PM · Observability-Alerting

Jan 9 2024

lmata updated the task description for T350592: EPIC: migrate in use metrics and dashboards to statslib.
Jan 9 2024, 1:17 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata renamed T350592: EPIC: migrate in use metrics and dashboards to statslib from Audit and prioritize metrics for conversion to statslib that rely on grafana alerting to migrate in use metrics and dashboards to statslib.
Jan 9 2024, 1:06 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata added a comment to T350592: EPIC: migrate in use metrics and dashboards to statslib.

Keeping this task open for further snapshots as the project evolves.

Jan 9 2024, 12:55 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
lmata raised the priority of T350592: EPIC: migrate in use metrics and dashboards to statslib from Low to Medium.
Jan 9 2024, 12:54 PM · SRE Observability (FY2024/2025-Q1), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics

Jan 8 2024

lmata added a comment to T307958: Reminders for unhandled/unacked alerts.

This is essentially what https://alerts.wikimedia.org/triage/ displays now, for hide_alerts_older_than: '1200h' alerts. The app also offers the user a button to open a task

Jan 8 2024, 8:47 PM · Observability-Alerting, SRE
lmata edited projects for T353912: Observability Bookworm upgrades, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability.
Jan 8 2024, 4:24 PM · SRE Observability (FY2024/2025-Q1), Patch-For-Review
lmata moved T352665: Upgrade Grafana hosts to Bookworm from Inbox to Up next on the SRE Observability (FY2023/2024-Q3) board.
Jan 8 2024, 4:22 PM · Patch-For-Review, SRE Observability (FY2023/2024-Q3)
lmata assigned T352665: Upgrade Grafana hosts to Bookworm to andrea.denisse.
Jan 8 2024, 4:21 PM · Patch-For-Review, SRE Observability (FY2023/2024-Q3)
lmata edited projects for T352665: Upgrade Grafana hosts to Bookworm, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Jan 8 2024, 4:21 PM · Patch-For-Review, SRE Observability (FY2023/2024-Q3)
lmata added a comment to T350192: On-call batphone escalation configuration holidays FY2023-24.

Batphone has been removed, and the business-hours on-call rota is enabled again in Splunk on-call.

Jan 8 2024, 12:45 PM · SRE Observability (FY2023/2024-Q4)

Dec 23 2023

lmata added a comment to T350192: On-call batphone escalation configuration holidays FY2023-24.

@MatthewVernon, IT does look weird. I think it's just the UI; when I added Batphone to the escalation path instead of the EMEA/Americas rotation, it seems to have expanded the Batphone list as the escalation and the on/off-calls folks according to their individual schedule set within the Batphone rotation.

Dec 23 2023, 4:52 PM · SRE Observability (FY2023/2024-Q4)
lmata updated the task description for T350192: On-call batphone escalation configuration holidays FY2023-24.
Dec 23 2023, 2:05 PM · SRE Observability (FY2023/2024-Q4)
lmata updated the task description for T350192: On-call batphone escalation configuration holidays FY2023-24.
Dec 23 2023, 2:04 PM · SRE Observability (FY2023/2024-Q4)

Dec 20 2023

fgiunchedi awarded T343024: Configure MediaWiki to use new StatsLib in production a Party Time token.
Dec 20 2023, 11:20 AM · MW-1.42-notes (1.42.0-wmf.12; 2024-01-02), SRE Observability (FY2023/2024-Q2), Observability-Metrics

Dec 14 2023

lmata placed T343024: Configure MediaWiki to use new StatsLib in production up for grabs.
Dec 14 2023, 8:11 PM · MW-1.42-notes (1.42.0-wmf.12; 2024-01-02), SRE Observability (FY2023/2024-Q2), Observability-Metrics

Dec 6 2023

lmata removed a project from T328707: Update arclamp to active/active architecture: SRE Observability (FY2023/2024-Q3).

It seems like this is not a priority, so we'll postpone it for now. We can revisit it when the time comes.

Dec 6 2023, 3:30 PM · Arc-Lamp, Observability-Tracing
lmata edited projects for T343529: Prometheus doesn't reload or alert on expired client certificates, added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Dec 6 2023, 3:24 PM · SRE Observability (FY2024/2025-Q1), Prod-Kubernetes, Observability-Metrics, User-fgiunchedi, Kubernetes, serviceops-radar
lmata edited projects for T326419: Expand kafka-logging using hosts kafka-logging[12]00[45], added: SRE Observability (FY2023/2024-Q3); removed SRE Observability (FY2023/2024-Q2).
Dec 6 2023, 3:16 PM · SRE Observability (FY2023/2024-Q3), User-herron, Observability-Logging
lmata edited projects for T352517: Put logging-hd[12]00[1-3] in service, added: SRE Observability (FY2023/2024-Q3); removed observability.
Dec 6 2023, 3:08 PM · SRE Observability (FY2023/2024-Q4), Patch-For-Review, Observability-Logging
lmata removed a project from T352756: Gap in metrics rendered from Thanos Rules: observability.
Dec 6 2023, 3:07 PM · SRE Observability (FY2024/2025-Q1), Observability-Metrics, Machine-Learning-Team
lmata added a project to T352756: Gap in metrics rendered from Thanos Rules: SRE Observability (FY2023/2024-Q2).
Dec 6 2023, 3:07 PM · SRE Observability (FY2024/2025-Q1), Observability-Metrics, Machine-Learning-Team
lmata moved T352783: Change data platform-related IRC channels to improve communication from Inbox to Radar on the observability board.
Dec 6 2023, 3:05 PM · Data-Platform-SRE, Data-Platform, observability

Dec 5 2023

lmata added a project to T351179: LVM vg0 close to getting full on prometheus eqiad: SRE Observability (FY2023/2024-Q2).
Dec 5 2023, 4:29 PM · SRE Observability (FY2023/2024-Q4), Observability-Metrics
lmata added a project to T351935: Audit Prometheus metrics size/label values: SRE Observability (FY2023/2024-Q2).
Dec 5 2023, 4:25 PM · Observability-Metrics
lmata added a project to T351936: Stop exporting unit state metrics for timers: SRE Observability (FY2023/2024-Q2).
Dec 5 2023, 4:24 PM · SRE Observability (FY2023/2024-Q2), User-fgiunchedi, Observability-Metrics
lmata added a comment to T277816: Improve Logstash's rate-limiting capabilities.

Raising priority based on recent conversations with the team and the intent to address this in the near future as part of risk mitigations to the logging pipeline.

Dec 5 2023, 3:21 PM · Observability-Logging, Wikimedia-Logstash
lmata moved T277816: Improve Logstash's rate-limiting capabilities from Backlog to Prioritized on the Observability-Logging board.
Dec 5 2023, 3:19 PM · Observability-Logging, Wikimedia-Logstash
lmata triaged T277816: Improve Logstash's rate-limiting capabilities as High priority.
Dec 5 2023, 3:19 PM · Observability-Logging, Wikimedia-Logstash
lmata triaged T336701: Bridge wikimediastatus.net to Mastodon as Low priority.
Dec 5 2023, 3:14 PM · Incident Tooling, SRE
lmata closed T352128: No on-call page notification when shift override was set on November 27 as Resolved.

It seems like the issue is resolved.

Dec 5 2023, 3:13 PM · Incident Tooling

Nov 29 2023

lmata awarded T351934: Label value spam in ncredir_requests_total metric a Like token.
Nov 29 2023, 6:57 PM · Traffic, Observability-Metrics
lmata moved T352128: No on-call page notification when shift override was set on November 27 from Inbox to Prioritized on the Incident Tooling board.
Nov 29 2023, 3:22 PM · Incident Tooling
lmata moved T322636: Provide mechanism to join/leave oncall from Inbox to Backlog on the Incident Tooling board.
Nov 29 2023, 3:22 PM · Incident Tooling, SRE-OnFire
lmata moved T288622: All Prometheus based alerts move from Icinga to alert manager exclusively from Inbox to Epics In Progress on the SRE Observability (FY2023/2024-Q2) board.
Nov 29 2023, 3:09 PM · SRE Observability (FY2024/2025-Q1)
lmata moved T288622: All Prometheus based alerts move from Icinga to alert manager exclusively from Epics In Progress to Inbox on the SRE Observability (FY2023/2024-Q2) board.
Nov 29 2023, 3:09 PM · SRE Observability (FY2024/2025-Q1)

Nov 28 2023

lmata added a comment to T349159: Transfer Arc Lamp alerts (Grafana and cron errors_mailto) to SRE O11y.

Arc Lamp metrics are visualised in Grafana but the alerts are defined in AlertManager, not in Grafana.

Nov 28 2023, 2:03 PM · MediaWiki-Platform-Team (Radar), SRE Observability (FY2023/2024-Q2), observability

Nov 27 2023

lmata added a project to T351927: Decide and tweak Thanos retention: SRE Observability (FY2023/2024-Q2).

Adding to ongoing quarter for visibility.

Nov 27 2023, 6:01 PM · User-fgiunchedi, Observability-Metrics