Page MenuHomePhabricator
Feed Advanced Search

Thu, May 6

lmata added projects to T274462: Logging for GitLab: observability, Wikimedia-Logstash.
Thu, May 6, 3:11 PM · Wikimedia-Logstash, observability, User-brennen, GitLab (Initialization)

Wed, May 5

lmata added a comment to T279602: reclaim icinga2001.wikimedia.org.

\o/

Wed, May 5, 5:27 PM · observability, decommission-hardware

Mon, May 3

lmata moved T248884: Documentation of client side error logging capabilities on mediawiki from Inbox to Radar on the observability board.
Mon, May 3, 3:42 PM · observability, Instrument-ClientError, Analytics-Radar, Documentation, Performance-Team (Radar), Wikimedia-Logstash, Better Use Of Data
lmata moved T260667: scap on beta fails canary check: KeyError: 'aggregations' from Inbox to Radar on the observability board.
Mon, May 3, 3:38 PM · Release-Engineering-Team (Doing), observability, Discovery-Search, Wikimedia-Logstash, Beta-Cluster-Infrastructure
lmata moved T262741: "Wikidata API format usage" Grafana dashboard is empty from Inbox to Radar on the observability board.
Mon, May 3, 3:37 PM · observability, Wikidata, Graphite
lmata moved T281266: Decommission old ELK5 Logstash cluster from Inbox to In progress on the observability board.
Mon, May 3, 3:36 PM · Patch-For-Review, observability, SRE
lmata moved T281267: various weekly and daily dumps run from systemd timers are broken from Inbox to Radar on the observability board.
Mon, May 3, 3:36 PM · wdwb-tech, Wikidata, SRE, observability, Dumps-Generation
lmata moved T281358: Move Performance Icinga alerts to AlertManager from Inbox to In progress on the observability board.
Mon, May 3, 3:36 PM · Patch-For-Review, Performance-Team, observability, User-fgiunchedi
lmata moved T281359: Onboard teams with Grafana alerts to AM from Inbox to In progress on the observability board.
Mon, May 3, 3:36 PM · User-fgiunchedi, observability
lmata moved T281454: Onboard teams with Prometheus-based alerts to AM from Inbox to In progress on the observability board.
Mon, May 3, 3:36 PM · User-fgiunchedi, observability
lmata moved T281507: KaiOS app client-side errors dashboard stopped working from Inbox to Backlog on the observability board.
Mon, May 3, 3:35 PM · observability, Wikimedia-Logstash, Inuka-Team

Fri, Apr 30

Quiddity awarded T202061: Implement an accurate and easy to understand status page for all wikis a Love token.
Fri, Apr 30, 2:27 AM · observability, SRE

Thu, Apr 29

lmata claimed T202061: Implement an accurate and easy to understand status page for all wikis.
Thu, Apr 29, 9:03 PM · observability, SRE
lmata added a comment to T202061: Implement an accurate and easy to understand status page for all wikis.

Quote opened in T281530

Thu, Apr 29, 9:03 PM · observability, SRE
lmata moved T202061: Implement an accurate and easy to understand status page for all wikis from Backlog to In progress on the observability board.

Technically in progress between @CDanis and me.

Thu, Apr 29, 9:00 PM · observability, SRE
lmata added a comment to T202061: Implement an accurate and easy to understand status page for all wikis.

This needs to live again

Thu, Apr 29, 9:00 PM · observability, SRE

Mon, Apr 26

lmata moved T269768: Investigate how to visualize Out Of Memomy and Timeout errors related to Wikidata from Inbox to Radar on the observability board.
Mon, Apr 26, 3:21 PM · observability, Wikimedia-Logstash, Wikidata Infrastructure Reliability Sprint Dec 2020, User-Michael
lmata closed T272016: Update saved / short links with objects in ELK7, a subtask of T234854: Upgrade ELK Stack to version 7, as Resolved.
Mon, Apr 26, 3:21 PM · observability, Patch-For-Review, SRE, Wikimedia-Logstash
lmata closed T272016: Update saved / short links with objects in ELK7 as Resolved.

closing please reopen if unresolved

Mon, Apr 26, 3:21 PM · observability, SRE, Wikimedia-Logstash
lmata moved T277516: Group DBPerformance logs by violated measure from Inbox to Radar on the observability board.
Mon, Apr 26, 3:19 PM · observability, MW-1.36-notes (1.36.0-wmf.38; 2021-04-06), Patch-For-Review, Performance-Team, Wikimedia-Logstash, Developer Productivity, Wikimedia-Rdbms
lmata moved T279046: Broken reportupdater queries: edit count bucket label contains illegal characters from Inbox to Radar on the observability board.
Mon, Apr 26, 3:19 PM · observability, Graphite, WMDE-TechWish-Sprint-2021-04-14, WMDE-TechWish-Sprint-2021-03-31, Analytics-Radar
lmata moved T279807: Unreliable message:"something" search since last Elastic upgrade from Inbox to Backlog on the observability board.
Mon, Apr 26, 3:18 PM · observability, Wikimedia-Logstash
lmata moved T280083: Pontoon: unable to provision role::puppetmaster::pontoon from Inbox to Backlog on the observability board.
Mon, Apr 26, 3:17 PM · observability
lmata moved T281039: Splunk On-Call doing something odd with routing some wmcs alerts from Inbox to Backlog on the observability board.
Mon, Apr 26, 3:16 PM · cloud-services-team (Kanban), observability
lmata moved T281048: mwlog1001 is running out of free space on /srv/mw-log from Inbox to In progress on the observability board.
Mon, Apr 26, 3:16 PM · Performance-Team, MediaWiki-Revision-backend, MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), observability, SRE
lmata moved T281095: Move paging for librenms from icinga to AM from Inbox to Backlog on the observability board.
Mon, Apr 26, 3:15 PM · Patch-For-Review, SRE, User-fgiunchedi, netops, observability

Tue, Apr 20

lmata added a comment to T141038: implement paging for non-ops teams.

do we want to keep the same scope for ICINGA? or consider our other paging tools?

Tue, Apr 20, 4:33 PM · observability, Icinga, SRE

Thu, Apr 15

lmata added a comment to T280242: Requesting access to graphite hosts for awight.

Looks good to me, approved, thanks @MoritzMuehlenhoff !

Thu, Apr 15, 12:44 PM · SRE, SRE-Access-Requests, Graphite, observability

Mar 25 2021

lmata moved T276697: Implement central logging for mailman3 from Radar to Inbox on the observability board.

Sure thing @Legoktm will discuss with team and share notes here

Mar 25 2021, 12:57 PM · Patch-For-Review, observability, SRE, Wikimedia-Mailing-lists

Mar 23 2021

lmata added a comment to T240685: MediaWiki Prometheus support.

@AMooney yes please

Mar 23 2021, 2:49 PM · Platform Team Workboards (External Code Reviews), Patch-For-Review, serviceops, SRE, MediaWiki-General, observability

Mar 22 2021

lmata moved T276468: Unable to exclude "error" field in Logstash from Inbox to Backlog on the observability board.
Mar 22 2021, 3:35 PM · observability, Wikimedia-Logstash
lmata moved T276492: Notifications when prometheus daemons are wedged from Inbox to Radar on the observability board.
Mar 22 2021, 3:34 PM · observability, Discovery-Search
lmata added a comment to T276492: Notifications when prometheus daemons are wedged.

Hello @EBernhardson, moving to radar for now, please let us know how you'd like to proceed and if you need assistance. thanks!

Mar 22 2021, 3:34 PM · observability, Discovery-Search
lmata moved T276623: Convert udp2log init script to use systemd from Backlog to Radar on the observability board.
Mar 22 2021, 3:33 PM · Patch-For-Review, observability, SRE
lmata moved T276623: Convert udp2log init script to use systemd from Radar to Backlog on the observability board.
Mar 22 2021, 3:32 PM · Patch-For-Review, observability, SRE
lmata moved T276623: Convert udp2log init script to use systemd from Inbox to Radar on the observability board.
Mar 22 2021, 3:30 PM · Patch-For-Review, observability, SRE
lmata moved T277445: Hourly log rotation for large MW logs from Inbox to Backlog on the observability board.
Mar 22 2021, 3:29 PM · Developer Productivity, Platform Team Workboards (Clinic Duty Team), observability
lmata assigned T277445: Hourly log rotation for large MW logs to herron.
Mar 22 2021, 3:29 PM · Developer Productivity, Platform Team Workboards (Clinic Duty Team), observability
lmata added a comment to T228838: Consider enabling all MW log channels by default for WMF.

@thcipriani would it be helpful to set a time to chat about this further? I don't know if there is an immediate plan to move MW to ECS, but lets discuss options available and see if there is a suitable path forward.

Mar 22 2021, 3:28 PM · Release-Engineering-Team (Radar), observability, Platform Engineering (Icebox), Developer Productivity, MediaWiki-Debug-Logger
lmata moved T277739: rsyslog-kubernetes missing in buster-wikimedia from Inbox to Radar on the observability board.
Mar 22 2021, 3:22 PM · SRE, observability
lmata added a comment to T277927: Add monitoring for performance.wikimedia.org.

hi @Legoktm let us (o11y) know if you need some help!

Mar 22 2021, 3:19 PM · observability, SRE, Performance-Team
lmata moved T277927: Add monitoring for performance.wikimedia.org from Inbox to Radar on the observability board.
Mar 22 2021, 3:19 PM · observability, SRE, Performance-Team

Mar 16 2021

lmata added a project to T240685: MediaWiki Prometheus support: Platform Team Workboards (Clinic Duty Team).

Hi @AMooney, I'd like to present this patch as the other of the two I was hoping to bring to your attention for next clinic duty... Please let me know if/how to proceed. thanks!

Mar 16 2021, 6:39 PM · Platform Team Workboards (External Code Reviews), Patch-For-Review, serviceops, SRE, MediaWiki-General, observability
lmata added a project to T269676: Mediawiki logging indexing conflict on 'status' for 'authevents': Platform Team Workboards (Clinic Duty Team).

Greetings @AMooney, this patch is one of the two I was hoping to bring to your attention for next clinic duty... This one is for some changes around logging and trying out the new clinic workflow regarding the "happy path" for these types of patches. Please let me know if/how to proceed. thanks!

Mar 16 2021, 6:35 PM · MW-1.36-notes, MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), Platform Team Workboards (External Code Reviews), Patch-For-Review, MW-1.35-notes, observability, MediaWiki-General

Mar 15 2021

lmata moved T276972: Set up cross DC topic mirroring for Kafka logging clusters from Radar to Backlog on the SRE board.
Mar 15 2021, 4:24 PM · Analytics-Radar, observability, SRE
lmata moved T276972: Set up cross DC topic mirroring for Kafka logging clusters from Inbox to Radar on the observability board.
Mar 15 2021, 4:24 PM · Analytics-Radar, observability, SRE
lmata moved T276972: Set up cross DC topic mirroring for Kafka logging clusters from Backlog to Radar on the SRE board.
Mar 15 2021, 4:23 PM · Analytics-Radar, observability, SRE
lmata triaged T277163: Prometheus PoPs disk space utilization as Medium priority.

Moving to short term backlog

Mar 15 2021, 4:20 PM · User-fgiunchedi, observability
lmata added a comment to T277445: Hourly log rotation for large MW logs.

hi @tstarling we can help, how would you like to proceed?

Mar 15 2021, 4:17 PM · Developer Productivity, Platform Team Workboards (Clinic Duty Team), observability

Mar 8 2021

lmata triaged T276303: logmsgbot auth issues as Medium priority.
Mar 8 2021, 4:32 PM · observability
lmata moved T276501: Pontoon enroll fails to complete from Inbox to In progress on the observability board.
Mar 8 2021, 4:22 PM · observability
lmata moved T276595: Upgrade prometheus-jmx-exporter from Inbox to In progress on the observability board.
Mar 8 2021, 4:22 PM · Analytics-Clusters, wdwb-tech, SRE, Wikidata, Wikidata-Query-Service, CirrusSearch, observability
lmata updated subscribers of T276623: Convert udp2log init script to use systemd.

@herron this might be worth looking into as part of the mwlog buster upgrade

Mar 8 2021, 4:19 PM · Patch-For-Review, observability, SRE
lmata moved T276697: Implement central logging for mailman3 from Inbox to Radar on the observability board.
Mar 8 2021, 4:17 PM · Patch-For-Review, observability, SRE, Wikimedia-Mailing-lists
lmata moved T276749: Flapping Prometheus metrics for netbox_device_statistics from Inbox to Radar on the observability board.
Mar 8 2021, 4:16 PM · observability, netbox
lmata moved T276792: Remove cloud contacts from legacy paging from Inbox to In progress on the observability board.
Mar 8 2021, 4:15 PM · cloud-services-team (Kanban), User-fgiunchedi, observability

Feb 22 2021

lmata created T275405: Logstash collector nodes hang indefinitely on reboot.
Feb 22 2021, 4:34 PM · Patch-For-Review, observability
lmata added a comment to T274987: Review and purge deprecated Graphite metrics for CodeMirror.

hello @awight could you let me know the level of assistance you'd like with this task, or if its just here for information purposes. Thanks!

Feb 22 2021, 4:21 PM · WMDE-TechWish, observability, WMDE-Templates-FocusArea

Feb 16 2021

lmata added a comment to T273450: Purge and migrate deprecated metrics paths.

howdy @awight saw some chatter around this on the #wikimedia-sre-observability channel and am wondering if there is still input you would like from the team on this matter. Thanks!

Feb 16 2021, 4:33 PM · Epic, WMDE-TechWish (Sprint-2021-02-03), WMDE-Templates-FocusArea
lmata updated the image for observability from F34107832: profile to F34107835: profile.
Feb 16 2021, 3:52 PM
lmata updated the image for observability from F34107824: profile to F34107832: profile.
Feb 16 2021, 3:51 PM
lmata updated the image for observability from F34107816: profile to F34107824: profile.
Feb 16 2021, 3:49 PM
lmata updated the image for observability from F8447740: profile to F34107816: profile.
Feb 16 2021, 3:49 PM

Feb 12 2021

lmata moved T274665: Design and implement SLO Dashboard tooling from Inbox to In progress on the observability board.
Feb 12 2021, 6:11 PM · observability
lmata created T274665: Design and implement SLO Dashboard tooling.
Feb 12 2021, 4:25 PM · observability

Feb 2 2021

lmata created T273641: Security Issue Access Request for (lmata).
Feb 2 2021, 4:32 PM · SecTeam-Processed, Security-Team, Security

Feb 1 2021

lmata added a comment to T265876: Logging options for apache httpd in k8s.

noted @Joe! I'll reach out to you to coordinate a time to talk with the team.

Feb 1 2021, 5:44 PM · observability, SRE, serviceops, MW-on-K8s
lmata moved T265876: Logging options for apache httpd in k8s from Backlog to Inbox on the observability board.
Feb 1 2021, 4:16 PM · observability, SRE, serviceops, MW-on-K8s

Jan 25 2021

lmata closed T141520: "MediaWiki exceptions and fatals per minute" alarm is too slow (half an hour delay!) as Resolved.

3M delay seems like a short but acceptable window for alerting. If there is a need to shorten this down we can discuss.. Closing this ticket, please reopen if you'd like to revisit the conversation.

Jan 25 2021, 4:50 PM · Release-Engineering-Team (Deployment services), Release-Engineering-Team-TODO, SRE, observability
lmata moved T265876: Logging options for apache httpd in k8s from Inbox to Backlog on the observability board.
Jan 25 2021, 4:23 PM · observability, SRE, serviceops, MW-on-K8s
lmata added a comment to T271138: Some Observability clusters apparently do not support IPv6..

Is there a specific timeline you'd like us to meet with this? Mainly the goal is to understand urgency for prioritization. Thanks!

Jan 25 2021, 4:23 PM · IPv6, User-crusnov, observability, SRE-tools
lmata moved T271298: Add Icinga check for SRX cluster status from Inbox to Radar on the observability board.

Hi Arzhel,

Jan 25 2021, 4:20 PM · netops, observability, SRE
lmata moved T271822: Add support for scraping php applications to the kubernetes prometheus scraper from Inbox to Radar on the observability board.

Hi Joe,

Jan 25 2021, 4:17 PM · observability, MW-on-K8s, serviceops, SRE

Dec 14 2020

lmata added a project to T269937: Investigate how to aggregate Wikibase Timeout errors by their api-action or special page: observability.
Dec 14 2020, 4:23 PM · observability, Wikimedia-Logstash, Wikidata Infrastructure Reliability Sprint Dec 2020
lmata moved T269941: Investigate how to get data from logstash to Grafana for Timeout and Out of Memory errors from Inbox to Radar on the observability board.
Dec 14 2020, 4:21 PM · observability, Wikimedia-Logstash, Wikidata Infrastructure Reliability Sprint Dec 2020
lmata added a project to T269941: Investigate how to get data from logstash to Grafana for Timeout and Out of Memory errors : observability.
Dec 14 2020, 4:21 PM · observability, Wikimedia-Logstash, Wikidata Infrastructure Reliability Sprint Dec 2020

Dec 7 2020

lmata moved T269272: Sign-in links from Grafana dashboards don't work when not signed into SSO from Inbox to Backlog on the observability board.
Dec 7 2020, 4:21 PM · Patch-For-Review, User-fgiunchedi, Performance-Team (Radar), CAS-SSO, observability, SRE
lmata moved T269333: Switch default Grafana datasource to Thanos from Inbox to Backlog on the observability board.
Dec 7 2020, 4:20 PM · observability
lmata assigned T269560: Increased icinga check latency since 05/12 to colewhite.
Dec 7 2020, 4:16 PM · SRE, observability
lmata moved T269563: HP RAID failed on ms-be1054 didn't open a task from Inbox to Radar on the observability board.
Dec 7 2020, 4:13 PM · SRE, SRE-tools, observability

Nov 30 2020

lmata assigned T266570: Two close pages for idle workers api + appserver didn't auto-resolve on recovery to herron.
Nov 30 2020, 4:32 PM · observability, SRE
lmata closed T266800: VictorOps ~5min delay from email received to incident paging as Resolved.

Closing for now we can reopen if we see another occurrence of this event happening.

Nov 30 2020, 4:30 PM · observability, SRE
lmata moved T268369: how to deal with cumin alias alerts from Inbox to Radar on the observability board.
Nov 30 2020, 4:27 PM · SRE-tools, observability, SRE
lmata moved T268806: ELK: uniquely identify network syslog from Inbox to Backlog on the observability board.
Nov 30 2020, 4:27 PM · observability
lmata moved T268995: Add alertmanager@ email user/alias or equivalent from Inbox to In progress on the observability board.
Nov 30 2020, 4:24 PM · User-fgiunchedi, observability
lmata moved T269000: thanos: 404 error trying to fetch js library from Inbox to Backlog on the observability board.
Nov 30 2020, 4:22 PM · SRE, observability

Nov 23 2020

lmata moved T268091: Capture usage metrics for Kibana saved objects from Inbox to Backlog on the observability board.
Nov 23 2020, 4:21 PM · observability
lmata moved T268233: thanos u/i gives errors if left idle for a few hours from Inbox to In progress on the observability board.
Nov 23 2020, 4:21 PM · CAS-SSO, observability, SRE
lmata moved T268282: Kibana deprecation warnings on startup from Radar to Backlog on the observability board.
Nov 23 2020, 4:20 PM · observability
lmata moved T268282: Kibana deprecation warnings on startup from Inbox to Radar on the observability board.
Nov 23 2020, 4:20 PM · observability
lmata moved T268355: cronspam from prometheus-directory-size (on labstore1004) from Inbox to Radar on the observability board.
Nov 23 2020, 4:19 PM · cloud-services-team (Kanban), observability, SRE
lmata moved T268355: cronspam from prometheus-directory-size (on labstore1004) from Backlog to Radar on the SRE board.
Nov 23 2020, 4:19 PM · cloud-services-team (Kanban), observability, SRE
lmata added a project to T268369: how to deal with cumin alias alerts: SRE-tools.
Nov 23 2020, 4:18 PM · SRE-tools, observability, SRE

Nov 16 2020

lmata moved T267901: SMART data dump healthy metric can contain None from Inbox to Backlog on the observability board.
Nov 16 2020, 4:21 PM · observability
lmata moved T267664: Enhance smart_data_dump to support gathering metrics from both raid and standalone disks from Inbox to Backlog on the observability board.
Nov 16 2020, 4:21 PM · observability
lmata moved T267660: Add ssacli support to smart_data_dump from Inbox to Backlog on the observability board.
Nov 16 2020, 4:20 PM · observability
lmata moved T267650: LibreNMS supports more than one Alertmanager address from Inbox to Backlog on the observability board.
Nov 16 2020, 4:20 PM · Upstream, User-fgiunchedi, observability
lmata moved T267645: Wrong redirect when logging into grafana-rw from a grafana.w.o dashboard from Inbox to In progress on the observability board.
Nov 16 2020, 4:19 PM · User-fgiunchedi, observability, SRE
lmata moved T265435: codfw: Testing Out Sample PDUs from Inbox to Radar on the observability board.
Nov 16 2020, 4:18 PM · User-fgiunchedi, observability, ops-codfw, DC-Ops, SRE

Nov 9 2020

lmata moved T267019: Alert design guidelines for teams are produced from Inbox to In progress on the observability board.
Nov 9 2020, 4:18 PM · observability