Page MenuHomePhabricator

MatthewVernon (Matthew Vernon)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Aug 2 2021, 1:52 PM (8 w, 7 h)
Availability
Available
LDAP User
MVernon
MediaWiki User
MVernon (WMF) [ Global Accounts ]

Recent Activity

Thu, Sep 23

MatthewVernon added a comment to T257056: Add alert for prometheus-mysql-exporter failing to scrape mysql.

AFAICT from https://wikitech.wikimedia.org/wiki/Alertmanager alerts are (now?) meant to go into operations/alerts rather than into puppet directly. So I think we want a data-persistence routing for alerts, and then a suitable alert defined in operations/alerts (I've checked and mysql_exporter_last_scrape_error is in prometheus)

Thu, Sep 23, 2:20 PM · Patch-For-Review, User-Kormat, DBA

Tue, Sep 21

MatthewVernon added a comment to T276961: Support Openstack Swift APIs via the radosgw.

[I was pointed at this task from IRC, I'm new in data persistence team, used to do quite a bit of Ceph at the Sanger]

Tue, Sep 21, 10:24 AM · cloud-services-team (Kanban), Data-Services, Cloud-VPS, User-Marostegui

Wed, Sep 15

MatthewVernon closed T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter as Resolved.

Marking this as resolved - as we deploy 10.4.21-2 everywhere, the fix will get rolled out.

Wed, Sep 15, 1:16 PM · Patch-For-Review, DBA
MatthewVernon added a comment to T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter.

I've also checked the stop/start/restart behaviour, which is as expected.
Also, that on reboot PME isn't started, but when you start mariadb, it does then get started for you.

Wed, Sep 15, 12:29 PM · Patch-For-Review, DBA

Mon, Sep 13

MatthewVernon added a comment to T290881: Spontaneous reboot of ms-be2045.

Hi @Papaul this system seems to have had a hardware fault(s), and is (just) still within its warranty, could you get the hardware checked out, please? Thanks :)

Mon, Sep 13, 3:50 PM · SRE, ops-codfw, SRE-swift-storage
MatthewVernon assigned T290881: Spontaneous reboot of ms-be2045 to Papaul.
Mon, Sep 13, 3:49 PM · SRE, ops-codfw, SRE-swift-storage
MatthewVernon added a comment to T290881: Spontaneous reboot of ms-be2045.

On reboot, the disks came back, but many of the filesystems are unhappy:
mvernon@ms-be2045:~$ sudo dmesg | grep 'Shutting down filesystem'
[ 18.244602] XFS (sda3): Corruption of in-memory data detected. Shutting down filesystem
[ 18.724649] XFS (sdf1): Corruption of in-memory data detected. Shutting down filesystem
[ 19.448076] XFS (sdg1): Corruption of in-memory data detected. Shutting down filesystem
[ 20.610420] XFS (sdj1): I/O Error Detected. Shutting down filesystem
[ 20.745769] XFS (sdn1): I/O Error Detected. Shutting down filesystem
[ 20.938081] XFS (sdi1): I/O Error Detected. Shutting down filesystem
[ 23.719222] XFS (sdh1): I/O Error Detected. Shutting down filesystem
[ 24.802161] XFS (sde1): I/O Error Detected. Shutting down filesystem
[ 30.091057] XFS (sdm1): I/O Error Detected. Shutting down filesystem
[ 31.761276] XFS (sdc1): I/O Error Detected. Shutting down filesystem

Mon, Sep 13, 3:46 PM · SRE, ops-codfw, SRE-swift-storage
MatthewVernon updated subscribers of T290881: Spontaneous reboot of ms-be2045.
Mon, Sep 13, 3:31 PM · SRE, ops-codfw, SRE-swift-storage
MatthewVernon created T290881: Spontaneous reboot of ms-be2045.
Mon, Sep 13, 3:19 PM · SRE, ops-codfw, SRE-swift-storage

Tue, Sep 7

MatthewVernon added a comment to T289117: decommission pc2010.codfw.wmnet.

This host is ready for DC-Ops to decommission

Tue, Sep 7, 2:07 PM · SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon reassigned T289117: decommission pc2010.codfw.wmnet from MatthewVernon to Papaul.
Tue, Sep 7, 2:07 PM · SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon added a comment to T289116: decommission pc2009.codfw.wmnet.

This host is ready for DC-Ops to decommission

Tue, Sep 7, 1:24 PM · SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon reassigned T289116: decommission pc2009.codfw.wmnet from MatthewVernon to Papaul.
Tue, Sep 7, 1:24 PM · SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon added a comment to T289115: decommission pc2008.codfw.wmnet.

This host is ready for DC-Ops to decommission

Tue, Sep 7, 10:55 AM · Patch-For-Review, SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon reassigned T289115: decommission pc2008.codfw.wmnet from MatthewVernon to Papaul.
Tue, Sep 7, 10:55 AM · Patch-For-Review, SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon claimed T289117: decommission pc2010.codfw.wmnet.
Tue, Sep 7, 10:39 AM · SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon claimed T289116: decommission pc2009.codfw.wmnet.
Tue, Sep 7, 10:39 AM · SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon added a comment to T289122: decommission pc1010.eqiad.wmnet.

This host is ready for DC-Ops to decommission

Tue, Sep 7, 10:29 AM · Patch-For-Review, SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon reassigned T289122: decommission pc1010.eqiad.wmnet from MatthewVernon to wiki_willy.
Tue, Sep 7, 10:28 AM · Patch-For-Review, SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon claimed T289115: decommission pc2008.codfw.wmnet.
Tue, Sep 7, 10:24 AM · Patch-For-Review, SRE, ops-codfw, DC-Ops, decommission-hardware
MatthewVernon claimed T289122: decommission pc1010.eqiad.wmnet.
Tue, Sep 7, 9:55 AM · Patch-For-Review, SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon added a comment to T289120: decommission pc1009.eqiad.wmnet.

This host is ready for DC-Ops to decommission

Tue, Sep 7, 9:48 AM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon assigned T289120: decommission pc1009.eqiad.wmnet to wiki_willy.
Tue, Sep 7, 9:48 AM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon added a comment to T289119: decommission pc1008.eqiad.wmnet.

This host is ready for DC-Ops to decommission

Tue, Sep 7, 8:54 AM · Patch-For-Review, SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon reassigned T289119: decommission pc1008.eqiad.wmnet from MatthewVernon to wiki_willy.
Tue, Sep 7, 8:53 AM · Patch-For-Review, SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon added a comment to T289118: decommission pc1007.eqiad.wmnet..

This host is ready for DC-Ops to decommission

Tue, Sep 7, 8:46 AM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon claimed T289119: decommission pc1008.eqiad.wmnet.
Tue, Sep 7, 8:18 AM · Patch-For-Review, SRE, ops-eqiad, DC-Ops, decommission-hardware

Mon, Sep 6

MatthewVernon edited projects for T289118: decommission pc1007.eqiad.wmnet., added: DC-Ops, ops-eqiad; removed DBA.
Mon, Sep 6, 3:04 PM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon reassigned T289118: decommission pc1007.eqiad.wmnet. from MatthewVernon to wiki_willy.
Mon, Sep 6, 3:03 PM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon created P17227 (An Untitled Masterwork).
Mon, Sep 6, 2:55 PM
MatthewVernon claimed T289118: decommission pc1007.eqiad.wmnet..
Mon, Sep 6, 2:01 PM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon updated the task description for T289118: decommission pc1007.eqiad.wmnet..
Mon, Sep 6, 1:48 PM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MatthewVernon closed T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter as Resolved.
Mon, Sep 6, 1:38 PM · Patch-For-Review, DBA
MatthewVernon added a comment to T257056: Add alert for prometheus-mysql-exporter failing to scrape mysql.

I think this could be achieved by setting a grafana alert on the MySQL Aggregated dashboard? But I don't really know much about how alerts are set up at WMF, or when one should use a Grafana alerts vs an icinga one, or...

Mon, Sep 6, 1:34 PM · Patch-For-Review, User-Kormat, DBA
MatthewVernon updated subscribers of T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter.

I think we concluded that the mariadb.target idea isn't all that useful (since mostly folk don't stop and start >1 instance at once I think @Kormat said).

Mon, Sep 6, 1:28 PM · Patch-For-Review, DBA
MatthewVernon added a comment to T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter.

I know it is not totally related to this task, but maybe this can be also looked at as part of this? T257056: Add alert for prometheus-mysql-exporter failing to scrape mysql

Mon, Sep 6, 1:25 PM · Patch-For-Review, DBA
MatthewVernon updated the task description for T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter.
Mon, Sep 6, 1:15 PM · Patch-For-Review, DBA

Aug 24 2021

MatthewVernon edited P17066 In which I break puppet-lint.
Aug 24 2021, 10:50 AM
MatthewVernon edited P17066 In which I break puppet-lint.
Aug 24 2021, 10:49 AM
MatthewVernon created P17066 In which I break puppet-lint.
Aug 24 2021, 10:48 AM

Aug 23 2021

MatthewVernon updated the task description for T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter.
Aug 23 2021, 12:50 PM · Patch-For-Review, DBA
MatthewVernon added a comment to T252761: Research performance changes on prometheus-mysqld-exporter after buster/mariadb upgrade.

I've split the systemd bits into a separate task - T289488

Aug 23 2021, 12:38 PM · DBA
MatthewVernon created T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter.
Aug 23 2021, 12:30 PM · Patch-For-Review, DBA
MatthewVernon added a comment to P17060 (An Untitled Masterwork).
-    ensure_packages('prometheus-mysqld-exporter', {'notify' => "Exec['systemctl try-restart prometheus-mysqld-exporter']"})
+    ensure_packages('prometheus-mysqld-exporter', {'notify' => Exec['systemctl try-restart prometheus-mysqld-exporter']})
Aug 23 2021, 10:11 AM
MatthewVernon created P17060 (An Untitled Masterwork).
Aug 23 2021, 9:51 AM

Aug 17 2021

MatthewVernon updated the task description for T288244: Upgrade s7 to Debian Buster and MariaDB 10.4.
Aug 17 2021, 1:30 PM · Patch-For-Review, DBA

Aug 9 2021

MatthewVernon added a comment to T252761: Research performance changes on prometheus-mysqld-exporter after buster/mariadb upgrade.

Another thought here - if we want the exporter to be automatically restarted if mysqld is restarted, then we should be able to get systemd to do this for us.

Aug 9 2021, 4:27 PM · DBA

Aug 6 2021

MatthewVernon created T288350: Add Matthew Vernon (@mcv21) to Wikimedia github.
Aug 6 2021, 2:23 PM · Wikimedia-GitHub

Aug 4 2021

MatthewVernon created T288122: New VictorOps user request.
Aug 4 2021, 4:11 PM · SRE Observability (FY2021/2022-Q1)
MatthewVernon created T288038: Add Matthew Vernon to security@wikimedia.org.
Aug 4 2021, 9:10 AM · SecTeam-Processed, Security-Team