Page MenuHomePhabricator

ABran-WMF (arnaudb)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Aug 29 2023, 8:30 AM (43 w, 16 h)
Availability
Available
LDAP User
Arnaudb
MediaWiki User
ABran-WMF [ Global Accounts ]

Recent Activity

Yesterday

ABran-WMF updated the task description for T368401: Switchover es6 master (es1037 -> es1038).
Tue, Jun 25, 1:24 PM · Patch-For-Review, DBA
ABran-WMF updated the task description for T365998: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad .
Tue, Jun 25, 9:23 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF updated the task description for T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad .
Tue, Jun 25, 9:21 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF updated the task description for T365996: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f1-eqiad .
Tue, Jun 25, 9:20 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF updated the task description for T365995: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad.
Tue, Jun 25, 9:17 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF closed T368020: Switchover es7 master (es1035 -> es1039) as Resolved.
Tue, Jun 25, 7:05 AM · DBA
ABran-WMF moved T368020: Switchover es7 master (es1035 -> es1039) from In progress to Done on the DBA board.
Tue, Jun 25, 7:04 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:41 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:38 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:37 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:34 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:33 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:27 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:25 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Tue, Jun 25, 6:05 AM · DBA

Mon, Jun 24

ABran-WMF triaged T367279: Migrate mysql icinga alerts to alert manager - seconds_behind_master + threads (replication/io) as Medium priority.
Mon, Jun 24, 2:49 PM · DBA
ABran-WMF triaged T367280: Migrate mysql icinga alerts to alert manager - memory pressure as Medium priority.
Mon, Jun 24, 2:49 PM · Patch-For-Review, DBA
ABran-WMF triaged T367282: Migrate mysql icinga alerts to alert manager - read only status as Medium priority.
Mon, Jun 24, 2:48 PM · DBA
ABran-WMF triaged T367283: Migrate mysql icinga alerts to alert manager - process monitoring as Medium priority.
Mon, Jun 24, 2:48 PM · DBA
ABran-WMF triaged T367284: Migrate mysql icinga alerts to alert manager - mariadb errors as Medium priority.
Mon, Jun 24, 2:48 PM · DBA
ABran-WMF triaged T368020: Switchover es7 master (es1035 -> es1039) as High priority.
Mon, Jun 24, 2:48 PM · DBA
ABran-WMF triaged T367281: Migrate mysql icinga alerts to alert manager - disk pressure as Medium priority.
Mon, Jun 24, 2:48 PM · Patch-For-Review, DBA
ABran-WMF triaged T367781: Drop deprecated abuse filter fields on wmf wikis as Medium priority.
Mon, Jun 24, 2:47 PM · Data-Engineering, Schema-change-in-production, DBA, Data Products
ABran-WMF updated the task description for T367280: Migrate mysql icinga alerts to alert manager - memory pressure.
Mon, Jun 24, 1:55 PM · Patch-For-Review, DBA

Fri, Jun 21

ABran-WMF changed the status of T368098: Dumps generation without prefetch cause disruption to the production environment from Open to In Progress.
Fri, Jun 21, 12:16 PM · Patch-For-Review, Data Products (Data Products Sprint 15), Data-Engineering, Dumps-Generation, SRE
ABran-WMF placed T368066: Prepare and check storage layer for btmwiki up for grabs.

ah indeed, mybad

Fri, Jun 21, 9:56 AM · Data-Services, DBA
ABran-WMF closed T368066: Prepare and check storage layer for btmwiki, a subtask of T368038: Create Wikipedia Mandailing, as Resolved.
Fri, Jun 21, 9:50 AM · Patch-For-Review, MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Wiki-Setup (Create)
ABran-WMF closed T368066: Prepare and check storage layer for btmwiki as Resolved.

private data has been sanitized
view database has been created with the proper accounting

Fri, Jun 21, 9:50 AM · Data-Services, DBA
ABran-WMF added a comment to T368067: Post-creation work for btmwiki.

private data has been trimmed, btmwiki_p database created with labsdb grants

Fri, Jun 21, 9:46 AM · Countervandalism-Network, Content-Transform-Team, Wiki-Setup
ABran-WMF added a comment to T368098: Dumps generation without prefetch cause disruption to the production environment.

I depooled the host by reflex, its currently repooling right now

Fri, Jun 21, 7:15 AM · Patch-For-Review, Data Products (Data Products Sprint 15), Data-Engineering, Dumps-Generation, SRE

Thu, Jun 20

ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

fyi https://gerrit.wikimedia.org/r/1048006 and https://gerrit.wikimedia.org/r/1047983 are bound together, related to pt-heartbeat monitoring

Thu, Jun 20, 2:00 PM · Patch-For-Review, DBA
ABran-WMF added a comment to T367854: db1165 network flapping issues.

server is repooling

Thu, Jun 20, 1:48 PM · SRE, ops-eqiad, DC-Ops, DBA
ABran-WMF updated the task description for T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.
Thu, Jun 20, 1:25 PM · Patch-For-Review, DBA
ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

Completely agreeing with you, we'll pay attention to avoid such regressions and approximations during the migration! The good news is that we can iterate and compose our alert thresholds as much as we need before deciding on this migration being "done".
The first patch I've sent:

Thu, Jun 20, 12:23 PM · Patch-For-Review, DBA
ABran-WMF updated the task description for T365986: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e5-eqiad.
Thu, Jun 20, 8:30 AM · Data-Platform-SRE, SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF updated the task description for T365987: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e6-eqiad.
Thu, Jun 20, 8:30 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF updated the task description for T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2.
Thu, Jun 20, 8:30 AM · Infrastructure-Foundations, netops, SRE
ABran-WMF added a comment to T365986: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e5-eqiad.

this task's scheduling is swapped with T365987

Thu, Jun 20, 8:30 AM · Data-Platform-SRE, SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF added a comment to T365987: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e6-eqiad.

this task's scheduling is swapped with T365986

Thu, Jun 20, 8:29 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Thu, Jun 20, 8:18 AM · DBA
ABran-WMF changed the status of T368020: Switchover es7 master (es1035 -> es1039) from Open to In Progress.
Thu, Jun 20, 8:16 AM · DBA
ABran-WMF added a comment to T368020: Switchover es7 master (es1035 -> es1039).

This will be run from cumin2002 as 1002 has to be be rebooted soon.

Thu, Jun 20, 8:15 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Thu, Jun 20, 8:14 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Thu, Jun 20, 8:08 AM · DBA
ABran-WMF updated the task description for T368020: Switchover es7 master (es1035 -> es1039).
Thu, Jun 20, 8:07 AM · DBA
ABran-WMF updated the task description for T367284: Migrate mysql icinga alerts to alert manager - mariadb errors.
Thu, Jun 20, 7:42 AM · DBA
ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

thanks @Ladsgroup @jcrespo for those considerations. This speaks volumes to help defining alerting thresholds. I was unaware of T253120 and T252952 in that context. I find it relevant to first test a more vanilla approach then.

Thu, Jun 20, 7:34 AM · Patch-For-Review, DBA

Wed, Jun 19

ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

Here is the implementation then

Wed, Jun 19, 1:14 PM · Patch-For-Review, DBA
ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

@jcrespo Thank you for the precision, I clearly see the point you were making! Indeed I was missing the metric aggregation part. I think the best angle will be then to enable this probe on the exporter and to also implement the query as we have it in check_mariadb.pl then. @Ladsgroup @Marostegui feel free to challenge this idea as well.

Wed, Jun 19, 12:29 PM · Patch-For-Review, DBA
ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

Correct me if I'm wrong but the heartbeat updates are coming from this script which is called by that service.
So, that would not change (or at least not in this iteration, nor that group of tasks 😄) at all and pt-heartbeat ←→ mediawiki relationship would stay 100% the same. What I'm aiming at here is the way we're alarming on those metrics, if you check mysqld-exporter's code, it does not update at all that ts as it's not his job. I think there was some confusion around my intentions on that comment, I hope I'm a bit clearer now

Wed, Jun 19, 9:49 AM · Patch-For-Review, DBA
ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

so there were 2 rows being written at the same time from both masters.

if you check that screengrab of the query afaict, we're seeing the same info added to the metric but from the config standpoint. That's why I was a bit perplex!

Wed, Jun 19, 9:24 AM · Patch-For-Review, DBA
ABran-WMF changed the status of T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding from Open to In Progress.

@Ladsgroup had a neat suggestion we just try the current exporter. I think it'll save a few customs (if not all) down the line: https://grafana.wikimedia.org/goto/-AH4B_8SR?orgId=1 here is pt-heartbeat as seen from the exporter point of view. I don't see a clear difference with the current icinga/perl implementation. We could maybe add the section label directly through the monitoring config script to keep the exporter's config as generic as possible.

Wed, Jun 19, 9:14 AM · Patch-For-Review, DBA
ABran-WMF changed the status of T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding, a subtask of T315866: Migrate mysql icinga alerts to alert manager, from Open to In Progress.
Wed, Jun 19, 9:12 AM · Patch-For-Review, DBA

Tue, Jun 18

ABran-WMF updated the task description for T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad.
Tue, Jun 18, 12:14 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF changed the status of T367854: db1165 network flapping issues from Open to In Progress.
Tue, Jun 18, 8:58 AM · SRE, ops-eqiad, DC-Ops, DBA
ABran-WMF created T367854: db1165 network flapping issues.
Tue, Jun 18, 8:57 AM · SRE, ops-eqiad, DC-Ops, DBA
ABran-WMF updated the task description for T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad.
Tue, Jun 18, 8:12 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF updated the task description for T365993: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad.
Tue, Jun 18, 8:09 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
ABran-WMF added a project to T157702: Followup for TLS MariaDB server roll-out: Data-Persistence-Backup.
Tue, Jun 18, 7:02 AM · Data-Persistence-Backup, observability, DBA
ABran-WMF added projects to T151583: Use tls for dump backup generation: Data-Persistence-Backup, database-backups.
Tue, Jun 18, 7:01 AM · database-backups, Data-Persistence-Backup, DBA
ABran-WMF moved T367833: Update grants for mailman from Triage to Refine on the DBA board.
Tue, Jun 18, 6:43 AM · Patch-For-Review, DBA, collaboration-services, SRE
ABran-WMF moved T367781: Drop deprecated abuse filter fields on wmf wikis from Triage to Ready on the DBA board.
Tue, Jun 18, 6:43 AM · Data-Engineering, Schema-change-in-production, DBA, Data Products

Mon, Jun 17

ABran-WMF assigned T367495: Apply schema change to add type column on GlobalRenameQueue table to the live databases to Ladsgroup.
Mon, Jun 17, 12:55 PM · Schema-change-in-production, DBA, MediaWiki-Platform-Team (Radar), Account-Vanishing, Data-Engineering, Data Products, MediaWiki-extensions-CentralAuth
ABran-WMF added a comment to P65105 Codfw media backup status.

taking that paste in note, thanks! :)

Mon, Jun 17, 12:35 PM · media-backups
ABran-WMF moved T367632: Drop ipblocks in production from Triage to Ready on the DBA board.
Mon, Jun 17, 6:13 AM · DBA

Fri, Jun 14

ABran-WMF updated the task description for T367496: MySQL_legacy Spicerack - fixes.
Fri, Jun 14, 1:48 PM · Patch-For-Review, DBA
ABran-WMF updated the task description for T367496: MySQL_legacy Spicerack - fixes.
Fri, Jun 14, 12:58 PM · Patch-For-Review, DBA
ABran-WMF updated the task description for T367496: MySQL_legacy Spicerack - fixes.
Fri, Jun 14, 12:33 PM · Patch-For-Review, DBA
ABran-WMF updated the task description for T367496: MySQL_legacy Spicerack - fixes.
Fri, Jun 14, 12:33 PM · Patch-For-Review, DBA
ABran-WMF triaged T367496: MySQL_legacy Spicerack - fixes as Medium priority.
Fri, Jun 14, 12:15 PM · Patch-For-Review, DBA
ABran-WMF updated the task description for T367496: MySQL_legacy Spicerack - fixes.
Fri, Jun 14, 11:43 AM · Patch-For-Review, DBA
ABran-WMF changed the status of T367496: MySQL_legacy Spicerack - fixes from Open to In Progress.
Fri, Jun 14, 9:56 AM · Patch-For-Review, DBA
ABran-WMF moved T367496: MySQL_legacy Spicerack - fixes from Triage to In progress on the DBA board.
Fri, Jun 14, 9:55 AM · Patch-For-Review, DBA
ABran-WMF created T367496: MySQL_legacy Spicerack - fixes.
Fri, Jun 14, 9:54 AM · Patch-For-Review, DBA

Thu, Jun 13

ABran-WMF moved T367059: Error connecting to {db_server} as user {db_user}: {error} from Triage to Refine on the DBA board.
Thu, Jun 13, 11:17 AM · MediaWiki-Platform-Team (Radar), DBA, Wikimedia-production-error
ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

will suggest a hierarchy and ask for validation @jcrespo @Marostegui @Ladsgroup → lets try to keep a good signal/noise ratio

Thu, Jun 13, 8:14 AM · Patch-For-Review, DBA
ABran-WMF added a comment to T367261: Rebuild recentchanges table everywhere.

this error popped today:

10:05:14 <+icinga-wm_> PROBLEM - MariaDB Replica SQL: s2 on db2125 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table recentchanges is corrupt: try to repair it on query. Default database: cswiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
Thu, Jun 13, 8:09 AM · DBA

Wed, Jun 12

ABran-WMF added a comment to T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.

Scaffolding started here: https://gitlab.wikimedia.org/repos/sre/wmf-mariadb-exporter

Wed, Jun 12, 2:51 PM · Patch-For-Review, DBA
ABran-WMF moved T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding from Triage to In progress on the DBA board.
Wed, Jun 12, 12:24 PM · Patch-For-Review, DBA
ABran-WMF moved T367279: Migrate mysql icinga alerts to alert manager - seconds_behind_master + threads (replication/io) from Triage to Ready on the DBA board.
Wed, Jun 12, 12:24 PM · DBA
ABran-WMF moved T367280: Migrate mysql icinga alerts to alert manager - memory pressure from Triage to Ready on the DBA board.
Wed, Jun 12, 12:24 PM · Patch-For-Review, DBA
ABran-WMF moved T367281: Migrate mysql icinga alerts to alert manager - disk pressure from Triage to Ready on the DBA board.
Wed, Jun 12, 12:24 PM · Patch-For-Review, DBA
ABran-WMF moved T367282: Migrate mysql icinga alerts to alert manager - read only status from Triage to Ready on the DBA board.
Wed, Jun 12, 12:24 PM · DBA
ABran-WMF moved T367283: Migrate mysql icinga alerts to alert manager - process monitoring from Triage to Ready on the DBA board.
Wed, Jun 12, 12:24 PM · DBA
ABran-WMF moved T367284: Migrate mysql icinga alerts to alert manager - mariadb errors from Triage to Ready on the DBA board.
Wed, Jun 12, 12:24 PM · DBA
ABran-WMF updated the task description for T315866: Migrate mysql icinga alerts to alert manager.
Wed, Jun 12, 10:51 AM · Patch-For-Review, DBA
ABran-WMF created T367284: Migrate mysql icinga alerts to alert manager - mariadb errors.
Wed, Jun 12, 10:50 AM · DBA
ABran-WMF created T367283: Migrate mysql icinga alerts to alert manager - process monitoring.
Wed, Jun 12, 10:49 AM · DBA
ABran-WMF created T367282: Migrate mysql icinga alerts to alert manager - read only status.
Wed, Jun 12, 10:49 AM · DBA
ABran-WMF created T367281: Migrate mysql icinga alerts to alert manager - disk pressure.
Wed, Jun 12, 10:48 AM · Patch-For-Review, DBA
ABran-WMF created T367280: Migrate mysql icinga alerts to alert manager - memory pressure.
Wed, Jun 12, 10:48 AM · Patch-For-Review, DBA
ABran-WMF created T367279: Migrate mysql icinga alerts to alert manager - seconds_behind_master + threads (replication/io).
Wed, Jun 12, 10:47 AM · DBA
ABran-WMF created T367278: Migrate mysql icinga alerts to alert manager - pt-heartbeat + scaffolding.
Wed, Jun 12, 10:46 AM · Patch-For-Review, DBA
ABran-WMF updated the task description for T315866: Migrate mysql icinga alerts to alert manager.
Wed, Jun 12, 10:46 AM · Patch-For-Review, DBA
ABran-WMF closed T367277: MariaDB monitoring transition out of icinga as Declined.

dupes T315866

Wed, Jun 12, 10:45 AM · observability, DBA
ABran-WMF moved T367277: MariaDB monitoring transition out of icinga from Triage to In progress on the DBA board.
Wed, Jun 12, 10:42 AM · observability, DBA
ABran-WMF created T367277: MariaDB monitoring transition out of icinga.
Wed, Jun 12, 10:42 AM · observability, DBA
ABran-WMF added a comment to T367162: db1240.s3 index issues.

my pleasure!
As for the time: indeed, but on my timezone, so please adjust to the proper timestamp if you want to take that as a reference. It's from a quick copy/paste from IRC to make sure this was not forgotten.

Wed, Jun 12, 8:45 AM · Patch-For-Review, Data-Persistence-Backup
ABran-WMF moved T366146: Create a sanitarium redaction cookbook from Ready to Blocked on the DBA board.
Wed, Jun 12, 8:17 AM · DBA
ABran-WMF moved T363665: Create a cookbook to restart mariadb on all sanitarium hosts from In progress to Blocked on the DBA board.
Wed, Jun 12, 8:17 AM · DBA