Page MenuHomePhabricator

ABran-WMF (arnaudb)
SRE

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Aug 29 2023, 8:30 AM (76 w, 4 d)
Availability
Available
IRC Nick
arnaudb
LDAP User
Arnaudb
MediaWiki User
ABran-WMF [ Global Accounts ]

Recent Activity

Tue, Feb 11

ABran-WMF updated the task description for T382159: Test Matrix.
Tue, Feb 11, 7:17 AM · ERC

Thu, Feb 6

Dzahn awarded T385777: moscovium and RT sunset a Like token.
Thu, Feb 6, 5:43 PM · Patch-For-Review, collaboration-services
ABran-WMF lowered the priority of T385777: moscovium and RT sunset from Medium to Low.
Thu, Feb 6, 9:52 AM · Patch-For-Review, collaboration-services
ABran-WMF updated the task description for T384595: Upgrade Collab hosts to Bookworm.
Thu, Feb 6, 9:52 AM · Patch-For-Review, collaboration-services
ABran-WMF added a comment to T384595: Upgrade Collab hosts to Bookworm.

We should have used a subtask for "decom RT", as expected there will be more patches than it might seem and this isn't an upgrade.

Good idea! T385777 has been created

Thu, Feb 6, 9:51 AM · Patch-For-Review, collaboration-services
ABran-WMF changed the status of T385777: moscovium and RT sunset, a subtask of T384595: Upgrade Collab hosts to Bookworm, from Open to In Progress.
Thu, Feb 6, 9:50 AM · Patch-For-Review, collaboration-services
ABran-WMF changed the status of T385777: moscovium and RT sunset from Open to In Progress.
Thu, Feb 6, 9:50 AM · Patch-For-Review, collaboration-services
ABran-WMF created T385777: moscovium and RT sunset.
Thu, Feb 6, 9:50 AM · Patch-For-Review, collaboration-services
ABran-WMF updated the task description for T385042: Generic NAT Handling Solution with nftables.
Thu, Feb 6, 8:08 AM · User-aborrero, collaboration-services

Tue, Feb 4

Jelto awarded T384242: Create a Grafana dashboard for Mailman a Like token.
Tue, Feb 4, 1:43 PM · collaboration-services
ABran-WMF closed T384242: Create a Grafana dashboard for Mailman as Resolved.
  • node_files_total was used to list all lists.* instances available on thanos. Metric was apparently absent from lists2001. I replaced it by up which properly fetches the complete list of instances.
  • host overview dashboard link has also been added
Tue, Feb 4, 8:53 AM · collaboration-services

Mon, Feb 3

ABran-WMF changed the status of T384242: Create a Grafana dashboard for Mailman from Open to In Progress.

I've added more metrics to the original dashboard:

Mon, Feb 3, 10:16 AM · collaboration-services
ABran-WMF added a comment to T385395: 503 error when edit large size pages on PHP 8.1.

503 errors are visible here: https://w.wiki/Cvfc

Mon, Feb 3, 9:37 AM · Upstream, serviceops, MediaWiki-Engineering, Traffic, Wikimedia-production-error
ABran-WMF updated subscribers of T385271: Excempt researcher from hyperkitty monthly export.
Mon, Feb 3, 9:05 AM · collaboration-services, SRE, Wikimedia-Mailing-lists
ABran-WMF moved T385395: 503 error when edit large size pages on PHP 8.1 from Untriaged to Feb 2025 on the Wikimedia-production-error board.
Mon, Feb 3, 9:02 AM · Upstream, serviceops, MediaWiki-Engineering, Traffic, Wikimedia-production-error

Thu, Jan 30

ABran-WMF added a comment to T385042: Generic NAT Handling Solution with nftables.
  • Software like docker, usually it may want to install its own rules, in particular for NAT. Is the idea to disable this rule generation by docker? and rely only on puppet-deployed config?

I think it mostly depends on the use case. For CI, a simple job that only downloads resources from the Internet would only need the masquerade rule. Gitlab runners are creating their own docker network (we can even ask for a per-job network if we want to get up to that isolation level on CI). It also would require a dedicated per-network masquerade rule. We can toggle the feature if needed. Managing containers that need to expose a port would be a bit tedious as the port would need to be mapped via puppet config on the container's IP:port. OTOH I'm not aware of docker containers outside of kubernetes and CI jobs. Another solution could be to wait for our runners to natively support nftables, to avoid the toil of manually working around compatibility while the feature is being developed.

  • currently, the base nftables firewall (abstracted via the firewall puppet profile) installs a base table, which is mostly for filtering traffic. Maybe we can consider having all NAT stuff on a different table, which should be easier to understand, and cleaner to integrate via puppet.
    • This is what we do with cloudgw anyway. We install the base nftables config from the firewall puppet profile AND and separate cloudgw table with the NAT specific stuff.

I reused part of the the way you mentioned preparing the configuration for CI runners in T370677, it duplicates a bit of code that we could merge with a more generic approach

  • we could even rename or refactor the current cloudgw implementation into a generic nftables-NAT profile, as the basis for further development anyway

I'd be happy to help on that effort!

Thu, Jan 30, 1:47 PM · User-aborrero, collaboration-services

Wed, Jan 29

ABran-WMF updated subscribers of T370677: migrate all sre-collab services to nftables.

Thanks for the review @Jelto ! After discussing with you and @MoritzMuehlenhoff on IRC; it appeared that T385042 was needed to take the most generic approach to the situation.
We will be aiming to avoid using iptables-nft and stick to managing the rules ourselves. It should not be an issue for any CI job that don't need to bind its port to the Internet.

Wed, Jan 29, 3:13 PM · Patch-For-Review, collaboration-services
ABran-WMF updated the task description for T385042: Generic NAT Handling Solution with nftables.
Wed, Jan 29, 2:46 PM · User-aborrero, collaboration-services
ABran-WMF updated the task description for T385042: Generic NAT Handling Solution with nftables.
Wed, Jan 29, 2:46 PM · User-aborrero, collaboration-services
ABran-WMF added a subtask for T370677: migrate all sre-collab services to nftables: T385042: Generic NAT Handling Solution with nftables.
Wed, Jan 29, 2:40 PM · Patch-For-Review, collaboration-services
ABran-WMF added a parent task for T385042: Generic NAT Handling Solution with nftables: T370677: migrate all sre-collab services to nftables.
Wed, Jan 29, 2:40 PM · User-aborrero, collaboration-services
ABran-WMF created T385042: Generic NAT Handling Solution with nftables.
Wed, Jan 29, 2:40 PM · User-aborrero, collaboration-services
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

@ABran-WMF Did we get any useful logs?

Wed, Jan 29, 7:35 AM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT

Mon, Jan 27

ABran-WMF moved T384242: Create a Grafana dashboard for Mailman from Incoming to Work in Progress on the collaboration-services board.

Panel was returning NaN because mailman_smtp_duration_seconds and mailman_smtp_total were both always at 0 and initial query was dividing by 0, clamp_min in the query limited the minimum to 1.

Mon, Jan 27, 1:47 PM · collaboration-services
ABran-WMF claimed T384242: Create a Grafana dashboard for Mailman.

I've started to update Mailman 3

Mon, Jan 27, 1:18 PM · collaboration-services

Fri, Jan 24

ABran-WMF added a watcher for collaboration-services: ABran-WMF.
Fri, Jan 24, 11:36 AM
ABran-WMF removed a member for DBA: ABran-WMF.
Fri, Jan 24, 11:35 AM
ABran-WMF removed a member for Data-Persistence-Automations: ABran-WMF.
Fri, Jan 24, 11:34 AM
ABran-WMF added a member for collaboration-services: ABran-WMF.
Fri, Jan 24, 11:33 AM
ABran-WMF closed T384676: SystemdUnitFailed as Resolved.
Fri, Jan 24, 10:23 AM · collaboration-services

Jan 16 2025

ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

after merging the CR, running puppet and trying again to download the file, this was not enough to fix. I've asked Traffic about Varnish and they told me it was probably not our culprit.

Jan 16 2025, 4:19 PM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

Also, I deployed MinT successfully on 07 Nov 24, and it didn't appear to log any failure like this time.

it seems that the issue could come from Varnish terminating the transfer after a timeout is reached, maybe the transfer took less time in that previous deployment? question has been asked to Traffic on IRC

Jan 16 2025, 3:14 PM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

I've temporarily configured the vhost to be more "download friendly":

EnableSendfile Off
EnableMMAP Off
Timeout 600
KeepAliveTimeout 15
LimitRequestBody 0

Also tried using only LimitRequestBody 0 which seemed to be the issue from my previous tests, with no more success though.
but had no success retrieving the file, debug logging told me nothing more than the http/200 that we previously identified. Will continue the dig

Jan 16 2025, 12:30 PM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

I think this is due to the file size;I've tested to generate an empty 2GB file that has the same issue around the same offset (~1.80GB). This file is dated from Mar 6 2023, so I'm guessing that there is something that has changed on the server's configuration since then. I have found nothing yet, I'll keep on digging

Jan 16 2025, 12:06 PM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

Oddly, compressing the file fixes the issue, a duplication of the file reproduces the issue consistently:

$ wget --verbose --output-document /dev/null https://people.wikimedia.org/~santhosh/nllb/nllb200-600M/model.bin
HTTP response 200  [https://people.wikimedia.org/~santhosh/nllb/nllb200-600M/model.bin]
/dev/null             79% [===============================================================================================================================================>                                      ]    1.81G   28.22MB/s
                          [Files: 1  Bytes: 1.81G [27.90MB/s] Redirects: 0  Todo: 0  Errors: 0                                                                                                                   ]
Jan 16 2025, 10:22 AM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

multipart download retrieves the file properly: P72108

Jan 16 2025, 9:58 AM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

I tried to downgrade to http/1.1 and added a bit more verbosity: P72105

Jan 16 2025, 9:51 AM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

curl gives a bit more information about the download termination, apache still sends http/200:

curl: (92) HTTP/2 stream 1 was not closed cleanly: CANCEL (err 8)
HTTP Response Code: 200
Jan 16 2025, 9:31 AM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT
ABran-WMF added a comment to T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet.

the download fails consistently indeed, and is logged as a 200:
2025-01-16T09:22:49 65001309 -/200 2464511302 GET http://people.wikimedia.org/~santhosh/nllb/nllb200-600M/model.bin - application/octet-stream - Wget/2.2.0 - - - - d66cd6e8-46cd-4e23-910f-5d8477edefa0

Jan 16 2025, 9:25 AM · LPL Essential (LPL Essential 2025 Feb-Mar), Patch-For-Review, Traffic, collaboration-services, MinT

Jan 13 2025

ABran-WMF placed T374352: Create a replication source candidate swap cookbook up for grabs.
Jan 13 2025, 3:45 PM · Patch-For-Review, Data-Persistence-Automations, DBA
ABran-WMF placed T369045: Migrate mysql icinga alerts to alert manager - haproxy exporter up for grabs.
Jan 13 2025, 3:44 PM · DBA
ABran-WMF placed T367284: Migrate mysql icinga alerts to alert manager - mariadb errors up for grabs.
Jan 13 2025, 3:44 PM · DBA
ABran-WMF placed T367282: Migrate mysql icinga alerts to alert manager - read only status up for grabs.
Jan 13 2025, 3:44 PM · Patch-For-Review, DBA
ABran-WMF placed T366776: Automation helper for parser cache switchovers up for grabs.
Jan 13 2025, 3:44 PM · DBA
ABran-WMF placed T366146: Create a sanitarium redaction cookbook up for grabs.
Jan 13 2025, 3:44 PM · DBA, Patch-For-Review, Data-Persistence-Automations
ABran-WMF placed T363665: Create a cookbook to restart mariadb on all sanitarium hosts up for grabs.
Jan 13 2025, 3:44 PM · Data-Persistence-Automations, Patch-For-Review, DBA
ABran-WMF placed T356053: Test setup for MariaDB up for grabs.
Jan 13 2025, 3:44 PM · Data-Persistence-Automations, DBA
ABran-WMF placed T315866: Migrate mysql icinga alerts to alert manager up for grabs.
Jan 13 2025, 3:44 PM · Patch-For-Review, DBA
ABran-WMF placed T239814: Automate DB upgrades up for grabs.
Jan 13 2025, 3:44 PM · Data-Persistence-Automations, User-Ladsgroup, DBA
ABran-WMF placed T379887: Brief hardware history on server metadata up for grabs.
Jan 13 2025, 3:43 PM · Data-Persistence-Automations
ABran-WMF placed T375589: mariadb monitoring: buffer pool usage up for grabs.
Jan 13 2025, 3:43 PM · Patch-For-Review, Data-Persistence-Automations, DBA
ABran-WMF placed T374551: mariadb - monitoring - predict linear on disk/ram usage up for grabs.
Jan 13 2025, 3:43 PM · Data-Persistence-Automations, DBA
ABran-WMF placed T367283: Migrate mysql icinga alerts to alert manager - process monitoring up for grabs.
Jan 13 2025, 3:43 PM · Patch-For-Review, DBA
ABran-WMF placed T369252: monitoring - MariaDB log parsing and log alerting up for grabs.
Jan 13 2025, 3:39 PM · DBA
ABran-WMF placed T376596: spicerack mysql_legacy: support fetch metrics for instance up for grabs.
Jan 13 2025, 3:39 PM · Patch-For-Review, Infrastructure-Foundations, SRE-tools, Spicerack, Data-Persistence-Automations, DBA
ABran-WMF placed T375263: mariadb replication source health monitoring up for grabs.
Jan 13 2025, 3:39 PM · Data-Persistence-Automations, DBA
ABran-WMF placed T374191: upgrade clone cookbook up for grabs.
Jan 13 2025, 3:39 PM · Patch-For-Review, DBA, Data-Persistence-Automations

Jan 8 2025

ABran-WMF added a comment to T370677: migrate all sre-collab services to nftables.

I stumbled upon a conversion issue on this part of the template.

Jan 8 2025, 3:54 PM · Patch-For-Review, collaboration-services

Dec 16 2024

ABran-WMF moved T381523: Fix AbuseFilter database schema drifts in production from Triage to Ready on the DBA board.
Dec 16 2024, 8:08 AM · AbuseFilter, DBA
ABran-WMF moved T381424: Create DB schema for storing topics of event from Triage to Pending comment on the DBA board.
Dec 16 2024, 8:08 AM · Data-Persistence (work done), MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), Campaigns-Product-Team (Campaign-Tools-Current-Sprint), Campaign-Registration, CampaignEvents

Dec 4 2024

ABran-WMF moved T378143: Q2:rack/setup/install es104[1-6] from In progress to Ready on the DBA board.
Dec 4 2024, 7:26 AM · Data-Persistence-Automations, DBA, SRE, Data-Persistence, ops-eqiad, DC-Ops
ABran-WMF moved T380846: Update $wgNukeMaxAge to 90 days in Nuke from Triage to In progress on the DBA board.
Dec 4 2024, 7:26 AM · User-notice-archive, Data-Persistence (work done), Performance-Team, Moderator-Tools-Team (Kanban), MediaWiki-extensions-Nuke
ABran-WMF moved T381276: replication breakage is not not paging anymore from Triage to In progress on the DBA board.
Dec 4 2024, 7:26 AM · Icinga, observability, Patch-For-Review, DBA

Dec 3 2024

ABran-WMF added a comment to T381086: Prepare and check storage layer for arbcom_zhwiki.

rebased and updated with the latest spicerack release, will run it on the replicas first

Dec 3 2024, 12:54 PM · Patch-For-Review, Chinese-Sites, Data-Services, cloud-services-team, DBA
Marostegui awarded T380194: [conftool] improve dbctl diff API a Party Time token.
Dec 3 2024, 8:51 AM · Data-Persistence-Automations

Dec 2 2024

ABran-WMF added a comment to T378143: Q2:rack/setup/install es104[1-6].

Thanks @Jclark-ctr! puppet patch pas been merged

Dec 2 2024, 3:47 PM · Data-Persistence-Automations, DBA, SRE, Data-Persistence, ops-eqiad, DC-Ops
ABran-WMF renamed Data-Persistence-Automations from Data-Persistence-SRE to Data-Persistence-Automations.
Dec 2 2024, 3:35 PM
ABran-WMF created T381276: replication breakage is not not paging anymore.
Dec 2 2024, 2:44 PM · Icinga, observability, Patch-For-Review, DBA
ABran-WMF changed the status of T374352: Create a replication source candidate swap cookbook from Stalled to In Progress.
Dec 2 2024, 12:34 PM · Patch-For-Review, Data-Persistence-Automations, DBA
ABran-WMF awarded T374352: Create a replication source candidate swap cookbook a Party Time token.
Dec 2 2024, 12:28 PM · Patch-For-Review, Data-Persistence-Automations, DBA
ABran-WMF added a comment to T363665: Create a cookbook to restart mariadb on all sanitarium hosts.

This will help and test T381086#10366983

Dec 2 2024, 12:16 PM · Data-Persistence-Automations, Patch-For-Review, DBA
ABran-WMF changed the status of T378068: pc1017 crashed from Open to In Progress.
Dec 2 2024, 10:14 AM · Data-Persistence-Automations, DBA
ABran-WMF moved T381197: Create views for SecurePoll db tables in Toolforge replicas from Triage to Blocked external/Not db team on the DBA board.
Dec 2 2024, 8:02 AM · Data-Engineering, Privacy Engineering, Data-Services, DBA

Nov 29 2024

ABran-WMF set IRC Nick to arnaudb on ABran-WMF.
Nov 29 2024, 1:22 PM
ABran-WMF added a comment to T376370: mariadb: create a synthetic monitoring indicator for dc switchover readiness.

This is a very good idea. There are some of them that can be quick and easy to achieve in cookbooks, examples:

Thanks! I already did some scripting during the dc switch: T375186#10160612 and T375186#10167177 wdyt of the gathered data? the way its shown?

  • compare weights and groups (API/vslow/dumps/...) and pooling for core hosts
  • compare weights and groups (API/vslow/dumps/...) and pooling for es hosts

This stems from my previous question, comparison should be done, I think, across sections and between comparable hosts (i.e. vslow  → vslow, apiapi, etc.) but weight distribution must also be taken in account. This is where I think we should spend a bit of thinking time, to be able to figure out a good angle to compare hosts properly. We could adopt a pattern that would state that an instance with api has to have maximum weight of xxx, vslow reduces that weight to yyy, etc. This approach would introduce a bias, ignoring hardware differences between the datacenters. I think this should be OK as long as the hardware between the DCs is not too different. We could also calculate the total weight of an instance, using a "sum of weights", this would introduce a bias towards the api etc., imho.

  • monitoring notifications enabled on all relevant hosts
  • validate event/query killer and pt-heartbeat is properly setup everywhere

Having those in cookbooks will already save quite a bunch of time.

Those would be monitoring probes in my mind, aggregated under an umbrella indicator, I've already prepared T375589 in that direction. This would also make that information available for cookbooks, using the thanos module of spicerack, querying for the alerts aggregated under that umbrella.

Nov 29 2024, 10:54 AM · Data-Persistence-Automations, DBA

Nov 28 2024

ABran-WMF claimed T381079: Prepare and check storage layer for idwikivoyage.
Nov 28 2024, 2:49 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), cloud-services-team (FY2024/2025-Q1-Q2), Data-Services, DBA
ABran-WMF claimed T381086: Prepare and check storage layer for arbcom_zhwiki.

please let me know when I can act on this wiki as it is supposed to be private, I would like to handle quickly after it is ready :-)

Nov 28 2024, 2:48 PM · Patch-For-Review, Chinese-Sites, Data-Services, cloud-services-team, DBA
ABran-WMF added a comment to T379887: Brief hardware history on server metadata.

My concerns were different from the ones listed on this task. As expressed T377276#10299621 we rarely interact with dbctl with either get or edit, hence those notes are very likely not to be seen. That's why I suggested keeping it in puppet for now.

metadata attachement to the server would also help and tag hosts that are currently running a schema update, for instance to allow for automated exclusion of sections, automated alarming notifications (i.e. host db9999 is running a schema for too long, send a warning alert etc.). Those are 2 very basic examples, I think we can only benefit from "more programmatically accessible information".

Nov 28 2024, 11:00 AM · Data-Persistence-Automations

Nov 27 2024

ABran-WMF moved T375589: mariadb monitoring: buffer pool usage from Ready to In progress on the DBA board.
Nov 27 2024, 10:48 AM · Patch-For-Review, Data-Persistence-Automations, DBA
ABran-WMF moved T375589: mariadb monitoring: buffer pool usage from Todo to External dep/In review on the Data-Persistence-Automations board.
Nov 27 2024, 10:48 AM · Patch-For-Review, Data-Persistence-Automations, DBA
ABran-WMF added a parent task for T375589: mariadb monitoring: buffer pool usage: T376370: mariadb: create a synthetic monitoring indicator for dc switchover readiness.
Nov 27 2024, 10:47 AM · Patch-For-Review, Data-Persistence-Automations, DBA
ABran-WMF added a subtask for T376370: mariadb: create a synthetic monitoring indicator for dc switchover readiness: T375589: mariadb monitoring: buffer pool usage.
Nov 27 2024, 10:47 AM · Data-Persistence-Automations, DBA

Nov 26 2024

ABran-WMF created P71166 pc1013.log.
Nov 26 2024, 9:08 AM

Nov 25 2024

ABran-WMF awarded T376892: Expand media backup storage available space to 960 TB per datacenter a Party Time token.
Nov 25 2024, 2:33 PM · Patch-For-Review, media-backups, Data-Persistence-Backup, SRE
ABran-WMF moved T378068: pc1017 crashed from Todo to External dep/In review on the Data-Persistence-Automations board.
Nov 25 2024, 1:14 PM · Data-Persistence-Automations, DBA
ABran-WMF moved T367380: Productionize dbproxy200[5-8] from Blocked to In progress on the DBA board.
Nov 25 2024, 10:37 AM · DBA
ABran-WMF closed T367781: Drop deprecated abuse filter fields on wmf wikis, a subtask of T188180: Read from and write to `actor` table in AbuseFilter, as Resolved.
Nov 25 2024, 8:47 AM · Patch-Needs-Improvement, MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), AbuseFilter (Overhaul-2020), MW-1.33-notes (1.33.0-wmf.17; 2019-02-12), Schema-change
ABran-WMF closed T367781: Drop deprecated abuse filter fields on wmf wikis as Resolved.

s4 is now done:

Result: {"already done in all dbs": ["db1150:3314", "db1190", "db1199", "db1221", "db1238", "db1241", "db1242", "db1243", "db1244", "db1245:3314", "db1247", "db1248", "db1249", "dbstore1007:3314", "db2136", "db2139:3314", "db2140", "db2147", "db2155", "db2172", "db2199:3314", "db2206", "db2210", "db2219", "db2236", "db2237", "db2240"]}
Result: {"already done in all dbs": ["db1160", "db2179"]}
Nov 25 2024, 8:47 AM · Data-Engineering (Q2 2024 October 1st - December 31th), Schema-change-in-production, DBA

Nov 22 2024

ABran-WMF added a comment to T363665: Create a cookbook to restart mariadb on all sanitarium hosts.

ack, good to know thanks!

Nov 22 2024, 2:32 PM · Data-Persistence-Automations, Patch-For-Review, DBA
ABran-WMF moved T363665: Create a cookbook to restart mariadb on all sanitarium hosts from Pending comment to In progress on the DBA board.

cookbooks have been merged, TODO: test & debug

Nov 22 2024, 11:01 AM · Data-Persistence-Automations, Patch-For-Review, DBA
ABran-WMF moved T363665: Create a cookbook to restart mariadb on all sanitarium hosts from External dep/In review to Doing on the Data-Persistence-Automations board.
Nov 22 2024, 10:59 AM · Data-Persistence-Automations, Patch-For-Review, DBA
ABran-WMF triaged T380449: Optimize two echo tables in x1 as Medium priority.
Nov 22 2024, 7:42 AM · DBA

Nov 21 2024

ABran-WMF added a comment to T375263: mariadb replication source health monitoring.

After a bit of thinking around how this script should be done:

Nov 21 2024, 3:26 PM · Data-Persistence-Automations, DBA
ABran-WMF moved T376596: spicerack mysql_legacy: support fetch metrics for instance from Todo to External dep/In review on the Data-Persistence-Automations board.
Nov 21 2024, 2:16 PM · Patch-For-Review, Infrastructure-Foundations, SRE-tools, Spicerack, Data-Persistence-Automations, DBA
ABran-WMF moved T376596: spicerack mysql_legacy: support fetch metrics for instance from Ready to In progress on the DBA board.
Nov 21 2024, 2:16 PM · Patch-For-Review, Infrastructure-Foundations, SRE-tools, Spicerack, Data-Persistence-Automations, DBA
ABran-WMF updated the task description for T367781: Drop deprecated abuse filter fields on wmf wikis.
Nov 21 2024, 9:54 AM · Data-Engineering (Q2 2024 October 1st - December 31th), Schema-change-in-production, DBA
ABran-WMF added a comment to T367781: Drop deprecated abuse filter fields on wmf wikis.

@bvibber: logfile is in your home directoryon deploy2002: /home/bvibber/mw-script.codfw.6497ohz1.log
job has been deleted:

kubectl --kubeconfig=admin-codfw.config  -n mw-script delete job mw-script.codfw.6497ohz1
job.batch "mw-script.codfw.6497ohz1" deleted
Nov 21 2024, 9:43 AM · Data-Engineering (Q2 2024 October 1st - December 31th), Schema-change-in-production, DBA
ABran-WMF added a comment to T367781: Drop deprecated abuse filter fields on wmf wikis.

Will do, thanks for the info @bvibber !

Nov 21 2024, 8:01 AM · Data-Engineering (Q2 2024 October 1st - December 31th), Schema-change-in-production, DBA

Nov 20 2024

ABran-WMF added a comment to T373579: Productionize db22[21-40].

I'm uncertain about db2231 so I commented it out in P71104, @Marostegui is misc the right section?

Nov 20 2024, 1:23 PM · DBA
ABran-WMF triaged T380349: zarcillo: Migrate orchestrator's DB on a ganeti VM as Medium priority.
Nov 20 2024, 10:44 AM · Data-Persistence-Automations, DBA