Page MenuHomePhabricator

FCeratto-WMF (Federico Ceratto)
Site Reliability Engineer

Projects (6)

Today

  • No visible events.

Tomorrow

  • No visible events.

Thursday

  • No visible events.

User Details

User Since
Jan 7 2025, 6:49 PM (48 w, 2 h)
Availability
Available
IRC Nick
federico3
LDAP User
Federico Ceratto
MediaWiki User
FCeratto-WMF [ Global Accounts ]

Recent Activity

Today

FCeratto-WMF closed T411805: Clone and restore db1229 as Resolved.
Tue, Dec 9, 3:32 PM · DBA
FCeratto-WMF moved T411573: sre.mysql.parsercache: make it work with msX sections from In progress to Blocked on the DBA board.
Tue, Dec 9, 1:50 PM · Patch-For-Review, DBA
FCeratto-WMF moved T383674: Abstract away different database depooling mechanisms into a cookbook from In progress to Blocked on the DBA board.
Tue, Dec 9, 1:49 PM · Patch-For-Review, DBA

Fri, Dec 5

FCeratto-WMF moved T411573: sre.mysql.parsercache: make it work with msX sections from Ready to In progress on the DBA board.
Fri, Dec 5, 10:46 AM · Patch-For-Review, DBA
FCeratto-WMF moved T411805: Clone and restore db1229 from Ready to In progress on the DBA board.
Fri, Dec 5, 10:46 AM · DBA
FCeratto-WMF claimed T383674: Abstract away different database depooling mechanisms into a cookbook.
Fri, Dec 5, 9:48 AM · Patch-For-Review, DBA
FCeratto-WMF added a comment to T411805: Clone and restore db1229.

Testing https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1215116/1
and https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1214083/3
using test-cookbook -c 1215116 sre.mysql.clone --source db1233 --target db1229 --nopool -t T411805

Fri, Dec 5, 9:19 AM · DBA

Thu, Dec 4

FCeratto-WMF moved T411805: Clone and restore db1229 from Triage to Ready on the DBA board.
Thu, Dec 4, 5:21 PM · DBA
FCeratto-WMF created T411805: Clone and restore db1229.
Thu, Dec 4, 5:21 PM · DBA
FCeratto-WMF closed T399580: Automate read-only es* section OS updates, a subtask of T196366: Implement (or refactor) a script to move slaves when the master is not available, as Resolved.
Thu, Dec 4, 11:40 AM · Data-Persistence-Automations, Patch-For-Review, SRE-Sprint-Week-Sustainability-March2023, User-Ladsgroup, Sustainability (Incident Followup), DBA
FCeratto-WMF closed T399580: Automate read-only es* section OS updates as Resolved.

Added documentation as a new heading: https://wikitech.wikimedia.org/wiki/MariaDB/Rebooting_a_host#Rolling_restarts_for_the_ES_section

Thu, Dec 4, 11:40 AM · DBA
FCeratto-WMF added a comment to T391581: Accept both FQDN and bare hostname in DB cookbooks.

I thought that was done when it was mentioned at: https://phabricator.wikimedia.org/T391581#11319934

Thu, Dec 4, 10:53 AM · Patch-For-Review, DBA
FCeratto-WMF added a comment to T315642: Monitor GTID status.

I've updated the dashboard again to ignore test-s4 and it's now showing zero hosts with misconfigured GTID. Next step: move the check/alarm in prometheus/puppet and log alerts on IRC

Thu, Dec 4, 9:32 AM · DBA
FCeratto-WMF closed T411111: Database Creation request for requestctl.wikimedia.org, a subtask of T409264: Data storage for HP, as Resolved.
Thu, Dec 4, 9:30 AM · Patch-For-Review, Hiddenparma
FCeratto-WMF closed T411111: Database Creation request for requestctl.wikimedia.org as Resolved.

Verified out-of-band on IRC: @Joe received the password and the database is ready. Closing.

Thu, Dec 4, 9:30 AM · DBA, Hiddenparma

Tue, Dec 2

FCeratto-WMF moved T410084: sre.mysql.clone cookbook not using --ignore-existing from Ready to Blocked on the DBA board.
Tue, Dec 2, 4:12 PM · Patch-For-Review, DBA
FCeratto-WMF moved T411111: Database Creation request for requestctl.wikimedia.org from In progress to Blocked on the DBA board.
Tue, Dec 2, 4:11 PM · DBA, Hiddenparma
FCeratto-WMF added a comment to T391581: Accept both FQDN and bare hostname in DB cookbooks.

The clone cookbook is not changed yet. I can update it while doing T410084

Tue, Dec 2, 3:01 PM · Patch-For-Review, DBA
FCeratto-WMF added a comment to T411111: Database Creation request for requestctl.wikimedia.org.

Documented in https://wikitech.wikimedia.org/w/index.php?title=MariaDB/misc&oldid=2367153

Tue, Dec 2, 11:33 AM · DBA, Hiddenparma
FCeratto-WMF added a comment to T411111: Database Creation request for requestctl.wikimedia.org.
Tue, Dec 2, 11:21 AM · DBA, Hiddenparma
FCeratto-WMF moved T411111: Database Creation request for requestctl.wikimedia.org from Ready to In progress on the DBA board.
Tue, Dec 2, 11:09 AM · DBA, Hiddenparma

Mon, Dec 1

FCeratto-WMF moved T411111: Database Creation request for requestctl.wikimedia.org from Refine to Ready on the DBA board.
Mon, Dec 1, 2:33 PM · DBA, Hiddenparma
FCeratto-WMF moved T409706: Create pooling/weight status Grafana dashboard demo from Blocked to In progress on the DBA board.
Mon, Dec 1, 2:18 PM · DBA
FCeratto-WMF moved T391470: Create template/helper for deploying Python webapps on aux k8s from Blocked to Ready on the DBA board.
Mon, Dec 1, 2:18 PM · DBA
FCeratto-WMF moved T410508: Auto_schema broken in HEAD from Ready to Blocked on the DBA board.
Mon, Dec 1, 2:17 PM · DBA
FCeratto-WMF moved T299441: Avoid depooling hosts if the schema change has been applied before from Blocked to Ready on the DBA board.
Mon, Dec 1, 2:17 PM · Patch-For-Review, DBA, Auto schema
FCeratto-WMF added a comment to T410508: Auto_schema broken in HEAD.

I can run schema_change on my side or we can run it together in a shared tmux session.

Mon, Dec 1, 8:11 AM · DBA
FCeratto-WMF added a comment to T410508: Auto_schema broken in HEAD.

To summarize the ongoing investigation:

Mon, Dec 1, 7:59 AM · DBA

Thu, Nov 27

FCeratto-WMF edited P85890 (An Untitled Masterwork).
Thu, Nov 27, 4:35 PM
FCeratto-WMF edited P85890 (An Untitled Masterwork).
Thu, Nov 27, 4:33 PM
FCeratto-WMF updated the language for P85890 (An Untitled Masterwork) from autodetect to bash.
Thu, Nov 27, 4:28 PM
FCeratto-WMF moved T299441: Avoid depooling hosts if the schema change has been applied before from Ready to Blocked on the DBA board.
Thu, Nov 27, 4:13 PM · Patch-For-Review, DBA, Auto schema
FCeratto-WMF moved T400056: Core DB testbed on VMs from Blocked to Ready on the DBA board.
Thu, Nov 27, 4:12 PM · DBA
FCeratto-WMF edited P85890 (An Untitled Masterwork).
Thu, Nov 27, 2:40 PM
FCeratto-WMF created P85890 (An Untitled Masterwork).
Thu, Nov 27, 2:19 PM
FCeratto-WMF created P85854 (An Untitled Masterwork).
Thu, Nov 27, 11:55 AM
FCeratto-WMF moved T410508: Auto_schema broken in HEAD from Ready to Blocked on the DBA board.
Thu, Nov 27, 10:16 AM · DBA
FCeratto-WMF reopened T400056: Core DB testbed on VMs, a subtask of T384810: MariaDB lifetime management system, as In Progress.
Thu, Nov 27, 8:45 AM · Patch-For-Review, DBA
FCeratto-WMF reopened T400056: Core DB testbed on VMs as "In Progress".

Reopening while fixing replication after host reboots

Thu, Nov 27, 8:45 AM · DBA

Wed, Nov 26

FCeratto-WMF added a comment to T411085: db2166 from s8 started lagging, disk latency up, hw issue?.

The host is repooled so we can close this task. @Marostegui can you please clarify the difference around "OS errors (they aren't even on the HW logs)"? E.g. are we seeing cases where Offline_Uncorrectable are false positive and the drives are healthy?

Wed, Nov 26, 2:47 PM · DBA
FCeratto-WMF added a comment to T411085: db2166 from s8 started lagging, disk latency up, hw issue?.

As related tasks might want to address:

Wed, Nov 26, 1:28 PM · DBA
FCeratto-WMF added a comment to T411085: db2166 from s8 started lagging, disk latency up, hw issue?.

Repooling as the raid is not degraded yet and monitoring MariaDB performance

Wed, Nov 26, 1:10 PM · DBA
FCeratto-WMF added a comment to T411085: db2166 from s8 started lagging, disk latency up, hw issue?.

db2166 is not showing any metric on https://grafana.wikimedia.org/goto/n9QcXmZvg?orgId=1 , but other host are showing inconsistent disk temperature readings. For example db1178 shows temps between 26 and 28 for the 8 raid drives, but smartd is logging between 72 and 74 celsius

ssh db1178.eqiad.wmnet -t "sudo journalctl --since '1 h ago' --identifier smartd"
Nov 26 11:43:02 db1178 smartd[2802618]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 73 to 72
Nov 26 11:43:02 db1178 smartd[2802618]: Device: /dev/bus/0 [megaraid_disk_08] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 73 to 74
Nov 26 12:13:01 db1178 smartd[2802618]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 72 to 73
Nov 26 12:13:01 db1178 smartd[2802618]: Device: /dev/bus/0 [megaraid_disk_08] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 74 to 73
Connection to db1178.eqiad.wmnet closed.
Wed, Nov 26, 12:48 PM · DBA
FCeratto-WMF added a comment to T411085: db2166 from s8 started lagging, disk latency up, hw issue?.

A summary of disk errors on the host:

for n in {0..10}; do echo $n; sudo smartctl -a /dev/bus/0 -d megaraid,$n | grep Uncorrec; done
0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       352
1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
2
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
3
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       48
4
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       240
5
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       72
6
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       2896
7
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
8
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
9
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       104
10
Wed, Nov 26, 12:41 PM · DBA
FCeratto-WMF added a comment to T411085: db2166 from s8 started lagging, disk latency up, hw issue?.

MariaDB logged multiple slow writes https://phabricator.wikimedia.org/P85718

Wed, Nov 26, 12:28 PM · DBA
FCeratto-WMF created P85718 (An Untitled Masterwork).
Wed, Nov 26, 12:28 PM
FCeratto-WMF added a comment to T411085: db2166 from s8 started lagging, disk latency up, hw issue?.

The host is showing multiple bad sectors

Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], 352 Offline uncorrectable sectors
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 61 to 60
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_01] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 62
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_02] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 65 to 63
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_03] [SAT], 48 Offline uncorrectable sectors
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_03] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 62
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], 240 Offline uncorrectable sectors
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 62 to 61
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], read SMART Attribute Data worked again, warning condition reset after 1 email
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], 72 Offline uncorrectable sectors (changed +16)
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], SMART Usage Attribute: 13 Read_Soft_Error_Rate changed from 100 to 99
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], SMART Usage Attribute: 180 Unused_Rsvd_Blk_Cnt_Tot changed from 100 to 99
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], 2896 Offline uncorrectable sectors
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 62
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 65 to 64
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_08] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 62
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_09] [SAT], 104 Offline uncorrectable sectors
Nov 26 11:49:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_09] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 64 to 63
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], 352 Offline uncorrectable sectors
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 60 to 61
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_02] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 64
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_03] [SAT], 48 Offline uncorrectable sectors
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_03] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 62 to 63
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_04] [SAT], 240 Offline uncorrectable sectors
Nov 26 12:19:18 db2166 smartd[932]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
Nov 26 12:19:18 db2166 smartd[932]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_05] [SAT], 72 Offline uncorrectable sectors
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_06] [SAT], 2896 Offline uncorrectable sectors
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_07] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 64 to 65
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_08] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 62 to 63
Nov 26 12:19:18 db2166 smartd[932]: Device: /dev/bus/0 [megaraid_disk_09] [SAT], 104 Offline uncorrectable sectors
Wed, Nov 26, 12:25 PM · DBA
FCeratto-WMF edited P85715 (An Untitled Masterwork).
Wed, Nov 26, 12:17 PM
FCeratto-WMF created P85715 (An Untitled Masterwork).
Wed, Nov 26, 12:14 PM
FCeratto-WMF created P85714 (An Untitled Masterwork).
Wed, Nov 26, 12:12 PM

Tue, Nov 25

FCeratto-WMF closed T391581: Accept both FQDN and bare hostname in DB cookbooks as Resolved.
Tue, Nov 25, 3:13 PM · Patch-For-Review, DBA
FCeratto-WMF added a comment to T391581: Accept both FQDN and bare hostname in DB cookbooks.

CR was approved, the cookbook was tested with a real repool. Merging and closing.

Tue, Nov 25, 3:12 PM · Patch-For-Review, DBA
FCeratto-WMF created P85625 (An Untitled Masterwork).
Tue, Nov 25, 2:33 PM
FCeratto-WMF created P85622 (An Untitled Masterwork).
Tue, Nov 25, 2:26 PM
FCeratto-WMF added a comment to T399580: Automate read-only es* section OS updates.

Related PR approved and merged.

Tue, Nov 25, 1:18 PM · DBA
FCeratto-WMF moved T399580: Automate read-only es* section OS updates from Ready to Blocked on the DBA board.
Tue, Nov 25, 1:17 PM · DBA
FCeratto-WMF added a comment to T409706: Create pooling/weight status Grafana dashboard demo.

Regarding "Add metrics also to single-db dashboard" I made a "demo" dashboard with the new panel on the left titled "dbctl weight": https://grafana.wikimedia.org/goto/L7_M7RWvR?orgId=1

Tue, Nov 25, 11:39 AM · DBA
FCeratto-WMF changed the status of T196366: Implement (or refactor) a script to move slaves when the master is not available, a subtask of T156461: [META ticket] Automation for our DBs tracking task, from Open to In Progress.
Tue, Nov 25, 11:38 AM · DBA, Epic
FCeratto-WMF changed the status of T196366: Implement (or refactor) a script to move slaves when the master is not available, a subtask of T384810: MariaDB lifetime management system, from Open to In Progress.
Tue, Nov 25, 11:38 AM · Patch-For-Review, DBA
FCeratto-WMF changed the status of T196366: Implement (or refactor) a script to move slaves when the master is not available from Open to In Progress.
Tue, Nov 25, 11:38 AM · Data-Persistence-Automations, Patch-For-Review, SRE-Sprint-Week-Sustainability-March2023, User-Ladsgroup, Sustainability (Incident Followup), DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Tue, Nov 25, 11:34 AM · DBA
FCeratto-WMF closed T400056: Core DB testbed on VMs, a subtask of T384810: MariaDB lifetime management system, as Resolved.
Tue, Nov 25, 11:27 AM · Patch-For-Review, DBA
FCeratto-WMF closed T400056: Core DB testbed on VMs as Resolved.
Tue, Nov 25, 11:27 AM · DBA
FCeratto-WMF moved T410508: Auto_schema broken in HEAD from Blocked to Ready on the DBA board.
Tue, Nov 25, 11:27 AM · DBA
FCeratto-WMF moved T299441: Avoid depooling hosts if the schema change has been applied before from In progress to Ready on the DBA board.
Tue, Nov 25, 11:27 AM · Patch-For-Review, DBA, Auto schema
FCeratto-WMF created P85593 (An Untitled Masterwork).
Tue, Nov 25, 11:01 AM
FCeratto-WMF created P85574 (An Untitled Masterwork).
Tue, Nov 25, 8:16 AM

Mon, Nov 24

FCeratto-WMF moved T410376: clone.py cookbook: restart and repool source host ASAP from Blocked to Done on the DBA board.
Mon, Nov 24, 9:43 AM · DBA
FCeratto-WMF closed T410376: clone.py cookbook: restart and repool source host ASAP as Resolved.
Mon, Nov 24, 9:43 AM · DBA
FCeratto-WMF moved T410376: clone.py cookbook: restart and repool source host ASAP from In progress to Blocked on the DBA board.
Mon, Nov 24, 9:43 AM · DBA
FCeratto-WMF moved T410508: Auto_schema broken in HEAD from In progress to Blocked on the DBA board.
Mon, Nov 24, 9:43 AM · DBA

Fri, Nov 21

FCeratto-WMF moved T410508: Auto_schema broken in HEAD from Ready to In progress on the DBA board.
Fri, Nov 21, 3:56 PM · DBA

Wed, Nov 19

FCeratto-WMF moved T391581: Accept both FQDN and bare hostname in DB cookbooks from In progress to Blocked on the DBA board.
Wed, Nov 19, 9:39 AM · Patch-For-Review, DBA

Tue, Nov 18

FCeratto-WMF moved T400056: Core DB testbed on VMs from In progress to Blocked on the DBA board.
Tue, Nov 18, 10:15 AM · DBA
FCeratto-WMF added a comment to T400056: Core DB testbed on VMs.

The 5 VMs are showing up on zarcillo and replicating https://zarcillo.wikimedia.org/ui/sections#test-s4 - the Prometheus metrics will start working as expected once the old metrics created before the VMs redeploy (with a different server_id) disappear.

Tue, Nov 18, 10:15 AM · DBA
FCeratto-WMF changed the status of T410376: clone.py cookbook: restart and repool source host ASAP from Open to In Progress.
Tue, Nov 18, 9:52 AM · DBA
FCeratto-WMF created T410376: clone.py cookbook: restart and repool source host ASAP.
Tue, Nov 18, 9:52 AM · DBA
FCeratto-WMF moved T409706: Create pooling/weight status Grafana dashboard demo from In progress to Blocked on the DBA board.
Tue, Nov 18, 9:33 AM · DBA
FCeratto-WMF moved T391581: Accept both FQDN and bare hostname in DB cookbooks from Ready to In progress on the DBA board.
Tue, Nov 18, 9:33 AM · Patch-For-Review, DBA

Mon, Nov 17

FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 6:36 PM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 6:20 PM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 5:17 PM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 4:51 PM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 4:16 PM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 3:59 PM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 3:25 PM · DBA
FCeratto-WMF updated the task description for T409926: Improve master-replica switchover (flip) automation.
Mon, Nov 17, 12:40 PM · Patch-For-Review, DBA
FCeratto-WMF added a comment to T409926: Improve master-replica switchover (flip) automation.

Moving code from https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1129904 into https://gitlab.wikimedia.org/repos/sre/wmfmariadbpy to:

  • initially provide a CLI tool for the team
  • later on a library that can be used in a switchover cookbook by other SREs
Mon, Nov 17, 12:38 PM · Patch-For-Review, DBA
FCeratto-WMF changed the status of T409926: Improve master-replica switchover (flip) automation from Open to In Progress.
Mon, Nov 17, 12:35 PM · Patch-For-Review, DBA
FCeratto-WMF triaged T391581: Accept both FQDN and bare hostname in DB cookbooks as Low priority.
Mon, Nov 17, 12:33 PM · Patch-For-Review, DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 12:28 PM · DBA
FCeratto-WMF added a comment to T409706: Create pooling/weight status Grafana dashboard demo.

@Marostegui I added initial documentation at https://phabricator.wikimedia.org/T384212#11378487

Mon, Nov 17, 12:01 PM · DBA
FCeratto-WMF moved T409706: Create pooling/weight status Grafana dashboard demo from Ready to In progress on the DBA board.
Mon, Nov 17, 11:59 AM · DBA
FCeratto-WMF changed the status of T409706: Create pooling/weight status Grafana dashboard demo from Open to In Progress.
Mon, Nov 17, 11:59 AM · DBA
FCeratto-WMF added a comment to T384212: Create a dashboard to show depooled hosts.

I'm adding documentation for the Web UI at https://doc.wikimedia.org/data_persistence/zarcillo/README.html#_web_ui as a way to share progress here. I can also paste the documentation here if desired.

Mon, Nov 17, 11:34 AM · DBA
FCeratto-WMF added a comment to T408774: Add db-test* hosts to zarcillo and test-s4.

Puppet is already configured to place VMs in test-s4. Deployment is tracked in https://phabricator.wikimedia.org/T400056

Mon, Nov 17, 10:38 AM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 10:36 AM · DBA
FCeratto-WMF moved T408774: Add db-test* hosts to zarcillo and test-s4 from Ready to In progress on the DBA board.
Mon, Nov 17, 10:23 AM · DBA
FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Mon, Nov 17, 9:41 AM · DBA

Fri, Nov 14

FCeratto-WMF updated the task description for T400056: Core DB testbed on VMs.
Fri, Nov 14, 3:08 PM · DBA