Page MenuHomePhabricator
Feed Advanced Search

Jul 25 2019

Marostegui created T228956: decommission db1072.eqiad.wmnet.
Jul 25 2019, 5:48 AM · DC-Ops, ops-eqiad, decommission, Operations
Marostegui updated the task description for T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC).
Jul 25 2019, 5:45 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC).

As of today, db1072 is no longer a master (T228243#5363931), so this rack is also good to go. db1072 will be decommissioned in a few days

Jul 25 2019, 5:45 AM · DC-Ops, Operations, ops-eqiad
Marostegui closed T228243: Switchover m3 (phabricator) master db1072 to db1128, a subtask of T217396: Decommission db1061-db1073, as Resolved.
Jul 25 2019, 5:41 AM · Operations, DBA
Marostegui closed T228243: Switchover m3 (phabricator) master db1072 to db1128 as Resolved.

This has been done.
Phabricator read only start: 05:30:44
Phabricator read only stop: 05:31:37

Jul 25 2019, 5:41 AM · User-notice, Phabricator, Operations, DBA
mmodell awarded T228955: test a Orange Medal token.
Jul 25 2019, 5:37 AM · Operations
Marostegui added a subtask for T228243: Switchover m3 (phabricator) master db1072 to db1128: T228955: test.
Jul 25 2019, 5:33 AM · User-notice, Phabricator, Operations, DBA
Marostegui added a parent task for T228955: test: T228243: Switchover m3 (phabricator) master db1072 to db1128.
Jul 25 2019, 5:33 AM · Operations
Marostegui closed T228955: test as Resolved.
Jul 25 2019, 5:32 AM · Operations
Marostegui assigned T228955: test to mmodell.
Jul 25 2019, 5:32 AM · Operations
Marostegui created T228955: test.
Jul 25 2019, 5:32 AM · Operations
Marostegui updated the task description for T227113: rack/setup/install db21[21-30].codfw.wmnet.
Jul 25 2019, 5:06 AM · Goal, Operations, DBA, ops-codfw
Marostegui added a comment to T227113: rack/setup/install db21[21-30].codfw.wmnet.

db2121-db2125 looking good! Thanks

Jul 25 2019, 5:04 AM · Goal, Operations, DBA, ops-codfw
Marostegui updated the task description for T227113: rack/setup/install db21[21-30].codfw.wmnet.
Jul 25 2019, 5:03 AM · Goal, Operations, DBA, ops-codfw
Marostegui added a comment to T228732: Upgrade db1100 firmware and BIOS.

@Marostegui This can be done any day...Let's plan 8/6 @1000EDT /1400UTC

Jul 25 2019, 4:44 AM · DBA, ops-eqiad, Operations
Marostegui added a parent task for T228891: dbprov1001 alerting on PS Redundancy: T226778: Install new PDUs in rows A/B (Top level tracking task).
Jul 25 2019, 4:43 AM · Operations, ops-eqiad, DC-Ops
Marostegui added a subtask for T226778: Install new PDUs in rows A/B (Top level tracking task): T228891: dbprov1001 alerting on PS Redundancy.
Jul 25 2019, 4:43 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a subtask for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4]: T228892: dbproxy1012 alerting on PS Redundancy.
Jul 25 2019, 4:43 AM · DBA
Marostegui added a parent task for T228892: dbproxy1012 alerting on PS Redundancy: T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 25 2019, 4:43 AM · Operations, ops-eqiad, DC-Ops
Marostegui added a comment to T227552: pc2010 possibly broken memory.

This host crashed again, this time it was totally frozen and I had to reset it via idrac.
These are the HW logs, same issue:

Jul 25 2019, 4:42 AM · Operations, ops-codfw, DBA
Marostegui added a comment to T226778: Install new PDUs in rows A/B (Top level tracking task).

@RobH if you guys don't have any preference on which rack to start with...from the DB side, B3 can be a good option if it can be done before Tuesday 30th.
A month ago we scheduled a failover (T227062) for our s8 (wikidata) primary db master, and the new master (db1104) will be in B3, so if that rack can be done before Tuesday 30th, that's one less master we need to worry about :)

Jul 25 2019, 4:42 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a subtask for T226778: Install new PDUs in rows A/B (Top level tracking task): T228892: dbproxy1012 alerting on PS Redundancy.
Jul 25 2019, 4:38 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a parent task for T228892: dbproxy1012 alerting on PS Redundancy: T226778: Install new PDUs in rows A/B (Top level tracking task).
Jul 25 2019, 4:38 AM · Operations, ops-eqiad, DC-Ops

Jul 24 2019

Marostegui added a comment to T228618: Reallocate dbproxy1020 and dbproxy1021 from row D to row C.

Separate if possible. If it is really not possible then same rack it is also ok

Jul 24 2019, 8:05 PM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 24 2019, 2:53 PM · AbuseFilter, DBA
Marostegui added a comment to T226851: Drop abuse_filter_log.afl_log_id in production.

s5 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • db1130
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070
Jul 24 2019, 2:44 PM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 24 2019, 2:41 PM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 24 2019, 2:33 PM · AbuseFilter, DBA
Marostegui renamed T228859: dbproxy1012 and dbprov1001 alerting on PS Redundancy from dbproxy1012 and dbpro1001 alerting on PS Redundancy to dbproxy1012 and dbprov1001 alerting on PS Redundancy.
Jul 24 2019, 12:26 PM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T228768: Decommission dbproxy1004 and dbproxy1009.

I have stopped haproxy on both hosts, and will leave it like that for 24h, just to be fully sure nothing uses it.

Jul 24 2019, 12:21 PM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui updated the task description for T228768: Decommission dbproxy1004 and dbproxy1009.
Jul 24 2019, 12:21 PM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui moved T228768: Decommission dbproxy1004 and dbproxy1009 from Triage to In progress on the DBA board.
Jul 24 2019, 12:18 PM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui created T228859: dbproxy1012 and dbprov1001 alerting on PS Redundancy.
Jul 24 2019, 12:07 PM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC).

From the DBA side, it is good to. db1073 is a master for m5 (wikitech, nova...) cloud-services-team needs to decide if they can afford a downtime there.

Jul 24 2019, 10:21 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC).
Jul 24 2019, 10:20 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a project to T228768: Decommission dbproxy1004 and dbproxy1009: decommission.
Jul 24 2019, 10:15 AM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui renamed T228768: Decommission dbproxy1004 and dbproxy1009 from Decommission m4 proxies (dbproxy1004 and dbproxy1008) to Decommission dbproxy1004 and dbproxy1009.
Jul 24 2019, 10:15 AM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui claimed T228768: Decommission dbproxy1004 and dbproxy1009.

Great - thanks. I will get them decommissioned

Jul 24 2019, 10:12 AM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui reopened T227552: pc2010 possibly broken memory as "Open".

Looks like this happened again and mysql crashed:
@Papaul could this be the memory slot? Should we swap the DIMM with another existing DIMM and see if the same error happens meaning it is the slot, or a new DIMM is reported meaning it is the DIMM itself (even though it is supposed to be new?)

-------------------------------------------------------------------------------
Record:      4
Date/Time:   07/24/2019 08:20:41
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      5
Date/Time:   07/24/2019 08:20:42
Source:      system
Severity:    Critical
Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_B1.
-------------------------------------------------------------------------------
Record:      6
Date/Time:   07/24/2019 08:20:42
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      7
Date/Time:   07/24/2019 08:20:43
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      8
Date/Time:   07/24/2019 08:20:44
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      9
Date/Time:   07/24/2019 08:20:44
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      10
Date/Time:   07/24/2019 08:20:45
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      11
Date/Time:   07/24/2019 08:20:46
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      12
Date/Time:   07/24/2019 08:20:47
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      13
Date/Time:   07/24/2019 08:20:55
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      14
Date/Time:   07/24/2019 08:20:55
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      15
Date/Time:   07/24/2019 08:21:03
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      16
Date/Time:   07/24/2019 08:21:04
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      17
Date/Time:   07/24/2019 08:21:05
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      18
Date/Time:   07/24/2019 08:21:06
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      19
Date/Time:   07/24/2019 08:21:06
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      20
Date/Time:   07/24/2019 08:21:07
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      21
Date/Time:   07/24/2019 08:21:07
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      22
Date/Time:   07/24/2019 08:21:09
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      23
Date/Time:   07/24/2019 08:21:09
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Record:      24
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Critical
Description: CPU 2 machine check error detected.
-------------------------------------------------------------------------------
Record:      25
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      26
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      27
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      28
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      29
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      30
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      31
Date/Time:   07/24/2019 08:21:10
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      32
Date/Time:   07/24/2019 08:21:11
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      33
Date/Time:   07/24/2019 08:21:11
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      34
Date/Time:   07/24/2019 08:21:11
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      35
Date/Time:   07/24/2019 08:21:11
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      36
Date/Time:   07/24/2019 08:21:11
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      37
Date/Time:   07/24/2019 08:21:11
Source:      system
Severity:    Critical
Description: Correctable memory error logging disabled for a memory device at location DIMM_B1.
-------------------------------------------------------------------------------
Record:      38
Date/Time:   07/24/2019 08:21:11
Source:      system
Severity:    Critical
Description: The system memory has faced an uncorrectable multi-bit memory errors in the non-execution path of a memory device at the location DIMM_B1.
-------------------------------------------------------------------------------
Jul 24 2019, 10:11 AM · Operations, ops-codfw, DBA
Marostegui triaged T228768: Decommission dbproxy1004 and dbproxy1009 as Normal priority.
Jul 24 2019, 10:06 AM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 24 2019, 10:05 AM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 24 2019, 9:56 AM · AbuseFilter, DBA
Marostegui added a comment to T226851: Drop abuse_filter_log.afl_log_id in production.

s2 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1129
  • db1125
  • db1122
  • db1105
  • db1103
  • db1095
  • db1090
  • db1076
  • db1074
  • db1066
Jul 24 2019, 9:56 AM · AbuseFilter, DBA
Marostegui added a comment to T222224: Normalize MediaWiki link tables.

If I can hw to test and give you some numbers, that would be great.

I believe you have access to the test hosts we have? We can import tables and alter them as required if that'd help you with the tests

I don't have access to that node AFAIK, also this needs commons instead of wikidata.

Jul 24 2019, 9:42 AM · DBA, Core Platform Team, MediaWiki-Page-derived-data, Schema-change, Patch-For-Review, TechCom-RFC
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 24 2019, 8:50 AM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 24 2019, 8:30 AM · AbuseFilter, DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 24 2019, 7:27 AM · DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 24 2019, 7:27 AM · DBA

Jul 23 2019

Marostegui created T228768: Decommission dbproxy1004 and dbproxy1009.
Jul 23 2019, 3:13 PM · Patch-For-Review, ops-eqiad, decommission, Operations, Analytics-EventLogging, Analytics
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 23 2019, 1:47 PM · AbuseFilter, DBA
Marostegui added a comment to T226851: Drop abuse_filter_log.afl_log_id in production.

s8 eqiad progress

Jul 23 2019, 1:47 PM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 23 2019, 1:16 PM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 23 2019, 1:01 PM · AbuseFilter, DBA
Marostegui updated the task description for T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).
Jul 23 2019, 10:24 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 23 2019, 10:23 AM · AbuseFilter, DBA
Marostegui added a comment to T226851: Drop abuse_filter_log.afl_log_id in production.

s6 eqiad progress

Jul 23 2019, 10:23 AM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 23 2019, 10:00 AM · AbuseFilter, DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 23 2019, 9:52 AM · AbuseFilter, DBA
Marostegui added a comment to T226851: Drop abuse_filter_log.afl_log_id in production.

I am going to start dropping this column on s6 (starting with codfw first).
I have double checked that there are rows where afl_log_id isn't NULL on all the s6 wikis (frwiki, jawiki and ruwiki).

Jul 23 2019, 9:51 AM · AbuseFilter, DBA
Marostegui added a comment to T222224: Normalize MediaWiki link tables.

If I can hw to test and give you some numbers, that would be great.

Jul 23 2019, 9:36 AM · DBA, Core Platform Team, MediaWiki-Page-derived-data, Schema-change, Patch-For-Review, TechCom-RFC
Marostegui updated the task description for T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC).
Jul 23 2019, 9:17 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T227139: a3-eqiad pdu refresh.
Jul 23 2019, 9:16 AM · DC-Ops, Operations, ops-eqiad
Marostegui removed a project from T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092: ops-codfw.
Jul 23 2019, 9:00 AM · Operations, DBA
Marostegui moved T228732: Upgrade db1100 firmware and BIOS from Triage to Blocked external/Not db team on the DBA board.
Jul 23 2019, 8:57 AM · DBA, ops-eqiad, Operations
Marostegui created T228732: Upgrade db1100 firmware and BIOS.
Jul 23 2019, 8:57 AM · DBA, ops-eqiad, Operations
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 23 2019, 8:18 AM · DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 23 2019, 8:13 AM · DBA
Marostegui added a comment to T227141: a5-eqiad pdu refresh.

From the DB side of things, this rack should be done before Thursday 30th 05:30AM UTC, as at that time db1128 will become phabricator master T228243: Switchover m3 (phabricator) master db1072 to db1128

Jul 23 2019, 7:14 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T227141: a5-eqiad pdu refresh.
Jul 23 2019, 7:14 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC).

From the DB side, this can be done after Thursday 25th as db1072 will no longer be a master

Jul 23 2019, 7:14 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC).

From the DB side this can be done anytime

Jul 23 2019, 7:11 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC).
Jul 23 2019, 7:09 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).

From the DB side, this rack is good to go

Jul 23 2019, 7:07 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227143: a7-eqiad pdu refresh.

From the DB side this rack is good to go

Jul 23 2019, 7:06 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).
Jul 23 2019, 7:06 AM · Patch-For-Review, DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).

This rack contains an active primary db master: db1066, this would need to be failed over if we are not confident about not losing power.

Jul 23 2019, 7:05 AM · Patch-For-Review, DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T227141: a5-eqiad pdu refresh.
Jul 23 2019, 7:04 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

Good to go from the DB side

Jul 23 2019, 7:02 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227141: a5-eqiad pdu refresh.

From the DB side of things, this rack should be done before Thursday 30th 05:30AM UTC, as at that time db1128 will become phabricator master T228243: Switchover m3 (phabricator) master db1072 to db1128

Jul 23 2019, 7:01 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T226778: Install new PDUs in rows A/B (Top level tracking task).

db1081 and db1075 are primary masters, so if we are not fully sure no power will be lost, I rather do other racks first
Racks on row A that are good to go:
A3: has one active dbproxy (dbproxy1001) I could failover tomorrow and then it should be good to go.
A4: good to go
A5: good to go if done before Thursday 30th as that day db1128 will become a master (T228243)
A7: good to go
From row B:
B1: good to go
B2: good to go after thursday 25th as we are failing over that host that day T228243
B3: It has m5 master which is mostly used by wikitech and cloud team, so you might want to ping them. From the DBAs side it is good to go.
B4: good to go
B6: good to go
B7: good to go
B8: it has m2 master which is mostly used by recommendationsapi, otrs, debmonitor, so if those stakeholders are ok, that is fine from a DBA point of view. Tags should be: OTRS Recommendation-API SRE-tools

Jul 23 2019, 4:57 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227139: a3-eqiad pdu refresh.

@RobH I have failed over dbproxy1001 to dbproxy1006 so this rack is good to go from the DB point of view.

Jul 23 2019, 4:56 AM · DC-Ops, Operations, ops-eqiad
Marostegui closed T227552: pc2010 possibly broken memory as Resolved.

MySQL caught up - all good. Thanks again Papaul!

Jul 23 2019, 4:39 AM · Operations, ops-codfw, DBA

Jul 22 2019

Marostegui added a comment to T227552: pc2010 possibly broken memory.

Thanks Papaul.
Memory looking good.

root@pc2010:~# free -m
              total        used        free      shared  buff/cache   available
Mem:         257392         674      256516           9         201      255498
Swap:          7628           0        7628

I have also upgraded the kernel too.
Going to start MySQL and all that jazz

Jul 22 2019, 4:20 PM · Operations, ops-codfw, DBA
Marostegui updated the task description for T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC).
Jul 22 2019, 3:02 PM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC).
Jul 22 2019, 3:01 PM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC).

db1081 and db1075 are primary masters, so if we are not fully sure no power will be lost, I rather do other racks first
Racks on row A that are good to go:

Jul 22 2019, 2:53 PM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T228618: Reallocate dbproxy1020 and dbproxy1021 from row D to row C.

@Cmjohnson lovely - thank you!

Jul 22 2019, 2:36 PM · DC-Ops, Operations, ops-eqiad
Marostegui moved T228613: Re-build db2097 s1 and s6 with Debian Buster and 10.3 from Backlog to Next on the DBA board.
Jul 22 2019, 1:25 PM · DBA
Marostegui moved T60674: Drop page.page_restrictions column from Wikimedia wikis from Backlog to Next on the DBA board.
Jul 22 2019, 1:25 PM · MediaWiki-Page-protection, Schema-change
Marostegui moved T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4] from Next to In progress on the DBA board.
Jul 22 2019, 1:24 PM · DBA
Marostegui added a comment to T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].

dbproxy1014 is now ready in m1 (as standby) and as a replacement of dbproxy1006 which can be decommissioned.

Jul 22 2019, 1:15 PM · DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 22 2019, 1:14 PM · DBA
Marostegui added a subtask for T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment: Unknown Object (Task).
Jul 22 2019, 12:46 PM · Goal, DBA
Marostegui claimed T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 22 2019, 9:09 AM · DBA
Marostegui updated the task description for T196055: Remove table `math` from the database.
Jul 22 2019, 8:34 AM · Patch-For-Review, DBA, Math
Marostegui added a comment to T196055: Remove table `math` from the database.

I have renamed the table on db2116: enwiki:

root@db2116.codfw.wmnet[enwiki]> show tables like 'TO_DRO%';
+----------------------------+
| Tables_in_enwiki (TO_DRO%) |
+----------------------------+
| TO_DROP_math               |
+----------------------------+
1 row in set (0.04 sec)
Jul 22 2019, 8:34 AM · Patch-For-Review, DBA, Math
Marostegui created T228618: Reallocate dbproxy1020 and dbproxy1021 from row D to row C.
Jul 22 2019, 8:18 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jul 22 2019, 8:10 AM · DBA
Marostegui updated the task description for T226851: Drop abuse_filter_log.afl_log_id in production.
Jul 22 2019, 7:52 AM · AbuseFilter, DBA
Marostegui added a comment to T226851: Drop abuse_filter_log.afl_log_id in production.

I have altered db1134, and will monitor the errors to make sure nothing complains about this column being gone.

root@db1134.eqiad.wmnet[enwiki]> show create table abuse_filter_log\G
*************************** 1. row ***************************
       Table: abuse_filter_log
Create Table: CREATE TABLE `abuse_filter_log` (
  `afl_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `afl_filter` varbinary(64) NOT NULL DEFAULT '',
  `afl_user` bigint(20) unsigned NOT NULL DEFAULT '0',
  `afl_user_text` varbinary(255) NOT NULL DEFAULT '',
  `afl_ip` varbinary(255) NOT NULL DEFAULT '',
  `afl_action` varbinary(255) NOT NULL DEFAULT '',
  `afl_actions` varbinary(255) NOT NULL DEFAULT '',
  `afl_var_dump` blob NOT NULL,
  `afl_timestamp` varbinary(14) NOT NULL DEFAULT '',
  `afl_namespace` int(11) NOT NULL,
  `afl_title` varbinary(255) NOT NULL DEFAULT '',
  `afl_wiki` varbinary(64) DEFAULT NULL,
  `afl_deleted` tinyint(1) NOT NULL DEFAULT '0',
  `afl_patrolled_by` int(10) unsigned NOT NULL DEFAULT '0',
  `afl_rev_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`afl_id`),
  KEY `afl_timestamp` (`afl_timestamp`),
  KEY `afl_rev_id` (`afl_rev_id`),
  KEY `user_timestamp` (`afl_user`,`afl_user_text`,`afl_timestamp`),
  KEY `filter_timestamp` (`afl_filter`,`afl_timestamp`),
  KEY `page_timestamp` (`afl_namespace`,`afl_title`,`afl_timestamp`),
  KEY `ip_timestamp` (`afl_ip`,`afl_timestamp`),
  KEY `wiki_timestamp` (`afl_wiki`,`afl_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=24453155 DEFAULT CHARSET=binary
1 row in set (0.00 sec)
Jul 22 2019, 7:52 AM · AbuseFilter, DBA