Page MenuHomePhabricator
Feed Advanced Search

Today

Jclark-ctr added a comment to T241849: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet.

@Cmjohnson Host are racked have started power cables Will have cables finished tomorrow

Wed, Feb 26, 12:29 AM · Patch-For-Review, serviceops, ops-eqiad, Operations

Yesterday

Jclark-ctr closed Unknown Object (Task), a subtask of T241313: cloudvirt1013: server down for no reason (power issue?), as Resolved.
Tue, Feb 25, 10:04 PM · cloud-services-team (Hardware), Operations, ops-eqiad, DC-Ops
Jclark-ctr closed Unknown Object (Task), a subtask of T242885: Expand Eqiad Ganeti row_A capacity, as Resolved.
Tue, Feb 25, 10:04 PM · hardware-requests, Operations
Jclark-ctr added a comment to T244958: db1095 backup source crashed: broken BBU.

Replaced BBU @jcrespo @Marostegui

Tue, Feb 25, 9:59 PM · ops-eqiad, Operations, DBA
Jclark-ctr added a comment to T244958: db1095 backup source crashed: broken BBU.
Tue, Feb 25, 9:49 PM · ops-eqiad, Operations, DBA
Jclark-ctr added a comment to T245647: Replace broken BBU on db1084 (HP host).

@Marostegui Received bbu please message me on irc and schedule replacement

Tue, Feb 25, 9:43 PM · Operations, ops-eqiad, DC-Ops

Fri, Feb 21

Jclark-ctr added a comment to T235685: (Need by: 2020-03-02) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.

Cabling is finished still being configured by Chris

Fri, Feb 21, 10:12 PM · cloud-services-team (Hardware), ops-eqiad, Operations

Wed, Feb 19

Jclark-ctr updated subscribers of T243414: relocate/reimage cloudvirt1013 with 10G interfaces.

@Andrew This is already located in 10g rack just needs Dac cables and connected to 10g nic. I have talked to @JHedden regarding this not sure if additional changes with switch is needed if using same ports.

Wed, Feb 19, 9:15 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations, Epic
Jclark-ctr added a comment to T240187: mw1280 crashed logging correctable memory errors.

@wiki_willy going by dell support based on service tag
Part number: PR5D1
DIMM,32GB,2133,2RX4,8G,DDR4,R 2

Wed, Feb 19, 9:11 PM · serviceops, Operations, ops-eqiad

Tue, Feb 18

Jclark-ctr added a comment to T241884: Degraded RAID on cloudvirt1024.

updated dell ticket with new tsr report

Tue, Feb 18, 9:58 PM · cloud-services-team (Hardware), Patch-For-Review, ops-eqiad, Operations
Jclark-ctr added a comment to T243536: cloudvirt1022 memory errors causing host to crash.

Replaced Failed Dimm

Tue, Feb 18, 9:26 PM · DC-Ops, ops-eqiad, cloud-services-team (Hardware), Operations

Sun, Feb 16

Jclark-ctr closed T245320: mr1-eqiad.wikimedia.org - Duplicate IP on mgmt network as Resolved.

Updated box fixed up address

Sun, Feb 16, 1:35 AM · Operations, DC-Ops, ops-eqiad

Fri, Feb 14

Jclark-ctr added a comment to T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet.

configured bios and idrac.

Fri, Feb 14, 10:36 PM · Dumps-Generation, Operations
Jclark-ctr updated the task description for T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet.
Fri, Feb 14, 10:30 PM · Dumps-Generation, Operations

Thu, Feb 13

Jclark-ctr updated the task description for T244506: (Need by: TBD) rack/setup/install kafka-jumbo100[789].eqiad.wmnet.
Thu, Feb 13, 12:44 AM · Operations, Analytics, ops-eqiad
Jclark-ctr reassigned T241359: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet from Jclark-ctr to Cmjohnson.

Below are switch ports for host are racked cabled and updated netbox. Handing over to chris to configure bios/ raid

Thu, Feb 13, 12:34 AM · DBA, Operations
Jclark-ctr added a comment to T244958: db1095 backup source crashed: broken BBU.

@jcrespo. Battery replacement delivery date is 02/22/20 Please message me on irc for what time works best for you for replacement. I can accommodate your schedule

Thu, Feb 13, 12:17 AM · ops-eqiad, Operations, DBA

Fri, Feb 7

Jclark-ctr added a comment to T241882: cloudvirt1016 crash.

replaced failed dimm A8

Fri, Feb 7, 6:23 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations
Jclark-ctr added a comment to T243963: es1019: reseat IPMI.

performed flea power drain. powered on host

Fri, Feb 7, 5:12 PM · DC-Ops, Operations, ops-eqiad, DBA

Thu, Feb 6

Jclark-ctr updated the task description for T208584: Decommission old eqiad caches.
Thu, Feb 6, 11:16 PM · ops-eqiad, decommission, Operations, Traffic
Jclark-ctr updated the task description for T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet.
Thu, Feb 6, 12:12 AM · Dumps-Generation, Operations
Jclark-ctr reassigned T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet from Jclark-ctr to Cmjohnson.
Thu, Feb 6, 12:12 AM · Dumps-Generation, Operations
Jclark-ctr added a comment to T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet.

Host is racked
rack B5 U25 . Switchport 14

Thu, Feb 6, 12:11 AM · Dumps-Generation, Operations

Wed, Feb 5

Jclark-ctr updated the task description for T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet.
Wed, Feb 5, 11:48 PM · Dumps-Generation, Operations
Jclark-ctr updated the task description for T233207: Decommission dbproxy1006.eqiad.wmnet.
Wed, Feb 5, 11:32 PM · Patch-For-Review, DC-Ops, decommission, ops-eqiad, Operations
Jclark-ctr updated the task description for T220503: Decommission neodymium.
Wed, Feb 5, 11:27 PM · decommission, Operations, ops-eqiad
Jclark-ctr updated the task description for T222109: decommission frav1001.frack.eqiad.wmnet.
Wed, Feb 5, 11:22 PM · decommission, Operations, fundraising-tech-ops, ops-eqiad, DC-Ops
Jclark-ctr updated the task description for T229586: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099.
Wed, Feb 5, 11:15 PM · ops-eqiad, DC-Ops, decommission, Operations
Jclark-ctr updated the task description for T234909: decommission auth1001.
Wed, Feb 5, 10:49 PM · ops-eqiad, Operations, DC-Ops, decommission
Jclark-ctr updated the task description for T238297: decommission db1067.eqiad.wmnet.
Wed, Feb 5, 10:43 PM · DC-Ops, decommission, ops-eqiad, Operations
Jclark-ctr updated the task description for T233071: Decommission db1066.eqiad.wmnet.
Wed, Feb 5, 10:02 PM · DC-Ops, ops-eqiad, decommission, Operations
Jclark-ctr updated the task description for T239188: Decommission db1062.eqiad.wmnet.
Wed, Feb 5, 9:50 PM · Operations, DC-Ops, ops-eqiad, decommission
Jclark-ctr updated the task description for T238624: Decommission db1061.eqiad.wmnet.
Wed, Feb 5, 9:48 PM · Operations, DC-Ops, ops-eqiad, decommission

Fri, Jan 31

Jclark-ctr added a comment to T241849: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet.

@jijiki will most likely be early March

Fri, Jan 31, 3:53 AM · Patch-For-Review, serviceops, ops-eqiad, Operations

Jan 24 2020

Jclark-ctr added a comment to T241359: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet.

I see that the racking recomendation is both in 10g and 1g racks @Marostegui and A1 is a network rack

Jan 24 2020, 10:45 PM · DBA, Operations
Jclark-ctr reassigned T235685: (Need by: 2020-03-02) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet from Jclark-ctr to Cmjohnson.

host racked netbox updated need ip addresses to continue

Jan 24 2020, 10:30 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Jclark-ctr updated the task description for T235685: (Need by: 2020-03-02) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.
Jan 24 2020, 10:24 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Jclark-ctr added a comment to T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet.

these have been received working on getting into netbox and will start racking shortly will not be finished before all hands . Will be able to turn over by feb 14th as long as no surprises

Jan 24 2020, 7:27 PM · Dumps-Generation, Operations
Jclark-ctr updated the task description for T241359: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet.
Jan 24 2020, 7:24 PM · DBA, Operations
Jclark-ctr added a comment to T241359: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet.

These host are not in racks yet. I can rack these today but do not have ip`s yet so can not setup yet. @Cmjohnson if you can add ip`s to this ticket i can configure host. once i get those turn around should be quick

Jan 24 2020, 7:22 PM · DBA, Operations
Jclark-ctr updated subscribers of T241494: Degraded RAID on cloudvirt1014.

@aborrero It is out of warranty i do have a spare bbu and can replace it. @JHedden and i had spoken briefly regarding this one last night. I am on site now and if you would like to change it today I can. But it is friday before allhands. Would like to change it on return unless you think this is urgent.

Jan 24 2020, 1:56 PM · Patch-For-Review, cloud-services-team (Hardware), ops-eqiad, Operations

Jan 23 2020

Jclark-ctr closed T241313: cloudvirt1013: server down for no reason (power issue?), a subtask of T138509: rack/setup/install/deploy labvirt1012 labvirt1013 labvirt1014 nodes (cloudvirt1012 cloudvirt1013 cloudvirt1014), as Resolved.
Jan 23 2020, 10:11 PM · Patch-For-Review, ops-eqiad, Operations
Jclark-ctr closed T241313: cloudvirt1013: server down for no reason (power issue?) as Resolved.
Jan 23 2020, 10:11 PM · cloud-services-team (Hardware), Operations, ops-eqiad, DC-Ops
Jclark-ctr added a comment to T243536: cloudvirt1022 memory errors causing host to crash.

emailed TSR report to dell

Jan 23 2020, 9:17 PM · DC-Ops, ops-eqiad, cloud-services-team (Hardware), Operations
Jclark-ctr added a comment to T241313: cloudvirt1013: server down for no reason (power issue?).

Replaced bbu no errrors at this time closing procurement task T243547 not needed at this time

Jan 23 2020, 9:07 PM · cloud-services-team (Hardware), Operations, ops-eqiad, DC-Ops
Jclark-ctr closed T242472: Degraded RAID on cloudvirt1013 as Resolved.

Replaced bbu no errrors at this time

Jan 23 2020, 9:06 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Jclark-ctr updated subscribers of T241313: cloudvirt1013: server down for no reason (power issue?).

313-hpe smart storage battery 1 Failure - battery shutdown event code: 0x400
action: restart system

Jan 23 2020, 8:33 PM · cloud-services-team (Hardware), Operations, ops-eqiad, DC-Ops
Jclark-ctr added a comment to T243536: cloudvirt1022 memory errors causing host to crash.

Confirmed: Service Request 1011922914 was successfully submitted.

Jan 23 2020, 7:17 PM · DC-Ops, ops-eqiad, cloud-services-team (Hardware), Operations
Jclark-ctr added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

firmware updated and bios. @JHedden can your team test to see if it will fail still

Jan 23 2020, 7:03 PM · cloud-services-team (Hardware), Operations, ops-eqiad, DC-Ops, User-Zppix
Jclark-ctr reassigned T241795: (Need By: Jan 10) rack/setup/install mc-gp100[123].eqiad.wmnet from Jclark-ctr to Cmjohnson.
Jan 23 2020, 4:18 PM · serviceops, Operations
Jclark-ctr reassigned T236437: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet from Jclark-ctr to Cmjohnson.

Host racked bios, ip , and password set. Needs dns

Jan 23 2020, 8:48 AM · serviceops, Operations
Jclark-ctr updated the task description for T236437: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet.
Jan 23 2020, 8:47 AM · serviceops, Operations

Jan 22 2020

Jclark-ctr closed T243433: cloudclastic1006 malformed asset tag - report error as Resolved.

updated netbox with correct asset tag

Jan 22 2020, 9:16 PM · ops-eqiad, Operations, DC-Ops

Jan 14 2020

Jclark-ctr added a comment to T236327: replace onboard NIC in kafka-jumbo100[1-6].

@Cmjohnson no nic installed or host moved yet. @RobH had helped with 10g interfaces

Jan 14 2020, 7:08 PM · ops-eqiad, Operations, Analytics, User-Elukey
Jclark-ctr reassigned T225121: (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300 from Jclark-ctr to Cmjohnson.
Jan 14 2020, 3:28 PM · netops, Operations, ops-eqiad
Jclark-ctr added a comment to T241506: Degraded RAID on db1100.

Replaced Disk #0

Jan 14 2020, 2:16 PM · DBA, ops-eqiad, Operations

Jan 10 2020

Jclark-ctr added a comment to T241506: Degraded RAID on db1100.

slot appears to be 0 as discussed on irc we will change monday

Jan 10 2020, 1:53 PM · DBA, ops-eqiad, Operations

Jan 9 2020

Jclark-ctr added a comment to T241506: Degraded RAID on db1100.

@Marostegui Drive has arrives Please PM me on IRC so we can get this swapped

Jan 9 2020, 9:48 PM · DBA, ops-eqiad, Operations

Jan 8 2020

Jclark-ctr added a comment to T241882: cloudvirt1016 crash.

Confirmed: Service Request 1009577756 was successfully submitted.

Jan 8 2020, 9:31 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations
Jclark-ctr added a comment to T241882: cloudvirt1016 crash.

Confirmed: Service Request 1009577756 was successfully submitted.

Jan 8 2020, 9:29 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations
Jclark-ctr added a comment to T241506: Degraded RAID on db1100.

Drive was ordered should arrive shortly will update when it arrives

Jan 8 2020, 9:22 PM · DBA, ops-eqiad, Operations

Jan 3 2020

Jclark-ctr added a comment to T239597: Hardware asset tag Netbox/DNS mgmt inconsistencies.

Verified host asset tags updated ticket . Corrected labstore1007 in netbox.

Jan 3 2020, 11:25 PM · Operations, ops-eqiad, DC-Ops
Jclark-ctr updated the task description for T239597: Hardware asset tag Netbox/DNS mgmt inconsistencies.
Jan 3 2020, 11:24 PM · Operations, ops-eqiad, DC-Ops
Jclark-ctr updated the task description for T239597: Hardware asset tag Netbox/DNS mgmt inconsistencies.
Jan 3 2020, 11:21 PM · Operations, ops-eqiad, DC-Ops
Jclark-ctr updated the task description for T239597: Hardware asset tag Netbox/DNS mgmt inconsistencies.
Jan 3 2020, 11:17 PM · Operations, ops-eqiad, DC-Ops

Dec 23 2019

Jclark-ctr added a comment to T240534: Degraded RAID on db1123.

drive changed

Dec 23 2019, 2:40 PM · DBA, ops-eqiad, Operations

Dec 20 2019

Jclark-ctr added a comment to T236437: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet.

@jijiki small delay had a few tickets become urgent. Being a contractor i will be in week after christmas and will wrap them up at beginning of new year.

Dec 20 2019, 11:59 PM · serviceops, Operations
Jclark-ctr closed Unknown Object (Task), a subtask of T221632: Storage capacity upgrade for WDQS, as Resolved.
Dec 20 2019, 11:27 PM · Discovery-Search, Wikidata, Wikidata-Query-Service
Jclark-ctr closed Unknown Object (Task), a subtask of T222098: Replace wdqs1003, as Resolved.
Dec 20 2019, 11:27 PM · Wikidata, Wikidata-Query-Service
Jclark-ctr closed Unknown Object (Task), a subtask of T226704: Setup es4 and es5 replica sets for new read-write external store service, as Resolved.
Dec 20 2019, 10:59 PM · Goal, Epic, DBA
Jclark-ctr added a comment to T240534: Degraded RAID on db1123.

@Marostegui disk arrived today message me on irc if available to change

Dec 20 2019, 9:45 PM · DBA, ops-eqiad, Operations

Dec 18 2019

Jclark-ctr added a comment to T240534: Degraded RAID on db1123.

Confirmed: Service Request 1007375142 was successfully submitted

Dec 18 2019, 10:12 PM · DBA, ops-eqiad, Operations
Jclark-ctr added a comment to T220698: Check if a GPU fits in any of the remaining stat or notebook hosts.


@EBernhardson stat1004 and 1007 are 1 u host these will not fit dual-slot card . most on this list are 1u host and have same configuration

Dec 18 2019, 2:28 PM · User-Elukey, Analytics, Operations, ops-eqiad

Dec 17 2019

Jclark-ctr added a comment to T236437: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet.

I recommend we use these racks we are limited in space . In row D and would not have the enough sfp to rj45 to accommodate the putting these in 10g racks
D 1: 15 servers
D 3: 3 servers
D 6: 15 servers
D 8: 3 servers

Dec 17 2019, 10:24 PM · serviceops, Operations

Dec 12 2019

Jclark-ctr claimed T205507: Decommission analytics100[1,2].
Dec 12 2019, 11:17 PM · Operations, ops-eqiad, decommission, Analytics
Jclark-ctr claimed T208586: Decommission lvs1007-1012.
Dec 12 2019, 11:15 PM · ops-eqiad, decommission, Operations, Traffic
Jclark-ctr claimed T208584: Decommission old eqiad caches.
Dec 12 2019, 11:15 PM · ops-eqiad, decommission, Operations, Traffic
Jclark-ctr claimed T234909: decommission auth1001.
Dec 12 2019, 11:04 PM · ops-eqiad, Operations, DC-Ops, decommission
Jclark-ctr claimed T233071: Decommission db1066.eqiad.wmnet.
Dec 12 2019, 11:03 PM · DC-Ops, ops-eqiad, decommission, Operations
Jclark-ctr claimed T146455: Decommission labsdb1002.
Dec 12 2019, 11:03 PM · hardware-requests, Patch-For-Review, ops-eqiad, Operations
Jclark-ctr updated the task description for T191362: decom promethium/WMF3571.
Dec 12 2019, 10:40 PM · decommission, Operations, DC-Ops, ops-eqiad
Jclark-ctr reassigned T239250: setup/install cescout1001.eqiad.wmnet from Jclark-ctr to RobH.
Dec 12 2019, 8:47 PM · Patch-For-Review, Operations
Jclark-ctr updated the task description for T239250: setup/install cescout1001.eqiad.wmnet.
Dec 12 2019, 8:46 PM · Patch-For-Review, Operations
Jclark-ctr added a comment to T239250: setup/install cescout1001.eqiad.wmnet.

Labeled Host
swap out the dual 480GB SSDs out for the dual 2TB SATA disks

Dec 12 2019, 8:46 PM · Patch-For-Review, Operations
Jclark-ctr closed T240545: Circuit down between cr1-eqiad and cr1-codfw as Resolved.
Dec 12 2019, 4:08 PM · ops-eqiad, netops, Operations
Jclark-ctr added a comment to T240545: Circuit down between cr1-eqiad and cr1-codfw.

Replaced failed Fiber

Dec 12 2019, 4:08 PM · ops-eqiad, netops, Operations
Jclark-ctr claimed T240545: Circuit down between cr1-eqiad and cr1-codfw.
Dec 12 2019, 1:39 PM · ops-eqiad, netops, Operations

Dec 10 2019

Jclark-ctr added a comment to T220698: Check if a GPU fits in any of the remaining stat or notebook hosts.

@RobH
Inspected stat1004 with @elukey this morning{F31467984} . 2 available slots. will fit 1 full height and 1 half height .

Dec 10 2019, 11:27 PM · User-Elukey, Analytics, Operations, ops-eqiad
Jclark-ctr updated the task description for T236327: replace onboard NIC in kafka-jumbo100[1-6].
Dec 10 2019, 10:57 PM · ops-eqiad, Operations, Analytics, User-Elukey
Jclark-ctr added a comment to T236327: replace onboard NIC in kafka-jumbo100[1-6].

@RobH
Server New Rack Switchport
kafka-jumbo1001 a4 39
kafka-jumbo1003 b2 35
kafka-jumbo1006 d7 36

Dec 10 2019, 10:44 PM · ops-eqiad, Operations, Analytics, User-Elukey
Jclark-ctr added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

sent TSR report after running onboard diagnostics that had faults for memory and psu1 & psu2 . TSR report showed no errors running more test..

Dec 10 2019, 9:39 PM · cloud-services-team (Hardware), Operations, ops-eqiad, DC-Ops, User-Zppix
Jclark-ctr closed T239217: Degraded RAID on dbstore1003 as Resolved.

Replaced Failed Drive

Dec 10 2019, 2:15 PM · Analytics, ops-eqiad, Operations

Dec 6 2019

Jclark-ctr added a comment to T239957: Degraded RAID on cloudelastic1002.

closing due to duplicate .

Dec 6 2019, 1:44 PM · Discovery-Search (Current work), Discovery, ops-eqiad, Operations

Dec 5 2019

Jclark-ctr added a comment to T230088: cloudelastic1002: SMART/disk error.

replaced failed drive

Dec 5 2019, 10:20 PM · ops-eqiad, DC-Ops, Operations, cloud-services-team (Kanban)
Jclark-ctr updated the task description for T235685: (Need by: 2020-03-02) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.
Dec 5 2019, 8:29 PM · cloud-services-team (Hardware), Operations, ops-eqiad
Jclark-ctr added a comment to T239365: Degraded RAID on an-worker1089.

Replaced failed drive

Dec 5 2019, 8:02 PM · Analytics, ops-eqiad, Operations
Jclark-ctr added a comment to T230088: cloudelastic1002: SMART/disk error.

@mathew.onip Disk arrived

Dec 5 2019, 6:53 PM · ops-eqiad, DC-Ops, Operations, cloud-services-team (Kanban)
Jclark-ctr added a comment to T236437: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet.

@jijiki If no surprises i could have them Racked by Dec 20th

Dec 5 2019, 5:42 PM · serviceops, Operations

Dec 4 2019

Jclark-ctr added a comment to T239569: cloudstore1008 crash - Memory error.

Finished replacement of DIMM_B2

Dec 4 2019, 10:45 PM · Operations, ops-eqiad, cloud-services-team (Kanban), Cloud-Services
Jclark-ctr added a comment to T239569: cloudstore1008 crash - Memory error.

Dimm has arrived

Dec 4 2019, 10:09 PM · Operations, ops-eqiad, cloud-services-team (Kanban), Cloud-Services