⚓ T172538 rack/setup/install labvirt10(19|20).eqiad.wmnet

	Subject	Repo	Branch	Lines +/-
	install params for labvirt10[19-20]	operations/puppet	production	+20 -1

RobH created this task.Aug 4 2017, 4:25 PM

RobH edited projects, added ops-eqiad; removed procurement.

RobH updated the task description. (Show Details)

• Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Aug 14 2017, 2:31 PM

• Cmjohnson moved this task from Up next to High Priority Task on the ops-eqiad board.Aug 16 2017, 4:44 PM

Change 372390 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding dns entries for labvirt1019-20 T172538

https://gerrit.wikimedia.org/r/372390

gerritbot added a project: Patch-For-Review.Aug 17 2017, 2:45 PM

Change 372390 merged by Cmjohnson:
[operations/dns@master] Adding dns entries for labvirt1019-20 T172538

https://gerrit.wikimedia.org/r/372390

• Cmjohnson updated the task description. (Show Details)Aug 30 2017, 12:50 AM

bios is setup, raid is configured to raid 10. switch ports need setup still

1019 -> b4 ge-4/0/33
1020 -> b7 ge-7/0/13

All the on-site work has been completed for labvirts1019-20. @RobH lmk if you want to take it from here

RobH claimed this task.Sep 11 2017, 6:59 PM

• Cmjohnson moved this task from High Priority Task to Blocked on the ops-eqiad board.Sep 12 2017, 2:37 PM

bd808 moved this task from Triage to Database on the Cloud-Services board.Sep 14 2017, 5:15 AM

Change 386266 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] install params for labvirt10[19-20]

https://gerrit.wikimedia.org/r/386266

Change 386266 merged by RobH:
[operations/puppet@production] install params for labvirt10[19-20]

https://gerrit.wikimedia.org/r/386266

So labvirt1020 is being annoying, it gave me ssl errors trying to pull up https (not cert errors, negotiation errors) and then finally let me in. Sending reboots to it works, but vsp doesnt work (times out after working for a short bit)

333-HPE RESTful API Error - Unable to communicate with iLO FW. BIOS
configuration resources may not be up-to-date.
Action: Reset iLO FW and reboot the server. If issue persists, AC power cycle
the server.

also listed:
312-HPE Smart Storage Battery 1 Failure - Communication with the battery

So need to try to upload new firmware to this system.

labvirt1019 installed the OS, but even though its set to boot first to disk, it is looping back into the installer.

RobH removed a project: Patch-For-Review.Oct 24 2017, 9:25 PM

RobH updated the task description. (Show Details)

RobH removed a subscriber: Pswaby.

both of these systems are now working and calling into puppet, ready for service implementation by Cloud-Services

Thank you @RobH :)

Mentioned in SAL (#wikimedia-operations) [2017-12-19T15:25:54Z] <chasemp> labvirt10[19|20] aptitude install linux-image-4.4.0-81-generic linux-image-extra-4.4.0-81-generic; sudo update-grub; /sbin/reboot T172538

bd808 added a project: cloud-services-team (Kanban).Jan 10 2018, 5:17 PM

The original request was:

Disks: 8T after RAID1 with a hardware raid controller

in /T162486 and I see these with:

/dev/mapper/tank-data xfs 5.1T 34M 5.1T 1% /srv

@Andrew any idea why? @Cmjohnson?

Mentioned in SAL (#wikimedia-operations) [2018-02-14T19:16:40Z] <andrewbogott> rebooting labvirt1019 so I can have a look at the raid setup, for T172538

Drive config on the HPs is annoying. The steps are:

-reboot
-during boot, ESC-9
-select System Configuration->Embedded RAID 1 : Smart Array P440ar Controller->Exit and launch HP Smart Storage Administrator(HPSSA)

This will appear to error out, but if you wait a few minutes you eventually get a prompt. The prompt responds to the commands documented here:

https://kallesplayground.wordpress.com/useful-stuff/hp-smart-array-cli-commands-under-esxi/

(Note that those commands begin with /opt/hp/hpssacli/bin/hpssacli which is unneeded in this context.)

It looks like we just need to rebuild the raids on these. That's more-or-less impossible to do remotely so I'll create subtasks for Chris.

• Marostegui unsubscribed.Feb 14 2018, 8:33 PM

in favor of T193264

bd808 moved this task from Inbox to Done on the cloud-services-team (Kanban) board.May 6 2018, 6:49 PM

• Bstorm added a subtask: T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020.Mar 6 2019, 3:37 PM

• Bstorm closed subtask T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020 as Resolved.May 22 2019, 5:11 PM

Status	Assigned	Task
		Unknown Object (Task)
Resolved	• chasemp	T172538 rack/setup/install labvirt10(19\|20).eqiad.wmnet
Duplicate	• Cmjohnson	T187373 Rebuild raids on labvirt1019 and 1020
Resolved	• Bstorm	T193264 Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020
		Unknown Object (Task)
Resolved	• Cmjohnson	T196507 Degraded RAID on cloudvirt1019
Resolved	• Cmjohnson	T194855 Degraded RAID on cloudvirt1020
Resolved	aborrero	T216353 toolsdb: firewalling changes for new setup (temporal mysql replication)
Declined	None	T216373 CloudVPS: run maintain-dbusers inside Toolforge
Declined	None	T208754 rename cloudvirt1019 and cloudvirt1020 to cloudvirtdb1001 and cloudvirtdb1002
Resolved	Jclark-ctr	T216749 Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet
Resolved	Halfak	T217922 Migrate Wikilabels from labsdb1004 to clouddb1002
Resolved	Halfak	T219563 Add a DNS alias for the wikilabels database (wikilabels.db.svc.eqiad.wmflabs)
Resolved	• Bstorm	T219652 Final migration of osmdb.eqiad.wmnet into Cloud VPS instances
Resolved	Jclark-ctr	T220144 Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet

rack/setup/install labvirt10(19|20).eqiad.wmnet
Closed, ResolvedPublic
Actions

Description

Details

Related Objects
Search...

Event Timeline

rack/setup/install labvirt10(19|20).eqiad.wmnetClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

rack/setup/install labvirt10(19|20).eqiad.wmnet
Closed, ResolvedPublic
Actions

Related Objects
Search...