db2044 is currently m2 codfw master, this host has a broken disk and has had many disks failures in the past. It will be decommissioned.
Let's replace it with db2063.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T88445 MediaWiki active/active datacenter investigation and work (tracking) | |||
Resolved | Marostegui | T220170 Address Database hardware infrastructure blockers on datacenter switchover & multi-dc deployment | |||
Declined | Marostegui | T230459 Replace db2044 with db2063 |
Event Timeline
Change 530034 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Remove db2063 from config
Change 530034 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Remove db2063 from config
Mentioned in SAL (#wikimedia-operations) [2019-08-14T07:08:03Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db2063 from config T230459 (duration: 00m 48s)
Mentioned in SAL (#wikimedia-operations) [2019-08-14T07:09:05Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db2063 from config T230459 (duration: 00m 47s)
Change 530035 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db2063 from s2 to m2
Change 530035 merged by Marostegui:
[operations/puppet@production] mariadb: Move db2063 from s2 to m2
Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:
['db2063.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201908140717_marostegui_75740.log.
Completed auto-reimage of hosts:
['db2063.codfw.wmnet']
Of which those FAILED:
['db2063.codfw.wmnet']
I have been trying to PXE boot this host but it has been impossible.
Even though I have manually set the PXE from the ipmitool locally it is still not working:
root@db2063:~# ipmitool chassis bootparam get 5 Boot parameter version: 1 Boot parameter 5 is valid/unlocked Boot parameter data: 0004000000 Boot Flags : - Boot Flag Invalid - Options apply to only next boot - BIOS PC Compatible (legacy) boot - Boot Device Selector : Force PXE - Console Redirection control : System Default - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default) - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST
After a reboot it keeps booting from disk despite of the above options.
The ipmi tool fails remotely, with:
Error: Unable to establish IPMI v2 / RMCP+ session
I have followed all the steps at https://wikitech.wikimedia.org/wiki/Management_Interfaces including re-seating the card from the mgmt interface without any luck.
Also tried to jump into the boot menu while the host boots, but it doesn't get into and continues to boot from disk.
@Papaul could you manually reset the idrac by powering the host down and doing a power drain (https://wikitech.wikimedia.org/wiki/Management_Interfaces#Power_drain_the_host) and upgrading the idrac's firmware to see if I can manage to install it?
Thanks
Forgot to mention that this host is not in use and it is downtimed, so this onsite maintenance can be done anytime without heads-up to the DBAs
ssh issue
papaul@papaulpc:~$ ssh root@db2063.mgmt.codfw.wmnet Unable to negotiate with UNKNOWN port 65535: no matching cipher found. Their offer: aes256-cbc,aes128-cbc,3des-cbc
I have to have run ssh command with -c aes256-cbc to access mgmt
papaul@papaulpc:~$ ssh -c aes256-cbc root@db2063.mgmt.codfw.wmnet root@db2063.mgmt.codfw.wmnet's password:
After the upgrade and resting the ILO I was able to access the mgmt without the -c aes256-cbc
I did the test on 6 other db servers same generation (db206[124567] i am getting also the same ssh error
Not sure what is the status of this, considering T228258 exists. db2063 mysql is down, but I ain't touching it just to prevent breaking something.
This host is still failing with the idrac not being able to work.
I think I will just decommission this one and pick another one, no need to waste more time with it.