User Details
- User Since
- Dec 18 2014, 3:39 PM (482 w, 4 d)
- Availability
- Available
- LDAP User
- Papaul
- MediaWiki User
- Unknown
Today
Yesterday
Zeroize done on asw-a1
setups:
- delete the member from the master
- Disconnect both cable going to asw-a2 and asw-a7
- while login into to console run the zeroize command
@JMeybohm hello is there anything DC-ops need to do on this task?
We have been having this issue a long time ago with this same server so I always close the task when i can the inbound interface error on this server.
The error is not the cable, since the cable was replaced already once and not the interface because we changed also the interface once. see https://T330218.
According to the discussion with @jcrespo in the pass, this server is getting a lot of traffic maybe the 1G NIC is not capable of handling that amount off traffic.
Since this error has no impact on the server, I am resolving this task .If you have any questions please fell free to re-open
Jan 31 2024
@Marostegui yes we will put some in row C and D as well. Just the once in row A and B will be connected to 10G is has 10G NIC.
Thanks
Jan 30 2024
@Marostegui if those hosts have a 10G NIC you don't have a problem for those going into row A and B to connect them to a 10G interface?
We did the last server move today. Thanks for All
Jan 24 2024
Today's work is complete. The only node left to relocation is gitlab2002. Service ops will get back with us with a day for sometimes next week. All old ports in netbox and on asw-a1-codfw removed.
@klausman thank you
@Marostegui thank you.
Jan 23 2024
Jan 22 2024
@ssingh can we close this task?
@cmooney can we get those 2 hosts back in decom? Thanks
@Marostegui thank you @cmooney i will again take a look at it thanks
Jan 18 2024
@Marostegui thank you
linecard removed from cr2 and deleted from netbox
Jan 17 2024
@RobH In the process of creating the RMA for the linecard in FPC0 on cr2-codfw the Juniper team did let me know that the linecard has only technical support and no hardware support for it so impossible to RMA it.
Jan 16 2024
Hello Papaul
After moving the lincard in cr1, we are seeing the error now in cr1. I email Support to request again a replacement
Link removed
Jan 12 2024
I will go for option 2 but I will have to do that next week since today is Friday. Thanks
Jan 11 2024
Hello Papaul
Jan 10 2024
@ayounsi see below email from Juniper support
Case Number 2024-0110-046148 Case Type Tech Priority P2 - High Platform MX480 Status Dispatch
@cmooney link moved to ssw1-a8
@ayounsi will do
Jan 9 2024
@Jhancock.wm what i did for the provision cookbook to PASSws to reset the IDRAC password and re-run the cookbook again
@Dzahn the host is backup .
Jan 8 2024
Waiting to received the replacement disk before closing the task.
mainboard repalced by @Jhancock.wm . She is running the provision cookbook now.
@cmooney xe-0/0/26
ganeti2033 on xe-0/0/8 on lsw1-b7-codfw
ganeti2034 on xe-0/0/12 on lsw1-a4-codfw
@Clement_Goubert thanks will work on it in a minute
Jan 4 2024
Request replacement
Your dispatch shipped on 1/3/2024 4:20 PM
Jan 3 2024
disk replaced
@colewhite disk replaced
Create Dispatch: Success
You have successfully submitted request SR182660280.
After swapping the CPU and DIMM now i am getting
CPU 2 MEM012 VPP PG voltage is outside of range. Wed 03 Jan 2024 17:43:07 CPU 1 MEM012 VPP PG voltage is outside of range.
and the server is no longer powering up
i will put in an order for Dell to send us a main board
Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. Sun 31 Dec 2023 19:43:14 Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. Sun 31 Dec 2023 19:43:14 CPU 1 machine check error detected. Sun 31 Dec 2023 19:43:14 CPU 1 machine check error detected.
@colewhite unfortunately this serer is out of warranty since 2023-11-18. You have 1 options
1- See if we have some 1.92 TB SSD's from decom nodes that we can use
2- Purchase 1.92TB SSD's
We know about this
Dec 20 2023
Dec 19 2023
This was a false alert it is a new server that was half way installed. I just finished the install now so resolving this task for now.
Dec 13 2023
@Vgutierrez I had a meeting with network and automation team today. We discussed about this issue and we same to not know the really cause of this issue. We decided we let traffic take back this server and put it in service and we can still track this issue @ T350179.
@Vgutierrez please give me until the end of today. Thank you
Dec 12 2023
Dec 9 2023
@Jhancock.wm on 2002 try to check network possible re-run the switch config cookbook
Dec 8 2023
Servers were missing in site.pp and 2006 was missing in preseed.yaml file I send a patch to fix this . You an try again the re-image
https://gerrit.wikimedia.org/r/c/operations/puppet/+/981544
@Jhancock.wm i send a patch to fix it. you can resume the install
https://gerrit.wikimedia.org/r/c/operations/puppet/+/981413
Dec 7 2023
@Jhancock.wm did you read my comment on Wed, Dec 6, 2:53 PM?
@Volans did the test 4 times. the first 2 times the server did pxe boot but the last 2 times it didn't