Page MenuHomePhabricator

codw pfw* serial connections problem
Closed, ResolvedPublic

Description

Author: ptshibamba

Description:

pfw1 is connected to scs-c1 port 33 and pfw2 is connected to scs-c1 port 34
once in the pmshell session of scs-c1 and select port 33 or 34 once you
powercycle the pfw* you can see it rebooting but once at the login prompte you
are not able to type anything.
first step:
use another cable to connecte pfw1 same problem
seconde step
unpluged the cable from pfw1 and connect it to pdu1-c8 it works : the problem
was not the cable
third step
unpluged the cable from asw-c8 and plug it to pfw1 did not work
step 4
reset one of the opengear that came from tampa, made a 3 ft cable to connect
pfw1 to the opengear withing the same rack, it works.

Event Timeline

rtimport raised the priority of this task from to Medium.Dec 18 2014, 2:19 AM
rtimport added a project: ops-codfw.
rtimport set Reference to rt8929.

ptshibamba wrote:

While trying to connect to the pfw's yesterday, we were able to get RX but not
TX. After a couple of hours troubleshooting yesterday I came up with a
temporary solution which was to to put in one of the opengear from Tampa in
rack 8 and use a short cable to connect the opengear to the pfws. This was
tested on pfw1 and it same to work with no problem. this was just a temporary
solution so that Faidon can have some work done.
After leaving site yesterday, this issue was stay in my mine because it did not
make sense .
After configuring the opengear this morning (scs-c8-codfw 10.193.0.20) with the
management users name and password, Faidon was able to access pfw1 and while I
was running some test on pfw2, pfw2 was not working with the solution that
works for pfw1. so my first approach as all technician will do was to verify
the physical layer. the cable was already verified but what we did not verify
was the physical console port. so I took the card out and verify the console
port on the card it same that the three last pins on the console port are lose.
i will have to verify also pfw1 to see if we are having the same problem.
i have attached a picture of the card from pfw2.


20141125_105359.jpg (2×4 px, 2 MB)

20141125_105149.jpg (2×4 px, 2 MB)

I'm really not certain what could have caused that kind of behavior (port works on other systems, but not on juniper gear, and the juniper gear works on other opengears, but not this one.

Papaul: Can you do a basic check of the firmware on the opengear and compare it to what is available for download for them? If not the same, update/flash the firmware on the opengear consoles that are in use in codfw (no the temp tampa one) and see if that fixes it.

RobH changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".
RobH set Security to None.

The firmware version on scs-c1-codfw is 3.9.1
The current version on the web site is 3.12.3
I update the firmware to the latest version and tested the connection scs-c1-codfw to pfw1 and pfw2, same problem can not connect.

As far as I remember from back then, this was debugged in the end as faulty pins on both of the serial ports of the SRXes. Multiple reports on the web confirmed that this was a hardware design fault affecting a lot of customers.

@Papaul, we should open a case with Juniper and ask for a replacement of those cards. I'm not sure if you know how to do this yet, so please coordinate with @Cmjohnson or @RobH.

ok will coordinate with Rob or Chris to do that

Serials for ease of reference:

pfw2001 = AJ5112AA0049

pfw2002 = AJ5112AA0042

I contact Juniper support, I will be receiving an email from them with the case number and what needs to be done from there.

@mark, @ Faidon, @ Jeff Green
I have received the two console cards for the SRX600. I need to coordinate with one of you to replaced the existing with the new one some times this week.

Thanks.

Anytime is fine, let me know if there are any problems.

Replacing the serial console card on the system caused the whole system to lose the initial set-up the reason being, the routing engine is part of the card. As discuss on IRC with Faidon, we need to plan better how we are going to replace those cards. The old cards are back in place for now.

I am boxing the two old routing engines to send back to Juniper. I have confirmation on IRC with Faidon that everything is working. I also send an email to the Juniper Tech working with me on this case to ask him to close the case.

Change 206157 had a related patch set uploaded (by Dzahn):
Removed scs-c8-codfw from DNS mgmt files

https://gerrit.wikimedia.org/r/206157

Change 206157 merged by Dzahn:
Removed scs-c8-codfw from DNS mgmt files

https://gerrit.wikimedia.org/r/206157