Page MenuHomePhabricator

ripe-atlas-codfw is down
Open, LowestPublic

Description

Hi everybody,

the ripe-atlas-codfw anchor is down since 2020-11-10 at around 21 UTC:

https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-target_site=All&var-ip_version=All&var-country_code=All&var-asn=All&from=1605026694423&to=1605056765545

I can see the following on asw-a-codfw:

Nov 10 20:54:44  asw-a-codfw fpc1 [EX-BCM PIC] ex_bcm_linkscan_handler: Link 18 DOWN
Nov 10 20:54:44  asw-a-codfw rpd[1947]: EVENT <UpDown> ge-1/0/4.0 index 587 <Broadcast Multicast> address #0 dc.38.e1.d4.1b.7
Nov 10 20:54:44  asw-a-codfw rpd[1947]: EVENT <UpDown> ge-1/0/4 index 1125 <Broadcast Multicast> address #0 dc.38.e1.d4.1b.7
Nov 10 20:54:44  asw-a-codfw rpd[1947]: STP handler: IFD =NULL, op=change, state=Discarding, Topo change generation=0
Nov 10 20:54:44  asw-a-codfw rpd[1947]: *STP Change*, notify to other modules
Nov 10 20:54:44  asw-a-codfw fpc1 [EX-BCM PIC] ex_bcm_pic_ifd_config: ge-1/0/4, enable - 1
Nov 10 20:54:44  asw-a-codfw mib2d[15883]: SNMP_TRAP_LINK_DOWN: ifIndex 757, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-1/0/4
Nov 10 20:54:44  asw-a-codfw rpd[1947]: STP handler: IFD =NULL, op=change, state=Discarding, Topo change generation=0
Nov 10 20:54:44  asw-a-codfw rpd[1947]: *STP Change*, notify to other modules

As far as I can see from other tasks, this will probably require @Papaul to check onsite (powercycle, cables, etc..)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
elukey triaged this task as High priority.Nov 11 2020, 7:40 AM

power cycle device, checked cable, swapped cable device is still showing down

ayounsi renamed this task from ripe-atlast-codfw is down to ripe-atlas-codfw is down.Nov 16 2020, 4:12 PM
ayounsi assigned this task to faidon.
ayounsi added a subscriber: ayounsi.

I think Faidon is the person who knows the most about the Atlas :)
Feel free to re-assign as needed.

CDanis added subscribers: faidon, CDanis.

Papaul, could you please attach atlas-codfw to one of the SCS servers so we can take a look via serial console? Thanks!

Connected the device on scs-a1 on port 47 still no connection to serial

Thanks! Can we try you powercycling it while one of us (either you or myself, at your preference) is watching the serial console?

Today we tried powercycling the anchor while I was watching on serial console. It didn't output a thing. As far as I can tell, we need replacement hardware.

Thanks - can you file a procurement request to that effect (& then resolve this task)?

CDanis mentioned this in Unknown Object (Task).

Filed T269046

Can we have a decom task for the faulty device? (switch port is still alerting as being down)

@faidon do we have some documentation on the console configuration for the RIPE?

  • console baud rate
  • Type of cable to use to connect to the console

I tried to use a Cisco console cable and a DB9 to RJ45 adapter no luck on the new RIPE

I'm pretty sure the baud rate is 19200

Not sure about the cable type

I tried both 9600 and 19200 on both cable it didn't work

I believe the Atlas is a PCEngines APU, so you'll need a null modem cable or adapter (RXD->TXD, TXD->RXD, etc.) If this is a Cisco rollover cable, it would do the trick, but your DB9<->RJ45 adapter should not be a crossover adapter, as that would swap crossover twice end-to-end and cancel each other out :)

Baud rate for the BIOS as the system boots is 115200 8n1. Note that unlike our Dells, its BIOS takes all of 2 seconds to boot or something.

I don't know the specifics of the Atlas - did it come with software preinstalled? I'd guess not, and that we'll need to flash it, right? In that case nothing would show up on the console past boot/BIOS.

Thank you for the information. I will try to work on it again when i am back on site tomorrow.

@CDanis
The old device is already set to decom in netbox. let me know when the new device is online so i can offline this device.

Papaul lowered the priority of this task from High to Lowest.Mar 4 2021, 4:18 PM