Page MenuHomePhabricator

mr1-esams i2c syslog flood
Closed, ResolvedPublic

Description

From the logs it looks exactly like T238174.

Logs are flooded with:

ayounsi@mr1-esams> show log messages| last 
Jan  7 11:13:48  mr1-esams /kernel: SMB read failed addr 0x51, off 0x74, group 0x3, flags 0x1, len 0x2
Jan  7 11:13:48  mr1-esams /kernel: SMB read failed addr 0x51, off 0x74, group 0x3, flags 0x1, len 0x2
Jan  7 11:13:48  mr1-esams flowd_octeon_hm: flowd_srx_i2c_read: Reading i2c data, dev 0x51 group 0x3 (ret = -1) [No such process] retry attempt 2
Jan  7 11:13:48  mr1-esams /kernel: SMB read failed addr 0x51, off 0x74, group 0x3, flags 0x1, len 0x2
Jan  7 11:13:48  mr1-esams flowd_octeon_hm: flowd_srx_i2c_read: Reading i2c data, dev 0x51 group 0x3 (ret = -1) [No such process] retry attempt 3
Jan  7 11:13:48  mr1-esams /kernel: SMB read failed addr 0x51, off 0x74, group 0x3, flags 0x1, len 0x2
Jan  7 11:13:48  mr1-esams flowd_octeon_hm: flowd_srx_i2c_read: Reading i2c data, dev 0x51 group 0x3 (ret = -1) [No such process] retry attempt 4
Jan  7 11:13:48  mr1-esams /kernel: SMB read failed addr 0x51, off 0x74, group 0x3, flags 0x1, len 0x2
Jan  7 11:13:48  mr1-esams flowd_octeon_hm: flowd_srx_i2c_read: Reading i2c data, dev 0x51 group 0x3 (ret = -1) [No such process] retry attempt 5
Jan  7 11:13:48  mr1-esams flowd_octeon_hm: flowd_srx_i2c_read: Failed reading i2c data, dev 0x51 group 0x3 (ret = -1) [No such process] retry attempt 6
Jan  7 11:13:48  mr1-esams JBCM(0/0):jbcm_sfp_eeprom_read: read from i2c failed
Jan  7 11:13:48  mr1-esams last message repeated 2 times

Opened JTAC task 2020-0107-0129.

Will move to procurement if a new RMA is needed.

Related Objects

StatusSubtypeAssignedTask
Resolvedayounsi

Event Timeline

ayounsi triaged this task as Medium priority.Jan 7 2020, 11:20 AM
ayounsi created this task.
ayounsi created this object in space Restricted Space.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ayounsi shifted this object from the Restricted Space space to the S1 Public space.
ayounsi removed a project: procurement.
ayounsi updated the task description. (Show Details)

JTAC recommends to upgrade to the current Junos recommended, 18.2R3-S2.9.

I copied it over and validated it:

ayounsi@mr1-esams> request system software validate /var/tmp/junos-srxsme-18.2R3-S2.9.tgz    
Checking compatibility with configuration
Initializing...
Verified manifest signed by PackageProductionEc_2018 method ECDSA256+SHA256
Using /var/tmp/junos-srxsme-18.2R3-S2.9.tgz
Checking junos requirements on /
Available space: 5169908 require: 365914
Saving boot file package in /var/sw/pkg/junos-boot-srxsme-18.2R3-S2.9.tgz
veriexec: cannot update veriexec for /cf/var/validate/c/junos/var/jailetc/php_mod.ini: No such file or directory
veriexec: cannot update veriexec for /cf/var/validate/c/junos/var/jailetc/mime.types: No such file or directory
veriexec: cannot update veriexec for /cf/var/validate/c/junos/usr/lib/libpsu.so.3: Too many links
veriexec: cannot update veriexec for /cf/var/validate/c/junos/usr/lib/libyaml.so.3: Too many links
veriexec: cannot update veriexec for /cf/var/validate/c/junos/usr/lib/libext_db.so.3: Too many links
veriexec: cannot update veriexec for /cf/var/validate/c/junos/usr/telemetry/na-mqttd/na-mqtt.conf: No such file or directory
Verified manifest signed by PackageProductionEc_2019 method ECDSA256+SHA256
Hardware Database regeneration succeeded
Validating against /config/juniper.conf.gz
mgd: commit complete
Validation succeeded
Validating against /config/rescue.conf.gz
mgd: commit complete
Validation succeeded

Next step is to do the actual upgrade.

faidon mentioned this in Unknown Object (Task).Jan 17 2020, 12:12 PM

Mentioned in SAL (#wikimedia-operations) [2020-01-21T19:39:48Z] <XioNoX> mr1-esams> request system software add /var/tmp/junos-srxsme-18.2R3-S2... - T242097

Mentioned in SAL (#wikimedia-operations) [2020-01-21T19:55:50Z] <XioNoX> restart mr1-esams for software upgrade - T242097

Errors are still there...

I asked for the next recommended steps.
I can imagine it's either another RMA or to try a new optic: https://www.fs.com/products/13272.html

Edit:

I see you are using the same SFP you were using on the previous box reporting similar logs.
Is there a chance to test other SFP to rule out any possible SFP issue?

RobH mentioned this in Unknown Object (Task).Jan 21 2020, 8:37 PM
RobH added a subtask: Unknown Object (Task).

Next steps:

  • @RobH and @wiki_willy to order another optic (or two) via T243335 .
  • @RobH to file remote hands request with iron mountain, for when the optics arrive
    • remote hands include swapping existing optic in mr1-esams with one of the two replacements
    • remote hands include putting the spare optic in another port (defined by @ayounsi) in a switch for storage.
  • @ayounsi to review mr1-esams task (this one) post optic swap to investigate for errors.
ayounsi renamed this task from mr1-esams RMA (2020 edition) to mr1-esams i2c syslog flood.Jan 21 2020, 8:47 PM

Screenshot_2020-03-17 Network overview - Kibana.png (299×488 px, 18 KB)

Problem solved after remote hands replaced the optic.

RobH closed subtask Unknown Object (Task) as Resolved.Mar 30 2020, 10:28 PM