Page MenuHomePhabricator

db1062 (s7 master eqiad) in a reboot cycle
Closed, ResolvedPublic

Description

 	2017-04-28T15:32:32-0500	NIC101	
The NIC Integrated 1 Port 1 network link is up.
	
 
 	  	2017-04-28T15:32:32-0500	LOG007	
The previous log entry was repeated 1 times.
	
 
 	  	2017-04-28T15:22:46-0500	SYS1003	
System CPU Resetting.
	
 
 	  	2017-04-28T15:22:17-0500	CPU9000	
An OEM diagnostic event occurred.
	
 
 	  	2017-04-28T15:22:17-0500	PCI1320	
A bus fatal error was detected on a component at bus 0 device 2 function 2.
	
 
 	  	2017-04-28T15:22:16-0500	PCI1320	
A bus fatal error was detected on a component at bus 3 device 0 function 0.
	
 
 	  	2017-04-28T15:17:44-0500	NIC101	
The NIC Integrated 1 Port 1 network link is up.
	
 
 	  	2017-04-28T15:16:13-0500	SYS1003	
System CPU Resetting.
	
 
 	  	2017-04-28T15:15:44-0500	CPU9000	
An OEM diagnostic event occurred.
	
 
 	  	2017-04-28T15:15:44-0500	PCI1320	
A bus fatal error was detected on a component at bus 0 device 2 function 2.
	
 
 	  	2017-04-28T15:15:43-0500	PCI1320	
A bus fatal error was detected on a component at bus 3 device 0 function 0.
	
 
 	  	2017-04-28T15:11:27-0500	NIC101	
The NIC Integrated 1 Port 1 network link is up.
	
 
 	  	2017-04-28T15:09:53-0500	SYS1003	
System CPU Resetting.
	
 
 	  	2017-04-28T15:09:24-0500	CPU9000	
An OEM diagnostic event occurred.
	
 
 	  	2017-04-28T15:09:24-0500	PCI1320	
A bus fatal error was detected on a component at bus 0 device 2 function 2.
	
 
 	  	2017-04-28T15:09:23-0500	PCI1320	
A bus fatal error was detected on a component at bus 3 device 0 function 0.
	
 
 	  	2017-04-28T15:08:06-0500	NIC101	
The NIC Integrated 1 Port 1 network link is up.
	
 
 	  	2017-04-28T15:06:36-0500	SYS1003	
System CPU Resetting.
	
 
 	  	2017-04-28T15:06:07-0500	CPU9000	
An OEM diagnostic event occurred.
	
 
 	  	2017-04-28T15:06:06-0500	PCI1320	
A bus fatal error was detected on a component at bus 0 device 2 function 2.
	
 
 	  	2017-04-28T15:06:05-0500	PCI1320	
A bus fatal error was detected on a component at bus 3 device 0 function 0.
	
 
 	  	2017-04-28T15:04:06-0500	NIC101	
The NIC Integrated 1 Port 1 network link is up.
	
 
 	  	2017-04-28T15:02:34-0500	SYS1003	
System CPU Resetting.
	
 
 	  	2017-04-28T15:02:05-0500	CPU9000	
An OEM diagnostic event occurred.
	
 
 	  	2017-04-28T15:02:05-0500	PCI1320	
A bus fatal error was detected on a component at bus 0 device 2 function 2.
	
 
 	  	2017-04-28T15:02:03-0500	PCI1320	
A bus fatal error was detected on a component at bus 3 device 0 function 0.
	
 
 	  	2017-04-28T14:56:29-0500	NIC101	
The NIC Integrated 1 Port 1 network link is up.
	
 
 	  	2017-04-28T14:54:55-0500	SYS1003	
System CPU Resetting.
	
 
 	  	2017-04-28T14:54:26-0500	CPU9000	
An OEM diagnostic event occurred.
	
 
 	  	2017-04-28T14:54:25-0500	PCI1320	
A bus fatal error was detected on a component at bus 0 device 2 function 2.
	
 
 	  	2017-04-28T14:54:24-0500	PCI1320	
A bus fatal error was detected on a component at bus 3 device 0 function 0.

Event Timeline

jcrespo created this task.Apr 28 2017, 4:17 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 28 2017, 4:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Moving eqiad master service back to db1041.

[18:18:16] <cmjohnson1> jynus: raid battery
[18:23:13] <cmjohnson1> jynus: i have a spare
[18:23:19] <cmjohnson1> swapping it now

Cmjohnson renamed this task from db2062 (s7 master eqiad) in a reboot cycle to db1062 (s7 master eqiad) in a reboot cycle.Apr 28 2017, 4:32 PM
Cmjohnson moved this task from Backlog to High Priority Task on the ops-eqiad board.

No errors on the last boot, but I would like to confirm by restarting it once more. I am doing that.

Mentioned in SAL (#wikimedia-operations) [2017-04-28T16:36:11Z] <jynus> restarting db1062 once more T164092

Mentioned in SAL (#wikimedia-operations) [2017-04-28T17:08:59Z] <jynus> restarting replication on all nodes on s7-eqiad T164092

Moving eqiad master service back to db1041.

This might be confusing, should we specify that it was never done?

You already did- I was doing it when Chris asked me to wait on IRC.

jcrespo closed this task as Resolved.Apr 29 2017, 1:01 PM
jcrespo claimed this task.

No longer ongoing.