Page MenuHomePhabricator

Investigate Junos vmhost snapshot
Closed, ResolvedPublic

Description

The new Junos architecture (vmhost) has feature to copy the running partition/disk to the backup one (when available).

See https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/request-vmhost-snapshot.html

We need to investigate how it works exactly, where we can use it, and use it.
It would not have prevented cr3-eqsin to reboot, but possibly rebooted to a working state.

Event Timeline

ayounsi triaged this task as High priority.Jul 5 2020, 7:10 PM
ayounsi created this task.
Restricted Application added a project: Operations. · View Herald TranscriptJul 5 2020, 7:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2020-07-06T08:44:38Z] <XioNoX> cr3-ulsfo> request vmhost snapshot - T257153

Mentioned in SAL (#wikimedia-operations) [2020-07-06T08:51:45Z] <XioNoX> cr1-codfw> request vmhost snapshot routing-engine both - T257153

Works as expected at least on single RE devices:

cr3-ulsfo> show vmhost snapshot 
UEFI 	Version: CBEP_P_SUM1_00.13.01

Secondary Disk, Snapshot Time: Wed Sep 19 23:22:47 UTC 2018

Version: set p
VMHost Version: 3.1485
VMHost Root: vmhost-x86_64-17.4R1-20171206_1335_builder
VMHost Core: vmhost-core-x86_64-17.4R1-20171213_1840_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.4R1.16

Version: set b
VMHost Version: 3.1539
VMHost Root: vmhost-x86_64-17.4R2-20180803_0115_builder
VMHost Core: vmhost-core-x86_64-17.4R2-20180816_1416_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.4R2.4
cr3-ulsfo> request vmhost snapshot    
warning: Existing data on the target may be lost
Proceed ? [yes,no] (no) yes 

warning: Proceeding with vmhost snapshot
Current root details, 		Device sda, Label: jrootp_P, Partition: sda3
Snapshot admin context from current boot disk to target disk ...
Proceeding with snapshot on secondary disk
Mounting device in preparation for snapshot...
Cleaning up target disk for snapshot ...
Creating snapshot on target disk from current boot disk ...
Snapshot created on secondary disk.
Software snapshot done
cr3-ulsfo> show vmhost snapshot 
UEFI 	Version: CBEP_P_SUM1_00.13.01

Secondary Disk, Snapshot Time: Mon Jul  6 08:46:46 UTC 2020

Version: set p
VMHost Version: 3.1836
VMHost Root: vmhost-x86_64-18.2R3-S3-20200312_0539_builder
VMHost Core: vmhost-core-x86-64-18.2R3-S3.11
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-18.2R3-S3.11

Version: set b
VMHost Version: 3.1834
VMHost Root: vmhost-x86_64-18.2R3-20191107_2101_builder
VMHost Core: vmhost-core-x86-64-18.2R3-S2.9
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-18.2R3-S2.9
ayounsi added a subscriber: akosiaris.EditedJul 6 2020, 9:06 AM

and dual RE:

cr1-codfw> show vmhost snapshot invoke-on all-routing-engines 
re0:
--------------------------------------------------------------------------
UEFI 	Version: NGRE_v00.53.00.01

Secondary Disk, Snapshot Time: <fresh install>

Version: set p
VMHost Version: 3.1455
VMHost Root: vmhost-x86_64-17.3R3-S2-20180920_1037_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S2-20181106_1344_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S2.2

Version: set b
VMHost Version: 3.1455
VMHost Root: vmhost-x86_64-17.3R3-S2-20180920_1037_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S2-20181106_1344_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S2.2

re1:
--------------------------------------------------------------------------
UEFI 	Version: NGRE_v00.53.00.01
                                        
Secondary Disk, Snapshot Time: <fresh install>

Version: set p
VMHost Version: 3.1455
VMHost Root: vmhost-x86_64-17.3R3-S2-20180920_1037_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S2-20181106_1344_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S2.2

Version: set b
VMHost Version: 3.1455
VMHost Root: vmhost-x86_64-17.3R3-S2-20180920_1037_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S2-20181106_1344_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S2.2
{master}
re0.cr1-codfw> request vmhost snapshot routing-engine both 
re0:
--------------------------------------------------------------------------
warning: Existing data on the target may be lost
warning: Proceeding with vmhost snapshot
Current root details, 		Device sda, Label: jrootb_P, Partition: sda4
Snapshot admin context from current boot disk to target disk ...
Proceeding with snapshot on secondary disk
Mounting device in preparation for snapshot...
Cleaning up target disk for snapshot ...
Creating snapshot on target disk from current boot disk ...
Snapshot created on secondary disk.
Software snapshot done

re1:
--------------------------------------------------------------------------
warning: Existing data on the target may be lost
warning: Proceeding with vmhost snapshot
Current root details, 		Device sda, Label: jrootb_P, Partition: sda4
Snapshot admin context from current boot disk to target disk ...
Proceeding with snapshot on secondary disk
Mounting device in preparation for snapshot...
Cleaning up target disk for snapshot ...
Creating snapshot on target disk from current boot disk ...
Snapshot created on secondary disk.
Software snapshot done
{master}
re0.cr1-codfw> show vmhost snapshot invoke-on all-routing-engines    
re0:
--------------------------------------------------------------------------
UEFI 	Version: NGRE_v00.53.00.01

Secondary Disk, Snapshot Time: Mon Jul  6 08:53:48 UTC 2020

Version: set p
VMHost Version: 3.1455
VMHost Root: vmhost-x86_64-17.3R3-S2-20180920_1037_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S2-20181106_1344_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S2.2

Version: set b
VMHost Version: 3.1463
VMHost Root: vmhost-x86_64-17.3R3-S3-20190110_0023_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S7-20191225_0240_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S7.2

re1:
--------------------------------------------------------------------------
UEFI 	Version: NGRE_v00.53.00.01

Secondary Disk, Snapshot Time: Mon Jul  6 08:55:58 UTC 2020

Version: set p
VMHost Version: 3.1455
VMHost Root: vmhost-x86_64-17.3R3-S2-20180920_1037_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S2-20181106_1344_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S2.2

Version: set b
VMHost Version: 3.1463
VMHost Root: vmhost-x86_64-17.3R3-S3-20190110_0023_builder
VMHost Core: vmhost-core-x86_64-17.3R3-S7-20191225_0240_builder
kernel: 3.10.100-ovp-rt110-WR6.0.0.31_preempt-rt
Junos Disk: junos-install-mx-x86-64-17.3R3-S7.2

Here the set b has the same version as the currently running Junos, but not the set p.

The system boots from the current set, while the alternate set contains the previous version of the software boot image.

@akosiaris found https://www.juniper.net/documentation/en_US/junos/topics/topic-map/vm-host-overview.html which explain things in details.

I also added those new steps to the Junos upgrade doc: https://wikitech.wikimedia.org/wiki/Juniper_router_upgrade#Cleanup

Next step is to run the snapshot command on all the devices using vmhost.

Mentioned in SAL (#wikimedia-operations) [2020-07-07T08:12:17Z] <XioNoX> cr4-ulsfo> request vmhost snapshot - T257153

Mentioned in SAL (#wikimedia-operations) [2020-07-07T08:15:22Z] <XioNoX> cr3-knams> request vmhost snapshot - T257153

Mentioned in SAL (#wikimedia-operations) [2020-07-07T08:17:26Z] <XioNoX> cr2-eqdfw> request vmhost snapshot - T257153

Mentioned in SAL (#wikimedia-operations) [2020-07-07T08:19:52Z] <XioNoX> cr2-eqord> request vmhost snapshot - T257153

Mentioned in SAL (#wikimedia-operations) [2020-07-07T08:22:46Z] <XioNoX> cr2-eqsin> request vmhost snapshot - T257153

Mentioned in SAL (#wikimedia-operations) [2020-07-07T08:26:13Z] <XioNoX> cr2-codfw> request vmhost snapshot routing-engine both - T257153

Left to do: cr1/2-eqiad.

Mentioned in SAL (#wikimedia-operations) [2020-07-07T13:24:09Z] <XioNoX> cr1-eqiad> request vmhost snapshot routing-engine both - T257153

Mentioned in SAL (#wikimedia-operations) [2020-07-07T13:29:07Z] <XioNoX> cr2-eqiad> request vmhost snapshot routing-engine both - T257153

ayounsi closed this task as Resolved.Jul 7 2020, 1:35 PM

All done!