VMware Shutdown - vSAN Cluster

In the following examples, a VMware vSAN Cluster is protected by a Single or Advanced UPS Configuration. vCenter Server is running on a Virtual Machine.

Recommended Deployment

PowerChute can be located on a physical Windows machine outside the vSAN Cluster or deployed as a VM inside the vSAN Cluster. The vCenter Server account configured in PowerChute Network Shutdown must have Administrator permissions on vCenter Server and on each of the ESXi hosts being managed by PowerChute. This can be an Active Directory account or a local user account. For more information see Active Directory VMware Configuration. The vSAN hosts must have SSH access enabled for the vSAN cluster preparation script to run correctly and successfully.

Example 1: Single UPS, 2-Node Stretch Cluster with Witness Appliance and Management Host

 

Setup

When a critical UPS event, such as UPS on Battery occurs the following sequence is triggered:

Shutdown Sequence

  1. PowerChute reports that the UPS is on battery.

  2. Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.

  3.  PowerChute starts VM and vApp shutdown on each Host.

  4. PowerChute gracefully shuts down the vCenter Server VM in the Medium priority group followed by the Active Directory controller VM in the High priority group during VM Shutdown.

  5. VM/vApp Shutdown durations elapse.

  6. PowerChute starts executing the shutdown command file.

  7. Shutdown command file duration elapses and PowerChute starts a Maintenance Mode task on the first vSAN host, and waits the Delay Host Maintenance Mode Duration. If there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  8. Once the Delay Host Maintenance Mode Duration has elapsed, PowerChute will start a Maintenance Mode task on the next vSAN host, and waits the Delay Host Maintenance Mode Duration. If there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  9. PowerChute starts a Maintenance Mode task on the Witness Host, waits the Delay Host Maintenance Mode Duration and shuts down the host.

  10. PowerChute starts a Maintenance Mode task on the Management Host, waits the Delay Host Maintenance Mode Duration of X seconds  and shuts down the host.

  11. OS shutdown sequence starts on the PowerChute physical machine. After a 70 second delay, the OS starts to shut down.

  12. UPS waits for the duration that is greatest of Low Battery Duration/Maximum Required Delay (Non-Outlet Aware UPS's) or the Outlet Group Power Off Delay.

  13. After this delay, a further non-configurable two-minute delay is counted down.

  14. UPS turns off after the user-configurable Shutdown Delay time has elapsed or the Outlet Group turns off after the power off Delay elapses.

NOTE: In a vSAN configuration, Witness and Management hosts will get placed into Maintenance Mode and shut down after Cluster hosts if Delay Maintenance Mode is enabled. vSAN Hosts are placed into Maintenance mode using “No data migration” for the vSAN data evacuation mode.

Example 2: Single UPS, 2-Node Stretch Cluster with Witness Appliance and Management Host, vCLS VMs in cluster, Disable HA on Shutdown enabled

 

Setup

When a critical UPS event, such as UPS on Battery occurs the following sequence is triggered:

Shutdown Sequence

  1. PowerChute reports that the UPS is on battery.

  2. Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.

  3. PowerChute disables vSphere Cluster Services (vCLS) and High Availability (HA) on the cluster.

  4. After 3 minutes (Disable vSphere Cluster Services (vCLS) Duration = 180, Disable HA Duration = 20), PowerChute starts VM and vApp shutdown on each Host.

  5. PowerChute gracefully shuts down the vCenter Server VM in the Medium priority group followed by the Active Directory controller VM in the High priority group during VM Shutdown.

  6. PowerChute performs cluster stop operations:

    1. PowerChute disables cluster member updates by executing the esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListUpdates SSH command on all protected vSAN nodes.

    2. PowerChute prepares the vSAN cluster for shutdown by executing the python /usr/lib/vmware/vsan/bin/reboot_helper.py prepare command on 1 of the vSAN nodes.

  7. PowerChute starts executing the shutdown command file.

  8. Shutdown command file duration elapses and PowerChute starts a Maintenance Mode task on the first vSAN host, and waits the Delay Host Maintenance Mode Duration. If there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  9. Once the Delay Host Maintenance Mode Duration has elapsed, PowerChute will start a Maintenance Mode task on the next vSAN host, and waits the Delay Host Maintenance Mode Duration. If there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  10. PowerChute starts a Maintenance Mode task on the Witness Host, waits the Delay Host Maintenance Mode Duration and shuts down the host.

  11. PowerChute starts a Maintenance Mode task on the Management Host, waits the Delay Host Maintenance Mode Duration of X seconds and shuts down the host.

  12. OS shutdown sequence starts on the PowerChute physical machine. After a 70 second delay, the OS starts to shut down.

  13. UPS waits for the duration that is greatest of Low Battery Duration/Maximum Required Delay (Non-Outlet Aware UPS's) or the Outlet Group Power Off Delay.

  14. After this delay, a further non-configurable two-minute delay is counted down.

  15. UPS turns off after the user-configurable Shutdown Delay time has elapsed or the Outlet Group turns off after the power off Delay elapses.

NOTE: In a vSAN configuration, Witness and Management hosts will get placed into Maintenance Mode and shut down after Cluster hosts if Delay Maintenance Mode is enabled.vSAN Hosts are placed into Maintenance mode using “No data migration” for the vSAN data evacuation mode.

Example 3: Advanced UPS Configuration, 2 Node vSAN Stretch Cluster with Witness Appliance and Management Host

 

NOTE: PowerChute could be installed on a VM on the Management Host.

Setup

When a critical UPS event, such as UPS on Battery occurs on UPS #1 on the primary site, the following sequence is triggered:

Shutdown Sequence

  1. PowerChute reports that the UPS #1 protecting the vSAN Cluster is On Battery.

  2. Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.

  3. PowerChute disables vSphere Cluster Services (vCLS) and High Availability.

  4. PowerChute determines that Fault Tolerance Threshold (FTT) has not been exceeded, since the number of critical groups = 1, which is not greater than the FTT Level of 1. PowerChute starts Virtualization shutdown tasks on the primary site only.

  5. PowerChute migrates the vCenter Server VM in the High priority group during VM Migration. Other VMs are also migrated to Hosts on the secondary site during this step.

  6. Any VMs or vApps that could not be migrated are shut down.

  7. After 3 minutes (Disable vSphere Cluster Services (vCLS) Duration = 180, Disable HA Duration = 20), PowerChute starts VM and vApp shutdown on each Host.

  8. After 120 seconds (VM/vApp Shutdown Duration), PowerChute performs cluster stop operations:

  9. After 120 seconds (Cluster stop operations delay), PowerChute starts executing the shutdown command file.

  10. Shutdown command file duration elapses and all vSAN Cluster Hosts enter Maintenance Mode using No Action as FTT is disabled and the entire cluster is protected by 1 UPS.

  11. Shutdown command file duration elapses and PowerChute starts a Maintenance Mode task on the first vSAN host on the primary site with "Ensure Data Accessibility" as FTT is not exceeded. The Delay Host Maintenance Mode Duration is waited and if there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  12. PowerChute starts a Maintenance Mode task on the next vSAN host on the primary site with "Ensure Data Accessibility". The Delay Maintenance Mode Duration is waited and if there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  13. PowerChute starts a Maintenance Mode task on the next vSAN host on the primary site with "Ensure Data Accessibility". The Delay Maintenance Mode Duration is waited and if there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  14. UPS waits for the duration that is greatest of Low Battery Duration/Maximum Required Delay (Non-Outlet Aware UPS's) or the Outlet Group Power Off Delay.

  15. After this delay, a further non-configurable two-minute delay is counted down.

  16. UPS turns off after the user-configurable Shutdown Delay time has elapsed or the Outlet Group turns off after the power off Delay elapses.

NOTE: In a vSAN configuration, Witness and Management hosts will get placed into Maintenance Mode and shut down after Cluster hosts if Delay Maintenance Mode is enabled.

NOTE: If Fault Tolerance Threshold (FTT) is enabled, vSAN hosts are placed into Maintenance Mode using "Ensure Data Accessibility" if the number of critical groups is less than or equal to the FTT Level.

Putting a host into Maintenance Mode with "Ensure Data Accessibility" can trigger data re-synchronization on the host. In this event, PowerChute will wait until the data re-synchronization is complete (with retry limit) before placing the host into maintenance mode and shutting it down. See Host Maintenance Mode for more information.

Example 4: Advanced UPS Configuration, 3 Node vSAN Standard Cluster

Setup

Shutdown Sequence

  1. PowerChute reports that the UPS #2 protecting the vSAN Cluster is On Battery.

  2. Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.

  3. PowerChute disables vSphere Cluster Services (vCLS) and High Availability.

  4. After 3 minutes (Disable vSphere Cluster Services (vCLS) Duration = 180, Disable HA Duration = 20), PowerChute starts VM and vApp shutdown on each Host.

  5. After 120 seconds (VM/vApp Shutdown Duration), PowerChute gracefully shuts down the vCenter Server VM in the High priority group.

  6. After 60 seconds (vCenter Server VM Duration), PowerChute performs cluster stop operations:

  7. After 120 seconds (Cluster stop operations delay), PowerChute starts executing the shutdown command file.

  8. Shutdown command file duration elapses and all vSAN Cluster Hosts enter Maintenance Mode using No Action as FTT is disabled and the entire cluster is protected by 1 UPS.

  9. The OS Shutdown Command is issued and an additional 70 second delay is counted down before the operating system on the physical machine running PowerChute starts to shut down.

  10. UPS waits for the duration that is greatest of Low Battery Duration/Maximum Required Delay (Non-Outlet Aware UPS's) or the Outlet Group Power Off Delay.

  11. After this delay, a further non-configurable two-minute delay is counted down.

  12. UPS turns off after the user-configurable Shutdown Delay time has elapsed or the Outlet Group turns off after the power off Delay elapses.

Critical Event 1

When a critical UPS event, such as UPS on Battery occurs on a host in the vSAN Cluster, for example, ESXi Host C, the following sequence is triggered:

Shutdown Sequence

  1. PowerChute reports that the UPS Setup 3 (ESXi Host C) is On Battery .

  2. Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.

  3. PowerChute determines that Fault Tolerance Threshold has not been exceeded, since the number of critical groups = 1 which is not greater than the FTT Level of 1. PowerChute starts Virtualization Shutdown tasks on ESXi Host C . PowerChute migrates VMs to non-critical Hosts (ESXi Host A, ESXi Host B) in the vSAN Cluster

  4. VM Migration durations elapse. PowerChute starts VM/vApp Shutdown.

  5. VM/vApp Shutdown durations elapse. PowerChute starts executing the shutdown command file.

  6. Shutdown command file duration elapses and PowerChute starts a Maintenance Mode task on ESXi Host C with "Ensure Data Accessibility" as FTT Level is not exceeded. The Delay Host Maintenance Mode Duration is waited and if there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host

  7. UPS waits for the duration that is greatest of Low Battery Duration/Maximum Required Delay (Non-Outlet Aware UPS's) or the Outlet Group Power Off Delay.

  8. After this delay, a further non-configurable two-minute delay is counted down.

  9. UPS turns off after the user-configurable Shutdown Delay time has elapsed or the Outlet Group turns off after the power off Delay elapses.

NOTE: If Fault Tolerance Threshold (FTT) is enabled, vSAN hosts are placed into Maintenance Mode using "Ensure Data Accessibility" if the number of critical groups is less than or equal to the FTT Level.

Putting a host into Maintenance Mode with "Ensure Data Accessibility" can trigger data re-synchronization on the host. In this event, PowerChute will wait until the data re-synchronization is complete (with retry limit) before placing the host into maintenance mode and shutting it down. See Host Maintenance Mode for more information.

Critical Event 2

When a critical UPS event, such as UPS on Battery occurs on a host in the vSAN Cluster, for example, ESXi Host B, the following sequence is triggered:

NOTE: ESXi Host C is already critical.

Shutdown Sequence

  1. PowerChute reports that the UPS Setup 2 (ESXi Host B) is On Battery .

  2. Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.

  3. PowerChute determines that Fault Tolerance Threshold has been exceeded since the number of critical groups = 2 which is greater than the FTT Level of 1.
    PowerChute starts Virtualization Shutdown on critical host ESXi Host B and non-critical host ESXi Host A as
     Shut down all Cluster VMs is enabled.

  4. VM Migration durations elapse. PowerChute starts VM/vApp Shutdown.

  5. PowerChute gracefully shuts down the vCenter Server VM in the High priority group during VM Shutdown of ESXi Host A.

  6. VM/vApp Shutdown durations elapse. PowerChute starts executing the shutdown command file

  7. Shutdown command file duration elapses and PowerChute starts a Maintenance Mode task on ESXi Host B with "No Data Migration" action as FTT Level has been exceeded. The Delay Maintenance Mode Duration is waited and if there is a vSAN synchronization active, PowerChute will wait the specified vSAN Synchronization Duration and check if it has completed with respect to the value set for "vsan_synch_retry_time" in the pcnsconfig.ini file until data re-synchronization is no longer active, or the retry limit has been reached. PowerChute then shuts down the host.

  8. UPS waits for the duration that is greatest of Low Battery Duration/Maximum Required Delay (Non-Outlet Aware UPS's) or the Outlet Group Power Off Delay.

  9. After this delay, a further non-configurable two-minute delay is counted down.

  10. UPS turns off after the user-configurable Shutdown Delay time has elapsed or the Outlet Group turns off after the power off Delay elapses.