[JDSS] JovianDSS failover mechanism technologies explained

Additional information:

Subject:

JovianDSS failover mechanism technologies explained

Contents:

JovianDSS's uses STONITH ("Shoot The Other Node In The Head" or "Shoot The Offending Node In The Head"), a technique for fencing in computer clusters that prevents cluster split-brain and removes potential cluster instability.

The JovianDSS has plenty of functions to prevent cluster split-brain or instability. This provides STONITH functionality and much more:

  1. Network-based ring-ping (heartbeat and ping nodes) controller by Cluster Resource Manager which can decide to reboot a node or export/import a pool. Reboots can be soft-reboot, immediate kernel-reboot or IPMI based reboot. Network-based split-brain protection works well if cluster is properly configured and hardware works as expected.

  2. In case of wrong configuration or unexpected hardware malfunction JovianDSS uses pool based split-brain protection. The function is described in the document:

    http://open-zfs.org/w/images/d/d9/05-MMP-openzfs-2017.4.pdf

    Overview of the MMP functionality:

    "MMP prevents ZFS from importing a pool that is active on another host, under most circumstances"

    The MMP prevents pool import in case of cluster resource manager malfunction. The MMP does not allow for forced pools import if it is used by other cluster nodes.

  3. JovianDSS has built-in “Critical system error response policy”  (please find the screenshot attached) which prevents cluster instability and triggers failover in case of unexpected hardware malfunctions.

  4. JovianDSS has built-in Cluster watchdog which is monitoring user volumes for availability. (please find  config screenshot attached)  In the case of volumes, the malfunction system is rebooted in order to start failover. If the kernel triggered reboot will not work, JovianDSS is using IPMI hardware watchdog to guarantee and force the reboot for clean failover.





Article ID: 3161
Last updated: 03 Apr, 2020
Revision: 1
JovianDSS -> JovianDSS Information -> General info -> Auto Failover -> [JDSS] JovianDSS failover mechanism technologies explained
https://kb.open-e.com/jdss-joviandss-failover-mechanism-technologies-explained_3161.html