USG High Availability - Active/Standby failover

Submitted by -
Status: Duplicate

This purpose of this idea is to add enterprise high availability to the USG lineup with minimal development effort.  Basically, modify the Unifi controller to allow two gateways to be adopted to a site, one is the Active gateway, the other is Backup/HA device.  All traffic processes through the Active firewall, Backup sits idle until an HA event occurs.

 

Requirements: 

1. Active and Backup USG must be the exact same hardware model.

2. Both USGs must be connected to the exact same Layer2 environment on LAN and WAN interfaces.  For example, if the Active USG has a 802.1q trunk on the LAN interface and a DHCP WAN connection on the WAN interface, the Backup USG must be cabled in the same way.

3. Connection state information is not maintained between the Active and Backup USG device.  This means all NAT sessions will expire during a failover event.

 

Theory of operation: 

[ Failover ]

1. The Active USG carries all Layer3 site traffic and houses all active Layer3 interfaces for the site, just like in a traditional single USG deployment. 

2. The Backup USG has no Layer3 interfaces up, with the exception of the HA interface,which is defined as a special network or interface in the Unifi controller and enforced during adoption. (More on this later).

3. A designated HA interface/newtork on each USG is used to send HA heartbeats between the two USGs, as well as sync the commit configuration from the Active to the Backup USG.  When the backup USG misses several consecutive HA heartbeats from the Active USG, the Active USG is considered to be down, which triggers the HA failover.

4. The designated interface could be a dedicated hardware interface on the USG -or- could be a Unifi network configured for HA.  For example, a network could be configured in Unifi as type "HA", instead of "Coprporte", which would instruct the USGs to use a vlan trunk sub-interface to communicate HA status.  

5. When an HA failover event occurs, the Backup USG brings all of the Layer3 interfaces into the up/up state using all of the same interface IPs as the Active USG.

 

[ Failback ]

1. When the failed USG is repaired and returned to service, the first interface initialized after boot up is the designated HA interface.  This interface will be used to check if an Active USG is present by listening for HA heartbeat messages.  If not, this unit becomes the Active USG.  If one is present, this USG goes into Backup USG mode.

2. Once the Active USG has been found, a config sync is initiated from the Active USG to the backup USG.  

3. The freshly booted Backup USG stays in this dormant state until the Active USG fails.

 

[ HA device onboarding ]

Requirements:  

* The site already has an active USG which is adopted by the Unifi controller

* The Unifi controller has an interface/network defined as "HA" and is already provisioned on the Active USG.

* The HA device is cabled up exactly the same as Active USG.

 

 

1. The new HA device boots up like a normal USG in a unadopted state, grabs a DHCP address on the management LAN and then advertises itself to the Unifi controller

2. The Unifi controller presents the new USG device as an adoptable device, however instead of the normal "Adopt" link, the link reads "Adopt HA device to EXISTING_USG_DEVICE_NAME"

3. Adopting the HA device will provision the HA device with the exact same configuration as the Active USG, but with all interfaces other than the HA interface/network in the down/down state.

Comments
by Ubiquiti Employee
on ‎09-25-2018 04:08 AM