Failover, High Availability, Clustering RAID and Redundancy Tutorial Guide
What is High Availability and Failover?
High Availability
High availability is a feature which provides redundancy and fault tolerance. It is a number of connected devices processing and providing a service. It's goal is to ensure this service is always available even in the event of a failure. For example a company such as Amazon.com who sell products through their website would require their website to be available to the public at all times. To ensure this would happen they will have a number of servers in a cluster, so that if one server failed the others will continue processing and take on the processing load of the failed server. They would also provide a number of backup internet connections from different ISP's so that if one ISP went down, the backup ISP's would ensure their website is still accessible on the internet. They would have a separate power line into their data centre, so that if there was a power failure they would have a backup power source keeping their critical resources up and running while they resolved the original problem. They would have a backup data centre at another location (disaster recovery site), so that if there was a disaster to the primary site such as an earthquake, the backup site can be utilised. All these preventative measures would assure their online website was always available and so ensuring there was a degree of high availability.
In firewalls and other similar devices the high availability feature is a mechanism to keep the state of devices synchronized with each other as well as being able to detect a failure so that if a failure did occur active devices would know about this and be able to take on the processing load from the failed device.
High availability is effectively enabling two or more firewalls so that each one acts as a backup for the other firewalls. The high availability feature in each firewall will be equipped to detect failures in a number of ways so that if a failure was detected instant failover could occur.
In high availability two firewalls are usually connected by a mirrored link. This link enables both firewall appliances to keep and maintain an identical state. A failover does not really occur because both firewalls are all ready in current state. The active firewall just takes on all the processing load.
Two modes in achieving high availability are as follows;
Active/Passive mode
In this mode, one firewall or device is active and processing packet, while the other firewall is in passive mode meaning it is a standby and will only become active if the primary firewall fails.
Active/Active mode
In this mode both firewalls are active and so traffic is load balanced between both devices, which are processing and filtering packets. If one firewall fails, the other firewall will take the full processing load, until the failed firewall becomes active again.
Today all the big vendors such as Cisco, Checkpoint, Juniper, Fortinet, support high availability and not just in their firewalls but in all their product sets.
Device Failover
Automatic failover is the process of moving active services from the primary device to the backup device when the primary device fails. Usually the backup device continues these services until the primary device has come back up and running. When a device fails another device takes over this process which is referred to as a failover. The services failover to the backup device which will continue from where the primary device left off.
Failover feature allows for hardware firewalls to have some redundancy. You would have two or more hardware firewalls configured and if the primary firewall fails, the backup firewall/s will take over. Failover is usually implemented on the high end hardware firewalls for networks that require redundancy.
Link Failover
Using the failover functionality means you can have one link processing traffic and have a second link which would only become active if the primary link fails. You can also have this set up to allow a company to connect their firewall to more than one internet connection. If one connection goes down, all traffic would failover to the other internet connection. This would eliminate single point of failure, and would re-assure availability and reliability.
Failover occurs in a number of ways depending on the vendor. Two common ways are stated as below;
Interface link down -
When the device senses a link down to the primary device a failover occurs. So when the interface link has failed to communicate back to the backup device a failover will occur.
Heartbeat loss -
When the backup device does not receive a heartbeat from the primary device for a defined number of seconds, the backup device will failover. A heartbeat is a sensing mechanism which sends a signal across to the primary device, and if the primary device stops responding to the heartbeat for a predefined amount of time, then a failover occurs.
Finally when the primary unit is backup, it will fail back, that is resume and take charge of the services from the backup unit. Failover capabilities also eliminate a single point of failure.
Fully redundant failover servers at separate sites connected to separate communication links should be configured if zero tolerance is required. However the downside is this can be an expensive option to implement.
What is RAID and Clustering
Raid
RAID is a fault tolerance solution for hard drives usually implemented in servers. As well as providing redundancy and fault tolerance, RAID also has a positive impact on performance depending on which RAID level is being used. For example in RAID level 5, parity and data is written to 3 or more drives, and if any of these drives fail the other drives will use the parity information to restore data from the failed drive.
Clustering
Clustering is very similar to redundant servers and provides fault tolerance. In clustering all servers take part in processing a service simultaneously. A group of servers are logically combined into a cluster and seen as one device, which provides a type of service. If a device fails within a cluster the services continue because the other devices within the cluster continue processing the same services as the failed device. The impact here will be less processing power as the cluster is one less device, however high availability is maintained. The cluster is also providing a load balancing solution as they are all taking some of the processing load, as well as kind of a failover scenario because the other devices continue to process even when one device fails. Overall these redundant devices are maintaining high availability.
What is Redundancy
Redundancy is basically extra hardware or software that can be used as backup if the main hardware or software fails. Redundancy can be achieved via load clustering, failover, RAID, load balancing, high availabiltiy in an automated fashion. A higher layer of redundancy is achieved when the backup device is completely separate from the primary device. For example a backup internet line is provided by another ISP provider, so a completely separate physical link and connection from the primary internet connection, or a redundant piece of hardware which resides in another building.
For further reading, there's some excellent electronic ebooks available for download from eBooks.com