Designing an HA Cluster Solution

To create a properly functioning HA cluster solution, you should first be aware of the way this type of cluster works. In this section, I'll answer the following questions:

• What exactly can you fail over?

• How many hosts can be configured in an HA cluster?

• How are these hosts monitoring each other?

• What happens when a node fails?

The Linux-HA suite works with software resources. Basically, a resource is some service managed by the Heartbeat daemons. To configure a resource, the first element needed is a load script that is used to control the resource. This is a load script similar to the role of /etc/init.d, which is used to start various Linux services. Typically, these load scripts are in /etc/init.d or in /etc/ha.d/ resource.d. Next, you need a generic file that refers to these resource load scripts. This file is /etc/ ha.d/haresources in Heartbeat 1 and the Cluster Information Base (CIB) in Heartbeat 2. In these, first the primary node where the resource is used is referred to, and next the name of the resource is referred to, followed by some options if they are required. In a two-node cluster, the other node in the cluster will automatically become the backup server. Heartbeat 1 supported only two cluster nodes, whereas in Heartbeat 2 a maximum of 16 nodes is supported.

In addition to the cluster resources, you'll need a configuration file where the Heartbeat protocol settings are configured. These settings determine how often Heartbeat packages are sent over the network to see whether the other nodes are still alive. In a Heartbeat cluster, packages are sent periodically because the cluster nodes depend on them to determine whether the other nodes are still alive. These Heartbeat packages typically are not sent over the network the users are using to connect to the services offered; this is because in that scenario a faulty configured switch may lead to a node being cast off the cluster because it is not reachable anymore. In a two-node cluster, you can use a dedicated serial connection, whereas in a multinode cluster that can be used in a Heartbeat 2 environment, you should configure a dedicated network for higher fault tolerance. If you use a dedicated Ethernet network, it serves multiple purposes, because at the same time you can use it for synchronizing shared file systems as well.

You must also be cautious to prevent what's known as a split-brain condition from occurring. A split brain arises when more than one node in the cluster thinks it owns the clustered resources. For example, such a situation can become critical if two nodes try to modify the same database at the same time, because this may eventually corrupt the database. To prevent this, one must provide for a solution to forcibly shut down a failing node in the cluster when a split brain is detected. In Heartbeat, for this purpose a Stonith ("shoot the other node in the head") device is used.

In summary, you should dedicate at least two physical network paths in the cluster. First, there is the network used to access the cluster. Then there is the network used to send Heartbeat messages and to synchronize shared storage. Optionally, there may even be a third physical path configured, which is for users to obtain administrative access to nodes in your cluster. Whatever solution you are using for the Heartbeat network, make sure no routing occurs between the user access network and the Heartbeat network to prevent unwanted paths.

Was this article helpful?

0 0

Post a comment