Introducing Linux Clustering

Definitions seem to abound regarding the term cluster, therefore, before discussing the Linux-HA cluster solution, it makes sense to define what exactly I'm talking about. In its broadest meaning, the word cluster is a group of computers working together. In a more specific way, clusters of computers are installed for three reasons:

• To increase computing power

• To distribute workload between computers

• To increase the availability of applications

The first definition refers to a high-performance cluster, which is also referred to as a computational cluster. In such a cluster solution, different computers are working together on the same task. The essence of such a cluster is that nodes are capable of sharing resources with each other. This is because computers in a high-performance cluster need to know about each other when another node has idle CPU cycles. In such a clustered environment, memory, CPUs, and storage can be shared in the most complex applications. As an alternative, however, you can also use application-level high-performance clustering. In such a solution, not the entire node is part of the cluster—just an application that is used on the node. Such a solution can work perfectly across the Internet. An example of an application-level high-performance cluster is the [email protected] project (http://setiathome.berkeley.edu/), where an application on a computer is activated whenever idle CPU cycles are available.

The second type of cluster, a load-balancing cluster, has as its goal distributing the workload between computers. In a load-balanced solution, different nodes host the same application, such as a web server. Some intelligence needs to be added to these nodes that are working together to distribute incoming requests between all hosting nodes. This intelligence is the load balancing. In its simplest form, a DNS server can perform the tasks of a load balancer and distribute tasks between different servers by using a technique known as round-robin. The idea is that two or more resource records are created in the DNS database, all referring to the same name, but to different IP addresses:

webserver.somewhere.com a 192.168.0.10

webserver.somewhere.com a 192.168.0.20

webserver.somewhere.com a 192.168.0.30

The round-robin technique evenly distributes requests between all nodes in the network but has some disadvantages. The most important disadvantage is known as black holing. When one of the servers goes offline, ordinarily the DNS server has no way of knowing that the server is down and continues sending requests to the downed server. Another disadvantage is that the DNS server has no way of knowing how many spare CPU cycles the recipient server can offer; in other words, it cannot differentiate between a heavily loaded server and one with relatively little to do. Also, if a cache-only DNS server is used, round-robin will not work for that site, because a cache-only DNS server will always send packets addressed to a given name to the same IP address.

Because of the drawbacks of DNS round-robin as a load-balancing solution, other, more specialized software is available. The most used of all open source software packages is the Linux Virtual Server software; see http://www.linux-vs.org for more information. Also, many companies prefer not to use an open source solution for load balancing and instead use specialized hardware. The advantage is that this specialized hardware can perform tasks that are rather difficult to do with just software, such as properly terminating SSL connections or just increasing the throughput to a maximum by using chips that are programmed uniquely for that purpose.

The third type of cluster is the high-availability cluster. The most important goal of a high-availability cluster is—as the name implies—to increase the availability of important services. High availability has become important in today's marketplace where people want to purchase goods or services on a 24/7 basis, because the high-availability cluster makes sure that when an important server in the cluster goes down, another server can take over the services in a matter of seconds. This high availability is exactly what the Linux Heartbeat project addresses. In a well-configured Heartbeat cluster, the nodes involved will monitor each other, and a signal will be generated when one of the nodes in the cluster goes down so that another node can take over the work. The result is that the user will notice no important downtime and can just continue working with the application involved.

Was this article helpful?

0 0

Post a comment