Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
 iakovlev.org 
 Books
  Краткое описание
 Linux
 W. R. Стивенс TCP 
 W. R. Стивенс IPC 
 A.Rubini-J.Corbet 
 K. Bauer 
 Gary V. Vaughan 
 Д Вилер 
 В. Сталлинг 
 Pramode C.E. 
 Steve Pate 
 William Gropp 
 K.A.Robbins 
 С Бекман 
 Р Стивенс 
 Ethereal 
 Cluster 
 Languages
 C
 Perl
 M.Pilgrim 
 А.Фролов 
 Mendel Cooper 
 М Перри 
 Kernel
 C.S. Rodriguez 
 Robert Love 
 Daniel Bovet 
 Д Джеф 
 Максвелл 
 G. Kroah-Hartman 
 B. Hansen 
NEWS
Последние статьи :
  Тренажёр 16.01   
  Эльбрус 05.12   
  Алгоритмы 12.04   
  Rust 07.11   
  Go 25.12   
  EXT4 10.11   
  FS benchmark 15.09   
  Сетунь 23.07   
  Trees 25.06   
  Apache 03.02   
 
TOP 20
 MINIX...3057 
 Solaris...2933 
 LD...2904 
 Linux Kernel 2.6...2470 
 William Gropp...2180 
 Rodriguez 6...2011 
 C++ Templates 3...1945 
 Trees...1936 
 Kamran Husain...1865 
 Secure Programming for Li...1791 
 Максвелл 5...1710 
 DevFS...1693 
 Part 3...1682 
 Stein-MacEachern-> Час...1632 
 Go Web ...1624 
 Ethreal 4...1618 
 Arrays...1606 
 Стивенс 9...1603 
 Максвелл 1...1592 
 FAQ...1538 
 
  01.01.2024 : 3621733 посещений 

iakovlev.org

Часть 8: Поддержка Heartbeat

This chapter describes how to use the high-availability resources file, haresources, to control resources on a pair of servers[1] running Heartbeat. We will also explore some of the common maintenance tasks required to keep the Heartbeat high-availability system functioning properly.

The Haresources File Syntax

The /etc/ha.d/haresources file must be the same on both the primary and the backup Heartbeat servers.

Each line in the haresources file usually contains the following:

  • The name of the server where the resource should normally run (the primary server), followed by a space or tab.

  • An (optional) IP alias that Heartbeat should add to the system before launching the resource, followed by a space. (The IP alias definition may include a network subnet mask and a broadcast address separated from each other by the forward slash (/) character.)

  • A resource script (the script used to start and stop the resource) located in either the /etc/init.d or the /etc/ha.d/resource.d directory.[2] If arguments need to be passed to the resource script, they are added after two colons and are separated from each other by two colons.

Additional resource scripts can be added to the line using the space character as a separator.

Note 

If you need to create a haresources line that is longer than the line of text that fits on your screen, you can add the backslash character (\) to indicate that the haresources entry continues on the next line.

A simplified summary of this syntax, for a single line with two resources, each with two arguments, looks like this:

 primary-server [IPaddress] resource1[::arg1::arg2] [resource2[::arg1::arg2]
 

In practice, this line might look like the following on a server called primary.mydomain.com running sendmail and httpd on IP address 209.100.100.3:

 primary.mydomain.com 209.100.100.3 sendmail httpd
 

Let's examine each element of the haresources file in more detail.

Haresources File Syntax: Primary-Server Name

The primary-server name you enter at the start of the haresources line should match one of the server names you've already specified in the /etc/ha.d/ha.cf file. It should also match the name returned by the uname -n command on the primary server.

Haresources File Syntax: IP Alias

Although it is not required, an IP alias[3] is usually specified in the haresources file. This IP alias can then be offered from either the primary or the backup server, depending upon which system is healthy. For example:

 primary.mydomain.com 209.100.100.3
 
 

Heartbeat will add 209.100.100.3 as an IP alias to one of the existing NICs connected to the system and send Gratuitous ARP[4] broadcasts out of this NIC to the locally connected computers when it first starts up. It will only do this on the backup server if the primary server goes down.

Actually, when Heartbeat sees an IP address in the haresources file, it runs the resource script in the /etc/ha.d/resource.d directory called IPaddr and passes it the requested IP address as an argument. The /etc/ha.d/ resource.d/IPaddr script then calls the program included with the Heartbeat package, findif (find interface), and passes it the IP alias you want to add. This program then automatically selects the physical NIC that this IP alias should be added to, based on the kernel's network routing table.[5] If the findif program cannot locate an interface to add the IP alias to, it will complain in the /var/log/messages file with a message such as the following:

 heartbeat: ERROR: unable to find an interface for 200.100.100.3
 
 
Note 

You cannot place the primary IP address for the interface in the haresources file. Heartbeat can add an IP alias to an existing interface, but it cannot be used to bring up the primary IP address on an interface.

Heartbeat's Automated Network Interface Card Selection Process

Heartbeat uses the findif program to select which NIC the IP alias you specify in the haresources file will be added by comparing the IP alias to each of the destination addresses listed in your kernel's network routing table. As described in Chapter 2, you can view this routing table with the route -n command. For example:

 #route -n
 Kernel IP routing table
 Destination      Gateway        Genmask          Flags    Metric    Ref    Use   Iface
 200.100.100.2    0.0.0.0        255.255.255.0    U        0         0      0     eth1
 10.1.1.0         0.0.0.0        255.255.255.0    U        0         0      0     eth2
 127.0.0.0        0.0.0.0        255.0.0.0        U        0         0      0     lo
 0.0.0.0          200.100.100.1  0.0.0.0          UG       0         0      0     eth1
 
 
Note 

The output of this command is based on entries the kernel stores in the /proc/net/route file that is created each time the system boots using the route commands in the /etc/init.d/network script on a Red Hat system. See Chapter 2 for an introduction to the kernel network routing table.

When findif is able to match the network portion of the destination address in the routing table with the network portion of the IP alias from the haresources file, it returns the interface name (eth1, for example) associated with the destination address to the IPaddr script. The IPaddr script then adds the IP alias to this local interface and adds another entry to the routing table to route packets destined for the IP alias to this local interface. So the routing table, after the 209.100.100.3 IP alias is added, would look like this:

 #route -n
 Kernel IP routing table
 Destination      Gateway        Genmask          Flags    Metric    Ref    Use   Iface
 200.100.100.2    0.0.0.0        255.255.255.0    U        0         0      0     eth1
 200.100.100.3    0.0.0.0        255.255.255.0    U        0         0      0     eth1
 10.1.1.0         0.0.0.0        255.255.255.0    U        0         0      0     eth2
 127.0.0.0        0.0.0.0        255.0.0.0        U        0         0      0     lo
 0.0.0.0          200.100.100.1  0.0.0.0          UG       0         0      0     eth1
 

The IPaddr script has executed the command route add -host 209.100.100.3dev eth1.

Finally, to complete the process of adding the IP alias to the system, the IPaddr script sends out five gratuitous ARP broadcasts to inform locally connected computers that this IP alias is now associated with this interface.

Note 

If more than one entry in the routing table matched the IP alias being added, the findif program will use the metric entry in the routing table to select the interface with the fewest hops (the lowest metric).

Finding the Right Network

To compare the network portion of the IP alias to the network portion of the "destination" address entry in the routing table, findif needs to know which portion of the address represents a network and which portion represents a node. In other words, it needs to know which network mask to apply to both the destination address in the routing table and the IP alias from the haresources file before it can determine whether the network portions of the two addresses match.

The findif program will use the network mask for each entry in the routing table and apply it to both the routing table entry and the IP alias being requested to see if they match. If the two network portions of the addresses match, they are on the same network, and findif knows to add the IP alias being requested to the interface associated with this routing table entry.

The findif program will not use the default route specified in your routing table (the entry with a destination address of 0.0.0.0) unless you specify a network mask in your haresources file. For example, the entry:

 primary.mydomain.com 209.100.100.3/24 myresource
 

says that Heartbeat should use a network mask of 255.255.255.0. It also says that if no other entry in the routing table with its associated network mask applied to it matches this address, the default route in the routing table should be used.

However, under normal circumstances your routing table has an entry that will match your IP alias correctly without the need to consult the default route, so you will probably never need to enter the network mask in the haresources file. In the above routing table, for example, before Heartbeat added the IP alias, the first entry looked like this:

 Destination       Gateway      Genmask         Flags     Metric    Ref    Use    Iface
 200.100.100.2     0.0.0.0      255.255.255.0   U         0         0      0      eth1
 

This entry matches the 209.100.100.3 IP alias once the 255.255.255.0 netmask from this routing table entry is applied to both addresses for the comparison (both addresses are on the 209.100.100 network). So the correct interface (eth1 in this case) is selected even though the default route was not used in the interface selection process.

Specifying a Network Interface Card

You can avoid this auto-selection process by specifying the interface Heartbeat should use in the haresources file with an entry like this:

 primary.mydomain.com 209.100.100.3/24/eth0/209.100.100.255
 

This entry uses the following syntax:

 primary-server IPalias/number-of-netmask-bits/interface-name/broadcast-address
 

Thus, in this example, the IP alias is 209.100.100.3 with a 24-bit netmask (equivalent to a network mask of 255.255.255.0) on network interface card eth0, using a broadcast address of 209.100.100.255.

To specify this as the IP alias and interface to be used for the httpd daemon, enter the following line:

 primary.mydomain.com 209.100.100.3/24/eth0/209.100.100.255 httpd
 

With this entry in the haresources file, Heartbeat will always use the eth0 interface for the 209.100.100.3 IP alias when adding it to the system for the httpd daemon to use.[6]

Customizing IP Address Takeover with the iptakeover Script

If you need to modify the routing table as part of the process of taking over an IP address, then you may want to use the iptakeover script described in Chapter 7 to perform Gratuitous ARP broadcasts yourself. To use this script, add a line like the following to the haresources file.

 primary.mydomain.com iptakeover myresource
 

Heartbeat will then run /etc/ha.d/resource.d/iptakeover status followed by /etc/ha.d/resource.d/myresource start. This makes it possible for you to decide exactly which interface you want your IP alias to appear on, to modify the routing table if need be, and to perform the Gratuitous ARP broadcasts from a single script. However, under most circumstances this is not required and should be avoided if possible—specify an IP address in the haresources file and let Heartbeat do all of this work for you.[7]

The Haresources File Syntax: Resources

Each line in the haresources file can contain one or more resource script names. The resources are separated by a space. Arguments can be passed to the resource scripts using two colons between the arguments. For example, if you need Heartbeat to send a special argument (let's say FILE1) to your resource script (before the word start, status or stop), you would use the syntax:

 primary.mydomain.com myresource::FILE1
 

Assuming you add this line to the haresources file on both the primary and the backup server, Heartbeat will run[8] /etc/ha.d/resource.d/myresource FILE1 start when it first starts on the primary server, and then again on the backup server when the primary server fails. When the resource needs to be "released" or stopped, Heartbeat will run the script with the command /etc/ ha.d/resource.d/myresource FILE1 stop.

If we wanted to combine our iptakeover script with the myresource script and its FILE1 argument, we would use the line:

 primary.mydomain.com iptakeover myresource::FILE1
 

To send your resource script several arguments, enter them all on the same line after the script name with each argument separated by a pair of colons. For example, to send the myresource script the arguments FILE1, UNAME=JOHN, and L3, your haresources entry would look like this:

 primary.mydomain.com iptakeover myresource::FILE1::UNAME=JOHN::L3
 
 
Note 

The haresources syntax is also documented online at http://wiki.trick.ca/linuxha/ HeartbeatResourceAgent.

Resource Groups

Until now, we have only described resources as independent entries in the haresources file. In fact, Heartbeat considers all the resources specified on a single line as one resource group. When Heartbeat wants to know if a resource group is running, it asks only the first resource specified on the line. Multiple resource groups can be added by adding additional lines to the haresources file.

For example, if the haresources file contained an entry like this:

 primary.mydomain.com iptakeover myresource::FILE1::UNAME=JOHN::L3
 

only the iptakeover script would be called to ask for a status when Heartbeat was determining the status of this resource group. If the line looked like this instead:

 primary.mydomain.com myresource::FILE1::UNAME=JOHN::L3 iptakeover
 

Heartbeat would run the following command to determine the status of this resource group (assuming the myresource script was in the /etc/ha.d/ resource.d directory):

 /etc/ha.d/resource.d/myresource FILE1 UNAME=JOHN L3 status
 
 
Note 

If you need your daemon to start before the IP alias is added to your system, enter the IP address after the resource script name with an entry like this:

     primary.mydomain.com myresource IPaddr::200.100.100.3
 

Resource Script Arguments and Resource Groups

To combine Heartbeat's ability to send arguments to a script with its ability to create multiple resource groups, you could, for example, write one script that started both SERVICE-A and SERVICE-B based upon the argument it was passed. For example, let's call this combined resource script resAB and assume it can handle the argument SERVICE-A or SERVICE-B followed by the word start, stop, or status to control both daemons. You could then create haresources entries like this:

 primary.mydomain.com resAB::SERVICE-A
 primary.mydomain.com resAB::SERVICE-B
 

Using this haresources file entry, when Heartbeat wanted to know if these resource groups were active, it would run the commands:

 /etc/ha.d/resource.d/resAB SERVICE-A status
 /etc/ha.d/resource.d/resAB SERVICE-B status
 
 

and it would start the resource group by executing:

 /etc/ha.d/resource.d/resAB SERVICE-A start
 /etc/ha.d/resource.d/resAB SERVICE-B start
 

[1]In this book, we are only concerned with Heartbeat's ability to failover a resource from a primary server to a backup server.

[2]Recall from Chapter 6 that this is the system init directory (/etc/init.d, /etc/rc.d/init.d/, /sbin/ init.d, /usr/local/etc/rc.d or /etc/rc.d) and the Heartbeat resource directory (/etc/ha.d/resource.d).

[3]IP aliases were introduced in Chapter 6.

[4]GARP broadcasts were also introduced in Chapter 6.

[5]See "Routing Packets with the Linux Kernel" in Chapter 2.

[6]Note that the resource daemon (httpd in this case) may also need to be configured to use this interface or this IP address as well.

[7]Also, as we'll see in Part III, you'll want to leave IP alias assignment under Heartbeat's control so Ldirectord can failover to the backup load balancer.

[8]This assumes the script myresource is located in the /etc/ha.d/resource.d directory and not in the /etc/init.d directory—in which case the command would be /etc/rc.d/resource.d FILE1 start.

Load Sharing with Heartbeat

Using Heartbeat and two computers, we can offer one daemon, or service, from the primary server and then offer a different service from the backup server. If either server fails, the other one will start offering both services.

This is a form of load sharing called an active-active server configuration, but to use it you will have to ensure that each system has the processing power and ability to handle the network load that will allow it to offer both services in the event of a failure. This configuration, however, is much more difficult to administer and support in a production environment. Neither server can go down for maintenance or upgrades without causing a failover of at least one service. Figure 8-1 shows a sample diagram of this type of Heartbeat configuration.

Image from book
Figure 8-1: Heartbeat active-active configuration

The two-line haresources entry used to create the configuration shown in Figure 8-1 (again, the primary and the backup server should always have identical haresources files) looks like this:

 primary.mydomain.com 209.100.100.3 sendmail
 backup.mydomain.com 209.100.100.4 httpd
 

Once Heartbeat is running on both servers, the primary.mydomain.com computer will offer sendmail at IP address 209.100.100.3, and the backup. mydomain.com computer will offer httpd at IP address 209.100.100.4. If the backup server fails, the primary server will perform Gratuitous ARP broadcasts for the IP address 209.100.100.4 and run httpd with the start argument. (The names "primary" and "backup" are really meaningless in this configuration because both computers are acting as both a primary and a backup server.) Client computers would always look for the sendmail resource at 209.100.100.3 and the http resource at 209.100.100.4, and even if one of these servers went down, the resource would still be available once Heartbeat started the resource and moved the IP alias over to the other computer.

Note 

This configuration is more difficult to administer than an active-standby configuration, because you will always be making changes on a "live" system unless you failover all resources so they run on a single server before you do your maintenance.

This configuration is also not recommended when using local data (data stored on a locally attached disk drive of either server) because complex data replication and synchronization methods must be used.[9]

Load Sharing with Heartbeat: Round-Robin DNS

But what if you wanted to offer just one resource from both computers and have them share the work? This is the goal of the cluster described in this book, and it will be possible to attain that goal using load balancing software described in Part III. However, for now we can achieve a simple form of load balancing using round-robin DNS.

One feature of the Domain Name System called round-robin DNS is that it allows you to offer one service on two (or more) IP addresses. For example, the host name (or web URL) is first resolved at IP address 209.100.100.4, then at 209.100.100.3, and then back at 209.100.100.4 again, and so on in roundrobin fashion. The DNS (BIND version 4.9.3 or later) entry for your server might look like this:

 ;Round-robin entry for www.mydomain.com
 www.mydomain.com    IN  A  209.100.100.3
 www.mydomain.com    IN  A  209.100.100.4
 

with reverse address entries for the 209.100.100.3 and 209.100.100.4 IP addresses that look like this:

 3         IN PTR www.mydomain.com
 4         IN PTR www.mydomain.com
 
 

The entry in your haresources file for this type of configuration would then look like this:

 primary.mydomain.com 209.100.100.3 httpd
 backup.mydomain.com 209.100.100.4 httpd
 

Problems with Round Robin DNS Load Balancing

When using round-robin DNS, the two servers would, in theory, each get half of the client requests, and Heartbeat would ensure that both IP addresses are available even if one of the servers goes down. Most client computers, however, have a name services caching daemon or NSCD that will cause them to remember (at least for a while) an IP address once they learn it. Storing this IP address on the client computer reduces the need for the client computer to repeatedly ask, "What is the IP address for this host name?" and helps improve the chances that client computers will not enter into a dialog (such as an HTTPS secure transaction) with one web server only to end up improperly sending a response (such as a credit card number) to another web server.

Caching IP address-to-host name can cause a cache-only DNS server on the Internet to respond to client requests for an IP address with a nonauthoritative reply using only one of the IP addresses. This intervening, cache-only DNS server effectively blocks the round-robin DNS replies from your authoritative DNS server.

You can try to stop this behavior by setting a very low time to live (TTL) value for your DNS replies. Once the amount of time specified in your time to live entry elapses, the intervening DNS server should drop the IP address-to-host name mapping it has stored in its memory and ask your authoritative DNS server once again for the proper IP address.

But there's a problem: If everyone on the Internet started using a onesecond DNS TTL value, every client computer on the Internet would effectively end up asking only the authoritative DNS server for the correct IP address when doing a host name-to-IP address resolution. This would circumvent the design of the DNS system whereby intervening DNS servers cache this information and thus reduce both the time it takes to resolve a host name and the amount of DNS traffic that has to be passed around on the Internet. (A very low TTL value for DNS entries means a more heavily loaded authoritative DNS server.[10])

Using round-robin DNS and this type of heartbeat configuration for load sharing, however, lets you locate your two servers at two different physical locations. In the event of a true disaster, one of the servers would be able to take over both (or all) IP addresses, and once the routers on the Internet figured out where the IP address had moved to, eventually allow the client computers to continue to connect to your web server at its new location.

Note 

This configuration is very susceptible to a split-brain condition. If exclusive access to resources is required, automatic failover mechanisms that require heartbeats to traverse a WAN or a public network should not be used.

Wide-Area Load Balancing

Now, what if the client computer on the Internet is much closer to one of the two web servers offering the same web page and it accidentally ends up with the IP address that happens to be on a server located on the other side of the world? It would make sense to offer your resources from the servers closest to the Internet client and only force the client to route to another server in the event of a failure or system crash.

This is called wide-area load balancing or globally distributed content and can be accomplished with an open source program for Linux called Super Sparrow. A server running Super Sparrow will examine the client's source address and determine whether the client computer would be better off talking to a different server with synchronized content that is closer to the client computer. If so, the client computer's request is redirected to the closest available server. (For more information, see the Super Sparrow website at http://www.supersparrow.org.)

[9]See Chapter 7, "Failover Configurations and Issues," in Blueprints for High Availability by Evan Marcus and Hal Stern.

[10]To prevent this problem from ever happening, many DNS clients and servers ignore small TTL values and use cached information anyway.

Operator Alerts: Audible Alarm

To cause an alarm to sound when the backup server has to take over for the primary server, use a haresources entry like this:

 primarynode AudibleAlarm::primarynode
 

This entry says that the host named primarynode should normally "own" the resource AudibleAlarm, but that the AudibleAlarm should never sound on the primary server. The AudibleAlarm resource, or script, allows you to specify a list of host names that should never sound an alarm. In this case, we are telling the AudibleAlarm script not to run or sound on the primarynode. When a failover occurs, Heartbeat running on the backup server will sound the alarm (an audible beep every one second).

Note 

These haresources entries rely on the scripts /etc/ha.d/resource.d/AudibleAlarm and / etc/ha.d/resource.d/MailTo. These scripts are located in the chapter8 subdirectory on the CD-ROM. You can also download them from the Linux-ha CVS repository (see the heartbeat/resource.d directory for additional scripts).

The AudibleAlarm script can also be modified to flash the floppy drive light if you have installed the fdutils packages. The fdutils package contains a utility called floppycontrol. You can easily download and compile the fdutils package (download the tar file from http://fdutils.linux.lu, then run ./configure, then make) and uncomment the lines in the /etc/ha.d/ resource.d/AudibleAlarm script to make the floppy drive light flash.

Operator Alerts: Email Alerts

To send an email alert, use the MailTo resource script with a haresources entry like this:

 primarynode MailTo::operator@mailhost.com,root@mailhost.com
 

or to specify a subject line for an email alert, use:

 primarynode MailTo::operator@mailhost.com,root@mailhost.com::Mysubject
 

Heartbeat Maintenance

Thanks to the fact that Heartbeat resource scripts are called by the heartbeat daemon with start, stop, or status requests, you can restart a resource without causing a cluster transition event. For example, say your Apache web server daemon is running on the primary.mydomain.com web server, and the backup.mydomain.com server is not running anything; it is waiting to offer the web server resource in the event of a failure of the primary computer. If you needed to make a change to your httpd.conf file (on both servers!) and you wanted to stop and restart the Apache daemon on the primary computer, you would not want this to cause Heartbeat to start offering the service on the backup computer. Fortunately, you can run the /etc/init.d/httpd restart command (or /etc/init.d/httpd stop followed by the /etc/init.d/httpd start command) without causing any change to the cluster status as far as Heartbeat is concerned.

Thus, you can safely stop and restart all of the cluster resources Heartbeat has been asked to manage, with perhaps the exception of filesystems, without causing any change in resource ownership or causing a cluster transition event. Of course, many daemons will also recognize the SIGHUP (or kill -HUP <process-ID-number>) command as well, so you can force a resource daemon to reload its configuration files after making a change without stopping and restarting it.

Again, in the case of the Apache httpd daemon, if you change the httpd.conf file and want to notify the running daemons of the change, you would send them the SIGHUP signal with the following command:

 #kill -HUP `cat /var/run/httpd.pid`
 
 
Note 

The file containing the httpd parent process ID number is controlled by the PidFile entry in the httpd.conf file (this file is located in the /etc/httpd/conf directory on Red Hat Linux systems).

Changing Heartbeat Configuration Files

If you need to make a change to the heartbeat configuration file /etc/ha.d/ authkeys, or /etc/ha.d/ha.cf, you can force the running heartbeat daemon to reload these configuration files with the following command.

 #/etc/init.d/heartbeat reload
 

or

 #service heartbeat reload
 

When you change a haresources file, you need to restart Heartbeat on both the primary and the backup server to make your changes take effect (the reload option will not work).

Server Maintenance and the Heartbeat auto_failback Option

Normally, when the primary server crashes and the backup server takes ownership of a resource, the backup server will only hold this resource until the primary server comes back up. Once the primary server is up and running again, the backup server will release the resource and the primary server will assume ownership once again; it will start the resource script and start offering the service to client computers. This is the default heartbeat failback configuration.

To modify this Heartbeat behavior, add the following line before the node entries in your /etc/ha.d/ha.cf file (through version 1.1.2 of Heartbeat):

 nice_failback on
 

For Heartbeat versions 1.1.2 and later, the syntax is more intuitively obvious:

 auto_failback off
 

These options tell Heartbeat to leave the resource on the backup server even after the primary server comes back on line. Make this change to the ha.cf file on both heartbeat servers and then issue the following command on both servers to tell Heartbeat to re-read its configuration files:

 #/etc/init.d/heartbeat reload
 

You should see a message like the following in the /var/log/messages file:

 heartbeat[1032]: info: nice_failback is in effect.
 

This configuration is useful when you want to perform system maintenance tasks that require you to reboot the primary server. When you take down the primary server and the resources are moved to the backup server, they will not automatically move back to the primary server.

Once you are happy with your changes and want to move the resource back to the primary server, you would remove the nice_failback option from the ha.cf file and again run the following command on the backup server.

 #/etc/init.d/heartbeat reload
 

(You do not need to run this command on the primary server because we are about to restart the heartbeat daemon on the primary server anyway.) Now, force the resource back over to the primary server by entering the following command on the primary server:

 #/etc/init.d/heartbeat restart
 
 
Note 

If you want auto_failback turned off as the default, or normal, behavior of your Heartbeat configuration, be sure to place the auto_failback option in the ha.cf file on both servers. If you neglect to do so, you may end up with a split-brain condition; both servers will think they should own the cluster resources. The setting you use for the auto_failback (or the deprecated nice_failback) option has subtle ramifications and will be discussed in more detail in Chapter 9.

Forcing the Primary Server into Standby Mode

In a two-node Heartbeat configuration, you can force the primary server to relinquish its resources, without stopping Heartbeat, by forcing the primary server into standby mode. This causes the backup server to start the resource scripts and take ownership of the Heartbeat resources. Run this command as root on the primary server:

 #/usr/lib/heartbeat/hb_standby
 

The primary server will not go into standby mode if it cannot talk to the heartbeat daemon on the backup server. In Heartbeat versions through 1.1.2, the hb_standby command required nice_failback to be turned on. With the change in syntax from nice_failback to auto_failback , Heartbeat no longer requires auto_failback to be turned off. However, if you are using an older version of Heartbeat that still supports nice_failback, it must still be turned on to use the hb_standby command.

In this example, the primary server (where we ran the hb_standby command) is requesting that the backup server take over the resources. When the backup server receives this request, it asks the primary server to release its resources. If the primary server does not release the resources,[11] the backup server will not start them.

Note 

The hb_standby command allows you to specify an argument of local, foreign, or all to specify which resources should go into standby (or failover to the backup server).

Tuning Heartbeat's Deadtime Value

Sometimes Heartbeat will report that it cannot hear its own heartbeat, or that heartbeat times are too long. If the heartbeat logs indicate that a heartbeat was not received within the deadtime timeout period and the backup server tried to take over for the primary server when you did not want it to, you need to properly tune your deadtime value to account for system and network environmental conditions that may be causing heartbeats to get lost or to not be heard. This can occur on systems that are heavily loaded with network processing tasks, or even with heavy CPU utilization.

To tune the heartbeat deadtime value for these conditions, set the deadtime value to a large value such as 60 seconds or higher, and set the warntime value to the number of seconds you would like to use for your deadtime value.

Now run the system for a few weeks and carefully watch the /var/log/ messages file and the logfile /var/log/ha-log for warntime messages indicating the longest period of time your system went without hearing a heartbeat. Armed with that information, set your warntime to this amount, and multiply this warntime value by 1.5 to 2 to arrive at the smallest possible value you should use for your deadtime. Leave logging enabled and continue to monitor your logs to make sure you have not set the value too low.

Informational Messages in Heartbeat's Log

You may see messages such as the following in your message log file:

 heartbeat: info: RealMalloc stats: 976 total malloc bytes. pid [369/HBREAD
 heartbeat: info: MSG stats: 0/441708 age 0 [pid370/MST_STATUS]
 heartbeat: info: ha_malloc stats: 0/9035987 0/0
 

These messages will appear every 24 hours, beginning when Heartbeat was started. After a few days of operation, the total number of bytes used should not grow. These informational messages from Heartbeat can be ignored when Heartbeat is operating normally.

Failover and Respawn (Automatically Restarting Failed Resources)

On a normal Unix/Linux server, the init daemon will start daemons (usually serial line communication services or tty related services) based on entries in the /etc/inittab file. If the entry contains the word respawn, init will monitor the daemon and restart it if it dies for any reason.

If you need to run a service that needs to be restarted or respawned automatically when it fails, you have a few options:

  • If the application should run on both the primary and the backup server all of the time, create an entry for the application in /etc/inittab (see the man page for inittab for syntax details).

  • If the application should only run when Heartbeat is running, you can create a respawn entry for the service in the /etc/ha.d/ha.cf file that looks like this:

     respawn root /usr/sbin/faxgetty ttyQ01e0
     

    This line tells Heartbeat to run the /usr/sbin/faxgetty program and pass it the argument ttyQ01e0. Heartbeat will do this when it first starts up on both the primary and the backup Heartbeat servers (recall that the Heartbeat configuration files should always be the same on both servers).

  • If the application should run on only one of the Heartbeat servers at a time, you will have to implement a service under Heartbeat's control that knows how to restart failed daemons, such as the Daemontools package (http://cr.yp.to/daemontools.html), or that can use the cl_respawn utility included with the Heartbeat package. To use the cl_respawn utility, create a line such as the following in the start section of your resource script:

     cl_respawn /usr/sbin/faxgetty ttyQ01e0
     

    When Heartbeat calls your resource script and passes it the start argument, it will then run the utility cl_respawn with the arguments /usr/sbin/faxgetty and ttyQ01e0. The cl_respawn utility is a small program that is unlikely to crash, because it doesn't do much—it just hangs around in the background and watches the daemon it started (faxgetty in this example) and restarts the daemon if it dies. (Run cl_respawn -h from a shell prompt for more information on the capabilities of this utility.)

License Manager Failover

A license manager daemon such as lmgrd from GlobeTrotter Software can be configured to failover in conjunction with an IP address, just like any other daemon. However, before you can use a license manager, you will need a second set of licenses from your software vendor for the backup server's hostid. Some software vendors will allow you to have two sets of licenses if you agree to use the second set only when the primary license server goes down.

[11]There is a 20-minute timeout period (it was a 10-second timeout period prior to version 1.0.2).

In Conclusion

To make a resource highly available, you need to eliminate single points of failure and understand how to properly administer a highly available server pair. To administer a highly available server pair, you need to know how to failover a resource from the primary server to the backup server manually so that you can do maintenance tasks on the primary server without affecting the resource's availability. Once you know how to failover a resource manually, you can place it under Heartbeat's control by creating the proper entry or entries in the haresources file to automate the failover process.

Часть 9: Stonith и Ipfail

Overview

Recall from Chapter 6 that a split-brain condition occurs when more than one server thinks it has exclusive ownership of a resource. The consequences of a split-brain condition can include the inability to answer client computer requests for service, or perhaps even worse, the ability to answer requests inaccurately. For example, if two servers have the ability to offer client computers access to a warehouse inventory database that is stored locally on each server's disk drive in a high-availability configuration, only one server should be able to modify the database at a time. If each server allows users to modify its copy of the inventory database, then neither database will be accurate, and correcting the errors in the data will be difficult, if not impossible.

Note 

The situation is even worse if both servers share access to a single disk drive (using a shared SCSI bus, for example). In the scenario just described, the local copy of the database will be out of date on both servers, but if a split-brain condition occurs when the primary and backup server both try to write to the same database (stored on the shared disk drive), the entire database may become corrupted.

This chapter explains how to avoid a split-brain condition using a component in the Heartbeat package called Stonith. We'll discuss how to use a feature of Heartbeat called ipfail, which allows Heartbeat servers to detect which server should own the resources based on the Heartbeat servers' ability to communicate on the network. We'll also show you how to use the kernel capability called Watchdog, which allows the system to reboot itself if the Heartbeat program hangs, and we'll conclude with the basic tests you should perform on your high-availability servers before they go into production.

Stonith

Stonith, or "shoot the other node in the head,"[1] is a component of the Heartbeat package that allows the system to automatically reset the power of a failing server using a remote or "smart" power device connected to a healthy server. A Stonith device is a device that can turn power off and on in response to software commands. A serial or network cable allows a server running Heartbeat to send commands to this device, which controls the electrical power supply to the other server in a high-availability pair of servers. The primary server, in other words, can reset the power to the backup server, and the backup server can reset the power to the primary server.

Note 

Although there is no theoretical limitation on the number of servers that can be connected to a remote or "smart" power device capable of cycling system power, the majority of Stonith implementations use only two servers. Because a two-server Stonith configuration is the simplest and easiest to understand, it is likely in the long run to contribute to—rather than detract from—system reliability and high availability.[2]

This section will describe how to get started with Stonith using a two-server (primary and backup server) configuration with Heartbeat. When the backup server in this two-server configuration no longer hears the heartbeat of the primary server, it will power cycle the primary server before taking ownership of the resources. There is no need to configure sophisticated cluster quorum election algorithms in this simple two-server configuration;[3] the backup server can be sure it will be the exclusive owner of the resource while the primary server is booting. If the primary server cannot boot and reclaim its resources, the backup server will continue to maintain ownership of them indefinitely. The backup server may also keep control of the resources if you enable the auto_failback option in the Heartbeat ha.cf configuration file (as discussed in Chapter 8).

Forcing the primary server to reboot with a power reset is the crudest and surest way to avoid a split-brain condition. As mentioned in Chapter 6, a split-brain condition can have dire consequences when two servers share access to an external storage device such as a single SCSI bus connection to one disk drive. If the server with write permission, the primary server, mal-functions, the backup server must take great precautions to ensure that it will have exclusive access to the storage device before it modifies data.[4]

Stonith also ensures that the primary server is not trying to claim ownership of an IP address after it fails over to a backup server. This is more important than it sounds, because many times a failover can occur when the primary server is simply not behaving properly and the lower-level networking protocols that allow the primary server to respond to ARP requests ("Who owns this IP address?") are still working. The backup server has no reliable way of knowing that the primary server is engaging in this sort of improper behavior once communication with the daemons on the primary server, especially with the Heartbeat daemon, is lost.

[1]Other high-availability solutions sometimes call this Stomith, or "shoot the other machine in the head."

[2]"Complexity is the enemy of reliability," writes Alan Robertson, the lead Heartbeat developer.

[3]With three or more servers competing for cluster resources, a quorum, or majority-wins, election process is possible (quorum election will be included in Release 2 of Heartbeat).

[4]More sophisticated methods are possible through advanced SCSI commands, which are not implemented in all SCSI devices and are not currently a part of Heartbeat.

An Unconventional Approach: Using a Single Stonith Device

In this chapter, I will use an unconventional approach to deploying a high-availability configuration. I will describe how to make resources highly available using only one Stonith device. Normally, high-availability configurations are built using two Stonith devices (one connected to the primary server and one connected to the backup server). In this chapter, however, I will describe how to deploy Heartbeat using only one Stonith device, which introduces two important limitations into your high-availability configuration:

  1. All resources must run on the primary server (no resources are allowed on the backup server as long as the primary server is up).

  2. A failover event can only occur one time and in one direction. In other words, the resources running on the primary server can only failover to the backup server once. When the backup server takes ownership of the resources, the primary server is shut down, and operator intervention is required to restore the primary server to normal operation.

When you use only one Stonith device, you must run all of the highly available resources on the primary server, because the primary server will not be able to reset the power to the backup server (the primary server needs to reset the power to the backup server when the primary server wants to take back ownership of its resources after it has recovered from a crash). Resources running on the backup server, in other words, are not highly available without a second Stonith device. Operator intervention is also required after a failover event when you use only one Stonith device, because the primary server will go down and stay down—you will no longer have a highly available server pair.

With a clear understanding of these two limitations, we can continue our discussion of Heartbeat and Stonith using a sample configuration that only uses one Stonith device.

Sample Heartbeat with Stonith Configuration

Figure 9-1 shows a two-server, high-availability configuration using Heartbeat and a single Stonith device with three client computers connecting to resources on the primary server.

Image from book
Figure 9-1: Two-server Heartbeat with Stonith—normal operation

Normally, the primary server broadcasts its heartbeats and the backup server hears them and knows that all is well. But when the backup server no longer hears heartbeats coming from the primary server, it sends the proper software commands to the Stonith device to cycle the power on the primary server, as shown in Figure 9-2.

Image from book
Figure 9-2: Stonith sequence of events

Stonith Sequence of Events

As shown in Figure 9-2, the Stonith operation proceeds as follows:

  1. The Stonith event begins when heartbeats are no longer heard on the backup server.

    Note 

    This does not necessarily mean that the primary server is not sending them. The heartbeats may fail to reach the backup server for a variety of reasons. This is why at least two physical paths for heartbeats are recommended in order to avoid false positives.

  2. The backup server issues a Stonith reset command to the Stonith device.

  3. The Stonith device turns off the power supply to the primary server and then turns it back on.

  4. As soon as the power is cut off from the primary server, it is no longer able to access cluster resources and is no longer able to offer resources to client computers over the network. This guarantees that client computers are unable to access the resources on the primary server and eliminates the possibility of a split-brain condition.

  5. The backup server then acquires the primary server's resources. Heartbeat runs the proper resource script(s) with the start argument and performs gratuitous ARP broadcasts so that client computers will begin sending their requests to its network interface card.

Once the primary server finishes rebooting it will attempt to reclaim ownership of its resources again by asking the backup server to relinquish them, unless both servers have the auto_failback turned off.[5]

Note 

Client computers should always access cluster resources on an IP address that is under Heartbeat's control. This should be an IP alias and not an IP address that is automatically added to the system at boot time.

Stonith Devices

Here is a partial listing of the Stonith devices supported by Heartbeat:

Stonith Name

Device

Company Website

apcmaster

APC Master Switch AP9211

www.apcc.com

apcmastersnmp

APC Masterswitch (SNMP)

www.apcc.com

apcsmart

APC Smart-UPS [a]

Tested with 900XLI

www.apcc.com

baytech

Tested with RPC-5 [b]

www.baytechdcd.com

nw_rpc100s

Micro Energetics

Night/Ware RPC100S

microenergeticscorp.com

rps10

Western Telematics RPS-10M

www.wti.com

wti_nps

Western Telematics

Network Power Switches NPS-xxx

Western Telematics Telnet Power Switches (TPS-xxx)

 

[a]Comments in the code state that no configuration file is used with this device. Apparently it will always use the /dev/ups device name to send commands, so you need to create a link to /dev/ups from the proper /dev/tty device.

[b]The code implies other RPC devices should work as well.

Stonith supports two additional methods of resetting a system:

Meatware

Operator alert; this tells a human being to do the power reset. It is used in conjunction with the meatclient program.

SSH

Usually used only for testing. This causes Stonith to attempt to use the secure shell SSH [a] to connect to the failing server and perform a reboot command.

[a]See Chapter 5 for a description of how to set up and configure SSH between cluster, or Heartbeat, servers.

You may also still see a null device. This "device" was originally used by the Heartbeat developers for coding purposes before the SSH device was developed.

Viewing the Current List of Supported Stonith Devices

If you download the source RPM file, you can view the list of currently supported Stonith devices in the file:

 /usr/src/redhat/SOURCES/heartbeat-<version>/STONITH/README
 

Or you can list the Stonith device names with the command:

 #/usr/sbin/stonith -L
 

The Stonith Meatware "Device"

Before purchasing one of the supported Stonith devices, you can use a meatware "device" to experiment with Stonith. A meatware device is a whimsical reference to a human being. When you use a meatware device, Heartbeat simply raises an operator alert (instead of resetting the power using software commands to a hardware device connected to a serial or network cable). After following the recipe in Chapter 7 that told you how to download and install the Heartbeat and Stonith RPMs, enter the following command to create a meatware device for a nonexistent host we will call chilly:

 #/usr/sbin/STONITH -t meatware -p "" chilly
 

This command tells Stonith to create a device of type (-t) meatware with no parameters (-p "") for host chilly.

The Stonith program normally runs as a daemon in the background, but we're running it in the foreground for testing purposes. Log on to the same machine again (if you are at the console you can press CTRL+ALT+F2), and take a look at the messages log with the command:

 #tail /var/log/messages
 STONITH: OPERATOR INTERVENTION REQUIRED to reset test.
 STONITH: Run "meatclient -c test" AFTER power-cycling the machine.
 

Stonith has created a special file in the /tmp directory that you can examine with the command:

 #file /tmp/.meatware.chilly
 /tmp/.meatware.chilly: fifo (named pipe)
 
 

Now, because we don't really have a machine named chilly we will just pretend that we have obeyed the instructions of the Heartbeat program when it told us to reset the power to host chilly. If this were a real server, we would have walked over and flipped the power switch off and back on before entering the following command to clear this Stonith event:

 #meatclient -c chilly
 

The meatclient program should respond:

 WARNING!
 If server "chilly" has not been manually power-cycled or disconnected from all
 shared resources and networks, data on shared disks may become corrupted and
 migrated services might not work as expected.
 Please verify that the name or address above corresponds to the server you
 just rebooted.
 PROCEED? [yN]
 

After entering y you should see:

 Meatware_client: reset confirmed.
 

Stonith should also report in the /var/log/messages file:

 STONITH: server chilly Meatware-reset.
 

Using the Stonith Meatware Device with Heartbeat

Now that we have used the Stonith software commands and the meatware device manually, let's configure Heartbeat to do the same thing for us automatically at the time of a failover. On both of your Heartbeat servers add the following entry to the /etc/ha.d/ha.cf file:

 stonith_host * meatware
 

Normally the first parameter after the word stonith_host is the name of the Heartbeat server that has a physical connection to the Stonith device. If both (or all) Heartbeat servers can connect to the same Stonith device, you can use the wildcard character (*) to indicate that any Heartbeat server can perform a power reset using this device. (Normally this would be used with a smart or remote power device that is connected via an Ethernet network capable which allows both Heartbeat servers to connect to it.)

Because meatware is an operator alert message sent to the /var/log/messages file and not a real device we do not need to worry about any additional cabling and can safely assume that both the primary and backup servers will have "access" to this Stonith "device." They need only be able to send messages to their log files.

Note 

When following this recipe, be sure that the auto_failback option is turned on in your ha.cf file. With Heartbeat version 1.1.2, the auto_failback option can be set to either on or off and ipfail will work. Prior to version 1.1.2 (when nice_failback changed to auto_failback), the nice_failback option had to be set to on for ipfail to work.

  1. Use a simple haresources entry like this:

     #vi /etc/ha.d/haresources
     primary.mydomain.com sendmail
     

    The second line says that the primary server should normally own the sendmail resource (it should run the sendmail daemon).

  2. Now start Heartbeat on both the primary and backup servers with the commands:

     primaryserver> service heartbeat start
     backupserver> service heartbeat start
     

    or

     primaryserver> /etc/init.d/heartbeat start
     backupserver> /etc/init.d/heartbeat start
     
     
    Note 

    The examples above and below use the name of the server followed by the > character to indicate a shell prompt on either the primary or the backup Heartbeat server.

  3. Now kill the Heartbeat daemons on the primary server with the following command:

     primaryserver> killall -9 heartbeat
     
     
    Note 

    Stopping Heartbeat on the primary server using the Heartbeat init script (service heartbeat stop) will cause Heartbeat to release its resources. Thus the backup server will not need to reset the power of the primary server. To test a Stonith device, you'll need to kill Heartbeat on the primary server without allowing it to release its resources. You can also test the operation of your Stonith configuration by disconnecting all physical paths for heartbeats between the servers.

  4. Watch the /var/log/messages file on the backup server with the command:

     backupserver> tail -f /var/log/messages
     

    You should see Heartbeat issue the Meatware Stonith warning in the log and then wait before taking over the resource (sendmail in this case):

     backupserver heartbeat[835]: info: **************************
     backupserver heartbeat[835]: info: Configuration validated. Starting
     heartbeat <version>
     backupserver heartbeat[836]: info: heartbeat: version <version>
     backupserver heartbeat[836]: info: Heartbeat generation: 3
     backupserver heartbeat[836]: info: UDP Broadcast heartbeat started on port
     694 (694) interface eth0
     backupserver heartbeat[841]: info: Status update for server backupserver:
     status up
     backupserver heartbeat: info: Running /etc/ha.d/rc.d/status status
     backupserver heartbeat[841]: info: Link backupserver:eth0 up.
     backupserver heartbeat[841]: WARN: server primaryserver: is dead
     backupserver heartbeat[841]: info: Status update for server backupserver:
     status active
     backupserver heartbeat[847]: info: Resetting server primaryserver with
     [Meatware Stonith device]
     backupserver heartbeat[847]: OPERATOR INTERVENTION REQUIRED to reset
     primaryserver.
     backupserver heartbeat[847]: Run "meatclient -c primaryserver" AFTER
     power-cycling the machine.
     backupserver heartbeat: info: Running /usr/local/etc/ha.d/rc.d/status
     status
     backupserver heartbeat[852]: info: No local resources [/usr/local/lib/
     heartbeat/ResourceManager listkeys backupserver]
     backupserver heartbeat[852]: info: Resource acquisition completed.
     

    Notice that Heartbeat did not start the sendmail resource; it is waiting for you to clear the Stonith Meatware event.

  5. Clear that event by entering the command:

     backupserver> meatclient -c primaryserver
     

    The /var/log/messages file should now indicate that Heartbeat has started the sendmail resource on the backup server:

     backupserver heartbeat[847]: server primaryserver Meatware-reset.
     backupserver heartbeat[847]: info: server primaryserver now reset.
     backupserver heartbeat[841]: info: Resources being acquired from
     primaryserver.
     backupserver heartbeat: info: Running /usr/local/etc/ha.d/rc.d/stonith
     STONITH
     backupserver heartbeat: info: Running /usr/local/etc/ha.d/rc.d/status
     status
     backupserver heartbeat: stonith complete
     backupserver heartbeat: info: Taking over resource group sendmail
     backupserver heartbeat: info: Acquiring resource group: primaryserver
     sendmail
     backupserver heartbeat: info: Running /etc/init.d/sendmail start
     

Heartbeat on the backup server is now satisfied: it owns the primary server's resources and will listen for Heartbeats to find out if the primary server is revived.

To complete this simulation, you can really reset the power on the primary server, or simply restart the Heartbeat daemon and watch the /var/ log/messages file on both systems. The primary server should request that the backup server give up its resources and then start them up again on the primary server where they belong. (The sendmail daemon should start running on the primary server and stop running on the backup server in this example.)

Note 

Notice in this example how you will have a much easier time performing system maintenance when you place all of your resources on a primary server and leave the backup server idle. However, you may want to place active services on both the primary and backup Heartbeat server and use the hb_standby command to failover resources when you need to do maintenance on one or the other server.

Using a "Real" Stonith Device

Each Stonith device has unique configuration options. To view the configuration options for any Stonith device, enter the command:

 #stonith -help
 

You can also obtain syntax configuration information for a particular Stonith device using the stonith command from a shell prompt as in the following example:

 # /usr/sbin/stonith -l -t rps10 test
 

This causes Stonith to report:

 STONITH: Cannot open /etc/ha.d/rpc.cfg
 STONITH: Invalid config file for rps10 device.
 STONITH: Config file syntax: <serial_device> <server> <outlet> [ <server>
 <outlet> [...] ]
 All tokens are white-space delimited.
 Blank lines and lines beginning with # are ignored
 

This output provides you with the syntax documentation for the rps10 Stonith device. The syntax for the parameters you can pass to this Stonith device (also called the configuration file syntax) include a /dev serial device name (usually /dev/ttyS0 or /dev/ttyS1); the server name that is being supplied power by this device; and the physical socket outlet number on the rps10 device. Thus, the following syntax can be used in the /etc/ha.d/ha.cf file on both the primary and the backup server in the /etc/ha.d/ha.cf configuration file:

 STONITH_host backupserver rps10 /dev/ttyS0 primaryserver.mydomain.com 0
 
 

This line tells Heartbeat that the backup server controls the power supply to the primary server using the serial cable connected to the /dev/ttyS0 port on the backup server, and the primary server's power cable is plugged in to outlet 0 on the rps10 device.

Note 

For Stonith devices that connect to the network you will also need to specify a login and password to gain access to the Stonith device over the network before Heartbeat can issue the power reset command(s). If you use this type of Stonith device, however, be sure to carefully consider the security risk of potentially allowing a remote attacker to reset the power to your server, and be sure to secure the ha.cf file so that only the root account can access it.

Avoiding Multiple Stonith Events

One potential advantage of offering all of the resoures from the primary server and not running any resources on the backup server (which you will recall is a limitation of the single-Stonith configuration I have been describing in this chapter) is that you can simply power off the primary server when the backup server takes ownership of the resources. This prevents the situation where confused Heartbeat servers start reseting each other's power and resources move back and forth between the primary and backup server.

To cause Heartbeat to power down, rather than power reset the primary server during a Stonith event, you can do one of two things:

  1. Look at the driver code in the Heartbeat package for your Stonith device and see if it supports a poweroff (rather than a power reset)—most do. If it does, you can simply change the source code in heartbeat that does the stonith command and change it from this:

     rc = s->s_ops->reset_req(s, ST_GENERIC_RESET, nodename);
     

    to this:

     rc = s->s_ops->reset_req(s, ST_POWEROFF, nodename);
     

    This allows you to power off rather than power reset any Stonith device that supports it.

  2. If you don't want to compile Heartbeat source code, you may be able to simply configure the system BIOS on your primary server so that a power reset event will not cause the system to reboot. (Many systems support this, check your system documentation to see if yours is one of them.)

[5]Formerly (prior to version 1.1.2) the nice_failback option turned on.

Network Failures

Even with these preparations, we still have not eliminated all single points of failure in our two-server Heartbeat failover design. For example, what happens if the primary server simply loses the ability to communicate properly with the client computers over the normal, or production network?

In such a case, if you have a properly configured Heartbeat configuration the heartbeats will continue to travel to the backup server. This is thanks to the redundancy you built into your Heartbeat paths (as described in Chapter 8), and no failover will occur. Still, the client computers will no longer be able to access the resource daemons on the primary server (the cluster resources).

We can solve this problem in at least two ways:

  • Run an external monitoring package, like the Perl program Mon, on the primary server and watch for the failure of the public NIC. When Mon detects the failure of this NIC it should shut down the Heartbeat daemon (or force it into standby mode) on the primary server. The backup server will then takeover the resources and, assuming it is healthy and can communicate on its public network interface, the client computers will once again have access to the resources. (See Chapter 17 for more information about Mon.)

  • Use the ipfail API plug-in, which allows you to specify one or more ping servers in the Heartbeat configuration file. If the primary server suddenly fails to see one of the ping servers, it asks the backup server, "Did you see that ping server go down too?" If the backup server can still talk to the ping server, it knows that the primary server is not communicating on the network properly and it should now take ownership of the resources.

ipfail

Beginning with Heartbeat version 0.4.9d, the ipfail plug-in is included with the Heartbeat RPM package as part of the standard Heartbeat distribution. To use ipfail, decide which network device (IP address) both Heartbeat servers should be able to ping at all times (such as a shared router, a network switch that is never supposed to go offline, and so on). Next, enter this IP address in your /etc/ha.d/ha.cf file and tell Heartbeat to start the ipfail plug-in each time it starts up:

 #vi /etc/ha.d/ha.cf
 
 

Add three lines before the final server lines at the end of the file like so:

 respawn hacluster /usr/lib/heartbeat/ipfail
 ping 10.1.1.254 10.1.1.253
 auto_failback off
 

The first line above tells Heartbeat to start the ipfail program on both the primary and backup server,[6] and to restart or respawn it if it stops, using the hacluster user created during installation of the Heartbeat RPM package. The second line specifies one or more ping servers or network nodes that Heartbeat should ping at heartbeat intervals to be sure that its network connections are working properly. (If you are building a firewall machine, for example, you will probably want to use ping servers on both interfaces, or networks.[7])

Note 

If you are using a version of Heartbeat prior to version 1.1.2, you must turn nice_failback on. Version 1.1.2 and later allow auto_failback (the replacement for nice_failback but with the opposite meaning) to be either on or off.

Now start Heartbeat on both servers and test your configuration. You should see a message in /var/log/messages indicating that Heartbeat started the ipfail child client process. Try removing the network cable on the primary server to break the primary server's ability to ping one of the ping servers, and watch as ipfail forces the primary server into standby mode. The backup server should then take over the resources listed in haresources.

[6]The /etc/ha.d/ha.cf configuration file should be the same on the primary and backup server.

[7]And, of course, be sure to configure your iptables or ipchains rules to accept ICMP traffic (see Chapter 2).

Watchdog and Softdog

The kernel has its own method to handle a hung system, called watchdog. Watchdog is simply a kernel module that checks a timer to make sure the system is healthy. If watchdog thinks the kernel is hung, it can take drastic action such as a system reboot. If you want to protect your high-availability server configuration from a server hang that causes an interruption in services even when the server hang is not detected by Heartbeat, you should enable watchdog in your kernel.

Note 

We are talking about a server hang here and not an application problem. Heartbeat (prior to Heartbeat release 2, which is not yet available as of this writing) does not monitor resources or the applications under its control to see if they are healthy—to do this you need to use another package such as the Mon monitoring system discussed in Part IV.

A watchdog device is normally connected to a system to allow the kernel to determine whether the system has hung (when the kernel no longer sees the external timer device updating properly, it knows that something has gone wrong).

The watchdog code also supports a software replacement for external hardware timers called softdog. Softdog maintains an internal timer that is updated as soon as another process on the system writes to the /dev/watchdog device file. If softdog doesn't see a process write to the /dev/watchdog file, it assumes that the kernel must be malfunctioning, and it will initiate a kernel panic. Normally a kernel panic will cause a system to shut down, but you can modify this default behavior and, instead, cause the system to reboot.

Enable Watchdog in the Kernel

To enable watchdog in the kernel, you first need to make sure that the softdog module is compiled for your kernel.

Note 

On a normal Red Hat or SuSe distribution you will not need to add watchdog to your kernel because the modular version of the Red Hat kernel contains a compiled copy of the softdog module already.

If you have compiled your own kernel from source code, run makemenu config from the /usr/src/linux directory, and check or enable the "Software Watchdog" option on the following submenu:

  • Character Devices

    • Watchdog Cards --->

      • [*] Watchdog Timer Support

        • [M] Software Watchdog (NEW)

If this option was not already selected in the kernel, follow the steps in Chapter 3 to recompile and install your new kernel. If you are using the standard modular version of the kernel included with Red Hat (or if you have just finished compiling your own kernel with modular support for the Software Watchdog), enter the following commands to make sure that the module loads into your running kernel:

 #insmod softdog
 #lsmod
 

You should see softdog listed. Normally the Heartbeat init script will insert this module for you if you have enabled watchdog support in /etc/ ha.d/ha.cf as described later in this section. Assuming that watchdog is enabled, you should now remove it from the kernel and allow Heartbeat to add it for you when it starts. Remove softdog from the kernel with the command:

 #modprobe -r softdog
 
 

Kernel Panic—Hang or Reboot?

To force the system to reboot instead of halt if the kernel panics, modify the boot arguments passed to the kernel. To do this on a system using the Lilo[8] boot loader, for example, edit /etc/lilo.conf near the top of the file, and before image= lines so it will take effect for all versions of the kernel you have configured, and add the following line:

 append="panic=60"
 

Then be sure to run the command:

 #lilo -v
 

Alternatively, you could also use the command:

 #echo 60 > /proc/sys/kernel/panic
 

But you would need to add this command to an init script so that it would be executed each time the system boots.

Configure Heartbeat to Support Watchdog

In addition to using the softdog timer as we've just described (as part of the normal configuration of your server to improve its reliability when the system hangs) you can tell Heartbeat to update the softdog timer. This lets watchdog know that Heartbeat is running and healthy. If the timer doesn't get updated, watchdog will notice and force a kernel panic. In effect, we are telling watchdog to watch Heartbeat.

Note 

With Heartbeat release 1.2.3, you can have apphbd watch Heartbeat and then let watchdog watch apphbd instead.

When you enable the watchdog option in your /etc/ha.d/ha.cf file, Heartbeat will write to the /dev/watchdog file (or device) at an interval equal to the deadtime timer raised to the next second. Thus, should anything cause Heartbeat to fail to update the watchdog device, watchdog will initiate a kernel panic once the watchdog timeout period has expired (which is one minute, by default).

 #vi /etc/ha.d/ha.cf
 

Uncomment the line:

 watchdog /dev/watchdog
 
 

Now restart Heartbeat to give the Heartbeat init script the chance to properly configure the watchdog device, with the command:

 #service heartbeat restart
 

You should see softdog listed when you run:

 #lsmod
 
 
Note 

You should do this on all of your Heartbeat servers to maintain a consistent Heartbeat configuration.

To test the watchdog behavior, kill all of the running Heartbeat daemons on the primary server with the following command:

 #killall -9 heartbeat
 

You will see the following warning on the system console and in the /var/ log/messages file:

 Softdog: WDT device closed unexpectedly. WDT will not stop!
 

This error warns you that the kernel will panic. Your system should reboot instead of halting if you have modified the /proc/sys/kernel/panic value as described previously.

[8]See Chapter 3 for a discussion of the Lilo boot loader.

Testing Your Heartbeat Configuration

Before you put your Heartbeat high-availability server pair into production, here are a few things to try:

Unplug the power cord on the primary server

  • Heartbeat on the backup server should detect the loss of heartbeat packets from the primary server and initiate a failover. Using Stonith, the backup server should turn off or reset the power to the primary server. Heartbeat on the backup server should then run the proper resource scripts (when the Stonith event has "cleared" or completed) to take ownership of the resources. Heartbeat on the backup server should also send gratuitous ARP broadcasts to notify clients and/or network equipment that the MAC addresses for the resource IP addresses have changed.

    Client computers, and network equipment such as routers and switches should update their ARP caches to reflect the new MAC address of the backup server. Check the ARP cache on your Cisco IOS network equipment, for example, with the command:

     show ip arp
     
     
  • Or use the command:

     show ip arp 209.100.100.3
     

    where 209.100.100.3 is the IP alias that fails over to the backup server. The MAC address should change automatically when the backup server sends out gratuitous ARP broadcasts.[9]

    Check all of the client computers that share the same network broadcast address with this command, which works on Windows PCs and Linux hosts:

     arp -a
     

Test the behavior of the hb_standby command

  • Use the hb_standby command on the primary server to force resources to failover to the backup server. Then use the command again on the backup server to force the resources back to the primary server. ipfail will not work properly if the hb_standby command is not working properly.

Unplug the production network cable on the primary server

  • Using ipfail (or Mon,[10] or a similar monitoring tool), the network connection failure should be detected and the resources and IP aliases should failover to the backup server.

Remove one of the heartbeat paths between the two servers

  • Use more than one heartbeat path between the servers to avoid false positives (the backup server incorrectly assumes the primary server has died). When you remove only one of these paths, such as the crossover network cable or serial cable connecting them, nothing should happen.

Remove all of the heartbeat paths between the two servers

  • What happens when you remove all heartbeat paths between the two servers? If you are using Stonith, the backup server should assume that the primary server has died, initiate a Stonith event, and take over the resources. What happens next depends upon how you have Stonith configured and whether or not you are using the auto_failback option.

    With two Stonith devices (each server controlling the other server's power supply) and the auto_failback option turned on, the two servers may start repeatedly cycling each other's power or Stonithing each other. To avoid this, you can disable auto_failback or use the method described earlier in this chapter to power off rather than power cycle the primary server.

Kill the heartbeat daemon on the primary server (killall -9 heartbeat)

  • Stonith is especially important when you are using IP aliases to offer resources to client computers. The backup server must Stonith or power off/reset the primary server before trying to assume ownership of the resources to avoid a split-brain condition.

Kill the resource daemon(s) on the primary server

  • This case was not addressed by the Heartbeat configuration used in this chapter. Depending on your needs, you can run cl_status, included with the Heartbeat package, or cl_respawn, which is also included with the Heartbeat package, to monitor or automatically restart services when they fail. You can also use the Mon application (described in detail in Chapter 17) to monitor daemons and then take specific action (such as sending an alert to your cell phone or pager) when the service daemons fail.

Power reset (or reboot) both servers

  • Do the servers boot properly and leave the resources on the primary server where they belong once both systems have finished booting? You may need to adjust the initdead time in the /etc/ha.d/ha.cf file if the backup server tries to grab the resources away from the primary server before it finishes booting.

[9]You may need to make sure the Cisco equipment accepts Gratuitous ARP broadcasts with the Cisco IOS command ip gratuitous-arps.

[10]See Chapter 17.

In Conclusion

Preventing the possibility of a split-brain condition while eliminating single points of failure in a high-availability server configuration is tricky. The goal of your high-availability server configuration should be to accomplish this with the least amount of complexity required. When you finish building your high-availability server pair you should test it thoroughly and not feel satisfied that you are ready to put it into production until you know how it will behave when something goes wrong (or to use the terminology sometimes used in the high-availability field: test your configuration for its ability to properly handle a fault, and make sure you have good fault isolation).

This ends Part II. I have spent the last several chapters describing how to offer client computers access to a service or daemon on a single IP address without relying on a single server. This capability—the ability to move a resource from one server to another—is an important building block of the high-availability cluster configuration that will be the subject of Part III.

Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье