Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
 iakovlev.org 
      Languages 
      Kernels 
      Packages 
      Books 
      Tests 
      OS 
      Forum 
      Математика 
NEWS
Последние статьи :
  Rust 07.11   
  Go 25.12   
  EXT4 10.11   
  FS benchmark 15.09   
  Сетунь 23.07   
  Trees 25.06   
  Apache 03.02   
  SQL 30.07   
  JFS 10.06   
  B-trees 01.06   
 
TOP 20
 2.0-> Linux IP Networking...471 
 Steve Pate 3...362 
 Cluster 3...308 
 Trees...290 
 Steve Pate 1...280 
 Daniel Bovet 2...267 
 Go Web ...265 
 Secure Programming for Li...265 
 Ethreal 4...258 
 Mike Perry...253 
 Stevens-> Глава 25...249 
 Steve Pate 2...248 
 Frolov 4...245 
 Rodriguez 2...242 
 Ethreal 1...240 
 Stein-MacEachern-> Час...234 
 Kamran Husain...233 
 Rubni-Corbet -> Глав...232 
 William Gropp...231 
 Bauer-> Appendix C...230 
 
  01.09.2018 : 2525283 посещений 

iakovlev.org

Часть 8: Поддержка Heartbeat

This chapter describes how to use the high-availability resources file, haresources, to control resources on a pair of servers[1] running Heartbeat. We will also explore some of the common maintenance tasks required to keep the Heartbeat high-availability system functioning properly.

The Haresources File Syntax

The /etc/ha.d/haresources file must be the same on both the primary and the backup Heartbeat servers.

Each line in the haresources file usually contains the following:

  • The name of the server where the resource should normally run (the primary server), followed by a space or tab.

  • An (optional) IP alias that Heartbeat should add to the system before launching the resource, followed by a space. (The IP alias definition may include a network subnet mask and a broadcast address separated from each other by the forward slash (/) character.)

  • A resource script (the script used to start and stop the resource) located in either the /etc/init.d or the /etc/ha.d/resource.d directory.[2] If arguments need to be passed to the resource script, they are added after two colons and are separated from each other by two colons.

Additional resource scripts can be added to the line using the space character as a separator.

Note 

If you need to create a haresources line that is longer than the line of text that fits on your screen, you can add the backslash character (\) to indicate that the haresources entry continues on the next line.

A simplified summary of this syntax, for a single line with two resources, each with two arguments, looks like this:

 primary-server [IPaddress] resource1[::arg1::arg2] [resource2[::arg1::arg2]
 

In practice, this line might look like the following on a server called primary.mydomain.com running sendmail and httpd on IP address 209.100.100.3:

 primary.mydomain.com 209.100.100.3 sendmail httpd
 

Let's examine each element of the haresources file in more detail.

Haresources File Syntax: Primary-Server Name

The primary-server name you enter at the start of the haresources line should match one of the server names you've already specified in the /etc/ha.d/ha.cf file. It should also match the name returned by the uname -n command on the primary server.

Haresources File Syntax: IP Alias

Although it is not required, an IP alias[3] is usually specified in the haresources file. This IP alias can then be offered from either the primary or the backup server, depending upon which system is healthy. For example:

 primary.mydomain.com 209.100.100.3
 
 

Heartbeat will add 209.100.100.3 as an IP alias to one of the existing NICs connected to the system and send Gratuitous ARP[4] broadcasts out of this NIC to the locally connected computers when it first starts up. It will only do this on the backup server if the primary server goes down.

Actually, when Heartbeat sees an IP address in the haresources file, it runs the resource script in the /etc/ha.d/resource.d directory called IPaddr and passes it the requested IP address as an argument. The /etc/ha.d/ resource.d/IPaddr script then calls the program included with the Heartbeat package, findif (find interface), and passes it the IP alias you want to add. This program then automatically selects the physical NIC that this IP alias should be added to, based on the kernel's network routing table.[5] If the findif program cannot locate an interface to add the IP alias to, it will complain in the /var/log/messages file with a message such as the following:

 heartbeat: ERROR: unable to find an interface for 200.100.100.3
 
 
Note 

You cannot place the primary IP address for the interface in the haresources file. Heartbeat can add an IP alias to an existing interface, but it cannot be used to bring up the primary IP address on an interface.

Heartbeat's Automated Network Interface Card Selection Process

Heartbeat uses the findif program to select which NIC the IP alias you specify in the haresources file will be added by comparing the IP alias to each of the destination addresses listed in your kernel's network routing table. As described in Chapter 2, you can view this routing table with the route -n command. For example:

 #route -n
 Kernel IP routing table
 Destination      Gateway        Genmask          Flags    Metric    Ref    Use   Iface
 200.100.100.2    0.0.0.0        255.255.255.0    U        0         0      0     eth1
 10.1.1.0         0.0.0.0        255.255.255.0    U        0         0      0     eth2
 127.0.0.0        0.0.0.0        255.0.0.0        U        0         0      0     lo
 0.0.0.0          200.100.100.1  0.0.0.0          UG       0         0      0     eth1
 
 
Note 

The output of this command is based on entries the kernel stores in the /proc/net/route file that is created each time the system boots using the route commands in the /etc/init.d/network script on a Red Hat system. See Chapter 2 for an introduction to the kernel network routing table.

When findif is able to match the network portion of the destination address in the routing table with the network portion of the IP alias from the haresources file, it returns the interface name (eth1, for example) associated with the destination address to the IPaddr script. The IPaddr script then adds the IP alias to this local interface and adds another entry to the routing table to route packets destined for the IP alias to this local interface. So the routing table, after the 209.100.100.3 IP alias is added, would look like this:

 #route -n
 Kernel IP routing table
 Destination      Gateway        Genmask          Flags    Metric    Ref    Use   Iface
 200.100.100.2    0.0.0.0        255.255.255.0    U        0         0      0     eth1
 200.100.100.3    0.0.0.0        255.255.255.0    U        0         0      0     eth1
 10.1.1.0         0.0.0.0        255.255.255.0    U        0         0      0     eth2
 127.0.0.0        0.0.0.0        255.0.0.0        U        0         0      0     lo
 0.0.0.0          200.100.100.1  0.0.0.0          UG       0         0      0     eth1
 

The IPaddr script has executed the command route add -host 209.100.100.3dev eth1.

Finally, to complete the process of adding the IP alias to the system, the IPaddr script sends out five gratuitous ARP broadcasts to inform locally connected computers that this IP alias is now associated with this interface.

Note 

If more than one entry in the routing table matched the IP alias being added, the findif program will use the metric entry in the routing table to select the interface with the fewest hops (the lowest metric).

Finding the Right Network

To compare the network portion of the IP alias to the network portion of the "destination" address entry in the routing table, findif needs to know which portion of the address represents a network and which portion represents a node. In other words, it needs to know which network mask to apply to both the destination address in the routing table and the IP alias from the haresources file before it can determine whether the network portions of the two addresses match.

The findif program will use the network mask for each entry in the routing table and apply it to both the routing table entry and the IP alias being requested to see if they match. If the two network portions of the addresses match, they are on the same network, and findif knows to add the IP alias being requested to the interface associated with this routing table entry.

The findif program will not use the default route specified in your routing table (the entry with a destination address of 0.0.0.0) unless you specify a network mask in your haresources file. For example, the entry:

 primary.mydomain.com 209.100.100.3/24 myresource
 

says that Heartbeat should use a network mask of 255.255.255.0. It also says that if no other entry in the routing table with its associated network mask applied to it matches this address, the default route in the routing table should be used.

However, under normal circumstances your routing table has an entry that will match your IP alias correctly without the need to consult the default route, so you will probably never need to enter the network mask in the haresources file. In the above routing table, for example, before Heartbeat added the IP alias, the first entry looked like this:

 Destination       Gateway      Genmask         Flags     Metric    Ref    Use    Iface
 200.100.100.2     0.0.0.0      255.255.255.0   U         0         0      0      eth1
 

This entry matches the 209.100.100.3 IP alias once the 255.255.255.0 netmask from this routing table entry is applied to both addresses for the comparison (both addresses are on the 209.100.100 network). So the correct interface (eth1 in this case) is selected even though the default route was not used in the interface selection process.

Specifying a Network Interface Card

You can avoid this auto-selection process by specifying the interface Heartbeat should use in the haresources file with an entry like this:

 primary.mydomain.com 209.100.100.3/24/eth0/209.100.100.255
 

This entry uses the following syntax:

 primary-server IPalias/number-of-netmask-bits/interface-name/broadcast-address
 

Thus, in this example, the IP alias is 209.100.100.3 with a 24-bit netmask (equivalent to a network mask of 255.255.255.0) on network interface card eth0, using a broadcast address of 209.100.100.255.

To specify this as the IP alias and interface to be used for the httpd daemon, enter the following line:

 primary.mydomain.com 209.100.100.3/24/eth0/209.100.100.255 httpd
 

With this entry in the haresources file, Heartbeat will always use the eth0 interface for the 209.100.100.3 IP alias when adding it to the system for the httpd daemon to use.[6]

Customizing IP Address Takeover with the iptakeover Script

If you need to modify the routing table as part of the process of taking over an IP address, then you may want to use the iptakeover script described in Chapter 7 to perform Gratuitous ARP broadcasts yourself. To use this script, add a line like the following to the haresources file.

 primary.mydomain.com iptakeover myresource
 

Heartbeat will then run /etc/ha.d/resource.d/iptakeover status followed by /etc/ha.d/resource.d/myresource start. This makes it possible for you to decide exactly which interface you want your IP alias to appear on, to modify the routing table if need be, and to perform the Gratuitous ARP broadcasts from a single script. However, under most circumstances this is not required and should be avoided if possible—specify an IP address in the haresources file and let Heartbeat do all of this work for you.[7]

The Haresources File Syntax: Resources

Each line in the haresources file can contain one or more resource script names. The resources are separated by a space. Arguments can be passed to the resource scripts using two colons between the arguments. For example, if you need Heartbeat to send a special argument (let's say FILE1) to your resource script (before the word start, status or stop), you would use the syntax:

 primary.mydomain.com myresource::FILE1
 

Assuming you add this line to the haresources file on both the primary and the backup server, Heartbeat will run[8] /etc/ha.d/resource.d/myresource FILE1 start when it first starts on the primary server, and then again on the backup server when the primary server fails. When the resource needs to be "released" or stopped, Heartbeat will run the script with the command /etc/ ha.d/resource.d/myresource FILE1 stop.

If we wanted to combine our iptakeover script with the myresource script and its FILE1 argument, we would use the line:

 primary.mydomain.com iptakeover myresource::FILE1
 

To send your resource script several arguments, enter them all on the same line after the script name with each argument separated by a pair of colons. For example, to send the myresource script the arguments FILE1, UNAME=JOHN, and L3, your haresources entry would look like this:

 primary.mydomain.com iptakeover myresource::FILE1::UNAME=JOHN::L3
 
 
Note 

The haresources syntax is also documented online at http://wiki.trick.ca/linuxha/ HeartbeatResourceAgent.

Resource Groups

Until now, we have only described resources as independent entries in the haresources file. In fact, Heartbeat considers all the resources specified on a single line as one resource group. When Heartbeat wants to know if a resource group is running, it asks only the first resource specified on the line. Multiple resource groups can be added by adding additional lines to the haresources file.

For example, if the haresources file contained an entry like this:

 primary.mydomain.com iptakeover myresource::FILE1::UNAME=JOHN::L3
 

only the iptakeover script would be called to ask for a status when Heartbeat was determining the status of this resource group. If the line looked like this instead:

 primary.mydomain.com myresource::FILE1::UNAME=JOHN::L3 iptakeover
 

Heartbeat would run the following command to determine the status of this resource group (assuming the myresource script was in the /etc/ha.d/ resource.d directory):

 /etc/ha.d/resource.d/myresource FILE1 UNAME=JOHN L3 status
 
 
Note 

If you need your daemon to start before the IP alias is added to your system, enter the IP address after the resource script name with an entry like this:

     primary.mydomain.com myresource IPaddr::200.100.100.3
 

Resource Script Arguments and Resource Groups

To combine Heartbeat's ability to send arguments to a script with its ability to create multiple resource groups, you could, for example, write one script that started both SERVICE-A and SERVICE-B based upon the argument it was passed. For example, let's call this combined resource script resAB and assume it can handle the argument SERVICE-A or SERVICE-B followed by the word start, stop, or status to control both daemons. You could then create haresources entries like this:

 primary.mydomain.com resAB::SERVICE-A
 primary.mydomain.com resAB::SERVICE-B
 

Using this haresources file entry, when Heartbeat wanted to know if these resource groups were active, it would run the commands:

 /etc/ha.d/resource.d/resAB SERVICE-A status
 /etc/ha.d/resource.d/resAB SERVICE-B status
 
 

and it would start the resource group by executing:

 /etc/ha.d/resource.d/resAB SERVICE-A start
 /etc/ha.d/resource.d/resAB SERVICE-B start
 

[1]In this book, we are only concerned with Heartbeat's ability to failover a resource from a primary server to a backup server.

[2]Recall from Chapter 6 that this is the system init directory (/etc/init.d, /etc/rc.d/init.d/, /sbin/ init.d, /usr/local/etc/rc.d or /etc/rc.d) and the Heartbeat resource directory (/etc/ha.d/resource.d).

[3]IP aliases were introduced in Chapter 6.

[4]GARP broadcasts were also introduced in Chapter 6.

[5]See "Routing Packets with the Linux Kernel" in Chapter 2.

[6]Note that the resource daemon (httpd in this case) may also need to be configured to use this interface or this IP address as well.

[7]Also, as we'll see in Part III, you'll want to leave IP alias assignment under Heartbeat's control so Ldirectord can failover to the backup load balancer.

[8]This assumes the script myresource is located in the /etc/ha.d/resource.d directory and not in the /etc/init.d directory—in which case the command would be /etc/rc.d/resource.d FILE1 start.

Load Sharing with Heartbeat

Using Heartbeat and two computers, we can offer one daemon, or service, from the primary server and then offer a different service from the backup server. If either server fails, the other one will start offering both services.

This is a form of load sharing called an active-active server configuration, but to use it you will have to ensure that each system has the processing power and ability to handle the network load that will allow it to offer both services in the event of a failure. This configuration, however, is much more difficult to administer and support in a production environment. Neither server can go down for maintenance or upgrades without causing a failover of at least one service. Figure 8-1 shows a sample diagram of this type of Heartbeat configuration.

Image from book
Figure 8-1: Heartbeat active-active configuration

The two-line haresources entry used to create the configuration shown in Figure 8-1 (again, the primary and the backup server should always have identical haresources files) looks like this:

 primary.mydomain.com 209.100.100.3 sendmail
 backup.mydomain.com 209.100.100.4 httpd
 

Once Heartbeat is running on both servers, the primary.mydomain.com computer will offer sendmail at IP address 209.100.100.3, and the backup. mydomain.com computer will offer httpd at IP address 209.100.100.4. If the backup server fails, the primary server will perform Gratuitous ARP broadcasts for the IP address 209.100.100.4 and run httpd with the start argument. (The names "primary" and "backup" are really meaningless in this configuration because both computers are acting as both a primary and a backup server.) Client computers would always look for the sendmail resource at 209.100.100.3 and the http resource at 209.100.100.4, and even if one of these servers went down, the resource would still be available once Heartbeat started the resource and moved the IP alias over to the other computer.

Note 

This configuration is more difficult to administer than an active-standby configuration, because you will always be making changes on a "live" system unless you failover all resources so they run on a single server before you do your maintenance.

This configuration is also not recommended when using local data (data stored on a locally attached disk drive of either server) because complex data replication and synchronization methods must be used.[9]

Load Sharing with Heartbeat: Round-Robin DNS

But what if you wanted to offer just one resource from both computers and have them share the work? This is the goal of the cluster described in this book, and it will be possible to attain that goal using load balancing software described in Part III. However, for now we can achieve a simple form of load balancing using round-robin DNS.

One feature of the Domain Name System called round-robin DNS is that it allows you to offer one service on two (or more) IP addresses. For example, the host name (or web URL) is first resolved at IP address 209.100.100.4, then at 209.100.100.3, and then back at 209.100.100.4 again, and so on in roundrobin fashion. The DNS (BIND version 4.9.3 or later) entry for your server might look like this:

 ;Round-robin entry for www.mydomain.com
 www.mydomain.com    IN  A  209.100.100.3
 www.mydomain.com    IN  A  209.100.100.4
 

with reverse address entries for the 209.100.100.3 and 209.100.100.4 IP addresses that look like this:

 3         IN PTR www.mydomain.com
 4         IN PTR www.mydomain.com
 
 

The entry in your haresources file for this type of configuration would then look like this:

 primary.mydomain.com 209.100.100.3 httpd
 backup.mydomain.com 209.100.100.4 httpd
 

Problems with Round Robin DNS Load Balancing

When using round-robin DNS, the two servers would, in theory, each get half of the client requests, and Heartbeat would ensure that both IP addresses are available even if one of the servers goes down. Most client computers, however, have a name services caching daemon or NSCD that will cause them to remember (at least for a while) an IP address once they learn it. Storing this IP address on the client computer reduces the need for the client computer to repeatedly ask, "What is the IP address for this host name?" and helps improve the chances that client computers will not enter into a dialog (such as an HTTPS secure transaction) with one web server only to end up improperly sending a response (such as a credit card number) to another web server.

Caching IP address-to-host name can cause a cache-only DNS server on the Internet to respond to client requests for an IP address with a nonauthoritative reply using only one of the IP addresses. This intervening, cache-only DNS server effectively blocks the round-robin DNS replies from your authoritative DNS server.

You can try to stop this behavior by setting a very low time to live (TTL) value for your DNS replies. Once the amount of time specified in your time to live entry elapses, the intervening DNS server should drop the IP address-to-host name mapping it has stored in its memory and ask your authoritative DNS server once again for the proper IP address.

But there's a problem: If everyone on the Internet started using a onesecond DNS TTL value, every client computer on the Internet would effectively end up asking only the authoritative DNS server for the correct IP address when doing a host name-to-IP address resolution. This would circumvent the design of the DNS system whereby intervening DNS servers cache this information and thus reduce both the time it takes to resolve a host name and the amount of DNS traffic that has to be passed around on the Internet. (A very low TTL value for DNS entries means a more heavily loaded authoritative DNS server.[10])

Using round-robin DNS and this type of heartbeat configuration for load sharing, however, lets you locate your two servers at two different physical locations. In the event of a true disaster, one of the servers would be able to take over both (or all) IP addresses, and once the routers on the Internet figured out where the IP address had moved to, eventually allow the client computers to continue to connect to your web server at its new location.

Note 

This configuration is very susceptible to a split-brain condition. If exclusive access to resources is required, automatic failover mechanisms that require heartbeats to traverse a WAN or a public network should not be used.

Wide-Area Load Balancing

Now, what if the client computer on the Internet is much closer to one of the two web servers offering the same web page and it accidentally ends up with the IP address that happens to be on a server located on the other side of the world? It would make sense to offer your resources from the servers closest to the Internet client and only force the client to route to another server in the event of a failure or system crash.

This is called wide-area load balancing or globally distributed content and can be accomplished with an open source program for Linux called Super Sparrow. A server running Super Sparrow will examine the client's source address and determine whether the client computer would be better off talking to a different server with synchronized content that is closer to the client computer. If so, the client computer's request is redirected to the closest available server. (For more information, see the Super Sparrow website at http://www.supersparrow.org.)

[9]See Chapter 7, "Failover Configurations and Issues," in Blueprints for High Availability by Evan Marcus and Hal Stern.

[10]To prevent this problem from ever happening, many DNS clients and servers ignore small TTL values and use cached information anyway.

Operator Alerts: Audible Alarm

To cause an alarm to sound when the backup server has to take over for the primary server, use a haresources entry like this:

 primarynode AudibleAlarm::primarynode
 

This entry says that the host named primarynode should normally "own" the resource AudibleAlarm, but that the AudibleAlarm should never sound on the primary server. The AudibleAlarm resource, or script, allows you to specify a list of host names that should never sound an alarm. In this case, we are telling the AudibleAlarm script not to run or sound on the primarynode. When a failover occurs, Heartbeat running on the backup server will sound the alarm (an audible beep every one second).

Note 

These haresources entries rely on the scripts /etc/ha.d/resource.d/AudibleAlarm and / etc/ha.d/resource.d/MailTo. These scripts are located in the chapter8 subdirectory on the CD-ROM. You can also download them from the Linux-ha CVS repository (see the heartbeat/resource.d directory for additional scripts).

The AudibleAlarm script can also be modified to flash the floppy drive light if you have installed the fdutils packages. The fdutils package contains a utility called floppycontrol. You can easily download and compile the fdutils package (download the tar file from http://fdutils.linux.lu, then run ./configure, then make) and uncomment the lines in the /etc/ha.d/ resource.d/AudibleAlarm script to make the floppy drive light flash.

Operator Alerts: Email Alerts

To send an email alert, use the MailTo resource script with a haresources entry like this:

 primarynode MailTo::operator@mailhost.com,root@mailhost.com
 

or to specify a subject line for an email alert, use:

 primarynode MailTo::operator@mailhost.com,root@mailhost.com::Mysubject
 

Heartbeat Maintenance

Thanks to the fact that Heartbeat resource scripts are called by the heartbeat daemon with start, stop, or status requests, you can restart a resource without causing a cluster transition event. For example, say your Apache web server daemon is running on the primary.mydomain.com web server, and the backup.mydomain.com server is not running anything; it is waiting to offer the web server resource in the event of a failure of the primary computer. If you needed to make a change to your httpd.conf file (on both servers!) and you wanted to stop and restart the Apache daemon on the primary computer, you would not want this to cause Heartbeat to start offering the service on the backup computer. Fortunately, you can run the /etc/init.d/httpd restart command (or /etc/init.d/httpd stop followed by the /etc/init.d/httpd start command) without causing any change to the cluster status as far as Heartbeat is concerned.

Thus, you can safely stop and restart all of the cluster resources Heartbeat has been asked to manage, with perhaps the exception of filesystems, without causing any change in resource ownership or causing a cluster transition event. Of course, many daemons will also recognize the SIGHUP (or kill -HUP <process-ID-number>) command as well, so you can force a resource daemon to reload its configuration files after making a change without stopping and restarting it.

Again, in the case of the Apache httpd daemon, if you change the httpd.conf file and want to notify the running daemons of the change, you would send them the SIGHUP signal with the following command:

 #kill -HUP `cat /var/run/httpd.pid`
 
 
Note 

The file containing the httpd parent process ID number is controlled by the PidFile entry in the httpd.conf file (this file is located in the /etc/httpd/conf directory on Red Hat Linux systems).

Changing Heartbeat Configuration Files

If you need to make a change to the heartbeat configuration file /etc/ha.d/ authkeys, or /etc/ha.d/ha.cf, you can force the running heartbeat daemon to reload these configuration files with the following command.

 #/etc/init.d/heartbeat reload
 

or

 #service heartbeat reload
 

When you change a haresources file, you need to restart Heartbeat on both the primary and the backup server to make your changes take effect (the reload option will not work).

Server Maintenance and the Heartbeat auto_failback Option

Normally, when the primary server crashes and the backup server takes ownership of a resource, the backup server will only hold this resource until the primary server comes back up. Once the primary server is up and running again, the backup server will release the resource and the primary server will assume ownership once again; it will start the resource script and start offering the service to client computers. This is the default heartbeat failback configuration.

To modify this Heartbeat behavior, add the following line before the node entries in your /etc/ha.d/ha.cf file (through version 1.1.2 of Heartbeat):

 nice_failback on
 

For Heartbeat versions 1.1.2 and later, the syntax is more intuitively obvious:

 auto_failback off
 

These options tell Heartbeat to leave the resource on the backup server even after the primary server comes back on line. Make this change to the ha.cf file on both heartbeat servers and then issue the following command on both servers to tell Heartbeat to re-read its configuration files:

 #/etc/init.d/heartbeat reload
 

You should see a message like the following in the /var/log/messages file:

 heartbeat[1032]: info: nice_failback is in effect.
 

This configuration is useful when you want to perform system maintenance tasks that require you to reboot the primary server. When you take