Термин cluster в основном используется для описания распределенной вычислительной системы. Gregory Pfister в своей книге от 1997 года, В поисках кластера,^[1] около 500 страниц посвятил описанию этой проблемы. Он дал следующее определение :

Кластер - это такой тип параллельной вычислительной системы,который :

В дальнейшем я буду использовать термин Linux Enterprise Cluster для описания его свойств и архитектуры.

^[1]This book (and the artwork on its cover) inspired the Microsoft Wolfpack product name.

^[2]Such as the Distributed Lock Manager discussed in Chapter 16.

^[3]MPI is a library specification that allows programmers to develop applications that can share messages even when the applications sharing the messages are running on different nodes. A carefully written application can thus exploit multiple computer nodes to improve performance.

^[4]This is assuming that they were already written for a multiuser operating system environment where multiple instances of the same application can run at the same time.

Architecture of the Linux Enterprise Cluster

Let's use Pfister's statement that all clusters should act like "a single unified computing resource" to describe the architecture of an enterprise cluster. One example of a unified computing resource is a single computer, as shown in Figure 1.

Figure 1: Simplified architecture of a single computer

If we replace the CPU in Figure 1 with "a collection of interconnected whole computers," the diagram could be redrawn as shown in Figure 2.

Figure 2: Simplified architecture of an enterprise cluster

The load balancer replaces the input devices. The shared storage device replaces a disk drive, and the print server is just one example of an output device. Let's briefly examine each of these before we describe how to build them.

The Load Balancer

The load balancer sits between the users of the cluster and the "whole computers," which are the nodes that make up the cluster. The load balancer decides how best to distribute the incoming workload across all of the nodes.

In an enterprise cluster, user-based transactions make up the incoming workload. As users make connections to the cluster, such as HTTP web requests or telnet connections, they are assigned to cluster nodes based upon a load-balancing scheme (see Chapter 11).

The Shared Storage Device

The shared storage device acts as the single repository for the enterprise cluster's data, just as a disk drive does inside a single computer. Like a single disk drive in a computer, one of the most important features of this storage device is its ability to arbitrate access to the data so that two programs cannot modify the same piece of data at the same time. This feature is especially important in an enterprise cluster because two or more programs may be running on different cluster nodes at the same time. An enterprise cluster must therefore use a filesystem on all cluster nodes that can perform lock arbitration. That is, the filesystem must be able to protect the data stored on the shared storage device in order to prevent two applications from thinking they each have exclusive access to the same data at the same time (see Chapter 16).

The Print Server

The output device in this example is just one of many possible cluster output devices—a shared print server. A shared print server and other servers that offer services to all cluster nodes are located outside the cluster to arbitrate contention for things like printers, fax lines, and so on. Servers that arbitrate contention for output devices do not know they are servicing cluster nodes and can continue to use the classic client-server model for network communication. As we'll see in the next section, all of the server functions required to service the cluster nodes can be consolidated on to a single highly available server pair.

The Cluster Node Manager

Servers outside the cluster may do a variety of other things as well, such as:

Act as license servers for applications running inside the cluster.
Act as the central user account database.
Provide other services such as DNS and email to all cluster nodes.
Monitor the health of the cluster.

Again, the applications running on these servers outside the cluster should not need to be modified to support the nodes inside the cluster. In other words, the applications offering services to clients that are cluster nodes need not be cluster aware.

These functions can be combined on a high-availability server pair that we will call the cluster node manager. The cluster node manager holds the master user account database (and uses NIS or LDAP to distribute user accounts to all nodes^[5]). The cluster node manager monitors the health of the cluster (using the Mon and Ganglia packages that will be discussed in Part 4); it is also a central print spooler (using LPRng). The cluster node manager can also offer a variety of other services to the cluster using the classic client-server model, such as the ability to send faxes.^[6]

^[5]The Webmin and OSCAR packages both have methods of distributing user accounts to all cluster nodes that do not rely on NIS or LDAP, but these methods can still use the cluster node manager as the central user account database.

^[6]A brief discussion of Hylfax is included near the end of Chapter 17.

No Single Point of Failure

We can further describe the architecture of the enterprise cluster by discussing a basic requirement of any mission-critical system: it must have no single point of failure.

An enterprise cluster should always have the following characteristic: Any computer within the cluster, or any computer the cluster depends upon for normal operation, can be rebooted without rebooting the entire cluster.

One way to be able to reboot servers the cluster depends upon without affecting the cluster is to build high-availability server pairs for all of the servers the cluster depends upon for normal operation. Our simplified cluster drawing could be redrawn as shown in Figure 3.

Figure 3: Enterprise cluster with no single point of failure

This figure shows two load balancers, two print servers, and two shared storage devices servicing four cluster nodes.

In Part II of this book, we will learn how to build high-availability server pairs using the Heartbeat package. Part III describes how to build a highly available load balancer and a highly available cluster node manager. (Recall from the previous discussion that the cluster node manager can be a print server for the cluster.)

Note

Of course, one of the cluster nodes may also go down unexpectedly or may no longer be able to perform its job. If this happens, the load balancer should be smart enough to remove the failed cluster node from the cluster and alert the system administrator. The cluster nodes should be independent of each other, so that aside from needing to do more work, they will not be affected by the failure of a node.

In Conclusion

I've introduced a definition of the term cluster that I will use throughout this book as a logical model for understanding how to build one. The definition I've provided in this introduction (the four basic properties of a Linux Enterprise Cluster) is the foundation upon which the ideas in this book are built. After I've introduced the methods for implementing these ideas, I'll conclude the book with a final chapter that provides an introduction to the physical model that implements these ideas. You can skip ahead now and read this final chapter if you want to learn about this physical model, or, if you just one to get a cluster up and running you can start with the instructions in Part II. If you don't have any experience with Linux you will probably want to start with Part I, and read about the basic components of the GNU/Linux operating system that make building a Linux Enterprise Cluster possible.

Primer

Overview

This book could have been called The GNU/Linux Enterprise Cluster. Here is what Richard Stallman, founder of the GNU Project, has to say on this topic:

Mostly, when people speak of "Linux clusters" they mean one in which the GNU/Linux system is running. They think the whole system is Linux, so they call it a "Linux cluster." However, the right term for this would be "GNU/Linux cluster."

The accurate meaning for "Linux cluster" would be one in which the kernel is Linux. The identity of the kernel is relevant for some technical development and system administration issues. However, what users and user programs see is not the kernel, but rather the rest of the system. The only reason that Linux clusters are similar from the user's point of view is that they are really all GNU/Linux clusters.

A "free software cluster" (I think that term is clearer than "free cluster") would be one on which only free software is running. A GNU/Linux cluster can be a free software cluster, but isn't necessarily one.

The basic GNU/Linux system is free software; we launched the development of the GNU system in 1984 specifically to have a free software operating system. However, most of the distributions of GNU/Linux add some non-free software to the system. (This software usually is not open-source either.) The distributors describe these programs as a sort of bonus, and since most of the users have never encountered any other way to look at the matter, they usually go along with that view. As a result, many of the machines that run GNU/ Linux (whether clusters or not) have non-free software installed as well. Thus, we fail to achieve our goal of giving computer users freedom.

Before I describe how to build a GNU/Linux Enterprise Cluster, let me define a few basic terms that I'll use throughout the book to describe one.

High Availability Terminology

I'll start with the terminology used to describe high availability. I'm not going to split hairs at this point—I'll do that later, when I explore each concept in depth. The important thing here is that I introduce a vocabulary that allows us to talk about multiple computers working together to accomplish a task.

When a program runs, it is called a process. A process running on a Linux system is called a daemon. A daemon and the effects it produces is called a service. A service is called a resource when it is combined with its operating environment (configuration files, data, network mechanism used to access it, and so forth). A failover occurs when a resource moves from one computer to another. A proper failover or high-availability configuration has no single point of failure.^[1]

^[1]I'm departing from the Red Hat cluster terminology here, but I am using these terms as they are defined by Linux-ha advocates (see http://www.linux-ha.org).

Linux Enterprise Cluster Terminology

Now that I've introduced the terminology used to describe high availability, I'd like to introduce a few terms that allow us to describe a GNU/Linux Enterprise Cluster.

In a cluster, all computers (called nodes) offer the same services. The cluster load balancer intercepts all incoming requests for services.^[2] It then distributes the requests across all of the cluster nodes as evenly as possible. A high-availability cluster is able to failover the cluster load balancing resource from one computer to another.

This book describes how to build a high-availability, load-balancing cluster using the GNU/Linux operating system and several free software packages. All of the recipes in this book describe how to build configurations that have been used by the author in production. (Some projects and recipes considered for use in this book did not make it into the final version because they proved too unreliable or too immature for use in production.)

^[2]To be technically correct here, I need to point out that you can build a cluster without one machine acting as a load balancer for the cluster—see http://www.vitaesoft.com for one method of accomplishing this.

Раздел 1: Запуск сервисов

In this chapter we will examine the basic boot procedure used on a Linux server and how to disable or enable a service. This chapter includes a list of services normally found on a Linux server and describes how these services are used on the nodes inside a cluster.

How do Cluster Services Get Started?

In a cluster environment, a daemon or service is started:

By the init daemon when the system boots or enters a runlevel.
By Heartbeat when it first starts or needs to take ownership of a resource.
By xinetd when a network request comes in.
Using some other non-standard method (such as the Daemontools^[1] package).

Note

For the moment, we are leaving batch jobs out of this discussion (because batch jobs are normally not considered "cluster services" that are offered to client computers). To learn more about batch jobs in a cluster environment, see Chapter 18.

Which of these methods should be used depends on the answers to the following questions:

Should the service run at all times or only when network requests come in?
Should the service failover to a backup server when the primary server crashes?
Should the service be restarted automatically if it stops running or ends abnormally?

If the service needs to run only as a result of an incoming network request (telnet and pop are good examples of this type of service), then it should be started by xinetd. However, the majority of services running on a Linux server, and on the nodes inside a cluster, will run all the time and should therefore be started by init when the system first boots (hence the name "init").

If a service should continue to run even when the system it is running on crashes (the LPRng lpd printing daemon running on a print server, for example), then the Heartbeat program should start the service (described in Part II of this book).

In this chapter I'll focus on how init works and later, in Part II, I'll describe how to start services using Heartbeat. Experienced Linux and Unix system administrators will find the first part of this chapter to be a repeat of concepts they are already familiar with and may want to skip to the section titled "Using the Red Hat init Scripts on Cluster Nodes."

^[1]The Daemontools package is described in Chapter 8.

Starting Services with init

Once your boot loader has finished the initial steps to get your kernel up and running, the init daemon—the parent of all processes—is started. Then init launches the daemons necessary for normal system operation based on entries in its /etc/inittab configuration file.

The /etc/inittab File

The heart of the /etc/inittab configuration file is the following seven lines that define seven system runlevels:

 l0:0:wait:/etc/rc.d/rc 0
 l1:1:wait:/etc/rc.d/rc 1
 l2:2:wait:/etc/rc.d/rc 2
 l3:3:wait:/etc/rc.d/rc 3
 l4:4:wait:/etc/rc.d/rc 4
 l5:5:wait:/etc/rc.d/rc 5
 l6:6:wait:/etc/rc.d/rc 6

Note

The system can be at only one runlevel at a time.

These lines indicate that the rc program (the /etc/rc.d/rc script) should run each time the runlevel of the system changes and that init should pass a single character argument consisting of the runlevel number (0 through 6) to the rc program. The rc program then launches all of the scripts that start with the letter S in the corresponding runlevel subdirectory underneath the /etc/rc.d directory.

The default runlevel, the runlevel for normal server operation, is defined in the /etc/inittab file on the initdefault line:

 id:3:initdefault:

Note

Use the runlevel command to see the current runlevel.

The default runlevel for a Red Hat Linux server that is not running a graphical user interface is runlevel 3. This means the init daemon runs the rc program and passes it an argument of 3 when the system boots. The rc program then runs each script that starts with the letter S in the /etc/rc.d/rc3.d, passing them the argument start.

Later, when the system administrator issues the shutdown command, the init daemon runs the rc program again, but this time the rc program runs all of the scripts in the /etc/rc.d/rc3.d directory that start with the letter K (for"kill") and passes them the argument stop.

On Red Hat, you can use the following command to list the contents of all of the runlevel subdirectories:

 #ls -l /etc/rc.d/rc?.d | less

Note

The question mark in this command says to allow any character to match. So this command will match on each of the subdirectories for each of the runlevels: rc0.d, rc1.drc2.d, and so forth.

The start and kill scripts stored in these subdirectories are, however,not real files. They are just symbolic links that point to the real script files stored in the /etc/rc.d/init.d directory. All of the start and stop init scripts are, therefore, stored in one directory (/etc/rc.d/init.d), and you control which services run at each runlevel by creating or removing a symbolic link within one of the runlevel subdirectories.

Fortunately, software tools are available to help manage these symbolic links. We will explain how to use these tools in a moment, but first, let's examine one way to automatically restart a daemon when it dies by using the respawn option in the /etc/inittab file.

Respawning Services with init

You can cause the operating system to automatically restart a daemon when it dies by placing the name of the executable program file in the /etc/inittab file and adding the respawn option. init will start the daemon as the system enters the runlevel and then watch to make sure the daemon stays running (restarting the daemon, if it dies), as long as the system remains at the same runlevel.

A sample /etc/inittab entry using the respawn option looks like this:

 sn:2345:respawn:/usr/local/scripts/start_snmpd > /dev/null

This entry tells the init daemon to run the script /usr/local/scripts/start_snmpd at runlevels 2, 3, 4, and 5, and to send any output produced to the /dev/null "device" (the output is discarded).

Notice, however, that we have entered a script file here (/usr/local/scripts/start_snmpd) and not the actual snmpd executable. How will init know which running process to monitor if this script simply runs the snmpd daemon and then ends? init will assume the program has died each time the script /usr/local/scripts/start_snmpd finishes running, and it will dutifully run the script again.

Note

When this happens, you are likely to see an error message from init saying the sn identifier (referring to the first field in the /etc/inittab file) is respawning too fast and init as decided to give up trying for a few minutes.

So to make sure the respawn option works correctly, we need to make sure the script used to start the snmpd daemon replaces itself with the snmpd daemon. This is easy to do with the bash programming statement exec. So the simple bash script /usr/local/scripts/start_snmpd looks like this:

 #!/bin/bash
 exec /usr/sbin/snmpd -s -P /var/run/snmpd -l /dev/null

The first line of this script starts the bash shell, and the second line causes whatever process identification number, or PID, is associated with this bash shell to be replaced with the snmpd daemon (the bash shell disappears when this happens). This makes it possible for init to run the script, find out what PID it is assigned, and then monitor this PID. If init sees the PID go away (meaning the snmpd daemon died), it will run the script again and monitor the new PID thanks to the respawn entry in the /etc/inittab file.^[2]

Managing the init Script Symbolic Links with chkconfig

The chkconfig program creates and removes the symbolic links used by init's servant, the /etc/rc.d/rc script. These symbolic links are the real way to enable or prevent a service from automatically running at a particular runlevel. As previously noted, all of the symbolic links for all system runlevels point back to the real scripts stored in the /etc/rc.d/init.d directory (actually, according to the Linux Standards Base,^[3] this directory is the /etc/init.d directory, but on Red Hat, these two directories point to the same files^[4]).

If you want a service (called myservice) to run at runlevels 2, 3, and 4, this means you want chkconfig to create three symbolic links in each of the following runlevel subdirectories:

     /etc/rc.d/rc2.d
     /etc/rc.d/rc3.d
     /etc/rc.d/rc4.d

These symbolic links should start with the letter S to indicate you want to start your script at each of these runlevels. chkconfig should then create symbolic links that start with the letter K (for "kill") in each of the remaining /etc/rc.d/rc<runlevel>.d directories (for runlevels 0, 1, 5, and 6).

To tell chkconfig you want it to do this for your script, you would add the following lines to the script file in the /etc/rc.d/init.d directory:

 # chkconfig: 234 99 90
 # description: myscript runs my daemon at runlevel 2, 3, or 4

You can then simply run the chkconfig program, which interprets this line to mean "create the symbolic links necessary to run this script at runlevel 2, 3, and 4 with the start option, and create the symbolic links needed to run this script with the kill option at all other runlevels."

The numbers 99 and 90 in this example are the start and stop priority of this script. When a symbolic link is created in one of the runlevel subdirectories, it is also given a priority number used by the rc script to determine the order it will use to launch (or kill) the scripts it finds in each of the runlevel subdirectories. A lower priority number after the S in the symbolic link name means the script should run before other scripts in the directory. For example, the script to bring up networking, /etc/rc.d/rc3.d/S10network, runs before the script to start the Heartbeat program /etc/rc.d/rc3.d/S34heartbeat.

The second commented line in this example contains a (required) description for this script. The chkconfig commands to first remove any old symbolic links that represent an old chkconfig entry and then add this script (it was saved in the file /etc/rc.d/init.d/myscript) are:

 chkconfig --del myscript
 chkconfig --add myscript

Note

Always use the --del option before using the --add option when using chkconfig to ensure you have removed any old links.

The chkconfig program will then create the following symbolic links to your file:

 /etc/rc.d/rc0.d/K90myscript
 /etc/rc.d/rc1.d/K90myscript
 /etc/rc.d/rc2.d/S99myscript
 /etc/rc.d/rc3.d/S99myscript
 /etc/rc.d/rc4.d/S99myscript
 /etc/rc.d/rc5.d/K90myscript
 /etc/rc.d/rc6.d/K90myscript

Again, these symbolic links just point back to the real file located in the /etc/init.d directory (from this point forward, I will use the LSB directory name /etc/init.d instead of the /etc/rc.d/init.d directory originally used on Red Hat systems) throughout this book. If you need to modify the script to start or stop your service, you need only modify it in one place: the /etc/ init.d directory.

You can see all of these symbolic links (after you run the chkconfig command to add them) with the following single command:

 #find /etc -name '*myscript' -print

You can also see a report of all scripts with the command:

 #chkconfig --list

Managing the init Script Symbolic Links with ntsysv

The ntsysv command relies on these same commented chkconfig lines in each of the scripts in the /etc/rc.d/init.d directory to manage the symbolic links to start and stop scripts at each runlevel.

Normally, ntsysv only manages the currently running level of the system (not very useful), so you will probably want to tell it to manage all of the runlevels by entering the command:

 #ntsysv --level 0123456

This will bring up a list of all of the scripts in the /etc/rc.d/init.d directory. By adding an asterisk next to the script name in this listing (and selecting OK), you tell the ntsysv program to create all of the symbolic links just as if you had run the chkconfig --add <scriptname> command. If you remove the asterisk next to the script name, the symbolic links are removed just as if you had entered the chkconfig --del <scriptname> command.

Note

chkconfig and ntsysv commands do not start or stop daemons (neither command actually runs the scripts). They only create or remove the symbolic links that affect the system the next time it enters or leaves a runlevel (or is rebooted). To start or stop scripts, you can enter the command service <script-name> start or service <script-name> stop, where <script-name> is the name of the file in the /etc/init.d directory.

Removing Services You do Not Need

For each script you do not need (see the next section for a description), you can disable or remove the services by doing the following:

Kill the currently running daemon (you can skip this step and simply reboot the system when you are finished):

 #/etc/init.d/<script-name> stop

Or on Red Hat systems, the following command works just as well:

 #service <script-name> stop

Remove symbolic links to the script that cause it to start or stop at the predefined chkconfig runlevels with the command:

 #chkconfig --del <script-name>

 #ntsysv --level 0123456

Then, inside of the ntsysv utility, remove the * next to each script name to cause it not to run at boot time.

^[2]Note that the snmpd daemon also supports a -f argument that accomplishes the same thing as the exec statement in this bash script. See the snmpd man page for more information.

^[3]See http://www.linuxbase.org/spec for the latest revision to the LSB.

^[4]Because /etc/init.d is a symbolic link to /etc/rc.d/init.d on current Red Hat distributions.

Using the Red Hat init Scripts on Cluster Nodes

This section contains a description of the init scripts used on a normal Red Hat system along with a discussion of how the services started by the init script are used, or can be used, in a Linux Enterprise Cluster. Note that you may not have all of these scripts on your system due to a difference in the Red Hat release you are using or because you do not have the service installed.

aep1000

Supports the Accelerated Encryption Processing card to speed Secure Socket Layer (SSL) encryption. Not used to build the cluster in this book.

anacron

Used like cron and at to start commands at a particular time, even on systems that are not running continuously. On Red Hat, the /etc/anacrontab file lists the normal log rotation cron job that is executed daily. If you disable this, you will no longer perform log rotation, and your system disk will eventually fill up. In Chapters 18 and 19, we will describe two methods of running cron jobs on all cluster nodes without running the crond daemon on all nodes (the cron daemon runs on a single node and remotely initiates or starts the anacron program once a day on each cluster node).

apmd

Used for advanced power management. You only need this if you have an uninterruptible power supply (UPS) system connected to your system and want it to automatically shut down in the event of a power failure before the battery in your UPS system runs out. In Chapter 9, we'll see how the Heartbeat program can also control the power supply to a device through the use of a technology called STONITH (for "shoot the other node in the head").

arpwatch

Used to keep track of IP address-to-MAC address pairings. Normally, you do not need this daemon. As a side note, the Linux Virtual Server Direct Routing clusters (as described in Chapter 13) must contend with potential Address Resolution Protocol (ARP) problems introduced through the use of the cluster load-balancing technology, but arpwatch does not help with this problem and is normally not used on cluster nodes.

atd

Used like anacron and cron to schedule jobs for a particular time with the at command. This method of scheduling jobs is used infrequently, if at all. In Chapter 18, we'll describe how to build a no-single-point-of-failure batch scheduling mechanism for the cluster using Heartbeat, the cron daemon, and the clustersh script.

autofs

Used to automatically mount NFS directories from an NFS server. This script only needs to be enabled on an NFS client, and only if you want to automate mounting and unmounting NFS drives. In the cluster configuration described in this book, you will not need to mount NFS file systems on the fly and will therefore not need to use this service (though using it gives you a powerful and flexible means of mounting several NFS mount points from each cluster node on an as-needed basis.) If possible, avoid the complexity of the autofs mounting scheme and use only one NFS-mounted directory on your cluster nodes.

bcm5820

Support for the Broadcom Cryponet BCM5820 chip for speeding SSL communication. Not used in this book.

crond

Used like anacron and atd to schedule jobs. On a Red Hat system, the crond daemon starts anacron once a day. In Chapter 18, we'll see how you can build a no-single-point-of-failure batch job scheduling system using cron and the open source Ganglia package (or the clustersh script provided on the CD-ROM included with this book). To control cron job execution in the cluster, you may not want to start the cron daemon on all nodes. Instead, you may want to only run the cron daemon on one node and make it a high-availability service using the techniques described in Part II of this book. (On cluster nodes described in this book, the crond daemon is not run at system start up by init. The Heartbeat program launches the crond daemon based on an entry in the /etc/ ha.d/haresources file. Cron jobs can still run on all cluster nodes through remote shell capabilities provided by SSH.)

cups

The common Unix printing system. In this book, we will use LPRng instead of cups as the printing system for the cluster. See the description of the lpd script later in this chapter.

firstboot

Used as part of the system configuration process the first time the system boots.

functions

Contains library routines used by the other scripts in the /etc/init.d directory. Do not modify this script, or you will risk corrupting all of the init scripts on the system.

gpm

Makes it possible to use a mouse to cut and paste at the text-based console. You should not disable this service. Incidentally, cluster nodes will normally be connected to a KVM (keyboard video mouse) device that allows several servers to share one keyboard and one mouse. Due to the cost of these KVM devices, you may want to build your cluster nodes without connecting them to a KVM device (so that they boot without a keyboard and mouse connected).^[5]

halt

Used to shut the system down cleanly.

httpd

Used to start the apache daemon(s) for a web server. You will probably want to use httpd even if you are not building a web cluster to help the cluster load balancer (the ldirectord program) decide when a node should be removed from the cluster. This is the concept of monitoring a parallel service discussed in Chapter 15.

identd

Used to help identify who is remotely connecting to your system. The theory sounds good, but you probably will never be saved from an attack by identd. In fact, hackers will attack the identd daemon on your system. Servers inside an LVS-NAT cluster also have problems sending identd requests back to client computers. Disable identd on your cluster nodes if possible.

ipchains or iptables

The ability to manipulate packets with the Linux kernel is provided by the ipchains (for kernels 2.2 and earlier) or iptables (for kernels 2.4 and later). This script on a Red Hat system runs the iptables or ipchains commands you have entered previously (and saved into a file in the /etc/ sysconfig directory). For more information, see Chapter 2.

irda

Used for wireless infrared communication. Not used to build the cluster described in this book.

isdn

Used for Integrated Services Digital Networks communication. Not used in this book.

kdcrotate

Used to administer Kerberos security on the system. See Chapter 19 for a discussion of methods for distributing user account information.

keytable

Used to load the keyboard table.

killall

Used to help shut down the system.

kudzu

Probes your system for new hardware at boot time—very useful when you change the hardware configuration on your system, but increases the amount of time required to boot a cluster node, especially when it is disconnected from the KVM device.

lisa

The LAN Information Server service. Normally, users on a Linux machine must know the name or address of a remote host before they can connect to it and exchange information with it. Using lisa, a user can perform a network discovery of other hosts on the network similar to the way Windows clients use the Network Neighborhood. This technology is not used in this book, and in fact, may confuse users if you are building the cluster described in this book (because the LVS-DR cluster nodes will be discovered by this software).

lpd

This is the script that starts the LPRng printing system. If you do not see the lpd script, you need to install the LPRng printing package. The latest copy of LPRng can be found at http://www.lprng.com. On a standard Red Hat installation, the LPRng printing system will use the /etc/printcap.local file to create an /etc/printcap file containing the list of printers. Cluster nodes (as described in this book) should run the lpd daemon. Cluster nodes will first spool print jobs locally to their local hard drives and then try (essentially, forever) to send the print jobs to a central print spooler that is also running the lpd daemon. See Chapter 19 for a discussion of the LPRng printing system.

netfs

Cluster nodes, as described in this book, will need to connect to a central file server (or NAS device) for lock arbitration. See Chapter 16 for a more detailed discussion of NFS. Cluster nodes will normally need this script to run at boot time to gain access to data files that are shared with the other cluster nodes.

network

Required to bring up the Ethernet interfaces and connect your system to the cluster network and NAS device. The Red Hat network configuration files are stored in the /etc/sysconfig directory. In Chapter 5, we will describe the files you need to look at to ensure a proper network configuration for each cluster node after you have finished cloning. This script uses these configuration files at boot time to configure your network interface cards and the network routing table on each cluster node.^[6]

nfs

Cluster nodes will normally not act as NFS servers and will therefore not need to run this script.^[7]

nfslock

The Linux kernel will start the proper in-kernel NFS locking mechanism (rpc.lockd is a kernel thread called klockds) and rpc.statd to ensure proper NFS file locking. However, cluster nodes that are NFS clients can run this script at boot time without harming anything (the kernel will always run the lock daemon when it needs it, whether or not this script was run at boot time). See Chapter 16 for more information about NFS lock arbitration.

nscd

Helps to speed name service lookups (for host names, for example) by caching this information. To build the cluster described in this book, using nscd is not required.

ntpd

This script starts the Network Time Protocol daemon. When you configure the name or IP address of a Network Time Protocol server in the /etc/ntp.conf file, you can run this service on all cluster nodes to keep their clocks synchronized. As the clock on the system drifts out of alignment, cluster nodes begin to disagree on the time, but running the ntpd service will help to prevent this problem. The clock on the system must be reasonably accurate before the ntpd daemon can begin to slew, or adjust, it and keep it synchronized with the time found on the NTP server. When using a distributed filesystem such as NFS, the time stored on the NFS server and the time kept on each cluster node should be kept synchronized. (See Chapter 16 for more information.) Also note that the value of the hardware clock and the system time may disagree. To set the hardware clock on your system, use the hwclock command. (Problems can also occur if the hardware clock and the system time diverge too much.)

pcmcia

Used to recognize and configure pcmcia devices that are normally only used on laptops (not used in this book).

portmap

Used by NFS and NIS to manage RPC connections. Required for normal operation of the nfslocking mechanism and covered in detail in Chapter 16.

pppoe

For Asymmetric Digital Subscriber Line (ADSL) connections. If you are not using ADSL, disable this script.

pxe

Some diskless clusters (or diskless workstations) will boot using the PXE protocol to locate and run an operating system (PXE stands for Preboot eXection Environment) based on the information provided by a PXE server. Running this service will enable your Linux machine to act as a PXE server; however, building a cluster of diskless nodes is outside the scope of this book.^[8]

random

Helps your system generate better random numbers that are required for many encryption routines. The secure shell (SSH) method of encrypting data in a cluster environment is described in Chapter 4.

rawdevices

Used by the kernel for device management.

rhnsd

Connects to the Red Hat server to see if new software is available. How you want to upgrade software on your cluster nodes will dictate whether or not you use this script. Some system administrators shudder at the thought of automated software upgrades on production servers. Using a cluster, you have the advantage of upgrading the software on a single cluster node, testing it, and then deciding if all cluster nodes should be upgraded. If the upgrade on this single node (called aGolden Client in SystemImager terminology) is successful, it can be copied to all of the cluster nodes using the cloning process described in Chapter 5. (See the updateclient command in this chapter for how to update a node after it has been put into production.)

rstatd

Starts rpc.rstatd, which allows remote users to use the rup command to monitor system activity. System monitoring as described in this book will use the Ganglia package and Mon software monitoring packages that do not require this service.

rusersd

Lets other people see who is logged on to this machine. May be useful in a cluster environment, but will not be described in this book.

rwall

Lets remote users display messages on this machine with the rwall command. This may or may not be a useful feature in your cluster. Both this daemon and the next should only be started on systems running behind a firewall on a trusted network.

rwhod

Lets remote users see who is logged on.

saslauthd

Used to provide Simple Authentication and Security Layer (SASL) authentication (normally used with protocols like SMTP and POP). Building services that use SASL authentication is outside the scope of this book. To build the cluster described in this book, this service is not required.

sendmail

Will you allow remote users to send email messages directly to cluster nodes? For example, an email order-processing system requires each cluster node to receive the email order in order to balance the load of incoming email orders. If so, you will need to enable (and configure) this service on all cluster nodes. sendmail configuration for cluster nodes is discussed in Chapter 19.

single

Used to administer runlevels (by the init process).

smb

Provides a means of offering services such as file sharing and printer sharing to Windows clients using the package called Samba. Configuring a cluster to support PC clients in this manner is outside the scope of this book.

snmpd

Used for Simple Network Management Protocol (SNMP) administration. This daemon will be used in conjunction with the Mon monitoring package in Chapter 17 to monitor the health of the cluster nodes. You will almost certainly want to use SNMP on your cluster. If you are building a public web server, you may want to disable this daemon for security reasons (when you run snmpd on a server, you add an additional way for a hacker to try to break into your system). In practice, you may find that placing this script as a respawn service in the /etc/inittab file provides greater reliability than simply starting the daemon once at system boot time. (See the example earlier in this chapter—if you use this method, you will need to disable this init script so that the system won't try to launch snmpd twice.)

snmptrapd

SNMP can be configured on almost all network-connected devices to send traps or alerts to an SNMP Trap host. The SNMP Trap host runs monitoring software that logs these traps and then provides some mechanism to alert a human being to the fact that something has gone wrong. Running this daemon will turn your Linux server into an SNMP Trap server, which may be desirable for a system sitting outside the cluster, such as the cluster node manager.^[9] A limitation of SNMP Traps, however, is the fact that they may be trying to indicate a serious problem with a single trap, or alert, message and this message may get lost. (The SNMP Trap server may miss its one and only opportunity to hear the alert.) The approach taken in this book is to use a server sitting outside the cluster (running the Mon software package) to poll the SNMP information stored on each cluster node and raise an alert when a threshold is violated or when the node does not respond. (See Chapter 17.)

squid

Provides a proxy server for caching web requests (among other things). squid is not used in the cluster described in this book.

sshd

Open SSH daemon. We will use this service to synchronize the files on the servers inside the cluster and for some of the system cloning recipes in this book (though using SSH for system cloning is not required).

syslog

The syslog daemon logs error messages from running daemons and programs based on configuration entries in the /etc/syslog.conf file. The log files are kept from growing indefinitely by the anacron daemon, which is started by an entry in the crontab file (by cron). See the logrotate man page. Note that you can cause all cluster nodes to send their syslog entries to a single host by creating configuration entries in the /etc/syslog.conf file using the @hostname syntax. (See the syslog man page for examples.) This method of error logging may, however, cause the entire cluster to slow down if the network becomes overloaded with error messages. In this book, we will use the default syslog method of sending error log entries to a locally attached disk drive to avoid this potential problem. (We'll proactively monitor for serious problems on the cluster using the Mon monitoring package in Part IV of this book.)^[10]

tux

Instead of using the Apache httpd daemon, you may choose to run the Tux web server. This web server attempts to introduce performance improvements over the Apache web server daemon. This book will only describe how to install and configure Apache.

winbindd

Provides a means of authenticating users using the accounts found on a Windows server. This method of authentication is not used to build the cluster in this book. See Chapter 19 for a description of alternative methods available, and also see the discussion of the ypbind init script later in this chapter.

xfs

The X font server. If you are using only the text-based terminal (which is the assumption throughout this book) to administer your server (or telnet/ssh sessions from a remote machine), you will not need this service on your cluster nodes, and you can reduce system boot time and disk space by not installing any X applications on your cluster nodes. See Chapter 20 for an example of how to use X applications running on Thin Clients to access services inside the cluster.

xinetd^[11]

Starts services such as FTP and telnet upon receipt of a remote connection request. Many daemons are started by xinetd as a result of an incoming TCP or UDP network request. Note that in a cluster configuration you may need to allow an unlimited number of connections for services started by xinetd (so that xinetd won't refuse a client computer's request for a service) by placing the instances = UNLIMITED line in the /etc/ xinetd.conf configuration file. (You have to restart xinetd or send it a SIGHUP signal with the kill command for it to see the changes you make to its configuration files.)

ypbind

Only used on NIS client machines. If you do not have an NIS server,^[12] you do not need this service. One way to distribute cluster password and account information to all cluster nodes is to run an LDAP server and then send account information to all cluster nodes using the NIS system. This is made possible by a licensed commercial program from PADL software (http://www.padl.com) called the NIS/LDAP gateway (the daemon is called yplapd). Using the NIS/LDAP gateway on your LDAP server allows you to create simple cron scripts that copy all of the user accounts out of the LDAP database and install them into the local /etc/ passwd file on each cluster node. You can then use the /etc/nsswitch.conf file to point user authentication programs at the local /etc/passwd file, thus avoiding the need to send passwords over the network and reducing the impact of a failure of the LDAP (or NIS) server on the cluster. See the /etc/nsswitch.conf file for examples and more information. Changes to the /etc/xinetd.conf file are recognized by the system immediately (they do not require a reboot).

Note
These entries will not rely on the /etc/shadow file but will instead contain the encrypted password in the /etc/passwd file.

The cron job that creates the local passwd entries only needs to contain a line such as the following:
```
 ypcat passwd > /etc/passwd
 
```
This command will overwrite all of the /etc/passwd entries on the system with the NIS (or LDAP) account entries, so you will need to be sure to create all of the normal system accounts (especially the root user account) on the LDAP server.

An even better method of applying the changes to each server is available through the use of the patch and diff commands. For example, a shell script could do the following:
```
    ypcat passwd > /tmp/passwd
    diff -e /etc/passwd /tmp/passwd > /tmp/passwd.diff
    patch -be /etc/passwd /tmp/passwd.diff
 
 
```
These commands use the ed editor (the -e option) to modify only the lines in the passwd file that have changed. Additionally, this script should check to make sure the NIS server is operating normally (if no entries are returned by the ypcat command, the local /etc/passwd file will be erased unless your script checks for this condition and aborts). This makes the program safe to run in the middle of the day without affecting normal system operation. (Again, note that this method does not use the /etc/shadow file to store password information.) In addition to the passwd database, similar commands should be used on the group and hosts files.

If you use this method to distribute user accounts, the LDAP server running yplapd (or the NIS server) can crash, and users will still be able to log on to the cluster. If accounts are added or changed, however, all cluster nodes will not see the change until the script to update the local /etc/passwd, group, and hosts files runs.

When using this configuration, you will need to leave the ypbind daemon running on all cluster nodes. You will also need to set the domain name for the client (on Red Hat systems, this is done in the /etc/sysconfig/network file).

yppasswd

This is required only on an NIS master machine (it is not required on the LDAP server running the NIS/LDAP gateway software as described in the discussion of ypbind in the previous list item). Cluster nodes will normally always be configured with yppasswd disabled.

ypserv

Same as yppasswd.

^[5]Most server hardware now supports this.

^[6]In the LVS-DR cluster described in Chapter 13, we will modify the routing table on each node to provide a special mechanism for load balancing incoming requests for cluster resources.

^[7]An exception to this might be very large clusters that use a few cluster nodes to mount filesystems and then re-export these filesystems to other cluster nodes. Building this type of large cluster (probably for scientific applications) is beyond the scope of this book.

^[8]See Chapter 10 for the reasons why. A more common application of the PXE service is to use this protocol when building Linux Terminal Server Project or LTSP Thin Clients.

^[9]See Part IV of this book for more information.

^[10]See also the mod_log_spread project at http://www.backhand.org/mod_log_spread.

^[11]On older versions of Linux, and some versions of Unix, this is still called inetd.

^[12]Usually used to distribute user accounts, user groups, host names, and similar information within a trusted network.

In Conclusion

This chapter has introduced you to the Linux Enterprise Cluster from the perspective of the services that will run inside it, on the cluster nodes. The normal method of starting daemons when a system boots on a Linux system using init scripts^[13] is also used to start the daemons that will run inside the cluster. The init scripts that come with your distribution can be used on the cluster nodes to start the services you would like to offer to your users. A key feature of a Linux Enterprise Cluster is that all of the cluster nodes run the same services; all nodes inside it are the same (except for their network IP address).

In Chapter 2 I lay the foundation for understanding the Linux Enterprise Cluster from the perspective of the network packets that will be used in the cluster, and in the Linux kernel that will sit in front of the cluster.

^[13]Or rc scripts.

Часть 2: Управление пакетами

Overview

When a packet from a client computer arrives on a Linux server, it must pass through the kernel before it is delivered to a daemon or service. If a service needs to failover from one server to another, the kernel's packet-handling methods must^[1] be the same on both servers. Resources can therefore be said to include kernel packet-handling methods and rules.

This chapter explains the techniques for changing the fate of network packets inside the Linux kernel. These techniques will provide you with the foundation for building a cluster load balancer in Part II of this book. The cluster load balancer receives all packets for the cluster and forwards them to the cluster nodes thanks to the (optional) code inside the kernel, called Netfilter.

This chapter also provides sample commands that demonstrate how you tell Netfilter to block packets before they reach the services on a Linux server; these commands can be used on the cluster load balancer and on all of the cluster nodes. These commands are normally not used when the cluster and its client computers sit behind a firewall, but they should be used on publicly accessible clusters (such as a web cluster) even when the cluster is protected by a firewall.

The Linux kernel can also affect the fate of a network packet by its ability to route packets. This ability is normally used on a server that has two network interface cards. Inbound packets from an untrusted network arrive on one interface card and are examined before they are allowed out the second interface card that is connected to the internal, or trusted, network.

To simplify this complex topic, we will use one network hardware configuration in all of the examples in this chapter. This hypothetical hardware configuration has two Ethernet network interface cards. The first is called eth0 and is on an untrusted, public network. The second is called eth1 and is connected to a private, trusted network. Later, in Part II of this book, we'll see how the private, trusted network can become a cluster.

^[1]Technically, they do not have to be exactly the same if the backup server is more permissive. They do, however, have to use the same packet marking rules if they rely on Netfilter's ability to mark packets. This is an advanced topic not discussed until Part III of this book.

Netfilter

Every packet coming into the system off the network cable is passed up through the protocol stack, which converts the electronic signals hitting the network interface card into messages the services or daemons can understand. When a daemon sends a reply, the kernel creates a packet and sends it out through a network interface card. As the packets are being processed up (inbound packets) and down (outbound packets) the protocol stack, Netfilter hooks into the Linux kernel and allows you to take control of the packet's fate.

A packet can be said to traverse the Netfilter system, because these hooks provide the ability to modify packet information at several points within the inbound and outbound packet processing. Figure 2-1 shows the five Linux kernel Netfilter hooks on a server with two Ethernet network interfaces called eth0 and eth1.

Figure 2-1: The five Netfilter hooks in the Linux kernel

The routing box at the center of this diagram is used to indicate the routing decision the kernel must make every time it receives or sends a packet. For the moment, however, we'll skip the discussion of how routing decisions are made. (See "Routing Packets with the Linux Kernel" later in this chapter for more information.)

For now, we are more concerned with the Netfilter hooks. Fortunately, we can greatly simplify things, because we only need to know about the three basic, or default, sets of rules that are applied by the Netfilter hooks (we will not directly interact with the Netfilter hooks). These three sets of rules, or chains, are called INPUT, FORWARD, and OUTPUT (written in lowercase on Linux 2.2 series kernels).

Note

This diagram does not show packets going out eth0.

Netfilter uses the INPUT, FORWARD, and OUTPUT lists of rules and attempts to match each packet passing through the kernel. If a match is found, the packet is immediately handled in the manner described by the rule without any attempt to match further rules in the chain. If none of the rules in the chain match, the fate of the packet is determined by the default policy you specify for the chain. On publicly accessible servers, it is common to set the default policy for the input chain to DROP^[2] and then to create rules in the input chain for the packets you want to allow into the system.

^[2]This used to be DENY in ipchains.

A Brief History of Netfilter

The Netfilter code changed dramatically from the 2.2 to the 2.4 series kernel (but, at least in terms of the iptables syntax, it did not change very much from the 2.4 to the 2.6 series). This means there is a big difference between ipchains and iptables. For historical purposes, we will briefly examine both in this section.

Note

You can use ipchains on 2.4 kernels, but not in conjunction with the Linux Virtual Server, so this book will describe ipchains only in connection with 2.2 series kernels.

Figure 2-2 shows the ipchains packet matching for the input, forward, and output chains in Linux 2.2 series kernels.

Figure 2-2: ipchains in the Linux kernel

Notice how a packet arriving on eth0 and going out eth1 will have to traverse the input, forward, and output chains. In Linux 2.4 and later series kernels,^[3] however, the same type of packet would only traverse the FORWARD chain. When using iptables, each chain only applies to one type of packet: INPUT rules are only applied to packets destined for locally running daemons, FORWARD rules are only applied to packets that arrived from a remote host and need to be sent back out on the network, and OUTPUT rules are only applied to packets that were created locally.^[4] Figure 2-3 shows the iptables INPUT, FORWARD, and OUTPUT rules in Linux 2.4 series kernels.

Figure 2-3: iptables in the Linux kernel

This change (a packet passing through only one chain depending upon its source and destination) from the 2.2 series kernel to the 2.4 series kernel reflects a trend toward simplifying the sets of rules, or chains, to make the kernel more stable and the Netfilter code more sensible. With ipchains or iptables commands (these are command-line utilities), we can use the three chains to control which packets get into the system, which packets are forwarded, and which packets are sent out without worrying about the specific Netfilter hooks involved—we only need to remember when these rules are applied. We can summarize the chains as follows:

Summary of ipchains for Linux 2.2 series kernels

Packets destined for locally running daemons:

       input

Packets from a remote host destined for a remote host:

       input
       forward
       output

Packets originating from locally running daemons:

       output

Summary of iptables for Linux 2.4 and later series kernels

Packets destined for locally running daemons:

       INPUT

Packets from a remote host destined for a remote host:

       FORWARD

Packets originating from locally running daemons:

       OUTPUT

In this book, we will use the term iptables to refer to the program that is normally located in the /sbin directory. On a Red Hat system, iptables is:

A program in the /sbin directory
A boot script in the /etc/init.d^[5] directory
A configuration file in the /etc/sysconfig directory

The same could be said for ipchains, but again, you will only use one method: iptables or ipchains.

Note

For more information about ipchains and iptables, see http://www.netfilter.org. Also, look for Rusty Russell's Remarkably Unreliable Guides to the Netfilter Code or Linux Firewalls by Robert L. Ziegler, from New Riders Press. Also see the section "Netfilter Marked Packets" in Chapter 14.

^[3]Up to the kernel 2.6 series at the time of this writing.

^[4]This is true for the default "filter" table. However, iptables can also use the "mangle" table that uses the PREROUTING and OUTPUT chains, or the "nat" table, which uses the PREROUTING, OUTPUT, and POSTROUTING chains. The "mangle" table is discussed in Chapter 14.

^[5]Recall from Chapter 1 that /etc/init.d is a symbolic link to the /etc/rc.d/init.d directory on Red Hat systems.

Setting the Default Chain Policy

You can use the ipchains or the iptables utility to set the default policy for each chain. These chains normally have a default policy that allows anything to pass through (ACCEPT) for packets that are not explicitly matched by a rule in the chain. You can change the default policy to either DROP or REJECT. DROP means to ignore the packet, and REJECT means the sender should be told the packet was not accepted.

This control over network packets allows you to turn any Linux computer into a firewall for your network. In the next section, we'll describe how to apply firewall rules to any Linux machine and how to make these rules permanent.

In Chapter 14, we will see how the Linux Virtual Server uses the Netfilter hooks to gain access to network packets and manipulate them based on a different set of criteria—namely, the criteria you establish for balancing incoming packets across the cluster nodes.

Using iptables and ipchains

In this section, I will provide you with the iptables (and ipchains) commands you can use to control the fate of packets as they arrive on each cluster node. These commands can also be used to build a firewall outside the cluster. This is done by adding a corresponding FORWARD rule for each of the INPUT rules provided here and installing them on the cluster load balancer. Whether you install these rules on each cluster node, the cluster load balancer, or both is up to you.

These commands can be entered from a shell prompt; however, they will be lost the next time the system boots. To make the rules permanent, you must place them in a script (or a configuration file used by your distribution) that runs each time the system boots. Later in this chapter, you will see a sample script.

Note

If you are working with a new kernel (series 2.4 or later), use the iptables rules provided (the ipchains rules are for historical purposes only).

Clear Existing Rules

Before adding any rules to the kernel, we want to start with a clean, rule-free configuration. ipchains and iptables are useful commands to use when troubleshooting and building a new configuration if you need to start over and don't want to reboot the system.

Use one of these commands (ipchains on 2.2 series kernels and iptables on 2.4 and later series kernels) to erase all rules in all three chains (INPUT, FORWARD, and OUTPUT):

 #ipchains -F
 #iptables -F

Set the Default INPUT Chain Policy

Use one of these commands to set the default INPUT policy to not allow incoming packets:

 #ipchains -P input DENY
 #iptables -P INPUT DROP

Remember, if you create routing rules to route packets back out of the system (described later in this chapter), you may also want to set the default FORWARD policy to DROP and explicitly specify the criteria for packets that are allowed to pass through. This is how you build a secure load balancer (in conjunction with the Linux Virtual Server software described in Part III of this book).

Recall that the 2.2 series kernel ipchains input policy affects both locally destined packets and packets that need to be sent back out on the network, whereas the 2.4 and later series kernel iptables INPUT policy affects only locally destined packets.

Note

If you need to add iptables (or ipchains) rules to a system through a remote telnet or SSH connection, you should first enter the ACCEPT rules (shown later in this section) to allow your telnet or SSH to continue to work even after you have set the default policy to DROP (or DENY).

FTP

Use these rules to allow inbound FTP connections from anywhere:

 #ipchains -A input -i eth0 -p tcp -s any/0 1024:65535 -d MY.NET.IP.ADDR 21 -j ACCEPT
 #ipchains -A input -i eth0 -p tcp -s any/0 1024:65535 -d MY.NET.IP.ADDR 20 -j ACCEPT
 #iptables -A INPUT -i eth0 -p tcp -s any/0 --sport 1024:65535 -d MY.NET.IP.ADDR
   --dport 21 -j ACCEPT
 #iptables -A INPUT -i eth0 -p tcp -s any/0 --sport 1024:65535 -d MY.NET.IP.ADDR
   --dport 20 -j ACCEPT

Depending on your version of the kernel, you will use either ipchains or iptables. Use only the two rules that match your utility.

Let's examine the syntax of the two iptables commands more closely:

-A INPUT

Says that we want to add a new rule to the input filter.

-i eth0

Applies only to the eth0 interface. You can leave this off if you want an input rule to apply to all interfaces connected to the system. (Using ipchains, you can specify the input interface when you create a rule for the output chain, but this is no longer possible under iptables because the output chain is for locally created packets only.)

-p tcp

This rule applies to the TCP protocol (required if you plan to block or accept packets based on a port number).

-s any/0 and --sport 1024:65535

These arguments say the source address can be anything, and the source port (the TCP/IP port we will send our reply to) can be anything between 1024 and 65535. The /0 after any means we are not applying a subnet mask to the IP address. Usually, when you specify a particular IP address, you will need to enter something like:
```
 ALLOWED.NET.IP.ADDR/255.255.255.0
 
```
or
```
 ALLOWED.NET.IP.ADDR/24
 
```
where ALLOWED.NET.IP.ADDR is an IP address such as 209.100.100.3.

In both of these examples, the iptables command masks off the first 24 bits of the IP address as the network portion of the address.

Note

For the moment, we will skip over the details of a TCP/IP conversation. What is important here is that a normal inbound TCP/IP request will come in on a privileged port (a port number defined in /etc/services and in the range of 1-1023) and ask you to send a reply back on an unprivileged port (a port in the range of 1024 to 65535).

-d MY.NET.IP.ADDR and --dport 21

These two arguments specify the destination IP address and destination port required in the packet header. Replace MY.NET.IP.ADDR with the IP address of your eth0 interface, or enter 0.0.0.0 if you are leaving off the -i option and do not care which interface the packet comes in on. (FTP requires both ports 20 and 21; hence the two iptables commands above.)

-j ACCEPT

When a packet matches a rule, it drops out of the chain (the INPUT chain in this case). At that point, whatever you have entered here for the -j option specifies what happens next to the packet. If a packet does not match any of the rules, the default filter policy, DENY or DROP in this case, is applied.

Passive FTP

The rules just provided for FTP do not allow for passive FTP connections to the FTP service running on the server. FTP is an old protocol that dates from a time when university computers connected to each other without intervening firewalls to get in the way. In those days, the FTP server would connect back to the FTP client to transmit the requested data. Today, most clients sit behind a firewall, and this is no longer possible. Therefore, the FTP protocol has evolved a new feature called passive FTP that allows the client to request a data connection on port 21, which causes the server to listen for an incoming data connection from the client (rather than connect back to the client as in the old days). To use passive FTP, your FTP server must have an iptables rule that allows it to listen on the unprivileged ports for one of these passive FTP data connections. The rule to do this using iptables looks like this:

 #iptables -A INPUT -i eth0 -p tcp -s any/0 --sport 1024:65535 -d MY.NET.IP.ADDR
   --dport 1024:65535 -j ACCEPT

DNS

Use this rule to allow inbound DNS requests to a DNS server (the "named" or BIND service):

 #ipchains -A input -i eth0 -p udp -s any/0 1024:65535 -d MY.NET.IP.ADDR 53 -j ACCEPT
 #ipchains -A input -i eth0 -p tcp -s any/0 1024:65535 -d MY.NET.IP.ADDR 53 -j ACCEPT
 #iptables -A INPUT -i eth0 -p udp -s any/0 --sport 1024:65535 -d MY.NET.IP.ADDR
   --dport 53 -j ACCEPT
 #iptables -A INPUT -i eth0 -p tcp -s any/0 --sport 1024:65535 -d MY.NET.IP.ADDR
   --dport 53 -j ACCEPT

Telnet

Use this rule to allow inbound telnet connections from a specific IP address:

 #ipchains -A input -i eth0 -p tcp -s 209.100.100.10 1024:65535
   -d MY.NETWORK.IP.ADDR 23 -j ACCEPT
 #iptables -A INPUT -i eth0 -p tcp -s 209.100.100.10 --sport 1024:65535
   -d MY.NETWORK.IP.ADDR --dport 23 -j ACCEPT

In this example, we have only allowed the client computer using IP address 209.100.100.10 telnet access into the system. You can change the IP address of the client computer and enter this command repeatedly to grant telnet access to additional client computers.

SSH

Use this rule to allow SSH connections:

 ipchains -A input -i eth0 -p tcp -s 209.200.200.10 1024:65535
   -d MY.NETWORK.IP.ADDR 22 -j ACCEPT
 iptables -A INPUT -i eth0 -p tcp -s 209.200.200.10 --sport 1024:65535
   -d MY.NETWORK.IP.ADDR --dport 22 -j ACCEPT

Again, replace the example 209.200.200.10 address with the IP address you want to allow in, and repeat the command for each IP address that needs SSH access. The SSH daemon (sshd) listens for incoming requests on port 22.

Email

Use this rule to allow email messages to be sent out:

 ipchains -A input -i eth0 -p tcp ! -y -s EMAIL.NET.IP.ADDR 25
   -d MY.NETWORK.IP.ADDR 1024:65535 -j ACCEPT
 iptables -A INPUT -i eth0 -p tcp ! --syn -s EMAIL.NET.IP.ADDR --sport 25
   -d MY.NETWORK.IP.ADDR --dport 1024:65535 -j ACCEPT

This rule says to allow Simple Mail Transport Protocol (SMTP) replies to our request to connect to a remote SMTP server for outbound mail delivery. This command introduces the ! --syn syntax, an added level of security that confirms that the packet coming in is really a reply packet to one we've sent out.

The ! -y syntax says that the acknowledgement flag must be set in the packet header—indicating the packet is in acknowledgement to a packet we've sent out. (We are sending the email out, so we started the SMTP conversation.)

This rule will not allow inbound email messages to come in to the sendmail service on the server. If you are building an email server or a server capable of processing inbound email messages (cluster nodes that do email order entry processing, for example), you must allow inbound packets to port 25 (without using the acknowledgement flag).

HTTP

Use this rule to allow inbound HTTP connections from anywhere:

 #ipchains -A input -i eth0 -p tcp -d MY.NETWORK.IP.ADDR 80 -j ACCEPT
 #iptables -A INPUT -i eth0 -p tcp -d MY.NETWORK.IP.ADDR --dport 80 -j ACCEPT

To allow HTTPS (secure) connections, you will also need the following rule:

 #ipchains -A input -i eth0 -p tcp -d MY.NETWORK.IP.ADDR 443 -j ACCEPT
 #iptables -A INPUT -i eth0 -p tcp -d MY.NETWORK.IP.ADDR --dport 443 -j ACCEPT

ICMP

Use this rule to allow Internet Control Message Protocol (ICMP) responses:

 ipchains -A input -i eth0 -p icmp -d MY.NETWORK.IP.ADDR -j ACCEPT
 iptables -A INPUT -i eth0 -p icmp -d MY.NETWORK.IP.ADDR -j ACCEPT

This command will allow all ICMP messages, which is not something you should do unless you have a firewall between the Linux machine and the Internet with properly defined ICMP filtering rules. Look at the output from the command iptables -p icmp -h (or ipchains icmp -help) to determine which ICMP messages to enable. Start with only the ones you know you need. (You probably should allow MTU discovery, for example.)

Review and Save Your Rules

To view the rules you have created, enter:

 ipchains -L -n
 iptables -L -n

To save your iptables rules to the /etc/sysconfig/iptables file on Red Hat so they will take effect each time the system boots, enter:

 #/etc/init.d/iptables save

Note

The command service iptables save would work just as well here.

However, if you save your rules using this method as provided by Red Hat, you will not be able to use them as a high-availability resource.^[6] If the cluster load balancer uses sophisticated iptables rules to filter network packets, or, what is more likely, to mark certain packet headers for special handling,^[7] then these rules may need to be on only one machine (the load balancer) at a time.^[8] If this is the case, the iptables rules should be placed in a script such as the one shown at the end of this chapter and configured as a high-availability resource, as described in Part II of this book.

^[6]This is because the standard Red Hat iptables init script does not return something the Heartbeat program will understand when it is run with the status argument. For further discussion of this topic, see the example script at the end of this chapter.

^[7]Marking packets with Netfilter is covered in Chapter 14.

^[8]This would be true in an LVS-DR cluster that is capable of failing over the load-balancing service to one of the real servers inside the cluster. See Chapter 15.

Routing Packets with the Linux Kernel

As shown in Figures 2-2 and 2-3, all inbound and outbound network packets must pass through the kernel's routing rules. And, like the list of rules in an iptables chain, the list of routing rules is contained in an ordered list that is examined only until a packet matches one of the rules.

Like the rules in an iptables chain, each routing rule includes:

The criteria used to match packets
Instructions telling the kernel what to do with packets that match this criteria

The order of the routing rules is important because the kernel stops searching the routing table when a packet matches one of the routing rules. Because the last rule in the routing table will normally match any packet, it is called the default route (or default gateway).

When you assign an IP address to one of the network interface cards, the kernel automatically adds a routing rule for the interface. The kernel constructs the routing table entry so it will match all packets containing a destination address that falls within the range of IP addresses allowed on the network or subnet that the interface is connected to.

In a typical configuration, routing table entries are constructed that will match packets based on the packet's destination IP address. This type of packet is typically matched for:

A particular destination network
A particular destination host
Any destination or host

Note

In the following sections, the add argument to the route command adds a routing rule. To delete or remove a routing rule after it has been added, use the same command but change the add argument to del.

Matching Packets for a Particular Destination Network

If we had to type in the command to add a routing table entry for a NIC connected to the 209.100.100.0 network, the command would look like this (we do not, because the kernel will do this for us automatically):

 #/sbin/route add --net 209.100.100.0 netmask 255.255.0.0 dev eth0

This command adds a routing table entry to match all packets destined for the 209.100.100.0 network and sends them out the eth0 interface.

If we also had a NIC called eth1 connected to the 172.24.150.0 network, we would use the following command as well:

 #/sbin/route add --net 172.24.150.0. netmask 255.255.255.0 dev eth1

With these two routing rules in place, the kernel can send and receive packets to any host connected to either of these two networks. If this machine needs to also forward packets that originate on one of these networks to the other network (the machine is acting like a router), you must also enable the kernel's packet forwarding feature with the following command:

 #/bin/echo 1 > /proc/sys/net/ipv4/ip_forward

As we will see in Part III of this book, this kernel capability is also a key technology that enables a Linux machine to become a cluster load balancer.

Matching Packets for a Particular Destination Host

Now let's assume we have a Linux host called Linux-Host-1 with two NICs connected to two different networks. We want to be able to send packets to Linux-Host-2, which is not physically connected to either of these two networks (see Figure 2-4). If we can only have one default gateway, how will the kernel know which NIC and which router to use?

If we only need to be able to send packets to one host on the 192.168.150.0 network, we could enter a routing command on Linux-Host-1 that will match all packets destined for this one host:

 #/sbin/route add -host 192.168.150.33 gw 172.24.150.1

This command will force all packets destined for 192.168.150.33 to the internal gateway at IP address 172.24.150.1. Assuming that the gateway is configured properly, it will know how to forward the packets it receives from Linux-Host-1 to Linux-Host-2 (and vice versa).

If, instead, we wanted to match all packets destined for the 192.168.150.0 network, and not just the packets destined for Linux-Host-2, we would enter a routing command like this:

 #/sbin/route add --net 192.168.150.0 netmask 255.255.255.0 gw 172.24.150.1

Note

In both of these examples, the kernel knows which physical NIC to use thanks to the rules described in the previous section that match packets for a particular destination network. (The 172.24.150.1 address will match the routing rule added for the 172.24.150.0 network that sends packets out the eth1 interface.)

Figure 2-4: Linux firewall and router example network

Matching Packets for any Destination or Host

As described previously, the last rule in the routing table, the default route, is normally used to match all packets regardless of their destination address and force them to a router or internal gateway device. The command to create this type of routing rule looks like this:

 #/sbin/route add --default gw 172.24.150.1

This routing rule causes the kernel to send packets that do not match any of the previous routing rules to the internal gateway at IP address 172.24.150.1.

If Linux-Host-1 needs to send packets to the Internet through the router (209.100.100.1), and it also needs to send packets to any host connected to the 192.168.150.0 network, use the following commands:

 #/sbin/ifconfig eth0 209.100.100.3 netmask 255.255.255.0
 #/sbin/ifconfig eth1 172.24.150.3 netmask 255.255.255.0
 #/sbin/route add -net 192.168.150.0 netmask 255.255.255.0 gw 172.24.150.1
 #/sbin/route add default gw 209.100.100.1

The first two commands assign the IP addresses and network masks to the eth0 and eth1 interfaces and cause the kernel to automatically add routing table entries for the 209.100.100.0 and 172.24.150.0 networks.

The third command causes the kernel to send packets for the 192.168.150.0 network out the eth1 network interface to the 172.24.150.1 gateway device. Finally, the last command matches all other packets, and in this example, sends them to the Internet router.

To View Your Routing Rules

To view your list of routing rules, use any of the following three commands:

 #netstat -rn
 #route -n
 #ip route list

The output of the first two commands (which may be slightly more readable than the list produced by the ip command) looks like this:

 Kernel IP routing table
 Destination     Gateway        Genmask         Flags  MSS  Window  irtt  Iface
 209.100.100.0   0.0.0.0        255.255.255.0   U      0    0       0     eth0
 192.168.150.0   172.24.150.1   255.255.255.0   UG     0    0       0     eth1
 172.24.150.0    0.0.0.0        255.255.255.0   U      0    0       0     eth1
 127.0.0.0       0.0.0.0        255.0.0.0       U      0    0       0     lo
 0.0.0.0         209.100.100.1  0.0.0.0         UG     0    0       0     eth0

Notice that the destination address for the final routing rule is 0.0.0.0, which means that any destination address will match this rule. (You'll find more information for this report on the netstat and route man pages.)

Making Your Routing Rules Permanent

Like the Netfilter iptables rules, the routing rules do not survive a system reboot. To make your routing rules permanent, place them into a script (and add them to the boot process with chkconfig or ntsysv, as described in Chapter 1) or use the method provided by your distribution.

iptables and Routing Resource Script

As with iptables, the standard Red Hat method of saving the routing rules is not the best one to use when you need your rules to failover from one server to another.^[9] In a cluster configuration, we want to enable and disable packet handling methods just like any other service. Let's therefore expand our definition of a service to include capabilities offered inside the kernel. (Chapter 3 describes how to configure capabilities within the Linux kernel and recompile it.)

To failover kernel capabilities such as the iptables and routing rules, we need to use a script that can start and stop them on demand. This can be done for the kernel's packet-handling service, as we'll call it, with a script like this:

 #!/bin/bash
 #
 # Packet Handling Service
 #
 # chkconfig 2345 55 45
 # description: Starts or stops iptables rules and routing
 
 case "$1" in
 start)
     # Flush (or erase) the current iptables rules
     /sbin/iptables -F
     /sbin/iptables --table nat -flush
     /sbin/iptables --table nat --delete-chain
 
     # Enable the loopback device for all types of packets
     # (Normally for packets created by local daemons for delivery
     # to local daemons)
     /sbin/iptables -A INPUT -i lo -p all -j ACCEPT
     /sbin/iptables -A OUTPUT -o lo -p all -j ACCEPT
     /sbin/iptables -A FORWARD -o lo -p all -j ACCEPT
 
     # Set the default policies
     /sbin/iptables -P INPUT DROP
     /sbin/iptables -P FORWARD DROP
     /sbin/iptables -P OUTPUT ACCEPT
 
     # NAT
     /sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
 
     # Allow inbound packets from our private network
     /sbin/iptables -A INPUT -i eth1 -j ACCEPT
     /sbin/iptables -A FORWARD -i eth1 -j ACCEPT
 
     # Allow packets back in from conversations we initiated
     # from the private network.
     /sbin/iptables -A FORWARD -i eth0 --match state --state ESTABLISHED,RELATED -j ACCEPT
     /sbin/iptables -A INPUT --match state --state ESTABLISHED,RELATED -j ACCEPT
 
     # Allow Sendmail and POP (from anywhere, but really what we
     # are allowing here is inbound connections on the eth0 interface).
     # (Sendmail and POP are running locally on this machine).
     /sbin/iptables -A INPUT --protocol tcp --destination-port 25 -j ACCEPT
     /sbin/iptables -A INPUT --protocol tcp --destination-port 110 -j ACCEPT
 
     # Routing Rules --
     # Route packets destined for the 192.168.150.0 network using the internal
     # gateway machine 172.24.150.1
     /sbin/route add -net 192.168.150.0 netmask 255.255.255.0 gw 172.24.150.1
 
     # By default, if we don't know where a packet should be sent we
     # assume it should be sent to the Internet router.
     /sbin/route add default gw 209.100.100.1
 
     # Now that everything is in place we allow packet forwarding.
     echo 1 > /proc/sys/net/ipv4/ip_forward
 
   ;;
 stop)
     # Flush (or erase) the current iptables rules
     /sbin/iptables -F
 
     # Set the default policies back to ACCEPT
     # (This is not a secure configuration.)
     /sbin/iptables -P INPUT ACCEPT
     /sbin/iptables -P FORWARD ACCEPT
     /sbin/iptables -P OUTPUT ACCEPT
 
     # Remove our routing rules.
     /sbin/route del -net 192.168.150.0 netmask 255.255.255.0 gw 172.24.150.1
     /sbin/route del default gw 209.100.100.1
 
     # Disable packet forwarding
     echo 0 > /proc/sys/net/ipv4/ip_forward
 ;;
 status)
     enabled=`/bin/cat /proc/sys/net/ipv4/ip_forward`
     if [ "$enabled" -eq 1 ]; then
         echo "Running"
     else
         echo "Down"
     fi
 ;;
 *)
         echo "Requires start, stop or status"
 ;;
 esac

This script demonstrates another powerful iptables capability: the ability to do network address translation or NAT to masquerade internal hosts. In this example, all hosts sending requests out to the Internet through Linux-Host-1 will have their IP addresses (the return or source address in the packet header) converted to the source IP address of the Linux-Host-1 machine. This is a common and effective way to prevent crackers from attacking internal hosts—the internal hosts use an IP address that cannot be routed on the Internet^[10] and the NAT machine, using one public IP address, appears to send all requests to the Internet. (This rule is placed in the POSTROUTING Netfilter chain, or hook, so it can apply the masquerade for outbound packets going out eth1 and then de-masquerade the packets from Internet hosts as their replies are returned back through eth1. See the NF_IP_POSTROUTING hook in Figure 2-1 and imagine what the drawing would look like if it were depicting packets going in the other direction—from eth1 to eth0.)

The iptables' ability to perform network address translation is another key technology that enables a cluster load balancer to forward packets to cluster nodes and reply to client computers as if all of the cluster nodes were one server (see Chapter 12).

This script also introduces a simple way to determine whether the packet handling service is active, or running, by checking to see whether the kernel's flag to enable packet forwarding is turned on (this flag is stored in the kernel special file /proc/sys/net/ipv4/ip_forward).

To use this script, you can place it in the /etc/init.d directory^[11] and enter the command:

 #/etc/init.d/routing start

Note

The command service routing start would work on a Red Hat system.

And to check to see whether the system has routing or packet handling enabled, you would enter the command:

 #/etc/init.d/routing status

Note

In Part II of this book, we will see why the word Running was used to indicate packet handling is enabled.^[12]

The ip Command

Advanced routing rules can be built using the ip command. Advanced routing rules allow you to do things like match packets based on their source IP address, match packets based on a number (or mark) inserted into the packet header by the iptables utility, or even specify what source IP address to use when sending packets out a particular interface. (We'll cover packet marking with iptables in Chapter 14. If you find that you are reaching limitations with the route utility, read about the ip utility in the "Advanced Routing HOWTO."^[13])

Note

To build the cluster described in this book, you do not need to use the ip command.

^[9]Red Hat uses the network init script to enable routing rules. The routing rules cannot be disabled or stopped without bringing down all network communication when using the standard Red Hat init script.

^[10]RFC 1918 IP address ranges will be discussed in more detail in Chapter 12.

^[11]This script will also need to have execute permissions (which can be enabled with the command: #chmod 755 /etc/init.d/routing).

^[12]So the Heartbeat program can manage this resource. Heartbeat expects one of the following: Running, running, or OK to indicate the service for the resource is running.

^[13]See http://lartc.org.

In Conclusion

One of the distinguishing characteristics of a Linux Enterprise Cluster is that it is built using packet routing and manipulation techniques. As I'll describe in Part III, Linux machines become cluster load balancers or cluster nodes because they have special techniques for handling packets. In this chapter I've provided you with a brief overview of the foundation you need for understanding the sophisticated cluster packet-handling techniques that will be introduced in Part III.

Часть 3: Компиляция ядра

Overview

In this chapter I'll describe how to compile a Linux kernel. For many system administrators not familiar with Linux, this may sound like a risky thing to do on a server that will be used in production, but once you have done it a few times you'll understand one of the reasons why Linux is so powerful: you have complete control over the operating system.

I could say a lot more about the benefits of having the source code of your operating system, but we'll save that for later. For now, we'll focus on what you need to be able to put your own Linux kernel into production.

To install a Linux kernel you need to do five things:

Get the source code.
Set the options you want.
Compile the code.
Install the object code.
Configure the boot loader.

What You Will Need

Before you compile the kernel, you'll need to install the gcc compiler and its dependencies. (This software is not included with this book because your distribution should already contain it.) Here is a list of packages that are required to compile the kernel (this list is taken from the Red Hat distribution):

binutils
cpp
gcc (check the README file included with the kernel source code to find out which version of the gcc compiler is required^[1]).
glibc-debug
glibc-devel
glibc-kernelheaders
modutils (for kernel versions in the 2.4 series) or module-init-utils (for kernel versions in the 2.6 series)

If you use a text-based terminal, you will also need to have the ncurses package installed so that the make menuconfig command will work. Two RPM packages are required for this on Red Hat systems:

ncurses
ncurses-devel

Once you've installed these packages, you're ready to begin. (If you have dependency problems when installing these package, see Appendix D for some tips.)

^[1]Each kernel release should only be compiled with the version of gcc specified in the README file.

Step 1: Get the Source Code

You'll have to decide whether to use the "stock" version of the Linux source code or the kernel source code included with your distribution. Here are a few of the pros and cons of each method.

Using the Stock Kernel

Pros of using the stock kernel:

You can apply your own patches to customize the kernel.^[2]
It is supported by the entire kernel-development community.
It allows more control over which patches are applied to your system.
You can decide which kernel version you want to use.

Con of using the stock kernel: It may invalidate support from your distribution vendor.

Using the Kernel Supplied with Your Distribution

Pros of using the distribution kernel:

It is usually supported by your distribution vendor (though the vendor may not support it if you recompile it with changes).
It is known to compile.
It is already running on your system.
It is easy to install.
It has been tested by your distribution vendor and is likely to contain bug fixes that are not yet a part of the stock version of the kernel.

Cons of using the distribution kernel:

You may not be able to apply kernel patches.
It may have bugs introduced by your distribution vendor that are not found in the stock version of the kernel, and you may depend on your distribution vendor to fix them.

Decide Which Kernel Version to Use

If you use the kernel source code supplied by your vendor, you won't have many decisions to make when it comes to picking a kernel version to use; your distribution vendor will do that for you. However, if you decide to use the stock version of the kernel, you'll have to do a little more research to determine which kernel version to use.

You can use the following command to see which kernel version you are currently running:

 #uname -a

Note

If you decide to download your own kernel, select an even-numbered one (2.2, 2.4, 2.6, and so on). Even-numbered kernels are considered the stable ones.

Here are sample commands to copy the kernel source code from the CD-ROM included with this book onto your system's hard drive:

 #mount /mnt/cdrom
 #cd /mnt/cdrom/chapter3
 #cp linux-*.tar.gz /usr/src
 #cd /usr/src
 #tar xzvf kernel-*

Note

Leave the CD-ROM mounted until you complete the next step.

Now you need to create a symbolic link so that your kernel source code will appear to be in the /usr/src/linux directory. But first you must remove the existing symbolic link. The commands to complete this step look like this:

 #rm /usr/src/linux
 #ln -s /usr/src/linux-<version> /usr/src/linux
 #cd /usr/src/linux

Note

The preceding rm command above will not remove /usr/src/linux if it is a directory instead of a file. (If /usr/src/linux is a directory on your system, move it instead with the mv command.)

You should now have your shell's current working directory set to the top level directory of the kernel source tree, and you're ready to go on to the next step.

^[2]For example, you can apply the hidden loopback interface kernel patch that we'll use to build the real servers inside the cluster in Part III of this book.

Step 2: Set the Options You Want

Your actions here will depend upon whether you decide to install a new kernel, upgrade your existing kernel source code, or patch your existing kernel source code.

Installing a New Kernel

If you decide to install a new kernel (rather than patch or upgrade your existing one), you must first clean the source code tree of any files left behind from previous kernel compilations with the make clean command. Then use the make mrproper command to erase all .config files from the top level of the source code directory tree so that you can start with a clean slate. The make oldconfig command is then used to build a fresh .config file from scratch. Thus, the commands you would enter in the top level of the kernel source tree to prepare it for a new compilation are as follows:

 #make clean
 #make mrproper
 make oldconfig

Upgrading or Patching the Kernel

If you decide to upgrade or patch your existing kernel, save a copy of your existing /usr/src/linux/.config file first. Select an archival directory that you'll remember, and use it to store your old .config files in case you ever need to go back to an old kernel configuration. For example, to save the current .config file from the top level of your kernel source tree to the /boot directory, you would use a command that looks like this:

 #cp .config /boot/config-<version>

Note

When upgrading to a newer version of the kernel, you can copy your old .config file to the top level of the new kernel source tree and then issue the make oldconfig command to upgrade your .config file to support the newer version of the kernel.

Upgrading a Kernel From a Distribution Vendor

If you are upgrading your existing kernel source code, and this kernel source code was supplied by your distribution vendor, you should locate the .config file that was used to originally build the kernel you are currently running on your system. The distribution vendor may have left this configuration file in the top level of their kernel source distribution tree, or they may have a separate directory where they place a variety of .config files for different hardware architectures.

When you find this configuration file, be sure to place it in the top-level directory of your kernel source tree into the file .config. If you are upgrading to a newer version of the kernel, you should then issue the command:

 #make oldconfig

This command will prompt you for any kernel options that have changed since the .config file was originally created by your distribution vendor, so it never hurts to run this command (even when you are not upgrading your kernel) to make sure you are not missing any of the options required to compile your kernel.

Note

In the top of the kernel source tree you'll also find a Makefile. You can edit this file and change the value of the EXTRAVERSION number that is specified near the top of the file. If you do this, you will be assigning a string of characters that will be added to the end of the kernel version number. Specifying an EXTRAVERSION number is optional, but it may help you avoid problems when your boot drive is a SCSI device (see the "SCSI Gotchas" section near the end of this chapter).

Set Your Kernel Options

You now need to set your kernel options regardless of whether you installed a new kernel, are upgrading your existing kernel source code, or are patching your existing kernel source code.

You select the options you want to enable in your kernel from a text console screen by running the command:

 #make menuconfig

One of the most important options you can select from within the make menuconfig utility is whether or not your kernel compiles as a monolithic or a modular kernel. When you specify the option to compile the kernel as a modular kernel (see "A Note About Modular Kernels" for details), a large portion of the compiled kernel code is stored into files in the /lib/modules directory on the hard drive, and it is only used when it is needed. If you select the option to compile the kernel as a monolithic kernel (you disable modular support in the kernel), then all of the kernel code will be compiled into a single file.

Knowing which of the other kernel options to enable or disable could be the subject of another book as thick as this one, so we won't go into much detail here. However, if you are using a 2.4 kernel, you may want to start by selecting nearly all of the kernel options available and building a modular kernel. This older kernel code is likely to compile (thanks to the efforts of thousands of people finding and fixing problems), and this modular kernel will only load the kernel code (the modules) necessary to make the system run. Newer kernels, on the other hand, may require a more judicious selection of options to avoid kernel compilation problems.

Note

The best place to find information about the various kernel options is to look at the inline kernel help screens available with the make menuconfig command. These help screens usually also contain the name of the module that will be built when you compile a modular kernel.

Modular kernels require more effort to implement when using a SCSI boot disk (see "SCSI Gotchas" later in this chapter). However, they are more efficient because they load or run only the kernel modules required, as they are needed. For example, the CD-ROM driver will be unloaded from memory when the CD-ROM drive has not been used for a while. Modular kernels allow you to compile a large number of modules for a variety of hardware configurations (several different NIC drivers, for example) and build a one-size-fits-all, generic kernel.

To build a modular kernel, you simply select Enable Loadable Module Support on the Loadable Module Support menu when you run the make menuconfig utility. As a general rule, you should always enable loadable module support^[3] so you can then decide as you select each kernel option whether it will be compiled into the kernel, by placing an asterisk (*) next to it, or compiled as a module, by placing an M next to it.

To continue our example of using the kernel source code supplied on the CD-ROM, enter the following lines from the top of the kernel source tree:

 #cp /mnt/cdrom/chapter3/sample-config .config
 #umount /mnt/cdrom
 #make oldconfig
 #make menuconfig^[4]

Note

The .config file placed into the top of the kernel source tree is hidden from the normal ls command. You must use ls -a to see it.

The sample-config file used in the previous command will build a modular kernel, but you can change it to create a monolithic kernel by disabling loadable module support. To do so, deselect it on the Loadable Module Support submenu. Be sure to enter the Processor Type and Features submenu and select the correct type of processor for your system, because even a modular kernel only compiles for one type of processor (or family of processors). See the contents of the file /proc/cpuinfo to find out which CPU the kernel thinks your system is using. Also, you normally should always select SMP support even if your system has only one CPU.

Note

When building a monolithic kernel, you can reduce the size of the kernel by deselecting the kernel options not required for your hardware configuration. If you are currently running a modular kernel, you can see which modules are currently loaded on your system by using the lsmod command. Also see the output of the following commands to learn more about your current configuration:

 #lspci -v | less
 #dmesg | less
 #cat /proc/cpuinfo
 #uname -m

Once you have saved your kernel configuration file changes, you are ready to begin compiling your new kernel.

^[3]Packages such as VMware require loadable module support.

^[4]If the make menuconfig command complains about a clock skew problem, check to make sure your system time is correct, and use the hwclock command to force your hardware clock to match your system time. See the hwclock manual page for more information.

Step 3: Compile the Code

To compile your kernel, you first need to convert the kernel source code files into object code with the gcc compiler, using the make utility.

If you are using the 2.4 kernel supplied with this book, start with the make dep command to build the kernel module dependencies. However, you can skip this step if you are compiling a kernel from the 2.6 series.

Also, you must clean the kernel source tree regardless of which version of the kernel you are using, whether you are building a modular or monolithic kernel, or whether you are patching, upgrading, or compiling a new kernel. Use the make clean command, which will clean the kernel source tree and remove any temporary files that were left behind by a previous kernel compilation.

Once the kernel source tree is clean, you are ready to tell the gcc compiler to build a new kernel with the make bzImage command; then, if you are building a modular kernel, you'll need to add the make modules and make modules_install commands to build and install the kernel modules into the /lib/modules directory.

To store the lengthy output of the compilation commands in a file, precede the make commands with the nohup command, which automatically redirects output to a file named nohup.out. If you have difficulty compiling the kernel, you can enter each make command separately and examine the output of the nohup.out file after each command completes to try and locate the source of the problem. In the following example, all of the make commands are combined into one command to save keystrokes.

 #cd /usr/src/linux
 #nohup make dep clean bzImage modules modules_install

Note

If you are building a monolithic kernel, simply leave off the modules modules_install arguments, and if you are building a 2.6 kernel, leave off the dep argument

Make sure that the kernel compiles successfully by looking at the end of the nohup.out file with this command:

 #tail nohup.out

If you see error messages at the end of this file, examine earlier lines in the file until you locate the start of the problem. If the nohup.out file does not end with an error message, you should now have a file called bzImage located in the /usr/src/linux/arch/*/boot (where * is the name of your hardware architecture). For example, if you are using Intel hardware the compiled kernel is now in the file /usr/src/linux/arch/i386/boot/bzImage. You are now ready to install your new kernel.

Step 4: Install the Object Code and Configuration File

The make modules_install command has already done part of this installation step, placing all of the modular kernel executables into a /lib/modules subdirectory based on the version number of the kernel. Now you just need to install the main part of the kernel by copying the kernel executable from the kernel source tree to a directory where your boot loader can find it at system boot time. Because boot loaders normally look for the kernel in the /boot directory, we'll place a copy of the kernel executable there.

Enter this command to copy the kernel executable file you just compiled, named bzImage, to where your boot loader expects to find it when the system first starts:

 #cp /usr/src/linux/arch/i386/boot/bzImage /boot/vmlinuz-<version>

Replace <version> in this example command with the version number of your kernel, and i386 with the proper name of the hardware architecture you are using. (Remember to also add the EXTRAVERSION number if you specified one in the Makefile in Step 2.)

If the nohup.out file ends with an error, or if the bzImage file is not created, or if the /lib/modules subdirectory does not exist,^[5] try copying different versions of the .config file into place or use the make mrproper and make oldconfig commands to build a new .config file from scratch.^[6] Then rerun the make commands to compile the kernel again to see if the kernel will compile with a different .config file.

You should also try turning on SMP support for your kernel through the make menuconfig command (if you are not already using one of the SMP .config files from your distribution).

If the nohup.out file contains just a few lines of output, and you see error messages about the .hdepend file, try removing this temporary file (if you are using a 2.4 or earlier kernel), and then enter the nohup make dep clean bzImage modules modules_install command again. Or, if you cannot get a newer kernel to compile, try an older, more stable version from the 2.4 series.

If none of this helps, try entering each of the make commands separately: make dep, then make clean, then make bzImage, and so forth, and send the output of each command to the screen and to a separate log file:

 #make dep 2>&1 | tee /tmp/dep.log
 #make clean 2>&1 |tee /tmp/clean.log
 #make bzImage 2>&1 | tee /tmp/bzimage.log
 #make modules 2>&1 | tee /tmp/modules.log
 #make modules_install _2>&1 | tee /tmp/modules_install.log

This should help you isolate the problem, at which point you can go online to find a solution. Try the comp.os.linux newsgroups (or comp.os.linux.redhat, dedicated to problems with Red Hat Linux). Also, see the Linux section at http://marc.theaimsgroup.com.

Before posting a kernel compilation question to one of these newsgroups, read the Linux FAQ (http://www.tldp.org), the Linux-kernel mailing list FAQ (http://www.tux.org/lkml), and the Linux-kernel HOWTO.

Install the System.map

Your system may need a file called System.map when it loads a module to resolve any module dependencies (a module may require other modules for normal operation).^[7]

Copy the System.map file into place with this command:

 #cp /usr/src/linux/System.map /boot/System.map

Save the Kernel Configuration File

For documentation purposes, you can now save the configuration file used to build this kernel. Use the following command:

 #cp /usr/src/linux/.config /boot/config-<version>

Again, <version> in this command should be replaced with the kernel version you have just built.

^[5]The /lib/modules subdirectory should be created when building a modular kernel.

^[6]This method should at least get the kernel to compile even if it doesn't have the options you want. You can then go back and use the make menuconfig command to set the options you need.

^[7]The System.map file is only consulted by the modutils package when the currently running kernel's symbols (in /proc/ksyms) don't match what is stored in the /lib/modules directory.

Step 5: Configure Your Boot Loader

The previous four steps are used to install any program, but the Linux kernel executable is part of the operating system, so we have to do a few additional things to cause the computer to recognize the kernel at boot time.

To complete this step, you must know what type of boot loader your system is using. Newer Linux distributions use GRUB as the boot loader of choice, but many older distributions use the LILO boot loader. (Reboot your system and watch it boot if you are not sure—the type of boot loader you are using will be displayed during bootup.)

Note

If you are using a SCSI boot disk, see the "SCSI Gotchas" section later in the chapter.

Both LILO and GRUB give you the ability at boot time to select which kernel to boot from, selecting from those listed in their configuration files. This is useful if your new kernel doesn't work properly. Therefore, you should always add a reference to your new kernel in the boot loader configuration file and leave the references to the previous kernels as a precaution.

If you are using LILO, you'll modify the /etc/lilo.conf file; if you are using GRUB, you'll modify the /boot/grub/grub.conf file (or the /boot/grub/ menu.lst if you installed GRUB from source).

Add your new kernel to lilo.conf with an entry that looks like this:

 image=/boot/vmlinuz-<version>
     label=linux.orig
     initrd=/boot/initrd-<version>.img
     read-only
     root=/dev/hda2

Add your new kernel to grub.conf with an entry that looks like this:

 title My New Kernel (<version>)
     root (hd0,0)
     kernel /vmlinuz-<version> ro root=/dev/hda2
     initrd /initrd-<version>.img

You do not need to do anything to inform the GRUB boot loader about the changes you make to the grub.conf configuration file, because GRUB knows how to locate files even before the kernel has loaded. It can find its configuration file at boot time. However, when using LILO, you'll need to enter the command lilo -v to install a new boot sector that tells LILO where the configuration file resides on the disk (because LILO doesn't know how to use the normal Linux filesystem at boot time).

The name / is given to the Linux root filesystem mount point. This mount point can be used to mount an entire disk drive or just a partition on a disk drive, though normally the root filesystem is a partition. (On Red Hat systems, Linux uses the second partition of the first disk device at boot time; the first partition is usually the /boot partition).

To find out which root partition or device your system is using, use the df command. For example, the following df command shows that the root filesystem is using the /dev/hda2 partition:

 # df
 Filesystem     1K-blocks   Used Available Use% Mounted on
 /dev/hda2      8484408  1195652  6857764 15% /
 /dev/hda1       99043   9324   84605 10% /boot

When you modify the boot loader configuration file, you must tell the Linux kernel which device, or device and partition, to use for the root filesystem. When using LILO as your boot loader, you specify the root filesystem in the lilo.conf file with a line that looks like this:

 root=/dev/hda2

The grub.conf configuration file entry looks like this:

 kernel /vmlinuz-<version> ro root=/dev/hda2

Note, however, that the Red Hat kernel allows you to use a symbolic name that looks like this:

 kernel /vmlinuz-<version> ro root=LABEL=/

Using the syntax LABEL=/ to specify the root filesystem is only possible when using Red Hat kernels. If you compile the stock kernel, you must specify the root partition device name, such as /dev/hda2.

One final concern when specifying the root filesystem using GRUB is that the GRUB boot loader also has its own root filesystem. The GRUB root filesystem has nothing to do with the Linux root filesystem and is usually on a different partition (it is the /boot partition on Red Hat systems).

To tell GRUB which disk partition to use at boot time as its root filesystem, create an entry such as the following in the grub.conf configuration file:

 root (hd0,0)

This tells GRUB to use the first partition of the first hard drive as the GRUB root filesystem. GRUB calls the first hard drive hd0 (whether it is SCSI or IDE), and partitions are numbered starting with 0. So to use the second partition of the first drive, the syntax would be as follows:

 root (hd0,1)

Note

see the /boot/grub/device.map file for the list of devices on your system.^[8]

Again, this is the root filesystem used by GRUB when the system first boots, and it is not related to the root filesystem used by the Linux kernel after the system is up and running. To list all of the partitions on a device, you can use the command sfdisk -l.

Now you are ready to reboot and test the new kernel.

Enter reboot at the command line, and watch closely as the system boots. If the system hangs, you may have selected the wrong processor type. If the system fails to mount the root drive, you may have installed the wrong filesystem support in your kernel. (If you are using a SCSI system, see the "SCSI Gotchas" section for more information.)

If the system boots but does not have access to all of its devices, review the kernel options you specified in step 2 of this chapter.

Once you can access all of the devices connected to the system, and you can perform normal network tasks, you are ready to test your application or applications.

If you are using a SCSI disk to boot your system, you may want to avoid some complication and compile support for your SCSI driver directly into the kernel, instead of loading it as a module. If you do this on a kernel version prior to version 2.6, you will save yourself the trouble of building the initial RAM disk. However, if you are building a modular kernel from version 2.4 or an earlier series, you will still need to build an initial RAM disk so the Linux kernel can boot up far enough to load the proper module to communicate with your SCSI disk drive. (See the man page for the mkinitrd Red Hat utility for the easiest way to do this on Red Hat systems. The mkinitrd command automates the process of building an initial RAM disk.) The initial RAM disk, or initrd file, must then be referenced in your boot loader configuration file so that the system knows how to mount your SCSI boot device before the modular kernel runs.

Also, if you recompile your 2.4 modular kernel without changing the EXTRAVERSION number in the Makefile, you may accidentally overwrite your existing /lib/modules libraries. If this happens, you may see the error "All of your loopback devices are in use!" when you run the mkinitrd command. If you see this error message, you must rebuild the kernel (before rebooting the system!) and add the SCSI drivers into the kernel. Do not attempt to build the drivers as modules and skip the mkinitrd process altogether.

^[8]The /boot/grub/device.map file is created by the installation program anaconda on Red Hat systems.

In Conclusion

You gain more control over your operating system when you know how to configure and compile your own kernel. You can selectively enable and disable features of the kernel to improve security, enhance performance, support hardware devices, or use the Linux operating system for special purposes. The Linux Virtual Server (LVS) kernel feature, for example, is required to build a Linux Enterprise Cluster. While you can use the precompiled kernel included with your Linux distribution, you'll gain confidence in your skills as a system administrator and a more in-depth knowledge of the operating system when you learn how to configure and compile your own kernel. This confidence and knowledge will help to increase system availability when problems arise after the kernel goes into production.

Оставьте свой комментарий !

Ваше имя:

Комментарий:

Оба поля являются обязательными

Автор	Комментарий к данной статье

Введение

Properties of a Linux Enterprise Cluster

Architecture of the Linux Enterprise Cluster

The Load Balancer

The Shared Storage Device

The Print Server

The Cluster Node Manager

No Single Point of Failure

In Conclusion

Primer

Overview

High Availability Terminology

Linux Enterprise Cluster Terminology

Раздел 1: Запуск сервисов

How do Cluster Services Get Started?

Starting Services with init

The /etc/inittab File

Respawning Services with init

Managing the init Script Symbolic Links with chkconfig

Managing the init Script Symbolic Links with ntsysv

Removing Services You do Not Need

Using the Red Hat init Scripts on Cluster Nodes

In Conclusion

Часть 2: Управление пакетами

Overview

Netfilter

A Brief History of Netfilter

Setting the Default Chain Policy

Using iptables and ipchains

Clear Existing Rules

Set the Default INPUT Chain Policy

FTP

Passive FTP

DNS

Telnet

SSH

Email

HTTP

ICMP

Review and Save Your Rules

Routing Packets with the Linux Kernel

Matching Packets for a Particular Destination Network

Matching Packets for a Particular Destination Host

Matching Packets for any Destination or Host

To View Your Routing Rules

Making Your Routing Rules Permanent

iptables and Routing Resource Script

The ip Command

In Conclusion

Часть 3: Компиляция ядра

Overview

What You Will Need

Step 1: Get the Source Code

Using the Stock Kernel

Using the Kernel Supplied with Your Distribution

Decide Which Kernel Version to Use

Step 2: Set the Options You Want

Installing a New Kernel

Upgrading or Patching the Kernel

Upgrading a Kernel From a Distribution Vendor

Set Your Kernel Options

Step 3: Compile the Code

Step 4: Install the Object Code and Configuration File

Install the System.map

Save the Kernel Configuration File

Step 5: Configure Your Boot Loader

In Conclusion