Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
iakovlev.org

Часть 12

Использование сокетов


Содержание


Прочитав эту главу,вы сможете написать клиент/серверное приложение с помощью сокетов.

Краткое введение в сокеты

Перл предлагает набор функций для доступа к юниксовым сокетам.

Сокет-библией является труд UNIX Network Programming W. Richard Stevens (Prentice Hall, ISBN 0-13-949876-1), в которой есть си-шные примеры,затрагивающие все аспекты сетевого программирования. Примеры в этой главе основы на примерах Стивенса из его книги. Перл хорош тем,что си-шный код можно оди-в -один перевести с минимальными изменениями и искажениями.

Программирование сокетов основано на клиент-серверной модели. Есть 2 программы - клиент и сервер - которые могут быть физически расположены как на одном компьютере,так и разбросаны по сети.

Основным протоколом сокетов является TCP/IP. Протокол Internet Protocol (IP) является управляющим протоколом. Транспортный протокол Transport Control Protocol (TCP) предлагает свой набор функций. Протокол User Datagram Protocol (UDP) похож на TCP. UDP менее надежен чем TCP,потому что это протокол без гарантии. Протокол TCP имеет встроенную проверку на гарантированную передачу пакетов, основанную на подтверждающих пакетах.

Сокеты,использующие TCP или UDP,работают через определенный порт. Для каждого приложения имеется свой уникальный порт. В юниксе есть файл /etc/services, который показывает,какие сервисы распределены по каким портам. Порты в диапазоне 1 - 255 зарезервированы для наиболее стандартных приложений. Например, поре 80-это World Wide Web,для nameserver используется порт 42, для sendmail - 25, и т.д.

Есть 2 способа задания адреса: если 2 процесса разговаривают друг с другом на одной машине, это адрес AF_UNIX. Если с разных машин,то это тип адреса семейства AF_INET. Для AF_UNIX сокет ассоциируется с путем в дереве каталогов. Для AF_INET сокету ставится в соответствие порт и номер приложения..

Есть 2 типа сокетов :

  • SOCK_STREAM. Используется протокол TCP для надежной передачи данных. Для SOCK_STREAM необходима первоначальная установка коннекта. SOCK_STREAM-протокол,ориентированный на коннект.
  • SOCK_DGRAM. Используется протокол UDP,работает быстрее предыдущего. Коннект не требуется.

Уникальность сокета гарантируется 3 параметрами: уникальный ip-шник компьютера,уникальный номер порта плюс уникальный номер самого сокета.

Перловые функции для работы с протоколами

The protocols available on your UNIX system are located in the /etc/protocols file. You have to use three functions to read this file. The function setprotoent() starts the listing process. The Perl function getprotoent() reads one line from the /etc/protocols file and lists it for you. Successive calls to the function read successive lines. Finally, a call to endprotoent() stops the listening process. A simple way to have all the protocols available to your Perl script is to use the script shown in Listing 12.1.


Listing 12.1. Showing available protocols.
1 #!/usr/bin/perl
2 #
3 # List all the protocols in this machine
4 #
5 setprotoent(1);
6 while (($name, $aliases, $protonum) = getprotoent) {
7      print " Name=$name, Aliases=$aliases, Protocol=$protonum \n";
8 }
9 endprotoent;

The output should be similar to what is shown here:

Name=ip, Aliases=IP, Protocol=0
Name=icmp, Aliases=ICMP, Protocol=1
Name=igmp, Aliases=IGMP, Protocol=2
Name=ggp, Aliases=GGP, Protocol=3
Name=tcp, Aliases=TCP, Protocol=6
Name=pup, Aliases=PUP, Protocol=12
Name=udp, Aliases=UDP, Protocol=17
Name=idp, Aliases=IDP, Protocol=22
Name=raw, Aliases=RAW, Protocol=255

To keep the file open between successive calls to the getprotoent() call, you should call the setprotoent() function with a nonzero parameter. To stop querying the file, use the endprotoent() call.

To determine whether you have a specific protocol present, you can use the system call getprotobyname or getprotobynumber. A return value of NULL indicates that the protocol is not there. The name passed to the function is not case-sensitive. Therefore, to list the names, aliases, and the protocol number for TCP, you can use this:

if  (($name, $aliases, $protonum) = getprotobyname('tcp')) {
     print "\n Name=$name, Aliases=$aliases, Protocol=$protonum";
}
print "\n"

A comparable set of calls is available for determining what services are available for your machine. This call queries the /etc/services file. Listing 12.2 illustrates how to use these calls. The setservent call with a nonzero file rewinds the index into the services file for you, the getservent gets the four items in the service entry, and the endservent call terminates the lookup. The output from this file can be a bit lengthy and is shown in Listing 12.2 starting at line 15.

In Listing 12.2, lines 1 and 2 clear the screen and show the output of the showme.pl file with the script in it. At line 13, we execute this script. Your output may be different than the one shown in Listing 12.2 depending on what services you have installed on your system.


Listing 12.2. Listing server services.
  1 $ clear
  2 $ cat showme.pl
  3 #!/usr/bin/perl
  4 setservent(1);
  5 printf "%15s %15s %4s %15s\n",
  6     "Name","Aliases","Port","Protocol";
  7 while(($nm,$al,$pt,$pr) = getservent) {
  8     # print "Name=$nm, Aliases=$al, Port=$pt, $Protocol=$pr\n";
  9     printf "%15s %15s %4d %15s\n", $nm,$al,$pt,$pr;
 10 }
 11 endservent;
 12 $
 13 $
 14 $ showme.pl
 15            Name         Aliases          Port         ;Protocol
 16          tcpmux                             1             tcp
 17            echo                             7             tcp
 18            echo                             7             udp
 19         discard       sink null             9             tcp
 20         discard       sink null             9             udp
 21          systat           users            11             tcp
 22         daytime                            13    ;          tcp
 23         daytime                            13    ;          udp
 24         netstat                            15    ;          tcp
 25            qotd           quote            17             tcp
 26             msp                            18             tcp
 27             msp                            18             udp
 28         chargen   ttytst source            19             tcp
 29         chargen   ttytst source            19             udp
 30             ftp                            21             tcp
 31          telnet                            23              tcp
 32            smtp            mail            25   &nbs p;         tcp
 33            time       timserver            37             tcp
 34            time       timserver            37             udp
 35             rlp        resource            39             udp
 36      nameserver            name            42        &nbs p;    tcp
 37           whois         nicname            43             tcp
 38          domain      nameserver            53          &n bsp;  tcp
 39          domain      nameserver            53          &n bsp;  udp
 40             mtp                            57             tcp
 41          bootps                            67              tcp
 42          bootps                            67              udp
 43          bootpc                            68              tcp
 44          bootpc                            68              udp
 45            tftp                            69&n bsp;            udp
 46          gopher                            70              tcp
 47          gopher                            70              udp
 48             rje          netrjs            77             tcp
 49          finger                            79              tcp
 50             www            http            80             tcp
 51             www                            80             udp
 52            link         ttylink            87      & nbsp;      tcp
 53        kerberos            krb5            88              tcp
 54        kerberos                            88   &nbs p;         udp
 55          supdup                            95              tcp
 56       hostnames        hostname           101             tcp
 57        iso-tsap            tsap           102             tcp
 58        csnet-ns          cso-ns           105             tcp
 59        csnet-ns          cso-ns           105             udp
 60         rtelnet                           107   &nbs p;         tcp
 61         rtelnet                           107   &nbs p;         udp
 62            pop2      postoffice           109             tcp
 63            pop2                           109 & nbsp;           udp
 64            pop3                           110 & nbsp;           tcp
 65            pop3                           110 & nbsp;           udp
 66          sunrpc                           111    ;          tcp
 67          sunrpc                           111    ;          udp
 68            auth tap ident authentication  113             tcp
 69            sftp                           115 & nbsp;           tcp
 70       uucp-path                           117             tcp
 71            nntp   readnews untp           119             tcp
 72             ntp                           123             tcp
 73             ntp                           123             udp
 74      netbios-ns                           137      & nbsp;      tcp
 75      netbios-ns                           137      & nbsp;      udp
 76     netbios-dgm                           138             tcp
 77     netbios-dgm                           138             udp
 78     netbios-ssn                           139             tcp
 79     netbios-ssn                           139             udp
 80           imap2                           143             tcp
 81           imap2                           143             udp
 82            snmp                           161 & nbsp;           udp
 83       snmp-trap        snmptrap           162             udp
 84        cmip-man                           163    &nb sp;        tcp
 85        cmip-man                           163    &nb sp;        udp
 86      cmip-agent                           164      & nbsp;      tcp
 87      cmip-agent                           164      & nbsp;      udp
 88           xdmcp                           177             tcp
 89           xdmcp                           177             udp
 90        nextstep NeXTStep NextStep         178             tcp
 91        nextstep NeXTStep NextStep         178             udp
 92             bgp                           179             tcp
 93             bgp                           179             udp
 94        prospero                           191    &nb sp;        tcp
 95        prospero                           191    &nb sp;        udp
 96             irc                           194             tcp
 97             irc                           194             udp
 98            smux                           199 & nbsp;           tcp
 99            smux                           199 & nbsp;           udp
100         at-rtmp                           201    &nb sp;        tcp
101         at-rtmp                           201    &nb sp;        udp
102          at-nbp                           202   &nbs p;         tcp
103          at-nbp                           202   &nbs p;         udp
104         at-echo                           204    &nb sp;        tcp
105         at-echo                           204    &nb sp;        udp
106          at-zis                           206   &nbs p;         tcp
107          at-zis                           206   &nbs p;         udp
108           z3950            wais           210             tcp
109           z3950            wais           210             udp
110             ipx                           213             tcp
111             ipx                           213             udp
112           imap3                           220             tcp
113           imap3                           220             udp
114       ulistserv                           372             tcp
115       ulistserv                           372             udp
116            exec                           512              tcp
117            biff          comsat           512             udp
118           login                           513             tcp
119             who            whod           513             udp
120           shell             cmd           514             tcp
121          syslog                           514   &nbs p;         udp
122         printer         spooler           515             tcp
123            talk                           517              udp
124           ntalk                           518             udp
125           route   router routed           520             udp
126           timed      timeserver           525             udp
127           tempo         newdate           526             tcp
128         courier             rpc           530             tcp
129      conference            chat           531             tcp
130         netnews        readnews           532             tcp
131         netwall                           533    &nb sp;        udp
132            uucp           uucpd           540             tcp
133        remotefs  rfs_server rfs           556             tcp
134          klogin                           543   &nbs p;         tcp
135          kshell                           544   &nbs p;         tcp
136    kerberos-adm                           749        &nbs p;    tcp
137         webster                           765    &nb sp;        tcp
138         webster                           765    &nb sp;        udp
139      ingreslock                          1524         ;     tcp
140      ingreslock                          1524         ;     udp
141     prospero-np                          1525             tcp
142     prospero-np                          1525             udp
143             rfe                          5002             tcp
144             rfe                          5002             udp
145       krbupdate            kreg           760             tcp
146         kpasswd            kpwd           761             tcp
147         eklogin                          2105     &n bsp;       tcp
148      supfilesrv                           871              tcp
149      supfiledbg                          1127         ;     tcp

Tip
The gethostent function was not implemented as of Perl 5.002.

Perl also lets you look at the host name by address in your /etc/hosts file with the gethostbyaddr call. This function takes two parameters, the address to look up and the value of AF_INET. On most systems, this value is set to 2 but can be looked up in the /usr/include/sys/socket.h file. The gethostbyname("hostname") function returns the same values as the gethostbyaddr() call. The parameter passed into the function is the name of the host being looked up. Listing 12.3 illustrates how to do this.

In the program shown in Listing 12.3, the code in Line 4 gets the host name and alias given the address 204.251.103.2. You would use a different address, of course, because the address shown here is specific to my machine. Lines 6 through 10 print the components of the information you get back from the gethostbyaddr function call. Also, in lines 12 and 13, you can get the same information back using the node name instead of an IP address. Lines 14 through 19 print these values.


Listing 12.3. Sample listing to show usage of gethostbyname and gethostbyaddr.
 1 #!/usr/bin/perl
 2
 3 $addr = pack('C4',204,251,103,2);
 4 ($name,$alias,$atype,$len,@addrs) = gethostbyaddr($addr,2);
 5 ($a,$b,$c,$d) = unpack('C4',$addrs[0]);
 6 print "Name :  $name \n";
 7 print "Alias:  $alias \n";
 8 print "Type :  $atype \n";
 9 print "Len  :  $len \n";
10 print "Node :  $a.$b.$c.$d \n";
11
12 $name = "www.ikra.com";
13 ($name,$alias,$atype,$len,@addrs) = gethostbyname($name);
14 ($a,$b,$c,$d) = unpack('C4',$addrs[0]);
15 print "Name :  $name \n";
16 print "Alias:  $alias \n";
17 print "Type :  $atype \n";
18 print "Len  :  $len \n";
19 print "Node :  $a.$b.$c.$d \n";

Note
Note the use of the number 2 in the call to gethostbyaddr in Listing 12.3. This should be the constant $AF_INET.

Socket Primitives

Enough already about getting information on your system. Let's see what socket functions are available to you. Depending on your site and what extensions you have for Perl, you may have more functions available. Check the man pages for socket for more information. Here are the most common ones you'll use:

  • socket()-Creates a socket
  • bind()-Binds a process to a socket
  • accept()-Accepts an incoming request for connection
  • listen()-Use for servers to begin listening for connections to a socket
  • connect()-Use for clients to connect to a server
  • read()-Reads binary data from a socket
  • write()-Writes binary data to a socket

I cover these functions in the following sections. However, there are some constants that must be defined before I continue. These constants are used in all function calls and scripts in this chapter. Feel free to change them to reflect your own system's peculiarities. Here's a list of the constants:

  • $AF_UNIX = 1. You'll be working with the AF_UNIX family of protocols.
  • $SOCK_STR = 1. When you work with UDP, set this variable to 2. If you will work with TCP/IP only, set it to 1.
  • $PROTOCOL = 0. The default protocol for all our examples is the one you will most probably wind up using anyway-IP. See man pages for protocols for more information about what other types are available.

socket()

The socket() system call creates a socket for the client or the server. The socket function is defined as this:

socket(SOCKET_HANDLE, $FAMILY, $TYPE, $PROTOCOL);

The return value from this function is NULL, and if there was an error, you should check the $! for the type of error message. The call to open a socket looks like this:

socket(MY_HANDLE,  $AF_UNIX, $STREAMS, $PROTOCOL) ||
     die "\nCannot open socket: $!\n";
print "\nSocket successfully opened\n";

It's a good idea to unlink any existing file names for previously opened sockets with the unlink call:

unlink "$my_tst_srvr" || die "\n$O: No permissions";

You'll use the socket descriptor MY_HANDLE to refer to this socket in all subsequent network function calls in your program. Sockets are created without a name. Clients use the name of the socket in order to read or write to it. This is where the bind function comes in.

The bind() System Call

The bind() system call assigns a name to an unnamed socket:

bind(SOCKET_HANDLE, $nameAsAString);
bind(SOCKET_HANDLE, $sockaddr);

The first item is the socket descriptor you just created. The second parameter is the name that refers to this socket if you are using AF_UNIX or its address if you are using AF_INET.

The call to bind using AF_UNIX looks like this:

bind(MY_HANDLE,"./mysocket") || die "Cannot bind $!\n";

In AF_INET, it looks like this:

$port = 6666
$AF_INET=2;     # Use AF_INET instead of AF_UNIX
($name,$alias,$atype,$len,$addr)=gethostbyname(`uname -n`);
$my_ip_addr = pack(S n C4 x8,$AF_INET,$STREAMS,$port,$addr);
bind(MY_HANDLE,$my_ip_addr) || die "Cannot bind $!\n";

The parameters' pack() function probably needs some explanation. The pack() function takes two parameters: a list of formats to use and a list of values to pack into one continuous stream of bytes. In our case, the bind() call expects a sockaddr structure of the following form in a C structure:

{
unsigned short family;
int network;
long     address;
char nullbytes[8];
     }

The first parameter to the pack instruction can take the values listed in Table 12.1. Check the man pages for the pack instruction for more details. You had the pack instruction create the socket address structure for you. Therefore, the script uses S n C4 x8 to pack a signed short, followed by an integer in network order, four unsigned characters, and eight NULL characters to get this call:

pack(S n C4 x8,$AF_INET,$STREAMS,$port,$addr);

Table 12.1. The types of packing supported by pack().
Character
Type of Packing
@
Null fill to absolute position
a
An ASCII string, padded with NULLs
A
An ASCII string, padded with spaces
b
A bit string in ascending bit order
B
A bit string in descending bit order
c
A signed char value
C
An unsigned char value
d
A double-precision float in the native format
f
A single-precision float in the native format
H
A hex string, high nibble first
h
A hex string, low nibble first
i
A signed integer value
I
An unsigned integer value
l
A signed long value
L
An unsigned long value
N
A long in network order
n
A short in network order
p
A pointer to a null-terminated string
P
A pointer to a structure (fixed-length string)
s
A signed short value
S
An unsigned short value
u
A uuencoded string
V
A long in little-endian order
v
A short in little-endian order
x
A null byte
X
Back up a byte

Now that you have bound an address for your server or client, you can connect to it or listen for connections with it. If your program is a server, it will set itself up to listen and accept connections.

Tip
Do not use S n A4 x8 instead of S n C4 x8 for packing into the sockaddr structure. The C4 specifies four unsigned char data values and is safe to use. The A4 implies a string, which may confuse the pack() function if there are nulls (zeroes) in the address.

Now let's look at the functions available for use in a server.

The listen() and accept() System Calls

The listen() system call is used by the server to listen for connections. Once it is ready to listen, the server is able to honor any requests for connections with the accept system call. The listen call is defined as this:

listen(SOCKET_HANDLE, $queueSize);

The SOCKET_HANDLE is the descriptor of the socket you created. The queueSize is the number of waiting connections allowed at one time before any are rejected. Use the standard value of 5 for queue size. A returned value of NULL indicates an error. The call to listen normally looks like this:

listen(MY_HANDLE,5) || die "Cannot listen $!\n";

If this call is successful, you can accept connections with the accept function, which looks like this:

accept(NEWSOCKET, ORIGINAL_SOCKET);

The accept() system call is used by the server to accept any incoming messages from a client's connect() calls. Be aware that this function will not return if no connections are received. As requests come off the queue and set up in the listen() call, the accept function handles them by assigning them to a new socket. NEWSOCKET is created by the accept function as ORIGINAL_SOCKET, but now NEWSOCKET is going to be used to communicate with the client. At this point, most servers fork off (fork()) a child process to handle the client and go back to wait for more connections. Before I get into that, let's see how connections are originated.

Let's look at the connect() call that you'll use to connect to a server.

The connect() System Call

The connect() system call is used by clients to connect to a server in a connection-oriented system. This connect() call should be made after the bind() call. There are two ways you can call the connect() call: one for AF_UNIX using the pathname of the socket and the other using an address as the AF_INET requirement for a socket handle.

connect(SOCKET_HANDLE,"pathname" ); # for AF_UNIX
connect(SOCKET_HANDLE,$Address); # for AF_INET

Connection-Oriented Servers in Perl

Given this background information about socket information gathering, creation, and so on, you are now ready to write your own server using Perl. Listing 12.4 presents a sample server.


Listing 12.4. Server side for connection-oriented protocol.
 1 #!/usr/bin/perl
 2 #
 3 # Sample connection oriented server using Perl
 4 #
 5 $AF_UNIX = 1;
 6 $AF_INET=2;     # Use AF_INET instead of AF_UNIX.
 7 $SOCK_STR = 1;  # Use STREAMS.
 8 $PROTOCOL = 0;  # stick to the default protocols (IP).
 9
10 $port = 6668 unless $port;
11
12 #
13 # The pattern for packing into a sockaddr structure
14 #
15 $PACKIT='S n C4 x8';
16
17 #
18 # Disable any buffering on any newly created sockets.
19 #
20 select(NEWSOCKET);
21 $| = 1;
22 select(stdout);
23
24 #
25 # Create the socket.
26 #
27 socket(MY_SOCKET, $AF_INET, $SOCK_STR, $PROTOCOL) ||
28                        die "\n $0: Cannot open socket: $!";
29 print "Socket successfully opened\n";
30
31 #
32 # Get the host address for this node
33 #
34
35 ($name, $aliases, $addrtype, $len, @addrs) = gethostbyname("www.ikra.com");
36 ($a,$b,$c,$d) = unpack('C4',$addrs[0]);
37 print "Server Name=$name, Server Address= $a.$b.$c.$d\n";
38 $my_ip_addr = pack($PACKIT,$AF_INET,$port,$addrs[0]);
39
40 #
41 # If you just want to test with the localhost, try this line
42 # instead of the above.
43 # $my_ip_addr = pack($PACKIT,$AF_INET,$port,127,0,0,1);
44
45 #
46 # Bind to the socket and listen on this port
47 #
48 bind(MY_SOCKET, $my_ip_addr) || die "$0: Cannot bind .. $!\n";
49 print  "\n Bound to socket";
50 listen(MY_SOCKET,5)  || die "$0: Cannot listen: $!\n";
51 print  "\n Listening \n";
52
53 while(1) {
54      $remote = accept(NEWSOCKET, MY_SOCKET) || die "$0: Unacceptable: $!\n";
55
56 #
57 # In case you have to display incoming connection
58 # information, you can uncomment the next three lines of code:
59
60 #         @remoteInfo = unpack($PACKIT,$remote);
61 #         $, = ' ';
62 #         print @remoteInfo; print "\n";
63
64      # $pid = fork || &cleanup;
65
66           if ($pid == fork)  {  # child
67           sleep 3;
68           print NEWSOCKET "Welcome to this server\n";
69           # in child,.. you can do other stuff here.
70           close NEWSOCKET;
71           # I chose to just print this message and terminate
72           close MY_SOCKET;
73           exit;
74      }
75      else {  # parent
76           sleep 10;
77           close MY_SOCKET;
78           close NEWSOCKET;  # in parent
79           exit;
80      }
81
82 }
83 sub cleanup { close NEWSOCKET; close(MY_SOCKET); die "$0: Cannot fork : $!\n"; }

Note
Of course, instead of the IP addresses shown above in Listing 12.4, you would have to modify the Perl script to use your own machine name and IP address. Do not use the addresses shown in this example because they are coded to work with a specific machine with a specific name.

In the case of connection-oriented protocols, the server does the following functions:

  • Creates a socket with a call to the socket() function.
  • Binds itself to an address with the bind() function call.
  • Listens for connections with the listen() function call.
  • Accepts any incoming requests with the accept() function call.

Once a connection to the server has been accepted, the client and server can exchange data with the read() and write() function calls. To read from the socket, use the function call

read(MY_SOCKET, $buffer, $length);

where SOCKET_HANDLE is the socket you are reading from and $buffer is where you will be putting in data of size $length. To write to the socket, you can use the function call

write(MY_SOCKET, $buffer, $length);

For sending just text data, you can use the print call instead. For example, the following code will write text to the socket:

print MY_SOCKET, "Hello, ..";

Once a connection has served its time, it has to be closed so that other clients are able to use the system resources. To close the socket, your server and clients should call the close() function:

close(MY_SOCKET);

The shutdown() function allows you to selectively shut down sends and receives on a socket. Here's the function call:

shutdown(MY_SOCKET,HOW);

When the HOW parameter is 0, no more data is received on this socket. If HOW is set to 1, no more data will be sent out on this socket. If set to 2, no data is received or sent on this socket. (You still have to close the socket, even if you shut it down for sending and receiving.)

Listing 12.5 presents a sample of the client side of things.


Listing 12.5. The client side.
 1 #!/usr/bin/perl
 2 #
 3 # define constants for talking to server
 4 #
 5 $PACKIT = 'S n C4 x8';
 6 $AF_INET = 2;
 7 $SOCK_STR = 1;     # STREAMS
 8 $DEF_PROTO = 0;    #default protocol of IP
 9 $port = 6668 unless $port;
10
11 #
12 # Get the name of the server
13 #
14 ($name, $aliases, $addrtype, $len, @addrs) = gethostbyname("www.ikra.com");
15 ($a,$b,$c,$d) = unpack('C4',$addrs[0]);
16 print "Server Name=$name, Server Address= $a.$b.$c.$d\n";
17 $that = pack($PACKIT,$AF_INET,$port,$addrs[0]);
18
19 #
20 # Confine yourself to the local host machine address of 127.0.0.1
21 #
22 $this = pack($PACKIT,$AF_INET,$port,127,0,0,1);
23
24 #
25 # Disable buffering on this socket.
26 #
27 select(CLIENT_SOCKET);
28 $| = 1;
29 select(stdout);
30
31
32 socket(CLIENT_SOCKET,$AF_INET,$SOCK_STR,$DEF_PROTO) ||
33      die "$0: Cannot open socket\n";
34 print "Created socket\n";
35
36 #
37 # Attempt to connect to server.
38 #
39 do
40 {
41  sleep(1);
42  $result = connect(CLIENT_SOCKET,$that);
43  if (result != 1) {
44      print "Sleeping\n";
45      }
46
47 } while ($result != 1);
48
49 sleep(5);
50 print "After connection\n";
51
52 #
53 # send data to server
54 #
55 # write(CLIENT_SOCKET,"hello",5);
56
57 read(CLIENT_SOCKET,$buf,100);
58
59 print "[" . $buf . "]\n";
60 close(CLIENT_SOCKET);

The client for connection-oriented communication also takes the following steps:

  • Creates a socket with a call to the socket() function.
  • Gets the packed structures for identifying itself and the server to which it will connect.
  • Attempts to connect to the server with a connect() call.
  • If a connection was made, requests for data with the write() call and reads incoming replies with the read() function.

Tip
After the connection is used up by the client, it's a good idea to close the socket so that others may use the system resources.

That's about it for a client program. Any processing that has to be done is done while the connection is open. Client programs can be written to keep a connection open for a long time while large amounts of data are transferred. If there is too long of a delay between successive messages, clients would then open a socket connection, send the message or messages, and close the connection immediately after the acknowledgment, if any, arrives. This way, all sockets are opened only on an as needed basis and do not use up socket services when both the server and client are idle.

The h2ph Script

If you read more documentation on Perl and sockets, you'll see references to the socket.ph file. If you cannot find this file anywhere on your system, it's because you have not run the h2ph file on your include directories. This h2ph program converts C header files to Perl header files. The safest way to ensure that you have all the files converted to Perl headers is to issue the following statements while logged in as root:

$ cd /usr/include
$ h2ph *
$ h2ph sys/*

You may run into some problems while running this script. For instance, it will say that it's creating a .ph file from a .h file, but after execution, the *.ph file may not exist! Check the script in the h2ph file to see where $perlincl is pointing and if you have read/write permissions there. A common repository is the /usr/local/lib/perl5 or the /usr/lib/perl5 directory. Another thing to remember is that the @Inc variable in your Perl scripts should point to the same location where the *.ph files are placed.

Using Socket.pm

The standard Perl 5 distribution comes with the Socket.pm module, which greatly speeds up Perl code development work. Look at the documentation in the /usr/lib/Perl5/Socket.pm file for more information. This module requires dynamic loading, so ensure that your system supports it.

Summary

Perl offers very powerful features for using sockets on UNIX machines. The system calls available offer enough power and flexibility to create client/server applications in Perl without having to write code in a compiled language.

The functions available in Perl include those for querying available system services and protocols, IP addresses, and other host name information. Here are a few key points to remember when working with Perl and sockets:

  • Perl uses the existing system calls for using sockets. These system calls expect structures to be packed in a sockaddr structure. This packing and unpacking is done with the pack() and unpack() functions prior to use with system calls.
  • You use read() and write() functions to do direct data transfer. You can use print SOCKET, … if you plan on writing directly to the port.
  • Disable buffering on the socket with the $|=1 assignment on a socket. This forwards any reads or writes immediately instead of being buffered internally, preventing unnecessary timeouts in case of long operations.
  • Always call the close() function of the socket when your application is done with communication.
  • Servers generally fork off copies of themselves to handle individual connections. Be prepared to write code this way.

Chapter 13

Messaging Facilities: The System V Ipc Functions


CONTENTS


This chapter introduces you to the Interprocess Communications (Ipc) functionality of message queues, shared memory, and semaphores. The Ipc facilities provide a clean, consistent solution to passing data between processes on the same machine. (Sockets can extend across platforms and a network.)

An Introduction to the System V Ipc

The UNIX system V Ipc enables you to perform the following tasks:

  • Send messages from one process to another via a message queue. Processes can add messages to, check the length of, and remove messages from a queue.
  • Create and handle shared memory. This includes the capability to read from and write into areas of shared memory.
  • Create and handle semaphores. This includes the capability to read, set, and reset semaphore values.

Each Ipc function is available to calling processes as a system resource. These resources are available for all processes on a system-level basis and can be shared by many processes on the same system. Ipc resources are limited to the system they reside on and do not offer networking functionality. Because there are only a limited number of Ipc resources on any UNIX system, it's important to free up each resource after using it. This is because each Ipc resource can exist for a long time after the process that created it has finished executing.

Each Ipc resource is referred to as an object in the operating system. For working with Ipc resources, you either have to create an object or use an existing one. Ipc objects are created via a get() function for that object. Each get() function call requires a unique positive Ipc key as the identifier for that object. Keys are converted by the kernel into an ID number and returned by the get() function. Then the ID is used by other related functions to refer to that object for all other operations.

An Ipc key is a long integer and is used to name the Ipc resource. A key is assigned by the programmer but could also be assigned by the system. The keys for shared memory, message queues, and semaphores are unique in the sense that the same key number can be assigned to Ipc objects of different types. That is, a semaphore with a key of 11 can coexist on the same system with a message queue with a key of 11. However, another semaphore cannot coexist with a key of 11 on the same system. Programmers can force the underlying operating system to assign a key by specifying the &Ipc_PRIVATE flag (this is explained shortly).

When you pass in a key number to a get() function, an ID is returned. Once an object is created and its ID is returned, the object must then be referred to by its ID. You can draw the analogy that a file handle is to a file as an ID is to an Ipc resource. The returned IDs are positive if there are no errors. (A negative ID is returned if there is an error.)

You can create a unique key by using the &Ipc_PRIVATE flags if you are not imaginative enough. The kernel then creates the ID and the key for you.

Ipc objects are global. Once created, the object is available to all the processes in the system. In this respect, you have to be careful how you access the available resources because any process can overwrite your shared memory, message queue, or semaphore. Also, your Ipc object remains in memory long after your process has gone. You, not the kernel, are responsible for cleanup.

When you create the object, you also have to specify permissions. The format of the permissions is very similar to that of files: three groups of read/write for owner, group, and other. The execute permission bits for the permissions are ignored by the Ipc calls. To get access to an existing object, you have to specify 0 for permissions.

The following flags are permitted for creating objects:

FlagDescription
&Ipc_CREAT This flag creates an Ipc object given a key, or it attaches to an existing object with the same key.
&Ipc_EXCL This flag creates an Ipc object given a key. It returns an error if the object with this key already exists. This prevents two unrelated processes from creating two objects with the same key.
&Ipc_RMID Given an ID, this flag removes the object from the system. You must have the permissions on the object to be able to delete it.
&Ipc_STAT Given an ID, this flag returns the values of each member of an Ipc object.
&Ipc_SET Given an ID and a data structure for an object, this flag sets the values of each member of the corresponding Ipc object.

Using the UNIX System V Ipc Functions

With Perl you can access all of the Ipc functions via a standard set of library functions. The information required for the functions is consistent with a UNIX system interface; therefore, the information in a UNIX man page will provide enough information about the facilities available on your system.

System V Ipc functions are defined in Perl header files. For a Perl installation on a UNIX system, the required information is in the *.ph files. (The ph stands for Perl header.) The following files will be required by most of the Perl scripts you write to utilize the Ipc facilities:

require "ipc.ph";
require "msg.ph";
require "sem.ph";
require "shm.ph";

Keep in mind that this might not work as shown here. Here are the primary reasons an error occurs when you try to include these files with the require statement:

  • The *.ph files do not exist.
  • The *.ph files do exist but are not in the path(s) specified by the @Inc include variable.

To cure these problems, you'll have to run the h2ph script in the /usr/lib/perl directory. The h2ph script contains a line (around line 9) that has the variable $perlincl set to a directory. On my machine, this variable is set to /usr/lib/perl5/i486-linux/5.002. On your machine, this value might be different. In any event, the value of $perlincl is the directory where the *.ph files are stored by the h2ph script.

Now go to the /usr/include directory and, as root, run the h2ph command as shown here:

h2ph * sys/*

The include files on your system may require that more subdirectories be included in the paths specified to this program. For example, on a Linux 3.0 system, the command to get most of the required files is this:

h2ph * sys/* asm/* linux/*

The only clear way to know which files are required is to include the Perl header files in a sample script. If everything goes well, you should be able to get the script to run. The sample script shown in Listing 13.1 gives two ignorable warnings on all three different Linux versions. The script does manage to create the message queue as expected. I cover the topic of message queues in the section "Shared Memory," later in this chapter.


Listing 13.1. A sample script to test Perl header file inclusion.
 1 #!/usr/bin/perl
 2
 3 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 4 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 5 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 6
 7 print "@Inc";
 8
 9 require "ipc.ph";
10 require "msg.ph";
11 require "shm.ph";
12 require "sem.ph";
13
14 $PERMISSIONS=0666;
15 $ipckey = &Ipc_PRIVATE;
16
17 $msgid = msgget($ipckey,&Ipc_CREAT | &Ipc_EXCL | $PERMISSIONS);
18
19 printf "\n  Message Id = $msgid";

The three required files in the sample script are included for message queues, shared memory, and semaphores, respectively. Only those that are required have to be included. That is, you do not have to include shm.ph if you aren't going to be using shared memory. The ipc.ph file is required for any of these three features.

PERMISSIONS is set to 0666, meaning that any process can work with or even delete the Ipc object in question. For a more secure system, you might consider using 0600 to give permissions to the owner process only.

The msgget() Function

In Listing 13.1 an Ipc message queue was created. To use the System V message-passing facility, you first create a message queue ID for a given message queue. Here's the syntax of the msgget() function:

$msgid = msgget ($key, $flag);

$key is set to either &Ipc_PRIVATE or an arbitrary constant. If $key is &Ipc_PRIVATE or $flag has &Ipc_CREAT set, the message queue is created and its queue ID is returned in $msgid. For &Ipc_EXCL, the object must not already exist. If msgget() cannot create the message queue, $msgid is set to undef.

The ipcs Command

After running the test script, you can see what the object created looks like by using the ipcs command. The ipcs command lists the status of any Ipc objects in the system. Here is the output from the ipcs command after creating the message queue.

------ Shared Memory Segments --------
shmid     owner     perms     bytes     nattch    status

------ Semaphore Arrays --------
semid     owner     perms     nsems     status

------ Message Queues --------
msqid     owner     perms     used-bytes  messages
128       khusain   666       0           0

The output from the ipcs command on your machine may be different than the one shown here. However, most of the information should be the same. For instance, in this example one message queue is shown as being created. The ID of this queue is 128; it is owned by khusain and has permissions of 0666, thereby allowing any process to manipulate it. The message queue has no messages in it and is not using any memory for queuing messages.

The msgsnd() and msgrcv() Functions

Use the msgsnd() function to send a message to a message queue. The syntax of the msgsnd function is this:

$err = msgsnd ($msgid, $message, $flags);

$msgid is the message queue ID returned by msgget(); $message is the content of what you are sending (the content does not have to be text). The $flags specifies options to use when sending the message. The msgsnd() function returns a non-zero value if the send operation succeeds and 0 if an error occurs. You can check $! for the errno code if you get a value of 0 back from this call.

Call the msgrcv() function to read messages from a message queue. The syntax of the msgrcv function is this:

$err = msgrcv ($msgid, $rcvd, $size, $mesgtype, $flags);

$msgid is the ID of the message queue. $rcvd is a scalar variable in which the incoming data is stored. $size is set to the number of bytes of the incoming message plus the size of the message type.

The message type is specified in $mesgtype by the caller. If $mesgtype is 0, any message on the queue is pulled off. A positive non-zero value implies that the first message of the type equal to the value in $mesgtype will be pulled. A negative non-zero value of $mesgtype requests to pull any message whose ID is greater than the absolute value of $mesgtype.

$flags specifies options that affect the message. If &Ipc_NOWAIT is specified, the function returns immediately with the appropriate error code. If the &Ipc_WAIT flag is set, the function waits until there is a message on the queue. The msgrcv() function returns a non-zero value if a message has arrived; otherwise it returns 0. You can check $! for the errno code if you get a value of 0 back from this call.

Let's see how to send a message. A message has a long integer as the first four bytes followed by the body of the message. The first bytes are used as identifiers for each message. It's up to the receiver to know how many bytes to expect from the type of the message. First, Listing 13.2 presents a script that creates a message queue using a unique key and then sends a message on it.


Listing 13.2. Creating a message queue and sending a message on it.
 1 #!/usr/bin/perl
 2 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 3 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 4 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 5 require "ipc.ph";
 6 require "msg.ph";
 7 $PERMISSIONS=0666;
 8 $ipckey = 42;
 9 $msgid = msgget($ipckey,&Ipc_CREAT | $PERMISSIONS);
10 printf "\n  Message Id = $msgid \n";
11 $msg_type = 1;
12 $msg = pack("L a*", $msg_type, "Elvis Lives!");
13 msgsnd($msgid, "$msg", &Ipc_NOWAIT);

Don't forget to replace lines 2 through 4 with your machine's specific path!

Lines 5 and 6 include the header files for the message queue facility. Line 7 sets the permissions to be globally vulnerable; that is, anyone can attach to or even destroy an object created with these permissions. The $ipckey value is set to 42 because it's a unique number. Had this value been left as Ipc_PRIVATE, a new message queue would be created every time this script is run. Too many queues will eat up system resources, so use these scripts judiciously.

The message itself is created in lines 11 and 12 using the pack statement. The L parameter to pack sets up the message type, and the a* parameter specifies a null-terminated string. The message will be 12 bytes long, including the null terminator for the string, but not including the four-byte message type.

Line 13 is where the message is actually sent. The &Ipc_NOWAIT flag requests that the message returns immediately even if it could not be sent. If you want to wait, use &Ipc_WAIT instead. Be warned, however, that the script making the call is suspended until the message is sent.

To see if the message made it to the message queue, check the output from the ipcs command:

------ Shared Memory Segments --------
shmid     owner     perms     bytes     nattch    status

------ Semaphore Arrays --------
semid     owner     perms     nsems     status

------ Message Queues --------
msqid     owner     perms     used-bytes  messages
512       khusain   666       12          1

Note that there isn't a receiver to receive the message just yet. If we do not create a receiving process, the messages in the queue will just sit there until the queue is destroyed. Queues have to be destroyed manually; the system will not destroy them for you automatically.

There is one message with an ID of 512 and a length of 12 bytes in the queue. The message stays in the queue until it's retrieved by something else. That something else is the script shown in Listing 13.3.


Listing 13.3. Receiving messages on the message queue.
 1 #!/usr/bin/perl
 2 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 3 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 4 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 5 require "ipc.ph";
 6 require "msg.ph";
 7 $PERMISSIONS=0666;
 8 $ipckey = 42;
 9 $msgid = msgget($ipckey,&Ipc_CREAT | $PERMISSIONS);
10 printf "\n  Message Id = $msgid";
11 $msg_type = 0;
12 #
13 # Keep in mind that the message packed was:
14 # $msg = pack("L a*", $msg_type, "Elvis Lives!");
15 msgrcv($msgid, $msg, 80, $mgt_type, 0);
16 printf "\n Message Recvd = [%s]\n", $msg;

Again, don't forget to modify lines 2 through 4 for your machine. Line 15 in Listing 13.3 is of importance to us. Note how the message type is set to 0, even though the message type sent was 1. The message type in the msgrcv() function can take three sets of values:

  • If the message type is zero (0), to pull off the next available message on the queue.
  • If the message type is greater than zero, to pull off only messages of the type explicitly specified in the message type.
  • If the message type is less than zero, the absolute value of the message type is used. The first message type greater than or equal to this type of message will be pulled off the message queue.

Run the receiver script. The message queue should be empty now. Let's confirm that the message queue is empty by examining the output of the ipcs command. In the following output, look at the information for the message queues. You should see zero for the number of messages and zero bytes by the queue.

------ Shared Memory Segments --------
shmid     owner     perms     bytes     nattch    status

------ Semaphore Arrays --------
semid     owner     perms     nsems     status

------ Message Queues --------
msqid     owner     perms     used-bytes  messages
512       khusain   666       0           0

It's not a good idea to leave Ipc objects around in the system. The msgctl() function is used to set options for message queues and send commands that affect them. Generally, this function is used to delete message queues. Here's the syntax of the msgctl function:

$err =msgctl ($msgid, $msgcmd, $msgarg);

$msgid is the message queue ID. The argument $msgcmd is the command to be sent to the message queue. The list of available commands is defined in the file ipc.ph. Some of the commands that can be specified by msgcmd set the values of message queue options. If one of these commands is specified, the new value of the option is specified in msgarg. If an error occurs, msgctl returns the undefined value. msgctl() also can return 0 or a non-zero value and will set errno in $!.

To delete a queue, use the following command:

$ret = msgctl($msgid, &Ipc_RMID, $NULL);

The value of the returned parameter will be -1 if there is an error; otherwise, the value is 0. Sometimes the message queue can be deleted in a signal handler, like this:

sub cleanup {
     local($signalName) = @_;
     print "\n Caught signal, removing message queue";
     $ret= msgctl($msgid,&Ipc_RMID,$NULL) ;
     print "\n System returned $ret from kill message queue";
     exit(0);
}

Shared Memory

Message queues are great for sending messages in a LIFO order. The major problem with message queues is that they can overflow if no one is there to receive the messages.

Shared memory areas have to be explicitly created before you can use them. To do this, call the shmget function with a key as you did with message queues. Here's the syntax of the shmget function:

$shmid = shmget (key, msize, flag);

As with message queues, $key is either &Ipc_PRIVATE or an arbitrary constant. If the key is &Ipc_PRIVATE or the flag has &Ipc_CREAT set, the shared memory segment is created, and its ID is returned in $shmid. The msize is the size of the created shared memory in bytes. If shmget() cannot create the shared memory area, the returned value in $shmid is set to undef. The $flags are the same as with message queues.

Here are the actions you can perform on a shared memory segment:

  • Write to it with the shmwrite() function
  • Read from it with the shmread() function
  • Delete or modify its parameters with the shmctl() function

The shmwrite() and shmread() Functions

To write data to an area of shared memory, call the shmwrite() function. Here's the syntax of the shmwrite function:

shmwrite ($shmid, $text, $pos, $size);

$shmid is the shared memory ID returned by shmget. $text is the character string to write to the shared memory, $pos is the number of bytes to skip over in the shared memory before writing to it, and $size is the number of bytes to write.

This function returns a value that is the number of bytes actually written or, in the case of an error, a value of 0.

If the data specified by $text is longer than the value specified by size, only the first $size bytes of text are written to the shared memory. If the data specified by $text is shorter than the value specified by $size, shmwrite generally will fill the leftover space with null characters. An error also occurs if you attempt to write too many bytes to the shared memory area (that is, if the value of $pos plus $size is greater than the number of bytes in the shared memory segment).

To read data from a segment of shared memory, call the shmread function. Here's the syntax of the shmread function:

shmread ($shmid, $retval, $pos, $size);

Here, $shmid is the shared memory ID returned by shmget. The $retval variable is a scalar variable (or array element) in which the returned data is to be stored. The data is read from $pos number of bytes from the start of the shared memory segment, and $size is the number of bytes to copy. This function returns a non-zero value if the read operation succeeds, or it returns 0 in the case of an error.

Only the number of bytes requested are returned in $retval. An error occurs if you attempt to read too many bytes from the shared memory area. In other words, if the value of $pos plus $size is greater than the number of bytes in the shared memory segment, you'll get an error. On errors, the values in the $retval scalar are undefined.

See Listing 13.4 for a simple Perl script that creates a memory segment and then puts some data in it.


Listing 13.4. The use of shmget() and shmwrite().
 1 #!/usr/bin/perl
 2
 3 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 4 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 5 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 6
 7 require "ipc.ph";
 8 require "shm.ph";
 9
10 $PERMISSIONS=0666;
11 $ipckey = 42;
12 $size = 1024;
13
14 $msgid = shmget($ipckey, $size, &Ipc_CREAT | $PERMISSIONS);
15
16 printf "\n Shared Memory Id = $msgid";
17
18 $message = "Segment #1";
19 print "\n Message = " . $message;
20
21 shmwrite($msgid, $message, 0, 80);

Note that in Listing 13.5, the shared memory segment is 1,024 bytes long. The shared memory segment is not automatically destroyed when the creating process is killed. The values and space for these values in the shared memory area remain there even after the process that created the segment is long gone.

A second application can now come in and read from the shared memory segment. This second application is shown in Listing 13.5.


Listing 13.5. The use of shmread().
 1 #!/usr/bin/perl
 2
 3 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 4 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 5 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 6
 7 require "ipc.ph";
 8 require "shm.ph";
 9
10 $PERMISSIONS=0666;
11 $ipckey = 42;
12 $size = 1024;
13
14 $msgid = shmget($ipckey, $size, &Ipc_CREAT | $PERMISSIONS);
15
16 printf "\n Shared Memory Id = $msgid";
17
18 $retval = shmread($msgid, $message, 0, 80);
19
20 print "\n Read data:". $message. "ret value= $retval" ;

This little example brings up a very possible and potentially dangerous scenario concerning the use of shared memory to pass data between two applications. Take two processes, A and B, which share data through shared memory. What if process A is in the middle of writing some data that B is reading? There is a high probability that the data read by B could be mangled by A. There is nothing that prevents B from reading from the same offset to which A is writing.

To prevent such potentially erroneous read/write situations, you have to lock the resource from multiple use. This is where semaphores come into play. A semaphore allows multiple processes to synchronize access on a resource.

Semaphores

A semaphore is simply a counter in the kernel. It can have values of 0, -1, -2, and so on, depending on how many processes are using it. A value of 0 indicates that the resource is unavailable. When a resource is locked by a process, the value of the semaphore is decremented. When the resource is freed, the value of the semaphore is incremented. A semaphore value of less than 0 indicates that the process must block (that is, wait until some other process zeroes it).

A semaphore is a data structure in the kernel that contains the process ID of the last process to perform a semaphore operation and the number of processes waiting on the semaphore to be 0. A binary semaphore uses a value of either 1 or 0.

To use a semaphore, you must first create it. To do this, call the semget function. Here's the syntax of the semget function:

$semid = semget ($key, $num, $flag);

The key and flag here are the same as those for shared memory or message queues. If the key is &Ipc_PRIVATE or the flag has &Ipc_CREAT set, the semaphore is created and its ID is returned in semid. The $num variable is the number of semaphores created and is an index into an array of semaphores. The first element of the array is at index 0.

If semget is unable to create the semaphore, $semid is set to the null string.

To perform a semaphore operation, call the semop() function. Here's the syntax of the semop() function:

semop ($semid, $semstructs);

Here, $semid is the semaphore ID returned by semget, and $semstructs is a character string consisting of an array of semaphore structures.

Each semaphore structure consists of the following components, each of which is a short integer (as created by the s format character in pack):

  • The number of semaphores
  • The semaphore operation
  • The semaphore flags, if any

This function returns a non-zero value if the semaphore operation is successful; otherwise, 0 if an error occurs.

There are three actions you can take with a semaphore. Each of these actions happens on the elements of the array you created in the semget() function. These actions add or subtract a value to the semaphore:

  • Adjust the value by adding 0 to the semaphore. This action causes the resource controlled by the semaphore to be acquired.
  • Increment the value of the semaphore. This causes the resource to be marked as "released." The calling process can now wait until the value is 0 again. Generally, you would increment by 1, but you can use an arbitrary positive constant.
  • Decrement the value of the semaphore. A decrement marks the use of a resource. This decrement may cause the value of the semaphore to be less than 0. If the value of the semaphore is less than 0, the calling process will block until some other process that is using or controlling the resource resets the value to 0.

The semctl function enables you to set options for semaphores and issue commands that affect them. Here's the syntax of the semctl function:

$err = semctl ($semid, $semcmd, $semarg);

$semid is the semaphore ID returned by semget. $semcmd is the command that affects the semaphore; the list of available commands includes the Ipc_RMID for removing the resource. Check the ipc.ph file for more commands on your system. Some of the commands that can be specified by semcmd set the values of semaphore options. If one of these commands is specified, the new value of the option is specified in $semarg.

If an error occurs, semctl returns the undefined value; otherwise, it returns 0.

Listing 13.6 shows an example of a parent and child process sharing a shared memory resource using semaphores.


Listing 13.6. Using semaphores and shared memory together.
 1 #!/usr/bin/perl
 2 $|=1;
 3 #
 4 #  Get the required files.
 5 #
 6 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 7 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 8 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 9 require "ipc.ph";
10 require "shm.ph";
11 require "sem.ph";
12 #
13 # Identify yourself and the resource parameters.
14 #
15 $ppid = $$;         # for the parent
16 $ipckey = 34;
17 $PERMISSIONS=0666;
18 $semid = semget($ipckey,1,&Ipc_CREAT|$PERMISSIONS);
19 $semnum = 0;
20 $semflags = 0;
21 #
22 # Now create the shared memory segment. Note that we use the same key.
23 #
24 $size = 1024;
25 $msgid = shmget($ipckey, $size, &Ipc_CREAT | $PERMISSIONS);
26 # printf "\n Shared Memory Id = $msgid";
27 $pid = fork;
28 if ($pid == -1) {
29     die "\nFork failed. This is not UNIX";
30 }
31 if ($pid)  {    # child process
32     for ($i=0;$i<10;$i++) {
33     $semop = 0;         # Wait for resource
34     $semopstr = pack("sss",$semnum,$semop,$semflags);
35     die "Cannot get semaphore" unless semop($semid,$semopstr);
36     #
37     #
38     printf "\n Child: read from shared memory ";
39     $retval = shmread($msgid, $message, 0, 80);
40
41     $semop = 2;
42     $semopstr = pack("sss",$semnum,$semop,$semflags);
43     die "Cannot get semaphore" unless semop($semid,$semopstr);
44     }
45 }
46 else { # in parent
47     for ($i=0;$i<10;$i++) {
48     $semop = -1;
49     $semopstr = pack("sss",$semnum,$semop,$semflags);
50     die "Cannot get semaphore" unless semop($semid,$semopstr);
51     printf "\n Parent: write to shared memory";
52     shmwrite($msgid, $message, 0, 80);
53
54     $semop = -1;
55     $semopstr = pack("sss",$semnum,$semop,$semflags);
56     die "Cannot get semaphore" unless semop($semid,$semopstr);
57     }
58     printf "\n In parent, removing the semaphore";
59     semctl($semid,0,&Ipc_RMID,0);
60     printf "\n In parent, removing the shared memory segment";
61     shmctl($msgid,&Ipc_RMID,0);
62 }

Line 2 specifies that the buffers be flushed immediately on write. This is a good idea when you are working with forked processes.

Lines 5 through 11 set up the include paths for the required header files. In lines 15 through 20 the semaphore is set up and created for parent and child processes to use. In line 24 the shared memory segment is created.

The process forks off into two processes in line 27. The child process waits for the semaphore by first explicitly setting the local counter to 0 (see line 33). Then it checks for the value of the semaphore in line 35 after packing the parameters into the semaphore structure in line 34. When it breaks out of the semaphore (that is, when the value of the semaphore is 0), the child reads data from the shared memory segment. It then sets the value of the semaphore to 2.

The parent, on the other hand, decrements the semaphore by 1. The value is 2 if the child runs first, and thus becomes 1, giving the parent control. If the child is running and adding to shared memory, then the value of the semaphore is 0; therefore, decrementing by the parent forces it to -1, thus blocking the parent. Now, when the child increments by 2, the semaphore is set to 1 (-1 + 2 = 1) and the parent is started. The child then waits until the semaphore becomes 0, which happens when the parent decrements the semaphore one more time.

The shmctl and semctl functions are used to obliterate the Ipc resources once you are done with the application.

The SysV::Ipc Module

The SysV Ipc code in Listing 13.6 was written long ago and has been documented to death in almost all UNIX texts. Some kind folks have tried to make the interface easier to use by providing modules to make the interface cleaner. For the latest version of this module, please check the CPAN sites listed in appendix B, "Perl Module Archives."

It looks simple enough already, but it can always be tweaked a little bit more. Check out the Ipc::SysV modules written by Jack Shirazi. Unfortunately, I could not get this module to compile and work for Perl 5.002. There was not any documentation or follow up address for me to get more information about the module. You can try to get it to work on your system, but with the application interface the way it is now, you should ask yourself this question: Will the module make it simpler? If the answer is yes, by all means go for it!

Applications of Ipc

There are several ways that you can apply the Ipc tools you have available. Generally, shared memory and semaphores have to be used together. When working with large blocks of data, use shared memory to pass the data between two processes. Synchronize the transfer between processes via the use of a semaphore.

What if you have a situation in which more than one process is required to process the data? Semaphores can get clunky at this stage if you are not careful.

In a typical scenario, you could have one process collect data from external devices and then have the data available in shared memory for all other processes. The shared memory area will be divided into partitions. Each partition is used only by one process and only written to by the data collector. The data collector updates all the sections of shared memory and then updates a semaphore with the number of processes that are currently waiting to work with this data. Then it sends a message to each of the processes via a message queue. After sending all the messages, the data collector process waits for the semaphore to be 0 again, thereby getting the signal to proceed.

Each data-handling process (client) can wait for messages forever on its message queue. As soon as it receives a message on its queue, the client can guarantee that it will have exclusive access to its partition. After it has processed the data, the client can decrement the semaphore. As each client increments the semaphore, it will go back to the top of the loop and wait on the input message queue again.

Once all the clients have incremented the semaphore, it becomes 0 again. This causes the data collector to wake up and begin the process of collecting and updating the shared memory area.

Listings 13.7 and 13.8 show a partial application for such a system. These listings are by no means complete because this would require a full-blown application well beyond the scope of this chapter. The gist of the program is to illustrate how all three types of Ipc objects can be used with each other to create relatively complex applications.

The server application decrements the semaphore to block (while the clients do what they have to do) and then increments the value of the semaphore. The processes here have to run concurrently in the background. There are three clients for the one server. Obviously, this example is contrived for the book-you might have more clients to handle your tasks.


Listing 13.7. The server of a dummy application.
 1 #!/usr/bin/perl
 2 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 3 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 4 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 5 require "ipc.ph";
 6 require "shm.ph";
 7 $PERMISSIONS=0666;
 8 $ipckey = 42;
 9 $size = 4096;
10 #
11 # Create the shared memory segment
12 #
13 $| = 1;
14 @messages =  (
15     "Process 0 ",
16     'Process 1 ',
17     'Process 2 ',
18 );
19 $shmid = shmget($ipckey, $size, &Ipc_CREAT | $PERMISSIONS);
20 $count = $#messages + 1;
21 $semid = semget($ipckey,$count,&Ipc_CREAT|$PERMISSIONS);
22 $semflags = 0;
23 for ($offset = 0; $offset < $count; $offset++) {
24     $msg = $messages[$offset];
25     $msgid[$offset] = msgget($ipckey
   $offset,&Ipc_CREAT | $PERMISSIONS);
26     print "\n Server: Creating Message Queue: "
   . $msgid[$offset] . "\n";
27     }
28 printf "\n Shared Memory Created Id = $shmid";
29 while(1)
30      {
31      $semop = $count;        # Stop the clients
32      $semopstr = pack("sss",$semnum,$semop,$semflags);
33      die "Cannot get semaphore" unless semop($semid,$semopstr);
34
35      for ($offset = 0; $offset < $count; $offset++) {
36           $semnum = $offset;
37           $msg = $messages[$offset];
38           print "\n Server: Writing " . $msg .
   " at " . $offset * 40 . "\n";
39           shmwrite($shmid, $msg, $offset * 40, 12);
40           $msg = pack("L a*", $offset + 1, " Go for it!");
41           print "\n Server: Sending to" . $msgid[$offset] . "\n";
42           msgsnd($msgid[$offset], "$msg", &Ipc_NOWAIT);
43      }
44      $semop = -$count;     # Block till semaphore is 0
45      $semopstr = pack("sss",$semnum,$semop,$semflags);
46      die "Cannot get semaphore to wait" unless semop($semid,$semopstr);
47      sleep(10);
48 }

Listing 13.8 is the client application to pick up the messages from the server. The messages sent can contain additional information for the client; that is, they don't have to be just triggers for the client to proceed with reading. The contents of the messages can contain information about how and where to pick up data from shared memory.


Listing 13.8. The client of a dummy application.
 1 #!/usr/bin/perl
 2 #
 3 # Usage =  client -p ProcessIndex
 4 #
 5 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/sys");
 6 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/linux");
 7 unshift (@Inc,"/usr/lib/perl5/i486-linux/5.002/asm");
 8 use Getopt::Long;
 9 $result = GetOptions ('p=i');
10 $| = 1;
11 if (($opt_p < 0) || ($opt_p > 3)) {
12      die "Usage: $0 -p ProcessIndex";
13      exit(0);
14 }
15 require "ipc.ph";
16 require "shm.ph";
17 require "msg.ph";
18 require "sem.ph";
19 $PERMISSIONS=0666;
20 $msg_type = 0;
21 $msg_offset = $opt_p * 40;
22 $ipckey = 42;
23 $msg_key = $ipckey +  $opt_p;
24 $size = 0;
25 $shmid = shmget($ipckey, $size, &Ipc_CREAT | $PERMISSIONS);
26 printf "\n $$ = Shared Memory Id = $shmid";
27 $msgid = msgget($msg_key,&Ipc_CREAT | $PERMISSIONS);
28 printf "\n $$ =  Message Id
   = $msgid, Will read from $msg_offset, $msg_type\n";
29 $semid = semget($ipckey,3,&Ipc_CREAT|$PERMISSIONS);
30 $semnum = 0;
31 $semflags = 0;
32 while (1)
33      {
34      print "\n Read message of type :". $msg_type;
35      msgrcv($msgid, $msg, 80, $msg_type, 0);
36      $retval = shmread($msgid, $message, 0, 80);
37      print "\n Read message:". $message. "ret value= $retval" ;
38      $semop = 1;        # Clear yourself for server
39      $semopstr = pack("sss",$semnum,$semop,$semflags);
40      die "Cannot get semaphore" unless semop($semid,$semopstr);
41      print "\n After semaphore";
42      }

The sample programs shown in Listings 13.7 and 13.8 provided the basis for a prototype of a seismic data collection system. The actual system was written in C for efficiency because of some pretty lengthy mathematical calculations. However, with Perl, we were able to use this code to get a proof-of-concept working model up and running in just one afternoon. The prototype provided us with enough information to consider using the Ipc model for the application. In later models of the same application, I was able to extend the processing to remote machines by replacing the message queues with sockets and sending the requisite portions of data along the sockets.

The final application was tested further by adding new Perl scripts that share messages and simulate data using shared memory. Listings 13.7 and 13.8 are very similar to the working application and have been created from scratch. It should be relatively painless for you to take this code and extend it into your own prototype.

Summary

Perl is a very powerful tool for prototyping applications. With the capability to access the system facilities, Perl can provide the necessary tools for rapid prototyping. Hopefully, this chapter has provided you with enough information to set up your own applications.

This chapter has introduced you to the System V Ipc facilities available through Perl. The Ipc objects are global and remain in memory long after the processes that created them are gone. With Ipc objects, you are limited to one machine. If your application requires network access, consider using sockets instead.

Using message queues, you can send messages between processes. For large data items, message queues might not be very efficient. Consider using shared memory instead. To synchronize the access to the shared memory, you can use semaphores to prevent one process from writing to an area where another process might be reading.

Chapter 14

Signals, Pipes, FIFOs, and Perl


CONTENTS


The basic interprocess communication facilities available via Perl are built on the old UNIX facilities: signals, named pipes, pipe opens, the Berkeley socket routines, and SysV Ipc calls. I have already covered sockets, shared memory, semaphores, and message queues in the previous sections. I cover the use of signals and pipes in this chapter.

Signals

A signal is a message sent to a process that indicates an event has occurred. The event can be something unexpected and, in general, cause the process to terminate. Types of such signal events include division by zero, a bus error, a segmentation fault, or sometimes even imminent power failure. All signals are not bad news. UNIX kernels use signals for timing. Users can send signals by hitting keys such as Ctrl+C, Break, or Delete.

The types of signals recognized by the kernel are listed in the standard UNIX header file /usr/include/signal.h. The names of these signals are listed in Table 14.1. Not all of these signals may be implemented in your UNIX system, and even fewer are usable in Perl. However, I will cover all the signals that you definitely need to know.

Table 14.1 Signal types.
Signal
Number
Description
SIGHUP
1
On hangup
SIGINT
2
On interrupt
SIGQUIT
3
On Quit key
SIGILL
4
Illegal instruction
SIGTRAP
5
Trap instruction
SIGABRT
6
Abort message
SIGIOT
6
Input/output transfer
SIGBUS
7
Bus error
SIGFPE
8
Floating-point error
SIGKILL
9
Kill signal from system
SIGUSR1
10
User defined
SIGSEGV
11
Segmentation violation
SIGUSR2
12
User defined
SIGPIPE
13
Pipe fault (broken pipe)
SIGALRM
14
Alarm
SIGTERM
15
Termination
SIGSTKFLT
16
Stack fault
SIGchLD
17
Signal from child
SIGCONT
18
Continuing a stopped process
SIGSTOP
19
Stopping a process
SIGTSTP
20
Stopping a process from terminal
SIGTTIN
21
Stopping a process reading from controlling terminal
SIGTTOU
22
Stopping a process writing to controlling terminal
SIGURG
23
Urgent condition
SIGXCPU
24
Excessive CPU limits reached
SIGXFSZ
25
Excessive file size limits reached
SIGVTALRM
26
Virtual interval timer expired
SIGPROF
27
Profiling interval timer expired
SIGWINch
28
Window size changed by background process
SIGIO
29
Asynchronous I/O
SIGPWR
30
Power failure
SIGUNUSED
31
Unused

The names of the signals available to you on your system are the ones listed by kill -l on your system. You can also retrieve them from the Config module. The following example shows how to obtain a list of available signals on your system:

$ kill -l
 1) SIGHUP        2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP       6) SIGIOT       7) SIGBUS       8) SIGFPE
 9) SIGKILL      10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE      14) SIGALRM     15) SIGTERM     17) SIGchLD
18) SIGCONT      19) SIGSTOP     20) SIGTSTP     21) SIGTTIN
22) SIGTTOU      23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM    27) SIGPROF     28) SIGWINch    29) SIGIO
30) SIGPWR

Using the Config Module

Another way of getting a list of signals on your system is by using the supplied Config module, as shown in Listing 14.1. The output from this simple script is shown following the listing. The list of signal names in the Config module is accessed via the $Config{sig_name} variable. (The signal names are separated by white space.) By the way, the program shown in Listing 14.1 requires Perl 5 and will not work in Perl 4.


Listing 14.1. Using the Config module to get a list of signals.
 1 #!/usr/bin/perl
 2 #
 3 # Use the Config module to get list of signals
 4
 5 use Config;
 6
 7 defined $Config{sig_name} || die "No Config Module?";
 8
 9 foreach $name (split(' ', $Config{sig_name})) {
10     $i++;
11     printf "%3d) %s \t", $i, $name;
12     if (($i % 5) == 0) { print "\n";  }
13     }
14 print "\n";

Line 5 uses the Config module to include its definitions. The existence of the Config signal names array is confirmed in line 7. Lines 9 through 14 print out a columnar output of all the signal names defined in the Config array. Here is the output generated by the code in Listing 14.1.

$ listsigs.pl

  1) ZERO       2) HUP       3) INT        4) QUIT       5) ILL
  6) TRAP       7) IOT       8) BUS        9) FPE       10) KILL
 11) USR1      12) SEGV     13) USR2      14) PIPE      15) ALRM
 16) TERM      17) STKFLT   18) chLD      19) CONT      20) STOP
 21) TSTP      22) TTIN     23) TTOU      24) URG       25) XCPU
 26) XFSZ      27) VTALRM   28) PROF      29) WINch     30) LOST
 31) PWR       32) UNUSED

A process does not know when a signal might occur because signals are asynchronous events. Programmatically, you can either ignore almost all signal messages or handle them yourself with a signal handler. This process of handling the signal is referred to as trapping a signal. A handler is simply a function that is called when the signal arrives.

In Perl, a special hash called %SIG contains either names or references to all the signal handlers. You have to install your own signal handler to trap a signal. If you do install a handler, Perl uses the default handling mechanism for that signal. Signal handlers are considered by the Perl interpreter to be in the main module unless otherwise specified. To specify a different module, you have to specify the package name. Therefore, you would use something like this:

$SIG{'QUIT'} = 'myModule::babysitter';

Signal handlers are basically just Perl functions. However, some restrictions apply while you are in the handler. First of all, signal handlers are called with an argument that is the name of the signal that triggered it. Therefore, your handler can be written to trap more than one signal if you want. Simply use the passed argument to determine what to do in the handler. Next, if your signal handler is being called when something out of the ordinary happens, you should not attempt to do lengthy operations while in the handler. For example, doing lengthy disk writes within a handler is not a good idea. Finally, try to use references instead of names because the code is cleaner and faster.

Another important point to keep in mind is that there is only one %SIG{} array in the Perl script you are running. Setting an entry in %SIG{} in one subroutine sets it for all objects in your program. There is no "my" %SIG{} array with which you can set your own handler functions.

To set a signal handler, you use the following statement:

$SIG{INT} = \&meHandleINT;

In this statement, meHandleINT is a reference to a subroutine defined as something similar to this:

sub meHandleSignals {
   my $signame =  shift;   #Grab signal name from passed input
   die "Caught Signal: SIG$signame";
}

You can use the signal-handling function's name instead of the reference; however, this is no longer the preferred, cool way of doing things because references give a faster lookup. Therefore, avoid setting handlers with statements like this one:

$SIG{INT} = 'meHandleINTs';

You can use anonymous functions to install signal handlers. However, be warned that some systems will not allow you to handle this signal more than once, especially on System V machines where the signal handler must be reinstalled every time it's hit. Use references instead of trying a statement like this one:

$SIG{INT} = sub { die "\nBye\n" };    #  Not portable

By modifying the value of the reference in %SIG{} entries, you can add or remove signal handlers. Listing 14.2 presents an example. While the program is asleep, the signal handler vocalHandler can be called if the Ctrl+C key combination is pressed. However, after the sleep session is over, the default value of $SIG{INT} is restored to allow the script to be killed via the default Ctrl+C combination. Notice also how the hangup signal is set to be ignored while the program is running with $SIG{HUP}='IGNORE' assignment.


Listing 14.2. Signal handler usage.
 1 #!/usr/bin/perl
 2
 3 sub vocalHandler {
 4     local($sig) = @_;
 5     print "Hey! Stop that! SIG$sig hurts! \n";
 6     exit(1);
 7 }
 8
 9 $SIG{INT} = \&vocalHandler;
Listing 14.2. continued
10
11 $SIG{HUP} ='IGNORE';
12
13 printf "\n I am about to sleep";
14
15 sleep (10);
16
17 print "\n Restoring. default signal handler. ";
18
19 $SIG{INT} ='DEFAULT';

The string "Hey!…" will be printed out every time you hit the Ctrl+C key combination during the 10-second interval that your Perl script is asleep.

There are two signal names that are reserved: IGNORE and DEFAULT. When a signal handler is set to IGNORE, Perl tries to discard the signal. When a signal handler is set to DEFAULT, Perl takes the default action taken by functions called from within that block. Some signals cannot be trapped or ignored (for example, the KILL and STOP signals).

All modules and inherited objects affect the %SIG{} hash. However, the %SIG hash can be redeclared in a function as a local variable to completely mask out the default %SIG{} hash. The original hash still remains, but due to scoping rules in Perl, the local %SIG{} is used.

The %SIG hash is not local to each module unless you declare it as such. This is used to disable signals temporarily within certain function calls. Here's a sample fragment of code:

sub warpper {
    local $SIG{INT} = 'IGNORE';   # declare a new SIG hash.
        &localFunction;  #Then call this function.
    }

    sub localFunction {
        # interrupts are still ignored in this function.
}

Upon entry into the wrapper function, a local %SIG hash is created and the INT interrupt is set to be ignored. The localFunction is then undisturbed by SIGINT. On exiting from the wrapper function, the %SIG hash in the main module is the default array. Therefore, the values in the %SIG hash are inherited by the lower modules.

The kill() Function

The kill() function is used to send the KILL signal to a process or a set of processes. The first parameter of the kill() function is the signal number, followed by a list of process IDs. The usual UNIX permissions apply. Your script must own the process it's about to blow out of the water. That is, the effective and real UID (user ID) must be the same for the process sending the signal and the process receiving the signal. Here's the syntax for the kill() function:

kill signalNumber,processID

You can specify more than one process on the line by the command:

kill signalNumber,processID, processID, ....

The kill function can be used to terminate a lot of processes at once. All processes in a process group can be terminated by another process in the group by sending a negative processID. Such measures are taken by daemons that terminate all child processes. The use of the blanket kill keeps you from keeping a table of process IDs of all the child processes for an application program. The process making the call with the negative ID is exempt from this pillage. That is, all the rest of the processes, except the one making the call, will be killed. The syntax for this call would be of the following form:

kill(9, -$$);

To see whether a particular task is alive, you can use signal 0 with the kill function. A non-zero return value tells you that the task is alive; a zero value tells you that the task does not exist.

if  (kill(0,$id)) {
    print "Task  $id is alive;
    }    else {
    print "Task  $id is dead";
}

The warn() Function

Sometimes in Perl scripts you have to warn the user about certain conditions in the system by writing to the console. Normally, you use the die() function to print the cause of death of a Perl script just before bailing out. If you did not want to stop right at the cause of the error but rather limp along with an error message only, a print statement may be redirected to some file. In this case, you would use the warn function. The warn function in Perl is the same as the die function, except that the program keeps going with the warn function instead of stopping as it would with die.

Timeouts

You can use signals to handle timeouts in UNIX. The alarm function comes in handy for these types of signals. Here's the syntax for the alarm() function:

alarm numberOfseconds;

For example, the following stub of code sets up SIGALRM for a program:

sub aclk {
my  $is = shift;
print "something";
}

$SIG{ALRM} = \&clk;

You can wait for a child process, too, with the wait command. The wait command returns after a child process dies and returns the process ID of the dead child process. Calling the wait function may cause you to hang forever if the child runs away and never dies. If there are no children, this function returns -1. To wait explicitly for a process, you can specify its process ID in the call to waitpid. The syntax for the call is waitpid $PID 0. The zero is required.

Pipes and FIFOs

A UNIX system call to pipe() can be used to create a communications channel between two processes. Generally you call this function just before a fork call and then use the pipe to communicate between a parent process and its child process. Contrast this usage of pipes with message queues, which can be used to communication between unrelated processes. Here's the syntax for this call:

pipe ( READHANDLE, WRITEHANDLE);

Listing 14.3 illustrates a sample run of a pipe communication.


Listing 14.3. A sample run when using pipes.
 1 #!/usr/bin/perl
 2 $|=1;
 3 #
 4 # Create a pipe with the system call
 5 #
 6 $smoke = pipe(PIPE_R, PIPE_W);
 7 #
 8 # Fork off (almost) immediately.
 9 #
10 if ($pid = fork) {
11     printf "\n Parent:";
12     close PIPE_R;          # <-- the handle you do not need.
13     #
14     # now write to the child.
15     #
16     select PIPE_W;
17     for ($i=0;$i<5;$i++) {
18        sleep 1;
19        printf PIPE_W "[Sending: %d]\n", $i;
20        }
21     }
22     else
23     {
24     printf "\n Child:";
25     close PIPE_W;   # <- won't be writing to it.
26     #
27     # now read from parent
28     #
29     for ($i=0;$i<5;$i++) {
30         $buf = <PIPE_R>;
31         # For fixed length records you would use:
32         # read PIPE_R,$buf,20;
33         print "Child: Received $buf \n";
34         }
35     }

Line 2 of this listing sets the output buffers to be flushed as soon as they are written to. Line 6 creates the pipe for you to make PIPE_R and PIPE_W valid handles. We immediately do a fork in line 10.

After the fork, there are actually four handles: two PIPE_R handles and two PIPE_W handles. Because the parent does not want to read its own echo on PIPE_R, it closes PIPE_R. Similarly, the child has no reason to write to itself, so it closes PIPE_W. Then the parent writes to the PIPE_W handle while the child reads from the PIPE_R handle.

The following lines are the output from this program. Note how the output from the child and parent process is intermixed. This is because both processes are running at the same priority and accessing the output device (stdout) at the same time.

hmm.. okParent:
 Child:Child: Received [Sending: 0]

Child: Received [Sending: 1]

Child: Received [Sending: 2]

Child: Received [Sending: 3]

Child: Received [Sending: 4]

Pipes are great when used between related processes. However, if you want to communicate between two different processes, you have to use FIFOs. A FIFO is also referred to as a named pipe. Unrelated processes use named pipes to talk to each other. The FIFO appears like a normal filename.

In Listing 14.4, line 6 names the FIFO pathname as it would appear in the output from an ls command. In line 7, the FIFO is created with a system call to the command mkfifo using the -m option and pathname used in line 6. The unless clause checks to see whether such a FIFO already exists before attempting to recreate it. The code in line 8 actually opens the FIFO after it is created. Then, in lines 9 through 11, you create a string with the current date and send it to the FIFO. In line 12, you close the FIFO. The FIFO is not destroyed when it is closed at line 12. You have to unlink the pathname-that is, remove the FIFO by name to get rid of it, which is shown in line 14.

You can use an ls command on the FIFO name to see whether it exists. A way to test whether a filename is a FIFO is to use the option -p on the filename from within a Perl script.


Listing 14.4. Using FIFOs.
 1 #!/usr/bin/perl
 2 #
 3 # Create a fifo and then return the date
 4 # back to the caller when FIFO is read.
 5 #
 6 $path = "./ch14_fifo";
 7     unless (-p $path) { system("mkfifo -m 0666 $path"); }
 8     open(FIFO,"> $path") || die "Cannot open $! \n";
 9     $date = `date`;
10     chop($date);
11     print FIFO "[$date]";
12     close FIFO;
13 # Remove when done.
14 unlink $path;

The FIFO is opened in line 8. Now the program blocks until there's something on the other end trying to read from it. Using a command like cat ch14_fifo triggers line 9. The program then gets the system date and prints it out to the FIFO. Line 14 cleans up after itself.

You would probably want to have a signal handler to clean up if a terminating signal arrives before input is read from the FIFO and the program exits before destroying the FIFO. This is the reason for the unless clause, which checks for any existing FIFOs before creating a new one. The signal to catch is SIGPIPE for broken pipes. To trap SIGPIPE, you have to add this segment to your code:

sub pipeHandler {
    my $sig = shift @_;
    print " Caught SIGPIPE: $sig $1 \n";
    exit(1);
}

$SIG{PIPE} = \&pipeHandler;

Using open() for Ipc

The open() statement can be used to start communication pipes. The catch is that these pipes are unidirectional. To read the results from a process, put | at the end of the command being executed. To write your results to the standard input of a process, put | at the start of the command.

For example, here's how to write your results to the sendmail mailing program:

open(LOG, "| /usr/apps/formatData | /usr/bin/sendmail ");

To read the results back from a process, use a line like this:

open(READINGS, "/usr/apps/commProgram |");

As an example, consider the program in Listing 14.5, which prints the names of the files with the string passed to them at the command-line argument. It then lists all the files within the associative array fname.


Listing 14.5. Storing results of a command in an array.
 1 #!/usr/bin/perl
 2 #      Storing results from a program in an array.
 3 my %fname = ();
 4 #  Open all files with names ending in .pl
 5 open(IncOMING,"grep ARGV[1] *.pl |");
 6 while (<IncOMING>) {
 7     ($name,@line) =  split(':',$_);
 8     if (!$fname{$name}) {
 9          $fname{$name} = $name;
10     }
11 }
12 close IncOMING;
13
14 while (($key,$value) = each (%fname)) {
15     print "File: " . $key . "\n";
16 }

It's probably tempting to use the back ticks to run a program for you and collect the results. For example, $a = `who` returns the entire results of the who command in variable $a. It's impractical to use this method to collect results from a verbose command. Using a command like $a=`ls -lr` is a lot slower than actually opening a file handle to the output of this command and then processing it one line at a time. The problem is that the entire result of the command is stored in variable $a, chewing up memory and time while $a is appended to. It's easier and far more efficient to simply read from the output in manageable chunks.

Use two pipes to do bidirectional communications. Do not use a statement with a | at end of the command. Perl does not allow commands using this syntax:

open(HANDLE, "| sort |");

Perl gives an error message about not being able to perform bidirectional pipes.

The open() function can accept a file argument of either -| or |-. Accepting either of these parameters forks a child connected to the file handle you've just opened. The child is then running the same program as the parent. The return value from the call to open() is the process ID of the child or zero for the parent. The function dies if it cannot fork within the open() call. Here's a sample usage of -| to create a receiving child:

$| = 1;   # Always do unbuffered IO here.

$pid = open(MYHANDLE, "|-");
if ($pid == 0) {
    #parent acts as server
    print MYHANDLE, " something";
} else {
    #child acts as receiver.
    getSomething = <MYHANDLE>;
}

An implicit fork() is possible with the open2() command. Basically, with this call you start off two processes running the same code. The child process ID is returned to the client. Errors cause the function to bail out. The Ipc::Open2 module is required for this to work. Also, you have to make sure you are reading and writing continuously to this buffer. Be careful to run only programs for which you know the input and output sequences.

$|  = 1;
$pid = open('BI_R', 'BI_W', "myPerlScript");
if ($pid == 0) {
    #parent acts as server
    print B_R, " something";
} else {
    #child acts as receiver.
    getSomething = <MYHANDLE>;
}

There is an open3() function call, too, that adds STDERR to the list of opened files with which you may work.

Summary

This is a very quick introduction to using UNIX pipes and signals with Perl. Signals are a way of asynchronously telling a process about an event. Generally, signals can only be caught and an error message displayed about the type of signal through the use of a subroutine, called a handler. The types of signals vary depending on the type of UNIX system you use.

Using pipes is a method of communication between two processes that is old, yet still heavily used in UNIX. If a pipe (|) is placed at the beginning of a filename in an open() call, you'll be writing to a pipe with a UNIX command, file, or device at the other end. If command | is placed at the end of a filename to the open() call, you'll be reading the output of the command. Bi-directional and even tridirectional pipes are possible using the open2() and open3() calls.

Chapter 15

Direct Access to System Facilities


CONTENTS


Perl can be used to access system facilities not directly available in shell or awk programs. This chapter discusses some of the ways in which Perl programs can access system files and resources. The bulk of this chapter provides information on accessing data structures within system files on a UNIX system. In Chapter 9, "Portability Issues with Windows and Perl," and Chapter 10, "Win32 Modules on Windows NT," I discussed ways of accessing files on an NT system when covering portability issues.

Introduction

Accessing system facilities enables you to add that extra spice to your Perl programs. You have already seen how to call UNIX programs with the back quote (`) operators or the system() call. Both these mechanisms are costly in terms of using system resources because they start a subshell to execute another process. Perl offers many utilities to access system files and affects process parameters (such as process priorities) without resorting to firing up another process to perform a simple command.

Let's start with an example that shows you how to get more information about the user running your Perl program. Sometimes it might be nice to personalize your Perl scripts by sending messages back to the user running your script. It's easy to derive the user name from the /etc/passwd file or from calling the getlogin() function. Here's the syntax for the getlogin() function:

$userName = getlogin();

$userName is the returned user ID. For example, you could write a script to get the name and print out an appropriate greeting based on the user name:

$logname = getlogin();
if ($logname == "khusain") {
        print ("Hello, Kamran! How are you?\n");
} else {
if ($logname == "mgr2") {
        print ("Oh no, it's you again !! !!!! !!\n");
}

I am sure that you can come up with more practical applications rather than spewing greetings and salutations based on a user name. Practical examples include getting files, mail, and so on based on the user name. A number of utilities are available that do such things as answer your mail while you're on vacation and manage your mail from multiple sources. Check out the archives at the various Perl archive sites listed in appendix B, "Perl Module Archives," for these files. Most of these utilities are copyrighted by their authors, so they cannot be printed here, but you will get a good idea of what they can do.

Working with UNIX Users and Groups

Perl is great for working with user and group information files in the /etc directory. You can get user and group names, IDs, and other useful information from within your Perl programs to create powerful system administration applications.

Information from /etc/passwd with getpwent()

The getpwent() function enables you to access sequentially entries in the /etc/passwd file. Here's the syntax for the getpwent function:

($username,
$password,
$userid,
$groupid,
$quota,
$comment,
$userInfo,
$userHome,
$loginShell) = getpwent();

Here are the returned values from the getpwent call:

$username Contains the login name of the user
$password Contains the user's encrypted password
$userid The user ID
$groupid The group ID
$quota System dependent and may not exist
$comment System dependent and may not exist
$userInfo Personal information about the user
$userHome The user's home directory
$loginShell The user's startup command shell

To access each entry of the password file in turn, you can use the getpwent() function. Calling the getpwent() function for the first time in a program returns the first item in the /etc/passwd file. Every subsequent call to getpwent()returns the next entry in the file. The function returns an empty list after reaching the last entry in the file. Calling the getpwent() function after an empty list is received returns the first item in /etc/passwd, and you can start over. The getpwent() function has two related functions: setpwent() to rewind the file and endpwent() to close the /etc/passwd file. Here's the syntax for the setpwent function:

setpwent (keepopen);

If keepopen is non-zero, the /etc/passwd file is left open for reading and any previously cached information about the file is kept in memory. If keepopen is set to zero, any cached information in memory is flushed and the file is read again with the first entry available for a call to getpwent(). The endpwent function simply closes the /etc/password file.

Listing 15.1 uses getpwent to list the user names known by the machine as well as their user IDs.


Listing 15.1. Listing password entries.
 1 #!/usr/bin/perl
 2
 3  while (1) {
 4          last unless (($username, $password, $userid)
 5                        = getpwent());
 6          $users{$username} = $userid;
 7  }
 8  print ("Users on this machine:\n");
 9
10  foreach $user (sort keys (%users)) {
11          printf ("%-20s %d\n", $user, $users{$user});
12  }

The while loop in this listing calls getpwent() to read every entry in the /etc/password file. For this script, we are using only the first three elements of the returned list: the users' names, their encrypted passwords, and their IDs. The values are stored in the %users associative array and displayed in the for() loop. The output should look like this:

Users on this machine:
adm                  3
bin                  1
daemon               2
ftp                  404
games                12
guest                405
halt                 7
khusain              501
lp                   4
mail                 8
man                  13
news                 9
nobody               65535
operator             11
postmaster           14
ppp                  504
root                 0
shutdown             6
sync                 5
tparker              503
uucp                 10
uzma                 505
walter               502

Two sister functions exist for getting the same nine-item list about a user in the /etc/passwd file: getpwnam() and getpwuid(). The items in the returned list are in the same type as those returned by a call to getpwent(). The getpwnam() function returns the list given a user name, whereas the getpwuid() function returns the list given a user ID.

The getpwnam() and getpwuid() functions have the following syntax:

($username, $password, $userid, $groupid,
$quota, $comment, $userInfo, $userHome, $loginShell
)
    = getpwnam ($name);

($username, $password, $userid, $groupid,
$quota, $comment, $userInfo, $userHome, $loginShell)
    = getpwuid ($id);

An empty list is returned if no matching entry is found in /etc/passwd. A common use for getpwnam is to get a user ID for a user in a Perl script and use that value for creating temporary files.

Getting Group-Related Information with getgrent() and getgrnam()

The getgrent function is used to list the contents of an entry in the /etc/group file. The following information is provided for each entry:

  • The user group name
  • The user group password, if any
  • The group ID
  • A list of the user IDs in this group

Here's the syntax for the getgrent function:

($gname, $gpasswd, $gid, $gmembers) = getgrent;

This function returns four items corresponding to the previous list. The name, password, and ID fields are all scalar values. The value of gmembers is a list of user IDs separated by spaces.

The setgrent function sets the pointer in the /etc/group file back to the top. After setgrent is called, the next call to getgrent retrieves the first element of the /etc/group file. The endgrent function stops further access to the elements in the /etc/group file and frees up the memory used to store group information.

Here's the syntax for these functions:

setgrent();
endgrent();

Each call to getgrent returns one line from the /etc/group file. A NULL value (that is, an empty list) is returned when the last item is read. To print the contents of the group file, use a while loop like the one shown in Listing 15.2.


Listing 15.2. Getting group information.
1 while (($gname, $gpasswd, $gid, $gmembers) = getgrent) {
2         $groupsFound{$gname} = $gmembers;
3 }
4 foreach $i (sort keys (%groupsFound)) {
5     print "\n User IDs for group:", $groupsFound{$i} ;
6 }

The getgrnam function returns an /etc/group file entry when given a group name. Here's the syntax for the getgrnam function:

($gname, $gpasswd, $gid, $gmembers) = getgrnam ($name);

The variable $gname is the group name to search for. The $getgrnam returns the same four-element list that getgrent returns. Here is the output:

adm                  4
bin                  1
daemon               2
disk                 6
floppy               11
kmem                 9
lp                   7
mail                 12
man                  15
mem                  8
news                 13
nogroup              65535
root                 0
sys                  3
tty                  5
users                100
uucp                 14
wheel                10

Here's another sample of how to list users given a group name. Listing 15.3 shows a simple script that prints the users in a group.


Listing 15.3. A program that uses getgrnam.
 1 #!/usr/bin/perl
 2 print ("Please enter name of the group:\n");
 3 $name = <STDIN>;
 4 chop($name); #
 5
 6 if (!(($gname, $gpasswd, $gid, $gmembers) = getgrnam ($name))) {
 7            die ("There is no  $name group!. \n");
 8 }
 9
10 $count = 0;
11 while (1) {
12         last if ($gmembers eq "");
13         ($uid, $gmembers) = split (/\s+/, $gmembers, 2);
14         printf ("  %-15s", $uid);
15         $count++;
16         if (($count % 3) == 0) {
17                 print ("\n");
18         }
19 }
20 if ($count % 4) {   # finish it off.
21         print ("\n");
22 }

Lines 16 and 20 print the output four items per line. getgrid() retrieves the user information as returned by getgrnam(), except that it retrieves it by group ID. Here's the syntax for the getgrid function:

($gname, $gpasswd, $gid, $gmembers) = getgrid ($gid);

Generally, the call is just used to get the group name given a group ID:

($gname) = getgrid (3);

Be careful, though, to parenthesize the $gname variable to indicate that the $gname variable is an element in a list and not a list itself ! If you make the call like this:

$gname = getgrid (3);

the value of $gname is the returned list, not the first element of the array.

Getting Information in Network Files

Perl offers several functions to get information about networking files and items in the files on your system. By using these functions, you can create very powerful networking applications.

The getnetent Function

The getnetent function enables you to read entries in the /etc/networks file for all the names and addresses recognized as valid names by the domain name server for your machine. Here's the syntax for getnetent():

($name, $aliases, $addrType, $inet) = getnetent();

Four items are returned by this function:

  • The $name variable is the name of a network.
  • The $aliases variable is a list that contains alternate names for the $name network, and each item is separated from the other with spaces.
  • The $addrType value is set to the defined constant &AF_INET for your machine. (You'll need the file socket.ph somewhere in your @Inc path.) $addrtype is the address type; at present, this is always whatever value is defined for the system constant &AF_INET, which indicates that the address is an Internet address. Usually, the subroutine &AF_INET is set to return the integer 2 on UNIX systems.
  • The $inet address is set to the address of this network in the A.B.C.D form for Internet addresses. The A is the class A portion of the network address; B is for the class B portion; and C for the class C portion of the address.

Listing 15.4 shows how you can use getnetent to list the machine names and addresses at your site.


Listing 15.4. A program that uses getnetent.
1 #!/usr/bin/perl
2 print ("Networks on this machine:\n");
3 while (($name, $aliases, $atype, $inet) = getnetent()) {
4         ($a,$b,$c,$d) = unpack ("cccc", $inet);
5         print "$name = ";
6      printf " $ %d %d %d %d \n",$a,$b,$c,$d;
7 }

Each iteration in the while reads one entry in the /etc/networks file. If the last entry in the /etc/networks file has been read, the getnetent function returns an empty list and the while loop terminates. Each non-empty entry read is assigned to the variables $name, $aliases, $atype, and $inet.

The getnetbyaddr function returns the next available entry from /etc/networks with a given network number. Here's the syntax for the getnetbyaddr function:

($name, $aliases, $atype, $inet) = getnetbyaddr ($inaddr, $itype);

The getnetbyaddr() function returns the same types of values as the getnetent() function. The input parameters to getnetbyaddr() differ from the getnetent() function. The $inaddr is the network number that you are looking for. The $inaddr value must be a packed four-byte integer whose four bytes are the A, B, C, and D components of an Internet address. Use the pack command to create the $inet word:

@bytes = (204,251,103,2);
$inaddr = pack ("C4", @bytes);

The itype variable is almost always set to &AF_INET for Perl scripts on UNIX systems.

The getnetbyname() function is just like the getnetbyaddr() function except that it takes a network name (or alias) instead of an address. The returned values for an entry in the /etc/networks file are the same, too. Here's the syntax for the getnetbyname function:

($name, $aliases, $atype, $inet) = getnetbyname ($networkName);

The setnetent and endnetent functions in Perl rewind and close the /etc/networks file for access. The setnetent function rewinds the /etc/networks file. After a call to setnetent(), the next getnetent() call returns the first item in the /etc/networks file. Here's the syntax for the setnetent function:

setnetent (keepopen);

If keepopen is non-zero, the /etc/networks file is left open for reading, and any previously cached information about the file is kept in memory. If keepopen is set to zero, any cached information in memory is flushed and the file is read again with the first entry available for a call to getnetent(). The endnetent() function accepts no parameters and simply closes the /etc/networks file.

Working with Host Names Using gethostbyaddr() Functions

The gethostbyaddr() function accesses the /etc/hosts file for the host name given a particular Internet address. Here's the syntax for the gethostbyaddr function:

($name, $aliases, $addrtype, $len, $addr)
    = gethostbyaddr ($inaddr, $atype);

This function needs two arguments:

  • The Internet address of the host
  • The address type (which is usually set to AF_INET)

The Internet address is in the packed form as in the getnetaddr() call. The $inaddr value must be a packed four-byte integer whose four bytes are the A, B, C, and D components of an Internet address. Use the pack command to create the $inet word:

@bytes = (204,251,103,2);
$inaddr = pack ("C4", @bytes);

The gethostbyaddr function returns a list with five items in it:

  • The first item is the host name corresponding to the Internet address specified by $inaddr.
  • The $aliases variable is assigned the list of aliases or alternative names by which the host can be referred.
  • The addrtype, like inaddrtype, is always &AF_INET.
  • The $addrs is a list of addresses (the main address and alternatives) corresponding to the host node named. Each address is stored as a four-byte integer.
  • The variable len is the length of the addrs field; this length is always four multiplied by the number of addresses returned in addrs.

Listing 15.5 shows how you can use gethostbyaddr to retrieve the Internet address corresponding to a particular machine name.


Listing 15.5. A program that uses gethostbyaddr.
 1 #!/usr/bin/perl
 2
 3 print ("Enter the Internet address to look for :\n");
 4 $machine = <STDIN>;
 5 $machine =~ s/^\s+|\s+$//g; # remove whitespaces around it
 6
 7 @bytes = split (/\./, $machine);
 8
 9 $packaddr = pack ("C4", @bytes);
10
11 if (!(($host, $aliases, $addrtype, $len, @addrlist) =
12         gethostbyaddr ($packaddr, &AF_INET))) {
13         die ("No such $machine was found.\n");
14 }
15
16 if ($aliases ne "") {  # i.e. you have more than one alias
17         print ("$host: Aliases for $host are :\n");
18         @alternates = split (/\s+/, $aliases);
19         for ($i = 0; $i < @alternates; $i++) {
20                 printf "%d:  %s \n",$i, $alternates[$i];
21         }
22 }
23 else {
24 print " This $host has no aliases ";
25 }

The following is sample output for a machine using the script called 15_6.pl:

$ 15_6.pl
Enter the  Internet address to look for :
204.222.245.10
pop.ikra.comAliases for pop.ikra.com are :
0:  pop
1:  www.ikra.com

You can get the host information by specifying the name to the gethostbyname function. The gethostbyname function is like gethostbyaddr, except it uses a name instead of an address. Here's the syntax for the gethostbyname function:

($name, $aliases, $addrtype, $len, $addr)
    = gethostbyname ($nameString);

Here, $nameString is the machine name to look for. The returned values from the gethostbyname function are the same as those for gethostbyaddr. Look at Listing 15.6. The host name entered by the user may have leading or trailing blanks. These are removed by the statement on line 5.


Listing 15.6. Using gethostbyname.
1 #!/usr/bin/perl
2
3  print ("Enter a machine name or Internet site name:\n");
4  $machine = <STDIN>;
5  $machine =~ s/^\s+|\s+$//g;
6  if (!(($name, $altnames, $addrtype, $len, @addrlist) =
7          gethostbyname ($machine))) {
8          die ("Machine name $machine not found.\n");
9  }
10 print ("Equivalent addresses:\n");
11 for ($i = 0; $i < @addrlist; $i++) {
12         @addrbytes = unpack("C4", $addrlist[$i]);
13         $realaddr = join (".", @addrbytes);
14         print ("\t$realaddr\n");
15 }

The gethostent, sethostent, and endhostent functions enable you to get one entry at a time from the /etc/hosts file. The sethostent() function call rewinds the /etc/host file access to ensure that a call to the gethostent() function returns the record. The endhostnet() call closes further accesses to the /etc/hosts file.

Here's the syntax for the gethostent function:

($name, $aliases, $atype, $alen, $addrs) = gethostent();

The first call to gethostent returns the first element and each subsequent call returns the next element. The five-item list returned by the gethostent() call has the same content as the list returned by gethostbyaddr() or gethostbyname(). The list contains the following:

  • The first item is the host name corresponding to the Internet address specified by $inaddr.
  • The $aliases variable is assigned the list of aliases or alternative names by which the host can be referred.
  • The addrtype, like inaddrtype, is always &AF_INET.
  • The $addrs is a list of addresses (the main address and alternatives) corresponding to the host node named. Each address is stored as a four-byte integer.
  • The variable len is the length of the addrs field; this length is always four multiplied by the number of addresses returned in addrs.

Caution
Be careful when cycling through /etc/hosts because the file may be "faked out" with the use of the name server. Every query into the /etc/hosts entry may wind up being a request for a host name to a site on the Internet. Calling gethostent repeatedly might access and overload valuable resources unnecessarily. Use this function with a bit of restraint.

Here's the syntax for the sethostent function:

sethostent (keepopen);

If keepopen is non-zero, the /etc/hosts file is left open for reading and any previously cached information about the file is kept in memory. If keepopen is set to zero, any cached information in memory is flushed and the file is read again with the first entry available for a call to gethostent(). The endhostent() function accepts no parameters and simply closes the /etc/hosts file after flushing any buffers.

Working with Process Groups Using the getpgrp() Function

A process group in the UNIX environment is a set of processes having the same process group ID. Several process groups can exist at one time. Each process group is identified by a unique integer, known as the process group ID. The getpgrp function retrieves the process group ID for a particular process.

Here's the syntax for the getpgrp function:

$pgroup = getpgrp ($pid);

$pid is the process ID whose group you want to retrieve, and $pgroup is the returned process group ID, which is a scalar value. A call with no parameters (or $pid set to 0) returns the process group ID of the current process.

You can change the process group of a process, provided you have the correct permissions, by using the setpgrp function. Here's the syntax for the setpgrp function:

setpgrp ($pid, $groupid);

$pid is the ID of the process whose group you will change, and $groupid is the process group ID you want to assign this process to.

The getppid Function

Each process in the UNIX environment has its own unique process ID and parent. (There are some exceptions to this rule, such as the init process, but that discussion is beyond the scope of this book.) The process ID is always available as the special variable, $$. To retrieve the process ID for the parent process for itself, the script can call the getppid() function.

Here's the syntax for the getppid function:

$parentid = getppid();

$parentid is the parent process ID of your program.

The most common use of the getppid function is to pass the process IDs of the parent to a child after a fork. You can use getppid with fork to ensure that each of the two processes produced by the fork knows the process ID of the other. Listing 15.7 illustrates a sample call.


Listing 15.7. Getting the parent process ID.
1 #!/usr/bin/perl
2
3 $otherid = fork();
4 if ($otherid == 0) {
5         $parentID = getppid();
6     printf "I am junior $$ child of $$\n";
7 } else {
8         printf "I am the parent with an ID of $$ \n";
9 }

The output from a sample run would be

I am junior 5423 child of 5422
I am the parent with an ID of 5422

Getting and Changing the Priority of a Process

Processes in the UNIX environments run at a priority level. Processes with the highest priority are run by the UNIX scheduler before processes with a lower priority. A Perl script can set its priority within limits by calling the setpriority function. A process can get information on its priority values by calling the getpriority function. Priority level numbers are system dependent.

The setpriority Function

Here's the syntax for the setpriority function:

setpriority ($category, $id, $priority);

The $category variable indicates what processes are to have their priorities altered. The values that $category can take are listed in the resources.ph file. You can use one of the three following values based on the action you want the setpriority function to take:

PRIO_PROCESS This call affects only one process whose process ID is specified in $id. A value of 0 for $id indicates the current process.
PRIO_PGRP This call affects the entire group whose group ID is specified in $id. A value of 0 for $id indicates the group of the current process.
PRIO_USER This call affects all the processes belonging to the user whose uid is specified in $id. A value of 0 for $id indicates the current user with his or her real (not effective) user ID.

The $priority variable is the new priority for the process, group, or user that you selected. The numbers can range from 0 to 31, or 99 for a UNIX machine, though this value is a very system-dependent issue. For example, the priority levels range from -19 to 20 on a Solaris machine where -8 runs at a higher priority than a process running at 9.

The getpriority Function

The function getpriority() gets the current priority for a process, process group, or user. You can set the priority relative to the value returned by getpriority(). A lower priority causes the affected processes to be set to run while allowing other higher priority tasks to run. A higher priority allows the process to hog more system resources.

Caution
The setpriority() function is a bit dangerous to use and is therefore only available to the root user on most UNIX systems.

Here's the syntax for the getpriority function:

$priority = getpriority ($category, $id);

$category and $id are specified in the same manner as they are for the setpriority() call. For example, look at the following fragment of code, which is used to raise the priority of a Perl program:

require "resource.ph";
$currentpriority = getpriority(&PRIO_PROCESS,$$);
setpriority (&PRIO_PROCESS, $userid, $currentpriority + 1);

Working with Protocol Files Using the getprotoent() Function

The getprotoent function is used to get entries in the /etc/protocols file for the protocols database. Here's the syntax for the getprotoent function:

($protoName, $aliases, $number) = getprotoent();

$protoName is the name of a particular system protocol. $aliases is a scalar list of alternative names for this system protocol, with each name being separated from the other by a space. The $number is the ID for the particular system protocol.

The first call to getprotoent returns the first element in /etc/protocols. Each subsequent call then returns the next entry in the /etc/protocols file. getprotoent returns the empty list when the last entry is read.

The getprotobyname() and getprotobynumber() functions are used to search for entries the /etc/protocols file. The getprotobyname function searches for a particular protocol entry by using a name, whereas getprotobynumber() uses the protocol ID. Here is the syntax for the two functions:

($protoName, $aliases, $number) = getprotobyname ($name);
($protoName, $aliases, $number) = getprotobynumber ($number);

Both functions return an empty list if no matching protocol database entries are found.

The setprotoent() and endprotoent() functions are used to access the entries in the /etc/protocols file. The setprotoent function rewinds the /etc/protocols file.

Here's the syntax for the setprotoent function:

setprotoent (keepopen);

If keepopen is non-zero, the /etc/protocols file is left open for reading and any previously cached information about the file is kept in memory. If keepopen is set to zero, any cached information in memory is flushed, and the file is read again with the first entry available for a call to getprotoent(). The endprotoent() function accepts no parameters and simply closes the /etc/protocols file after flushing any buffers.

The getservent Function

The getservent() function is used to search the /etc/services file for entries in the system services database. Here's the syntax for the getservent() function:

($name, $aliases, $portnum, $protoname) = getservent();

$name is the identifier of a particular system service. $aliases is a scalar list of alias names for the system service specified in $name. The names in $aliases are separated from each other by a white space. $portnum is the port number assigned to the system protocol and indicates the location of the port at which the service is residing. The value in $portnum is a packed array of integers, which must be unpacked using unpack with a C* format specifier. $protoname is a protocol name. The first call to getservent returns the first element in /etc/services. Further calls return subsequent entries; when /etc/services is exhausted, getservent returns the empty list.

The setservent() and endservent() functions are used to access entries in the /etc/services file. The setservent() function rewinds the /etc/services file. Here's the syntax for the function:

setservent ($keepopen);

If keepopen is non-zero, the /etc/services file is left open for reading and any previously cached information about the file is kept in memory. If keepopen is set to zero, any cached information in memory is flushed and the file is read again with the first entry available for a call to getservent(). The endservent() function accepts no parameters and simply closes the /etc/services file after flushing any buffers.

The getservbyname and getservbyport functions are used to search the /etc/services file. The getservbyname() function looks for an entry given a name, whereas the getservbyport() function looks for an entry given a port number. Here's the syntax for the getservbyname function:

($name, $aliases, $portnum, $protoName)
   = getservbyname ($searchname, $searchproto);

Here's the syntax for the getservbyport function:

($name, $aliases, $portnum, $protoName)
   = getservbyport ($searchportnum, $searchproto);

$searchportnum and $searchname are the port number and name of the protocol, respectively. $searchproto is the port number and protocol type to search for.

Both functions return the same type of values as the four-element list returned by getservent(). (The empty list is returned if the name and type are not matched.) Similarly, the getservbyport function searches for a service name that matches a particular service port number.

System-Level Functions

This section lists those system-level functions that you're not likely to use but should know about to perform that one special function.

The chroot Function

The chroot function enables you to change the root directory for your program. The root directory is passed on to any child processes created by the application calling the chroot function.

Here's the syntax for the chroot function:

chroot ($dirname);

The $dirname is the pathname of the directory to use as the root directory. The value of $dirname name specified by dirname is appended to every pathname specified by your program and its subprocesses. For example, use a statement like this one to force all further access to files in the /pub directory:

chroot ("/pub");

This forces even absolute pathnames in a program to be prepended with /pub. The chroot function is helpful when writing applications for the World Wide Web because you can limit user access to a known directory tree. Thus, if a user types /etc/passwd, the request is turned into /pub/etc/passwd.

The dump Function

The dump function enables you to generate a UNIX core dump from within your Perl program. It is meant to be used with the undump command. Here's the syntax for this command:

dump[(label)];

label is optional and specifies the place to start for the UNIX undump command.

Caution
Be careful when working with dump and undump. Only the execution state of a program is kept, not the state of the environment. For example, if the code being undumped was accessing a file when the core was dumped, the file will not be open and is therefore not available to the undumped code.

Using the ioctl Function

In UNIX, the ioctl function has been the traditional catch-all for all input/output operations that can fit in the open, read, write, and close functions. The ioctl function is sometimes not portable across some UNIX systems and almost certainly not for Windows NT or Windows 95 systems. However, ioctl is too useful to discard because it may be the only route to get you the extra functionality you need to access terminal and system facilities.

A very good text to read to learn more about ioctl is W. Richard Stevens's Advanced Programming in the UNIX Environment, Addison Wesley, 1992. This book tells you more about ioctl than you'll want to know at one reading. It's a great source of UNIX information.

Here's an example of how to emulate the getkey() function prevalent in DOS machines. The getkey()function returns one character read back from the keyboard. When reading from the keyboard using <STDIN>, the program has to wait until the user hits the Return key. Waiting for the Return key lets the user back up and correct mistakes. This editing feature is available because the terminal is in a "cooked" mode. That is, the terminal driver is smart enough to recognize a Backspace key and take a character off the input queue.

To process each character at one time, you have to place the terminal in "raw" mode. The raw mode passes all typed keystrokes into the reading application without processing. Note that the following lines may be different on your operating system. The following lines are meant to serve only as an example and are adapted from an example in the Perl FAQ by Don Carson (dbc@tc.fluke.com):

$saveioctl = ioctl(STDIN,0,0);     # Gets device info
$saveioctl &= 0xff;               # reset right most bits
ioctl(STDIN,1,$saveioctl | 32);    # Set raw mode

Here's how to read a single character:

sysread(STDIN,$c,1);               # Read a single character

And here's how to put the terminal back in "cooked" mode:

ioctl(STDIN,1,$saveioctl);         # Restore back to original mode

Note that special keys return two-byte codes in the pc world. Check the ordinal value of $c to see if it's 0. If the value is zero, you have to read the next byte to get the special key code. On Linux machines, you have to check to see if the value is 1 instead of 0:

if(ord($c) == 0) {
    sysread(STDIN,$c,1);
    }

The returned value from the second call is the returned code. The values of the returned key codes depend on your operating system. For most UNIX machines, the codes are listed in the keyboard.ph file, which is about 400 lines long. The file will be in the /usr/include/sys directory.

Using the select Call

The select call in Perl can be used in more than one way. If used with one or no parameters, the select call in Perl refers to the default file handle being selected. However, in UNIX systems, the select(2) function has another very useful purpose of selecting which input to receive data from. To get more information about how to use select, refer to the UNIX man page.

By using the select call, an application can literally wait on more than one source of incoming data. For example, you can wait on both input from the keyboard (STDIN) and from other handles at one point in your code. As input arrives, your program can selectively process each input. Handling more input for an application at more than one location using signals, semaphores, or even other child processes is a lot more complicated than using select().

Using select has the side effect that you cannot use buffered input with the <HANDLE> commands. When using select for input, you have to use the sysread() command to get input into your application. The syntax for the sysread command is

sysread(HANDLE, $variable, $len)

where incoming values are placed in $variable. See the man page read(2) for more details.

Also, do not confuse using the select call for getting input from multiple file handles with using the select call for choosing a file handle sending output. When you make the select HANDLE call, you are redirecting all print and printf output to go to the file associated with HANDLE. The select call in this section lets you collect input from more than one source.

The syntax for the select call is defined as this:

($handle,$timeLeft) =select $rbits,$wbits,$ebits,$timeout;

The values of $rbits, $wbits, and $ebits are bitmapped fields, each for read, write, and "execute" attributes of a file handle. The location of each bit corresponds to the file handle number. The way to work with the bits in these fields is to use the vec() and fileno() calls. The $handle returned is the one that caused the select call to return. The $timeLeft variable is non-zero if the timeout is not reached. To wait indefinitely for some action at a handle, use the undef value for $timeout.

The $rbit, $wbit, and $ebit values have to be set for the file handles they represent in select. You can hard code the values, but it's much safer to use the vec() function. The vec() function sets bits in a vector. Here's the syntax for the vec() function:

vec($id, $index, $value);

The $index bit in the $Id is set to $value. The $value is either 0 or 1. You have to set the bits for reading, writing, and executing in three separate vectors. The index of each bit is determined by the file handle. To get the file handle as an integer, make a call to fileno(HANDLE). In the current implementation you are limited to 32 file handles. On AIX machines, you can use Ipc message queue identifiers as well as file handles.

For example, to set the bits for reading and writing to a socket as well as to standard input, as well as a socket, MYSOCKET, you would make the calls shown in Listing 15.8.


Listing 15.8. Using the select call.
 1 #!/usr/bin/perl
 2 $wbits =0 ; $rbits = 0;
 3 $ebits = 0;
 4 vec($rbits,fileno(MYSOCKET),1) = 1; # read from socket.
 5 vec($wbits,fileno(MYSOCKET),1) = 1;  # enable write vector
 6 vec($rbits,fileno(STDIN),1) = 1;  # enable read vector
 7 $ein = $rin | $win;  # for both reading & writing on all handles
 8
 9 while(1) {
10 ($theFile,$timeleft) = select($rbits, $wbits, $ebits, $timeout);
11 if ($timeleft == 0 ) {
12     &doIdleTasksHere();
13     }
14 &process($theFile);
15 }

The $timeout value can be used to set up a timer by using a call with the bit masks set to undef. The select call actually provides a better timer resolution using milliseconds than the sleep function, which works in one-second granularity. The time "slept" is never exactly what you ask it to be, but it's close enough to what you ask on most systems. To get a timer for 300 milliseconds, use this call:

select(undef, undef, undef, 0.3);

The syscall Function

The syscall() function is a system-dependent function that enables you to call the operating system directly. Most scripts that use the syscall function are nonportable even across platforms with different types of UNIX operating systems.

You need the file syscall.ph in order to use the syscall function. Here's the syntax for the syscall function:

syscall arguments, ... ;

The first item in the arguments list is a call to a subroutine that returns a token ID of the system function being called. The subroutines are defined in the syscall.ph file. Listing 15.9 shows a sample syscall.ph file.


Listing 15.9. The syscall.ph file.
  1 if (!defined & SYS_SYSCALL_H) {
  2 eval 'sub _SYS_SYSCALL_H {1;}';
  3     eval 'sub SYS_setup {0;}';
  4     eval 'sub SYS_exit {1;}';
  5     eval 'sub SYS_fork {2;}';
  6     eval 'sub SYS_read {3;}';
  7     eval 'sub SYS_write {4;}';
  8     eval 'sub SYS_open {5;}';
  9     eval 'sub SYS_close {6;}';
 10     eval 'sub SYS_waitpid {7;}';
 11     eval 'sub SYS_creat {8;}';
 12     eval 'sub SYS_link {9;}';
 13     eval 'sub SYS_unlink {10;}';
 14     eval 'sub SYS_execve {11;}';
 15     eval 'sub SYS_chdir {12;}';
 16     eval 'sub SYS_time {13;}';
 17     eval 'sub SYS_prev_mknod {14;}';
 18     eval 'sub SYS_chmod {15;}';
 19     eval 'sub SYS_chown {16;}';
 20     eval 'sub SYS_break {17;}';
 21     eval 'sub SYS_oldstat {18;}';
 22     eval 'sub SYS_lseek {19;}';
 23     eval 'sub SYS_getpid {20;}';
 24     eval 'sub SYS_mount {21;}';
 25     eval 'sub SYS_umount {22;}';
 26     eval 'sub SYS_setuid {23;}';
 27     eval 'sub SYS_getuid {24;}';
 28     eval 'sub SYS_stime {25;}';
 29     eval 'sub SYS_ptrace {26;}';
 30     eval 'sub SYS_alarm {27;}';
 31     eval 'sub SYS_oldfstat {28;}';
 32     eval 'sub SYS_pause {29;}';
 33     eval 'sub SYS_utime {30;}';
 34     eval 'sub SYS_stty {31;}';
 35     eval 'sub SYS_gtty {32;}';
 36     eval 'sub SYS_access {33;}';
 37     eval 'sub SYS_nice {34;}';
 38     eval 'sub SYS_ftime {35;}';
 39     eval 'sub SYS_sync {36;}';
 40     eval 'sub SYS_kill {37;}';
 41     eval 'sub SYS_rename {38;}';
 42     eval 'sub SYS_mkdir {39;}';
 43     eval 'sub SYS_rmdir {40;}';
 44     eval 'sub SYS_dup {41;}';
 45     eval 'sub SYS_pipe {42;}';
 46     eval 'sub SYS_times {43;}';
 47     eval 'sub SYS_prof {44;}';
 48     eval 'sub SYS_brk {45;}';
 49     eval 'sub SYS_setgid {46;}';
 50     eval 'sub SYS_getgid {47;}';
 51     eval 'sub SYS_signal {48;}';
 52     eval 'sub SYS_geteuid {49;}';
 53     eval 'sub SYS_getegid {50;}';
 54     eval 'sub SYS_acct {51;}';
 55     eval 'sub SYS_phys {52;}';
 56     eval 'sub SYS_lock {53;}';
 57     eval 'sub SYS_ioctl {54;}';
 58     eval 'sub SYS_fcntl {55;}';
 59     eval 'sub SYS_mpx {56;}';
 60     eval 'sub SYS_setpgid {57;}';
 61     eval 'sub SYS_ulimit {58;}';
 62     eval 'sub SYS_oldolduname {59;}';
 63     eval 'sub SYS_umask {60;}';
 64     eval 'sub SYS_chroot {61;}';
 65     eval 'sub SYS_prev_ustat {62;}';
 66     eval 'sub SYS_dup2 {63;}';
 67     eval 'sub SYS_getppid {64;}';
 68     eval 'sub SYS_getpgrp {65;}';
 69     eval 'sub SYS_setsid {66;}';
 70     eval 'sub SYS_sigaction {67;}';
 71     eval 'sub SYS_siggetmask {68;}';
 72     eval 'sub SYS_sigsetmask {69;}';
 73     eval 'sub SYS_setreuid {70;}';
 74     eval 'sub SYS_setregid {71;}';
 75     eval 'sub SYS_sigsuspend {72;}';
 76     eval 'sub SYS_sigpending {73;}';
 77     eval 'sub SYS_sethostname {74;}';
 78     eval 'sub SYS_setrlimit {75;}';
 79     eval 'sub SYS_getrlimit {76;}';
 80     eval 'sub SYS_getrusage {77;}';
 81     eval 'sub SYS_gettimeofday {78;}';
 82     eval 'sub SYS_settimeofday {79;}';
 83     eval 'sub SYS_getgroups {80;}';
 84     eval 'sub SYS_setgroups {81;}';
 85     eval 'sub SYS_select {82;}';
 86     eval 'sub SYS_symlink {83;}';
 87     eval 'sub SYS_oldlstat {84;}';
 88     eval 'sub SYS_readlink {85;}';
 89     eval 'sub SYS_uselib {86;}';
 90     eval 'sub SYS_swapon {87;}';
 91     eval 'sub SYS_reboot {88;}';
 92     eval 'sub SYS_readdir {89;}';
 93     eval 'sub SYS_mmap {90;}';
 94     eval 'sub SYS_munmap {91;}';
 95     eval 'sub SYS_truncate {92;}';
 96     eval 'sub SYS_ftruncate {93;}';
 97     eval 'sub SYS_fchmod {94;}';
 98     eval 'sub SYS_fchown {95;}';
 99     eval 'sub SYS_getpriority {96;}';
100     eval 'sub SYS_setpriority {97;}';
101     eval 'sub SYS_profil {98;}';
102     eval 'sub SYS_statfs {99;}';
103     eval 'sub SYS_fstatfs {100;}';
104     eval 'sub SYS_ioperm {101;}';
105     eval 'sub SYS_socketcall {102;}';
106     eval 'sub SYS_klog {103;}';
107     eval 'sub SYS_setitimer {104;}';
108     eval 'sub SYS_getitimer {105;}';
109     eval 'sub SYS_prev_stat {106;}';
110     eval 'sub SYS_prev_lstat {107;}';
111     eval 'sub SYS_prev_fstat {108;}';
112     eval 'sub SYS_olduname {109;}';
113     eval 'sub SYS_iopl {110;}';
114     eval 'sub SYS_vhangup {111;}';
115     eval 'sub SYS_idle {112;}';
116     eval 'sub SYS_vm86 {113;}';
117     eval 'sub SYS_wait4 {114;}';
118     eval 'sub SYS_swapoff {115;}';
119     eval 'sub SYS_sysinfo {116;}';
120     eval 'sub SYS_ipc {117;}';
121     eval 'sub SYS_fsync {118;}';
122     eval 'sub SYS_sigreturn {119;}';
123     eval 'sub SYS_clone {120;}';
124     eval 'sub SYS_setdomainname {121;}';
125     eval 'sub SYS_uname {122;}';
126     eval 'sub SYS_modify_ldt {123;}';
127     eval 'sub SYS_adjtimex {124;}';
128     eval 'sub SYS_mprotect {125;}';
129     eval 'sub SYS_sigprocmask {126;}';
130     eval 'sub SYS_create_module {127;}';
131     eval 'sub SYS_init_module {128;}';
132     eval 'sub SYS_delete_module {129;}';
133     eval 'sub SYS_get_kernel_syms {130;}';
134     eval 'sub SYS_quotactl {131;}';
135     eval 'sub SYS_getpgid {132;}';
136     eval 'sub SYS_fchdir {133;}';
137     eval 'sub SYS_bdflush {134;}';
138     eval 'sub SYS_sysfs {135;}';
139     eval 'sub SYS_personality {136;}';
140     eval 'sub SYS_afs_syscall {137;}';
141     eval 'sub SYS_setfsuid {138;}';
142     eval 'sub SYS_setfsgid {139;}';
143     eval 'sub SYS__llseek {140;}';
144 }
145 1;

The remainder of the arguments must be precisely what the system call requires, or the calling application will fail. A non-root application will also fail if it makes a non-system call that requires root privilege. Passed arguments are translated into system call equivalents as follows: Numbers are sent as integers. Floating-point numbers are scaled to the nearest integer. Strings are passed by reference. Any strings expecting a response back from the call have to be long enough to store the returned results. A call for writing directly to a file handle is shown in Listing 15.10.


Listing 15.10. Using the syscall function.
#!/usr/bin/perl
require 'syscall.ph';

$hnd  =  syscall(&SYS_open,"myfile",$flgs);
syscall(&SYS_write, $hnd, "Hello!!!\n", 9);
syscall(&SYS_close,hndl);

You can force literals to be interpreted as numbers if you add a zero to the variable. For example, a filename of x would be interpreted as the string "x", not the letter x. However, "x"+0 will cause a number to be passed to the called function. See Listing 15.9 for a list of available system calls on a typical UNIX system.

Summary

This chapter has introduced you to Perl functions for accessing system facilities. Using system('function') or using back quotes is a costly way of executing commands in the system. Various utilities exist in Perl for manipulating the hosts, networks, systems services, and protocols on a UNIX system.

Chapter 16

Command-line Interface with Perl


CONTENTS


This chapter introduces you to handling the options available from the command-line interface to the Perl interpreter, handling user input, and writing interactive Perl scripts.

By using the command-line options in Perl, you can determine how to best use the Perl interpreter to take care of details such as handling loops on the input, creating and destroying temporary files, and handling multiple files.

Of course, you would want to be able to process the incoming options to your own programs as well. Writing scripts to handle user responses takes an inordinate amount of time and effort given the infinite number of responses you can receive. When passing installation scripts for a software package, it would be nice if the scripts were intelligent enough to filter out most of the incorrect responses. In this chapter, you work with Perl modules that eliminate some of the grunt work.

The Command-line Options to Perl

Perl's command-line options provide many features, such as checking syntax, printing warnings, using the C preprocessor, and modifying the way output is printed in a Perl document. There are two ways to provide options to a Perl program: either by passing them in the command line along with the command you enter to start the Perl program or in the comment header of your Perl program script.

Sending Options via the Command Line

You can always enter options for a Perl program on the command line. The syntax for specifying options on the command line is

perl options programName

where programName is the name of the Perl program to run, and options is the list of options to provide to the program being run. For example, the command

perl -d -w test1

runs the Perl program named test1 and passes it the options -d and -w. You'll learn about the actions of these options in the following sections. Some options require an additional value. For example, the -I option requires a pathname for include files.

perl -I /usr/local/include/special something

The /usr/local/include/special path is also searched for a file if it is not found via the @Inc path. It is not necessary to put a space between the option and its argument.

perl -I/usr/local/include/special something

In all cases, any value associated with an option must always immediately follow the option.

Options that do not require an associated value can be grouped without the use of an additional dash (-) character or space. For example, the following two commands do the same thing:

perl -d -w test1
perl -dw test1

The last option in a group can have additional values. For example, the following two commands do the same thing:

perl -w -I/usr/local/include/special something
perl -wI/usr/local/include/special something

Specifying an Option within the Program

The command line at the start of a program that includes a header comment (a comment beginning with the #! characters) can be used to pass options to Perl. For example, the following line:

#!perl -w

will pass the -w option to Perl. Historically, only one argument could be passed to Perl this way, but now you can pass several options. A word of caution is necessary here: Options specified on the command line will override options specified in the header comment. For example, if your header comment is

#!perl -d

and you start your program with the following command, the program will run with the -w option specified but not the -d option:

perl -w test1

Table 16.1 lists some of the command-line options to Perl.

Table 16.1. Command-line options to Perl.
Option
Meaning
-c
Do syntax checking only.
-d
Start the debugger.
-e
Execute a program from the command line.
-i
Insert line back into the input file.
-I
Specify the paths to search for included files.
-p
Echo each line of input.
-P
Use the C preprocessor.
-s
Parse very basic command-line switches to the program.
-T
Used for writing secure programs. Using this option forces data obtained from outside the program to not be used in any command that affects your file system. This feature lets you write secure programs for system administration tasks.
-u
Generate a core dump.
-U
Run in unprotected mode (full access to file system).
-v
Print the version number.
-w
Print warning labels.
-W
Print warnings.

The following sections cover each option in more detail. The options are presented in the order in which they are most likely to be found in Perl scripts rather than in alphabetical order.

The -c and -w Syntax Checking and Warning Options

The -c option asks the Perl interpreter to check the syntax of your Perl program without actually running it. All other options except for -v and -w are ignored by the Perl interpreter when it sees the -c option. The -w option prints warnings instead of errors. An error will certainly crash your program. A warning is issued when attempting to parse an ambiguous operation. Both the -c and the -w options can be used together with the flag, like this: -cw.

If the program you provide is syntactically correct, the Perl interpreter will print the message

filename syntax OK

where filename is the name of your program. If any errors are detected, you'll see the following message where filename is the name of your program:

filename had compilation errors

The -w option prints a warning every time the Perl interpreter sees something that might cause a problem. Here are some of the potential problems:

  • Having more than one subroutine with the same name. Both functions will be called, and the program won't crash. Use the -w option to warn about this problem.
  • Using the value of a variable that has not been defined.
  • Using the == operator to compare strings instead of eq operators.

Note
A number is converted to a string when compared with a string using the eq operator. However, a string when used with the == operator is always converted to the numeric value of 0.

The -e Option: Executing a Single-line Program

You can execute Perl statements at the command line with the -e option. Here is an example of a command that prints a string:

$ perl -e 'print ("Kamran Wuz Here\n");'
Kamran Wuz Here

Don't forget to type the semicolon (;) at the end of each statement. You can specify more than one statement by using either semicolons to separate them or using multiple -e options. For example, the following two statements both print the string Howdy folks:

$ perl -e 'print ("Howdy\n");' -e 'print (" folks\n");'
Howdy folks
$ perl -e 'print ("Howdy\n"); print (" folks\n");'
Howdy folks

In the case of multiple -e options, the Perl interpreter executes them from left to right. Here's an example:

$ perl -e 'print ("Donald");' -e 'print (" Duck");'
Donald Duck

The -s Option to Supply Custom Command-line Options

Generally, you'll specify the command line in a Perl script with execute permissions in the first line of a script file, as follows:

#!/usr/bin/perl

The first line is the complete pathname to the Perl interpreter.

You can run the same script using the following command at the command line, as follows:

perl scriptFile

where scriptFile is the name of the script file. Any command-line options specified before the script file's name will be passed to the Perl interpreter and not to your script file.

To pass options to the script that you run, you have to use the -s option.

perl -s scriptFile -w

This command starts the Perl program scriptFile and passes it the -w option. If you do not specify the -s option, your -w will be sent as part of the @ARGV array to the program being run. For programs that are run from the command line with the Perl command, it's best to include -s as part of your header comment:

#!perl -s

This way you are guaranteed that the program always will check for options provided that no other Perl options were specified on the command line when you invoked the program.

A scalar variable with the same name as the name of any specified option is created and automatically set to 1 before the Perl interpreter executes a program. For example, if a Perl program named scriptFile is called with the -x option, as in

perl -s scriptFile -x

the scalar variable $x is automatically set to 1. This lets you test the $x variable in a conditional expression to see whether the option has been set. The named variable will not appear in the @ARGV array. Options do not have to be a single character. For example, the following command sets the value of the scalar variable $surge to 1:

perl -s scriptFile -surge

Options can be set to a value other than 1 by simply assigning a value at the command line. For example:

perl -s scriptFile -surge="power"

This command sets the value of $surge to power in the program specified in scriptFile.

The -s option lets you supply both options and command-line arguments based on these rules:

  • All arguments that start with a dash (-) and immediately follow the program name are assumed to be options.
  • Any argument not starting with a dash (-) is assumed to be an ordinary argument. All subsequent arguments, even if they start with a - , are then assumed to be ordinary arguments and not options.
  • A double dash (--) will end Perl's parsing of command-line switches.

This means, for example, that the command

perl -s scriptFile -arg1 -arg2 -arg3

treats -arg1 as an option to scriptFile, and -arg2 and -arg3 are ordinary arguments that are placed in @ARGV.

The -I Option to Include Other Files

The -I option is used with the -P option (which is described in the next section). The -I option lets you specify the pathnames to search for include files to be processed by the C preprocessor. For example:

perl -P -I /usr/local/other scriptFile

This command tells the Perl interpreter to search the directory /usr/local/other for include files if the file is not found in its default paths. The default path is the current directory and, if the file is not found, the /usr/local/lib/perl directory. The -I option can be repeated on the same command line to specify more than one include-file path.

Using the -I option also adds the path or paths to the @Inc variable. The paths are then made available to the Perl interpreter when it uses the require function to find its modules.

The -P Option for Using the C Preprocessor

The -P option is helpful only if you have a C compiler on your system. Although all UNIX systems come with a C compiler, DOS and Windows NT systems don't; you have to purchase your own. The cpp preprocessor is the default C preprocessor in UNIX. The C preprocessor is a program that takes code written in C which does basic string substitution based on the values of variables. To enable the use of cpp with the -P option, use the following statement to start your Perl program:

perl -P scriptFile

The Perl program scriptFile is first run through the C preprocessor, and then the resulting output is executed by the Perl interpreter. You can also specify the use of the C preprocessor in the header comment like this:

#!perl -P

All C preprocessor statements have the following syntax:

#command value

The hash (#) is interpreted by Perl as a comment, and so any statements intended for the C preprocessor are ignored by Perl even if the -P option is not used. The command is the preprocessor operation to perform, and value, which is optional, is associated with this operation.

The #define operator is the most common preprocessor statement. It tells the preprocessor to replace every occurrence of a particular character string with a specified value. The syntax for #define is

#define item value

This statement replaces all occurrences of the character string item with value. This substitution operation is known as macro substitution. The item being substituted can contain a combination of letters, digits, and underscores. The value specified in a #define statement can be any character string or number. For example, the following statement will replace all occurrences of DOCTOR with quack, and Donald with "Duck" including the quotation marks:

#define DOCTOR QUACK
#define Donald "Duck"

Any expressions are treated as strings. For example, the following statement:

#define AREA (3.141 * 2)

replaces AREA with the string (3.141 * 2), including the parentheses and not the value of 6.282.

When using #define with expressions, don't forget to enclose the value in parentheses. For example, consider the following Perl statement:

$result = ITEM * 10;

If the statement you use in your preprocessor command is this:

#define ITEM 1 + 2

the resulting Perl statement is this:

$result = 1 + 2 * 10;

This statement assigns 21 to $result instead of 30, which would be the result if you used this expression instead:

#define ITEM (1 + 2)

to get this statement as a result:

$result = (1 + 2) * 10;

You can even specify multiple parameters with a #define statement thus enabling you to use a preprocessor command like a simple function that also accepts arguments. For example, this preprocessor statement:

#define SQUARE(val) ((val) * (val))

will return the square of a passed value. This statement:

$result = SQUARE(4)

will generate the following statement:

$result = ((4) * (4));

Multiple parameters are specified using a syntax similar to a Perl program. For example, consider the following statement:

#define POW(base, power) ((base) ** (power))
$result = POW(2,3);

It produces this result:

$result = ((2) ** (3));

Macros can be reused. For example,

#define PI 3.141
#define AREA(rad) (2* (rad) * PI)
$result = 43 + AREA($radius);

Here, the macro PI is defined first, the macro AREA uses PI to return an area for a given radius in $.

Using the -n and -p Options for Handling Multiple Files

When processing input from multiple files, it's often convenient to put the processing function in a while(<>) loop so that each line in each file is sequentially processed. For example, you'll see code of the following form:

while ($line = <>) {
  &processMe($line)
}

Use the -n option to not specify the while loop. This option forces Perl to take your program and execute it once for each line of input in each of the files specified on the command line. Here's an example:

#!perl -n
$line = $_;
chop ($line);
printf ("%d %-52s *\n", $ctr++, $line);

The -n option encloses this program in an invisible while loop. Each line of input is stored in the system variable $_ by the Perl interpreter, which then calls this program. The same program could be rewritten as follows:

#!perl
while (<>) {
$line = $_;
chop ($line);
printf ("%d %-52s *\n", $ctr++, $line);
}

The -n and -e options can be used together to perform a function on each line of input of all input files. For example, the following statements both search for the word param in all files whose names end with .pl:

perl -n -e "print $_ if (/param/);" *.pl

grep "param" *.pl

The print $_ if (/param/); argument supplied with the -e option is a one-line Perl program that prints the current line if the word param is found in it. The -n option executes the one-line program once for each input line that is set into the system variable $_.

The -p option is like the -n option except that it prints each line as it reads each line. The -p option is designed for use with the -i option, which is described in the following section. If both the -p and the -n options are specified, the -n option is ignored.

The -i Option to Edit Files

Both the -n and -p options read lines from the files whose names are listed on the command line. When the -i option is used with the -p option, the Perl interpreter takes the input lines being read and writes them back out to the files from which they came. For example, consider the following command:

perl -p -i -e "s/Costa/Rica/g;" *.txt

It replaces every instance of Costa with Rica in all the files whose names end with .txt.

Caution
Do not use the -i option with the -n option. The following command:
perl -n -i -e "s/Stock/Option/g;" *.txt
also changes all occurrences of Stock to Option. However, it does not write out the input lines after it changes them! Because the -i option forces the input files to be written to and nothing is printed, you'll erase the contents of all the files with .txt extensions!

The -i option does not have to work in conjunction with the -p option if the program that uses the option contains the <> operator inside a loop. For example, consider the following command:

perl -i *.txt

It will copy the content of each input file to a temporary file and then open it for reading. The input file is closed and then reopened for writing. This process is repeated for all input files.

Listing 16.1 presents a simple example of a program using both the -i option and the <> operator. This program replaces all occurrences of Wall with Brick.


Listing 16.1. A program that edits files using the -i option.
1 #!perl -I
2 while ($line = <>) {
3 $line =~ s/Wall/Brick/g;
4        print ($line);
5 }

No output is sent to the screen because the output is redirected to each input file.

The -i option can be used to back up input files, too. By specifying a new file extension to the -i option, you can ask that the new extension be appended to the filename being written to. For example, the following command:

perl -i .bak dog mouse

will result in two extra files, dog.bak and mouse.bak, being written to disk. The .bak file extension specified with -i will force the Perl interpreter to copy each file to file.bak before overwriting it.

Using the -a Option

The -a option is used for extracting words from files. The -a option is designed to be used with the -n or -p option to split incoming lines into a list of items in the @F array. Each item in the @F array is a word derived by applying the split(' ',$_) function to each input line. For example, if your input file contains the following line:

My name is     Kamran

the result of the -a option that reads this line sets the contents of the array @F to be the following list:

("My", "name", "is", "Kamran")

Note that extraneous spaces and tabs from the input line have not been added to the @F array.

Listing 16.2 shows a sample program of how to use the -a option to extract all numeric values that are the first word of an input line.


Listing 16.2. Sample use of the -a option.
1  #!perl -a -n
2  while ($F[0] =~ /[^\d.]/) {
3          shift (@F);
4          next if (!defined($F[0]));
5  }
6  print ("$F[0] \n");

Note that this program prints every line and prints only the first word that does not contain a digit or a . character.

Using the -F Option

The -F option is designed to be used along with the -a option. It is used to specify the pattern to use when splitting input lines into words. For example, if the input fields on each line that is input to a program are separated by a colon, you would use the following statement:

perl -a -n -F: textfile

In this case, the words in the input file are assumed to be separated by a colon. You can use opening and closing slashes as pattern delimiters. This means that both the following programs do the same thing:

prog -a -n -F: *.txt

prog -a -n -F/:/ *.txt

Using the -0 Option

The default end-of-input for one line of text in Perl is the newline. That is, the Perl interpreter reads a line from an input file or from the keyboard until it sees a newline character. You can specify an end-of-line input character other than the newline character by using the -0 OOO option. The 0 here is the digit zero for the option, and the letter O is the octal number to replace the newline character. For example, the following command:

perl -0 07 program *.bin

will let the named program use the bell character (7 octal) as the end-of-line character when it reads the input files that have a .bin extension.

For example, the following header comment line will set the end-of-line character to a space, (octal 40):

#!perl -0 040

To read one paragraph at a time, specify 00 as the input to the -0 option. This will let the Perl interpreter read input until it sees two newlines together, and thus you will be able to read in one paragraph at a time. If you do not specify a value with the -0 option, the Perl interpreter assumes the null character (ASCII 0).

Using the -l Option

The -l option lets you use a new output end-of-line character for printing statements. Like the -0 option, the -l option takes an octal number instead of an ASCII character for use in place of the newline. This is a one, not the letter "el." When the -l option is specified, the Perl interpreter always replaces the end-of-line character in print statements with the newer version. Also, in the case of -n or -p options, the end-of-line character is removed after reading the input.

The Perl interpreter uses the character specified by the -0 option, if it is defined, in case you do not specify the -I option. If -0 also has not been specified, the end-of-line character is set to the newline character.

When using both the -l and the -0 option, specify the -1 option first, then -0 option. Recall that options are processed from left to right. If the -l option appears first, the output end-of-line character is set to the newline character. If the -0 option appears first, the output end-of-line character (set by -l) becomes the same as the input end-of-line character (set by -0).

Note
It's probably easier to control the input and output end-of-line characters also by using the system variables $/ and $\, respectively.

Using the -x Option to Get a Perl Program from Another File

The -x option enables you to process a Perl program that appears in the middle of a file. When the -x option is specified, the Perl interpreter ignores every line in the program until it sees a header comment. The Perl interpreter then processes the program as usual until the bottom of the program file is reached or the __END__ statement is reached. Everything after the __END__ statement is ignored by the Perl interpreter.

Using the -S Option

You need to use -S only if you run your Perl program using the Perl command. If you run a program directly using a script, the -S option is meaningless because the shell will hunt for your program in the directories specified in your PATH environment variable. The -S option simply tells the Perl interpreter that your program might be contained in any of the directories specified by your PATH environment variable.

The -v Option: Printing the Perl Version Number

You might be curious as to which version of Perl you are running. The -v option prints a string with the version information for the Perl interpreter you are running. The Perl interpreter will not run any scripts, nor will it honor any other options when this -v option is specified. Here is sample output from the -v command:

$ perl -v
This is perl, version 5.002

Copyright 1987-1996, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.

Now that you've learned the command-line options for the Perl interpreter, you're ready to learn how to process input in your Perl applications.

Using Conditional Code with the C Preprocessor

The C preprocessor also provided five statements, #ifdef, #ifndef, #if, #else, and #endif, for conditional statements to include or exclude parts of your Perl program. The syntax for the #ifdef and #endif statements is

#ifdef cond
...code if cond is defined...
#else
...code if cond is NOT defined...
#endif

The cond is a character string that can be used in a #define statement. If the character string has been defined to a value, the first set of code (above the #else clause) is inserted in your program; otherwise, the second part of code (after the #else and before the #endif clause) is inserted in your program. Because the #else clause is optional, you can also have statements of the form

#ifdef cond
...code if cond is defined...
#endif

The #ifndef lets you define code that is to be executed when a particular string is not defined. Thus, #ifndef takes the opposite action of the #ifdef statement. For example:

#ifdef SOMBER
print ("Hello, Cruel world!\n");
#else
print ("Hello, Beautiful world!\n");
#endif

This code prints a sad message (Hello, Cruel world!) if SOMBER was defined earlier, or a happy message (Hello, Beautiful world!) if SOMBER was not defined earlier.

Code enclosed by #ifdef and #endif does not have to be a complete Perl statement. For example, the following code will set the value of $result to different settings based on the whether or not METRIC was defined:

$area =  $radius * PI * 2
#ifdef METRIC
* 2.54
#endif
;

Here, $result is assigned a value in centimeters if METRIC is defined or in inches if it's not.

Tip
Don't overuse the C preprocessor because it might make your program hard to read, especially by people who are not familiar with the C programming language.

The #if statement in the C preprocessor is similar to the #ifdef statement. The #if statement uses the value of a variable, whereas the #ifdef statement simply checks to see whether a variable is defined. The syntax for the #if statement is as follows:

#if expr
...code...
#endif

The expr is the expression that is evaluated by the C preprocessor, and code is the code to be executed if expr is nonzero. For example, the following statements will set the value of $result to "hello" if the sum of S1 and S2 is nonzero:

#if S1 + S2
$result = "hello";
#endif

If you want to set the value of $result if either S1 or S2 is set to a nonzero value, you can use the following statement:

#if S1 || S2
$result = "hello";
#endif

By specifying 0 to the #if statement, you can easily prevent lines of code from being interpreted without having to put a hash (#) in front of each line:

#if 0
$result = "hello";
print ("I will not be printed if the -P option is used.\n");
#endif

You can also use #else with the #if operator:

#if S1 || S2
$result = "hello";
#else
$result = "goodbye";
#endif

In this case, the value of $result will be "hello" if either S1 or S2 has a nonzero value; otherwise, the value will be "goodbye".

Note
The C preprocessor does not support the exponent operator, so you cannot evaluate
(x ** y) with the #if statement.

You can embed #ifdef/#else/#endif constructs inside one another. Just make sure that you match all the #ifdef and #endif statements so that there is one #endif for each #ifdef and #ifndef statement. For example, here is a snippet of code that illustrates how the nesting is done with two #ifdef blocks:

#ifdef S1
#ifdef S2
print ("Both S1 and S2 are defined \n");
#else
print ("S1 yes but not S2\n");
#endif
#else
#ifdef S2
print ("S2 yes but not S1\n");
#else
print ("neither S1 nor S2\n");
#endif
#endif

Normally, you would include other Perl programs and modules with the require and use statements. You can also use the #include directive of the C preprocessor to include the contents of another file. The syntax for the #include command is

#include filename

where filename is the name of the file to be included.

For example, the following command includes the contents of math.h as part of the program:

#include <math.h>

The contents of math.h will also be run through the C preprocessor before it's included. The C preprocessor searches for the included file in the current directory and, if not found, in the
/usr/local/lib/perl directory. You can use the -I option to search in other directories for source and include files.

Reading Input from STDIN

In a Perl script, you can easily read the standard input for responses with the <STDIN> file handle. The following three lines of code show you how to get a number from a user and return the square root of the number:

print "\nEnter a number:";
$answer = <STDIN>;
print "Sq. root of $answer = ", sqrt($answer), "\n";

This little gem of code works great as long you are careful enough to enter only positive numbers. Enter a negative number, and the script bombs. Therefore, before taking the square root, you have to check to see if the number is greater than or equal to zero; otherwise, you have to bail out with an error message.

Another annoying fact is that reading $answer=<STDIN> also brings along the \n end-of-line character. Therefore, to remove this appendage from $answer, you have to call the function chop($answer).

The <STDIN> operation is used to read from the STDIN file handle for reading from standard input. To read each line one at a time from the standard input <STDIN>, you use a program like this one:

while ($_ = <STDIN>) {
chop($_);
print $_;
}

Because $_ is the default storage area for the last line read in a Perl script, any references to $_ can be removed when implicitly implied. For example, the previous excerpt of code could be written as this:

while (<STDIN>) {
chop;
print;
}

To read complete files by specifying the filename from the command line, you can use the <> operator. For example, the following code reads and prints the contents of the files specified on the command line:

while (<>) {
            print $_;
}

In this way, you are reading all the files specified on the command line and then processing the contents of the files one line at time by simply printing the contents of each. Think of this as an equivalent to saying cat file1 file2 file3 … and so on. The <> is equivalent to <ARGV> where ARGV is either STDIN if no files were specified or the contents of all the files in the order they were specified at the command line.

The Term::Query Module

The previous example for getting the square root of a number is a very simple example of what you normally run into when getting user responses to questions. Your query expects a response of Y for Yes and N for No, but the user's response might be a firm M for Maybe. If you have twenty questions, the last thing you want to do is to have to verify the responses. This is when it's nice to have modules that do the work for you.

Term::Query is a Perl 5 module written by Alan K. Stebbens. The module is used to provide a set of questions, a default response, a set of expected responses per question, and a help string to assist the end user. Not all of these items have to be specified; only the query is required.

If you do not specify a set of expected return values to a query, the module will accept anything as input. On the other hand, if you do specify a set of parameters, the module will validate the responses for you.

The default response to a query can also be set. The default response is displayed between square brackets. If no default is specified, there will be no such response displayed for the user.

Finally, you can specify a help string for the input question. This string is displayed if the user types ? at the prompt. You can disable the display of the help string if you want ? to be an acceptable response to a query. The help messages can also be based on expected input types. There are built-in help messages for some types of input that are displayed even if you do not explicitly specify a help message. The built-in help strings are quite verbose and may be enough for most general cases.

If at any time during the entry and validation process a bizarre response is given, the module can stop and ask the same question again. This capability to ask the same query again until a correct response is received (or the user types the Ctrl+C key combination) is great for ensuring that the right user responses get into your Perl script.

The module itself contains more details about its internal operations. The documenation is located in the module in the Perl 5 "pod" format. You can convert a pod file into a man page with the following command:

pod2man Query.pm | nroff -man - | less

The pod2man code was developed in version 5.001m and requires at least Perl5.001m. This is because the pod2man code uses references in the Carp.pm module to diagnose itself and in the PrintArray.pm module. (Both modules are written by Alan Stebbens.)

Installing the module is easy. First check to see whether you have the module already in your distribution. Go to your /usr/lib/perl5, /usr/local/lib/perl5/site_perl, or /usr/local/lib/perl5 directory (or wherever you have installed Perl) and look for the file Query.pm. The file will most likely be in the directory /usr/lib/perl5/Term.

If you cannot find the file, you can get it from the ftp sites at hubs.ucsb.edu/pub and ikra.com:/pub/perl/modules. Here's a list of the modules you need:

  • Term-Query-1.15.tar.gz for the Term module
  • PrintArray-1.1.tar.gz, a required module for Term

Unzip and untar these files in a place away from the PERLLIBDIR directory.

You have to set the environment variable PERLLIBDIR to either /usr/lib/perl5 or /usr/local/lib/perl5.

Copy the Query.pm file into the $PERLLIBDIR/Term directory. You have to be superuser to do this. Create the directory if you do not already have it. Copy the PrintArray.pm file into the location PERLLIBDIR. You can use the Makefiles that come with the modules, but the copying method has proved to work without having to edit any pathnames in the Makefiles. It's worth taking a look at the test target in the Makefile to see how the regression tests are done in the test directory.

There is one primary subroutine, called query, which is called to process one interaction with the user. The subroutine query() is passed a prompt and some flags, optionally followed by additional arguments, depending on the particular flags. Each flag is a single character and indicates the following values:

  • The input type: integer, real, string, yes/no, keyword, or non-keyword
  • What default input to use in the absence of user input
  • An optional help string to be displayed for errors or input of a question mark (?)
  • Any required input validation, such as regular expression or pattern matching, maximum length, and so on
  • Any use of chop() or white space removal

I'll cover these options with some samples. The following sections describe how you can use the module.

Using the Term::Query Module

Here's the syntax for the call to the query function:

$result = query($prompt, $flags, [optional fields]);

The $prompt string is displayed, and the response entered is interpreted on the value in $flags. The optional fields may be NULL but must be at least as large as required by the flags.

What are these flags and how did they get interpreted by query()? The flags indicate the type or attribute of the value. Each flag may have parameters associated with it. The order in which the flags are listed must be the same order in which the parameters are listed. Therefore, if you list flags rdh, then you'll have two more strings in the argument list in the order of a default string and a help message string.

There are several flags you can use with the Query package. Some of these you have already seen, some are described in Table 16.2. There is more documentation on other esoteric flags included in the module.

Table 16.2. Flags for the interpretation of input variables.
Flag
Interpretation
d
The default response to use if you get no input from the question.
H
Ignores the question mark as a request for help. Treats it as a response to a question.
h
The help string.
i
Accepts Integer input only.
I
Specifies a reference to a function to use instead of read <STDIN>.
k
Specifies a table reference of allowed responses to the question.
K
Specifies a table reference of disallowed responses to the question.
l
Limits the length of the input.
m
Uses the argument as a regular expression for processing responses.
n
Accepts Real or Integer.
N
Requires a Yes/No response only. The default is N.
r
An answer is required at the prompt.
Y
Requires a Yes/No response only. The default is Y.

Now, let's see how the flag ridh is interpreted by the module. The first two flags, r and i, translate to "required, integer value." No extra parameters are needed. The d flag specifies that the next argument ($_[1]) is used as the default value. The h flag specifies that the next argument ($_[2]) is used as the help string for the prompt.

The best way to start is to use an example. A sample script using the query() subroutine is shown in Listing 16.3.


Listing 16.3. Using the query subroutine.
 1    #!perl
 2    #
 3    # A sample usage of the query subroutine.
 4    #
 5    use Term::Query qw( query query_table query_table_set_defaults );
 6    #
 7    # Tell him what happened.
 8    #
 9    sub processReply {
10        my $reply = query @_; # <<<< The call to the query function >>>
11        #
12        # Bail out?
13        #
14        exit if $reply =~ /^\s*(later|bye)\s*$/;
15        printf "You said = [%s]\n",$reply;
16        return $reply;
17    }
18    printf "\n ------------------------------------------ ";
19    printf "\n  Application to join da rest of da boys -- ";
20    printf "\n ------------------------------------------ ";
21    #
22    # This will require a response.
23    #
24    $nameh = &processReply("\nWhazza u name-h?",'rh',
25            'Whazza matah, u too stoopid to fouget yo name-h?');
26    printf "\n Okay $nameh, lemme talk to u about it... \n";
27    #
28    # This subroutine will NOT require a response before proceeding.
29    #
30    $liveh = &processReply("\nWheh you live-h");
31    #
32    # This will only accept a response of Y or N, the default being Y
33    #
34    $wannbe = &processReply("\nU wanna be amobstah?",'Y');
35    #
36    # This will only accept a response of Y or N, the default being N
37    #
38    $house = &processReply("\nU bin to da Big Haus?",'N');
39    #
40    #  This one requires an integer, with a default reply and
41    #  has help text for the question mark.
42    #
43    $iq = &processReply ("\nEnter your IQ:",
44          'ridh',
45           5,           # the default IQ
46          'Whazza matah? Give you shoe size-h');
47    #
48    #  Use a list of keyowrds
49    #
50    $gunnh = &processReply("\nWhat weapon you like?",
51         'rkd', ['GUN','38','lugah','mace','BO'],'GUN');
52    printf "\n Okay $nameh, lemme think about it... \n";

The imports in line 5 are a bit extraneous for this simple example. The line could easily have been rewritten because none of the other three functions in the Query.pm module are being used. Here's the line that would work:

5     use Term::Query qw( query );

Here's the function that processes each reply:

 9     sub processReply {
10        my $reply = query @_;
11                #
12                # Bail out?
13                #
14        exit if $reply =~ /^\s*(later|bye)\s*$/;
15        printf "You said = [%s]\n",$reply;
16        return $reply;

Lines 9 through 16 are a subroutine to process the reply from the first parameter passed in. This subroutine simply calls the query function, checks to see whether the user wants to exit and, if the user is not exiting, prints the reply. The reply is returned back to the caller.

Lines 24 and 25 process replies to the name question. In case the applicant does not know how to answer this one correctly, a help string is provided. The processed reply is echoed in the following print statement:

24    $nameh = &processReply("\nWhazza u name-h?",'rh',
25                            'Whazza matah, u fouget yo name-h?');
26    printf "\n Okay $nameh, lemme talk to u about it... \n";

Line 30 requests input and even accepts a carriage return. No help string is given, nor is there any default response. If only a carriage return is entered, the reply back is set to undef.

30    $liveh = &processReply("\nWheh you live-h");

In Line 34, the $wannabe variable is something like, y, Y, n, or N. The responses in this module are not case-sensitive.

34  $wannbe = &processReply("\nU wanna be amobstah?",'Y');

Line 38 does the reverse of Line 34 in that the default response is No instead of Yes. This is also not case-sensitive. Thus, nO is the same as No is the same as NO.

For case-sensitive comparisions between the response and a known string, set the variable $Query::Case_sensitive to 1. By default, this value is set to 0 for case-insensitive comparisons.

Lines 43 through 46 require an input integer with the ridh flag.

43    $iq = &processReply ("\nEnter your IQ:",
44                  'ridh',
45                   5,           # the default IQ
46                  'Whazza matah? Give you shoe size-h');

In line 50, a list of keywords to use is specified. Only the responses listed in the table are allowed:

50    $gunnh = &processReply("\nWhat weapon you like?",
51         'rkd', ['GUN','38','lugah','mace','BO'],'GUN');
52    printf "\n Okay $nameh, lemme think about it... \n";

The order of flags specified in the input to query is also important because this also sets the order in which input validation is done. All input is validated in the order of the flags. When the first test fails, an error message is displayed and the testing stops:

query "Really format the disk? (yn)", 'NV', \$ans;

To add a long help message, you can use the following example:

$ans = &query("Are you sure?? (yn)",'Nh',<<'MESSAGE');
This is the time to back out. If you answer "y", I will format the
partition in $partitionName, any existing data on the partition will
be lost. If you answer 'no' now, you can back out of the routine to
specify another partition.
MESSAGE

Note the use of the variable $partitionName in the string to print a value.

You can even use regular expressions to specify an input variable collection. Consider Listing 16.4, which uses regular expressions to match incoming words with the m flag and does not allow certain words.


Listing 16.4. Using regular expressions.
 1 #!perl
 2 #
 3 # A sample usage of the query subroutine.
 4 #
 5 use Term::Query qw( query query_table query_table_set_defaults );
 6
 7 $names = "itsy bitsy bambi";
 8
 9 @fields = split(' ',$names); # existing fields
10 $newNode = &query('New node name:','rKmh',\@fields,'^\w+$',<<MSG);
11 Enter a node name to add to the existing list:
12 $names
13 MSG
14
15 $names .= " " . $newNode;
16
17 print "The names are now:\n";
18 print $names . "\n";

Extending the Query.pm Module

There are two other routines in the Query.pm module that allow easier processing of the question/answer sequences:

query_table() This function can be passed an array of arguments that are interactively passed to query(). This is an easy way to get all your answers up front if you do not have to do any processing between responses.
query_table_set_defaults() This can be used on a query table array to cause any mentioned variables to be initialized with any mentioned default values. This is handy for having a single table define variables, default values, and validation criteria for setting new values.

Using query_table()

The query_table() function is useful when you want to collect all the user input at one time without having to do any processing in between inputs. Basically, you pass in list of prompts, flags, and optional arguments to the query_table function. The query_table() function calls query() on the list, collects the responses, and returns them in an array.

Here's the way to use query_table:

@array = query_table( $prompt1, $flags, [ $arguments, ... ],
            $prompt2, $flags, [ $arguments, ... ],
    
            ...
                $promptN, $flags, [ $arguments, ... ] );

There are three items per query: a prompt string, a flags string, and an array of arguments. Note that the syntax specifies the use of the square brackets to show that the arguments array is a variable length array. The array can be empty if no arguments are needed for a set of flags for the entry.

A query-table can be created with a set of variables, their default values, input validation parameters, and help strings. The query_table_set_defaults() subroutine sets the default values in the table. The subroutine query_table() processes each entry in this table to get the responses from the user.

Listing 16.5 contains a sample script using query-table.


Listing 16.5. Using query-table.
 1  #!perl
 2  #
 3  # A sample usage of the query subroutine.
 4  #
 5  use Term::Query qw( query query_table query_table_set_defaults );
 6  #
 7  # Snagged straight out of the test module with this package.
 8  #
 9  sub qa {
10      $ans = query @_;
11      exit if $ans =~ /^\s*(exit|quit|abort)\s*$/;
12      printf "Your Response = \"%s\"\n",(length($ans) ? $ans :
13            defined($ans) ? 'NULL' : 'undef');
14 }
15 @interrogator = ( "What is your name?", 'Vrh',
16       [ 'name', 'Who are you?' ]  ,
17       "What is your age?", 'Vrih',
18       [ 'age', 32 , 'Please be honest are you?' ]  ,
19       "Do you have carrots?", 'VY',
20        [ 'carrots', 'Y'  ]  );
21 #------------------------------------------------------------------
22         # The variables $name, $age and $carrots will be set to default
23         # values (if any).
24         #------------------------------------------
25           query_table_set_defaults  \@interrogator;
26         #
27           foreach $var ( qw( name age carrots ) ) {
28           $val = $$var;
29           print "  \$$var = \"$val\"\n";
30           }
31         #
32         # The variables $name, $age and $carrots will
33         # be set to default (if any) or the response values
34         # from the processing of the query table.
35           $ret = query_table \@interrogator;
36           print "queryTable returned $ret\n";
37         # Echo them out.
38           foreach $var ( qw( name age carrots ) ) {
39           $val = $$var;
40           print "  \$$var = \"$val\"\n";
41           }

With typical usage, given $prompt and $flags, query() prints $prompt and then waits for input from the user. The handling of the response depends on the flag characters given in the $flags string.

In Listing 16.5 the table has three prompt strings and three variables to which the received responses are assigned. See Lines 15 through 20. You can just as easily have 50 entries in the table. This modular procedure makes it easy to set up a series of questions when no processing is required between responses. That is, you can collect your information all at once and then parse the collected information.

Note the use of the V flag on all of the flags in the table in Listing 16.5, lines 17, 19, and 20. The V flag forces the reference to the name of the variable to the level above its current execution level. Therefore, $name, $age, and $carrot are defined in the calling module once the query_table_set_defaults or query_table call is made. Not setting the V flag forces the variable to be local to the Query module itself, and any responses in the named variables are lost.

The Term::Query module is an excellent tool for prompting and collecting user responses to commands. If you would like more detailed information, please read the documentation in the Query.pm module. The author Alan K. Stebbens can be reached via e-mail at aks@hub.ucsb.edu at the College of Engineering, University of California, Santa Barbara.

The Getopts Package

The Getopts package is designed to help you parse the input options into your shell scripts. This package comes standard with the Perl 5 distributions, so you do not have to get it from anywhere.

Options to Perl scripts you write can be sent in one at a time or can be clustered. For example, options x,v, and t can be sent in as -x -v -t, -xvt, -xt -v, and so on. Your script should be able to recognize these options. The type of work involved in this type of option recognition is common enough that more than one module is available for you to work with: for example, the Std.pm module, which is a simple module that recognizes only certain options, and the Long.pm module, which also recognizes the states and default values of options.

Using Std.pm

You use the Std.pm module by including the following line in your shell script:

use Getopt::Std.pm;

The options that you want the module to list are passed in a string of the form xyz. The call to the getopts function then attempts to look for -x, -y, or -z, or a combination of these options in the command-line string. For each option found, it sets the variable $opt_x, $opt_y, or $opt_z with either the value of 1 (for found) or undef (for not found). For example, the following two lines of code set up the command-line options for x, y, or z:

use Getopt::Std.pm;
getopt("xyz");

The returned values for x, y, and z do not have to be 1 or 0. Assigned values can be collected by appending a colon to each option with which you expect to pass a parameter. For example, the following line takes arguments for -f and -c:

getopt("vf:c:");

A sample usage of this module is shown in Listing 16.6.


Listing 16.6. Using the Std.pm module.
 1 #!perl
 2
 3 use Getopt::Std;
 4
 5 $result = getopt('wx:yz');
 6 print "\n Options:\n";
 7 printf " w :-->  $opt_w  \n";
 8 printf " x :-->  $opt_x  \n";
 9 printf " y :-->  $opt_y  \n";
10 printf " z :-->  $opt_z  \n";

Note
The use Getopt::Std; getopt(); call completely replaces the original Perl 4, "require 'getopts.pl'; &Getopts();" statements. The old library is still included for compatibility reasons.

The Long.pm module

The Long.pm module is a bit bigger than the Std.pm module, both in size and in functionality. The primary functional interface to this module is via the GetOptions() function, which is basically a souped-up version of the getopts() function found in the C library. Each description of the options your script is looking for should designate a valid Perl identifier, optionally followed by a specification designating the type of option.

Here's the syntax to use when calling GetOptions():

use Getopt::Long;
$result = GetOptions (name=opt1, name=opt2, .. name=optN);

You should specify the option name because this name is used by Perl to set the variable $opt_name to the value specified by the option. Here are the values for the opt1, opt2,…optN specifiers:

<none>
This option does not take an argument.
!
This option does not take an argument and may be negated.
=s
This option takes a mandatory string argument.
:s
This option takes an optional (:) string argument.
=i
This option takes a mandatory (=) integer argument.
:i
This option takes an optional (:) integer argument.
=f
This option takes a mandatory (=) real number argument.
:f
This option takes an optional (:) real number argument.

Please read the Long.pm file in the subroutine GetOptions() header documentation for details on these options.

For options that do not take an argument, their value will be set to 1 or nothing. Options that do take an optional argument will cause the corresponding variable to be defined in the name space of the module they are being called from. If no value is specified at the command prompt, the value of the variable will be set to the empty string.

Boolean options are also possible. Use ! after the option name to indicate that an option can also be negated. Then, you can negate options of the form html and nohtml. Thus, -html causes the variable $opt_html to be set to 1, and -nohtml causes the variable $opt_html to be set to 0.

Dashes in option names are allowed (ice-cream) but are translated to underscores in the corresponding Perl variable ($ice_cream). A lone dash is translated to the Perl identifier of $opt_. Double dashes (--) by themselves signal the end of the options list to the package. Options that start with -- can have an assignment after them. For example, --topping=nuts.

Examples of Options Settings

Listing 16.7 provides a small example of how to use options. In this example, you can specify a string for the value of the variable $opt_flavor. The variable $opt_vanilla is either 1 or 0, depending on how the -vanilla or -novanilla options are set. The value of -cost, if specified, will be set in $opt_cost. The variable $opt_cost will not be set, because it does not have to be set to a value.


Listing 16.7. Using the Getopts::Long package.
 1 #!perl
 2 use Getopt::Long;
 3
 4 #
 5 # This allows you to specify a string for the flavor,
 6 # either -vanilla or -novanilla
 7 # The value of -cost is a required integer value, but
 8 # -topping value is optional
 9 #
10 $result = GetOptions ('flavor=s','vanilla!', 'cost=i','topping:s');
11
12 printf " flavor :-->  $opt_flavor  \n";
13 printf " vanilla :-->  $opt_vanilla  \n";
14 printf " cost :-->  $opt_cost  \n";
15 printf " topping :-->  $opt_topping  \n";

Here is sample input and output for this listing.

$ test.pl -flavor weird -novanilla -cost 2 -topping nuts
 flavor :-->  weird
 vanilla :-->  0
 cost :-->  2
 topping :--> nuts

$ test.pl -flavor marmalade -vanilla -cost 1 -topping bugs
 flavor :-->  marmalade
 vanilla :-->  1
 cost :-->  1
 topping :-->  bugs

Some important variables, to keep in mind when working with the 29 Getopts:: modules, appear in the following list. Check the Long.pm file itself for other variables not listed here if you need more functionality. You can set one or more of the following variables in your script to get the desired result:

$autoabbrev This allows option names to be uniquely abbreviated. So, if no other variable begins with the first three letters che, the variable chewable would be referred to as che. The default value of this variable is 1. The value of this variable can be overridden by setting the environment variable POSIXLY_CORRECT.
$option_start This is the regular expression of the start of the option identifier character strings. The default value is (--|-|\+); that is, it allows the use of -, --, and even +. If the environment variable POSIXLY_CORRECT is set, the value is set to (--|-), thereby not allowing the + option.
$ignorecase When set, this variable ignores case when matching options. The default value of this variable is 1.
$debug This variable enables debugging output. The default is 0. By setting this value, you can view debugging information.

Summary

By passing command-line options to the Perl interpreter you can control how input is read into a program, whether the code internally can be considered a loop, how input is parsed by changing the end-of-line character, where to look for files, and so on. Perl has several options to control execution. Options can be specified on the command line or in the header comment of the program you are running. An option is simply a dash (-) followed by one or more characters. An option can have parameters too, such as pathnames for where to look for included files, or what to use as the end-of-line character, or what to use in place of a space character, and so on. Options that do not require any parameters can be grouped together behind one dash. The command-line options override values set in the header comment.

Your Perl programs can have their own options as well. For simple options into programs, it's possible to parse incoming arguments manually. In order to get arguments manually, you'd have to start with the skeletal code shown below and then work yourself up adding recognized switches as you went:

foreach (@ARGV) {
last if $_ eq "--";
            $opt_x = 1 if /^-x/;
            $opt_y = 1 if /^-y/;
            $opt_z = 1 if /^-z/;
}

Okay, now how about handling options that are clustered, such as -xyz? This is where these packages help you out. Obviously, using the Getopts package is far more efficient and easier to use when it comes to parsing the command-line arguments. Then, after you have read the options, the Query.pm module can help with the user interaction.

This chapter has also been a very quick introduction to using Perl modules to handle interactive user input and command-line arguments. Using the Term::Query module, you can set up one or more query/response prompt and reply strings. You can use tables to automate the interrogations. The code to handle responses from the user can be set to verify or accept the user response. The Getopt::Std and Getopt::Long packages can be used to pick up arguments specified on the command line.

Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье