Serg Iakovlev

 Chapter 4. Content Handlers
  Первый хэндлер , который начинает обрабатывать внешний запрос - mod_alias .
 Если он почему-то не проинсталлирован , что вряд ли , тогда  http_core хэндлер.
 Затем апач разбивает полученный оттранслированный путь на 2 части :
 "filename" и "additional path information" .
 Пусть рутовый каталог есть /home/www и получен запрос на URI /abc/def/ghi.
 Процесс трансляции можно отследить по следующей таблице :
 
 
 
 Physical Directory
 
 
 Translated Filename
 
 
 Additional Path Information
 
 
 
 
 /home/www
 
 
 /home/www/abc
 
 
 /def/ghi 
 
 
 
 
 /home/www/abc
 
 
 /home/www/abc/def
 
 
 /ghi 
 
 
 
 
 /home/www/abc/def
 
 
 /home/www/abc/def/ghi
 
 
 empty
 
 
 
 
 /home/www/abc/def/ghi
 
 
 /home/www/abc/def/ghi
 
 
 empty
 
 
 
 Затем начинает работать handler mod_mime , который определяет , является ли 
 расширение файла типом MIME .
 
 В конфиге с помощью директивы AddHandler можно хэндлер привязать 
 к обработке конкретного расширения :
 	AddHandler server-parsed .shtml
 Сам хэндлер определен в mod_include .
 Директива SetHandler используется внутри директив <Directory>, <Location>, <Files>
 для привязки хэндлера к конкретному виртуальному каталогу :
 	<Location /shtml> 
 	  SetHandler server-parsed 
 	</Location>
 	 
 	<Files map-*> 
 	  SetHandler imap-file 
 	</Files>
 Кстати , директивы  AddHandler и SetHandler реализованы не в самом ядре апача ,
 а в модуле mod_actions.
 
 При использовании mod_perl нужно использовать комбинацию из 2-х директив - 
 SetHandler и PerlHandler :
 	<Location /graph> 
 	   SetHandler  perl-script 
 	   PerlHandler Apache::Graph 
 	</Location>	
 
 Если нужно привязать Perl content handler к расширению , используется директива <Files>: 
 	<Files ~ "\.graph$"> 
 	   SetHandler  perl-script 
 	   PerlHandler Apache::Graph 
 	</Files>
 
 Для отмены действия SetHandler можно написать в конфиге :
 	<Location /graph/tutorial>
 	   SetHandler default-handler
 	</Location>	
 
 Добавление футера на веб-странице 
  Для демонстрации работы контент-хэндлера разработаем модуль , который добавляет футер
 на все страницы данного каталога . 
 Пример 4-1 включает код Apache::Footer . Рассмотрим его подробнее :
 	package Apache::Footer; 
 	use strict; 
 	use Apache::Constants qw(:common); 
 	use Apache::File ();
 Прагма use strict защищает модуль от таких ошибок , как конфликт имен переменных ,
 неправильный синтаксис написания функций без круглых скобок и т.д.
 Собственно функция handler и генерит контент :
 	sub handler {
 	    my $r = shift; 
 	    return DECLINED unless $r->content_type() eq 'text/html';
 	    my $file = $r->filename;
 	  
 	    unless (-e $r->finfo) { 
 	       $r->log_error("File does not exist: $file");     
 	       return NOT_FOUND; 
 	    } 
 	    unless (-r _) { 
 	       $r->log_error("File permissions deny access: $file");    
 	       return FORBIDDEN; 
 	    }
 Сначала мы получаем имя файла , потом проверяем его наличие и право на обработку .
 Получим время изменения файла :
 	    my $modtime = localtime((stat _)[9]);
 Пытаемся открыть файл , и если что , возвращаем 500-ю ошибку :
 	    my $fh; 
 	    unless ($fh = Apache::File->new($file)) { 
 	       $r->log_error("Couldn't open $file for reading: $!"); 
 	       return SERVER_ERROR; 
 	    }
 Строим футер :
 	    my $footer = <<END; 
 	<hr> 
 	&copy; 2001 <a href="http://www.ora.com/">O'Reilly &amp; Associates</a><br> 
 	<em>Last Modified: $modtime</em> 
 	END
 Переписываем документ :
 	    $r->send_http_header; 
 	     
 	    while (<$fh>) { 
 	       s!(</BODY>)!$footer$1!oi; 
 	    } continue { 
 	       $r->print($_); 
 	    }	
 В конце перлу нужно послать сигнал о том , что все прошло корректно :
 	    return OK; 
 	} 
 	  
 	1;
 
 Полный вариант :
 	package Apache::Footer; 
 	# file: Apache/Footer.pm 
 	 
 	use strict; 
 	use Apache::Constants qw(:common); 
 	use Apache::File (); 
 	 
 	sub handler { 
 		my $r = shift; 
 		return DECLINED unless $r->content_type() eq 'text/html';
 	 
 		my $file = $r->filename;
 	 
 		unless (-e $r->finfo) { 
 		$r->log_error("File does not exist: $file");     
 		return NOT_FOUND; 
 		} 
 		unless (-r _) { 
 		$r->log_error("File permissions deny access: $file");    
 		return FORBIDDEN; 
 		} 
 	 
 		my $modtime = localtime((stat _)[9]);
 	 
 		my $fh; 
 		unless ($fh = Apache::File->new($file)) { 
 		$r->log_error("Couldn't open $file for reading: $!"); 
 		return SERVER_ERROR; 
 		} 
 		my $footer = <<END; 
 	<hr> 
 	&copy; 2001 <a href=">http://www.ora.com/">O'Reilly &amp; Associates</a><br> 
 	<em>Last Modified: $modtime</em> 
 	END 
 	 
 		$r->send_http_header; 
 	     
 		while (<$fh>) { 
 		s!(</BODY>)!$footer$1!oi; 
 		} continue { 
 		$r->print($_); 
 		} 
 	 
 		return OK; 
 	} 
 	 
 	1; 
 	_    _END_    _
 
 Если все файлы лежат в одном каталоге , то футер можно прописать в конфиге так :
 	<Location /footer> 
 	  SetHandler perl-script 
 	  PerlHandler Apache::Footer 
 	</Location>	
 
 Если файлы раскиданы по дереву , прикрутить Apache::Footer можно с помощью 
 уникального расширения .footer :
 	AddType text/html .footer 
 	<Files ~ "\.footer$"> 
 	   SetHandler  perl-script 
 	   PerlHandler Apache::Footer 
 	</Files>
 
 
 Реализовать футер можно также с помощью .htaccess . 
 При этом не нужно перезагружать сервер :   
 	SetHandler perl-script 
 	PerlHandler Apache::Footer
 
 SSI
 
  Предыдущий вариант футера имеет один недостаок - для того , чтобы изменить футер ,
 нужно изменить код , что ни есть хорошо . Подобная технология еще называется hardcode .
 Более гибким методом является использование модуля Apache::Sandwich.
 Еще более мощным инструментом является SSI. SSI - это фрагмент кода внутри HTML ,
 обрамленный комментарием . Например , текущие время и дата с помощью SSI :  
 	Today is <!--#echo var="DATE_LOCAL"-->.
 Мы будем использовать mod_perl для разработки наших собственных SSI . 
 Базовая идея :  
 	<!--#DIRECTIVE PARAM1 PARAM2 PARAM3 PARAM4...-->
 Имеется директива и набор параметров . Можно использовать одинарные и двойные кавычки .
 Директива грузится не из модуля , а из конфига .
 Директива :  
 	sub HELLO { "Hello World!"; }
 SSI:  
 	I said <!--#HELLO-->  
 Следующая более сложная подпрограмма возвращает хидер из внешнего запроса :
 	sub HTTP_HEADER { 
 	  my ($r,$field) = @_; 
 	  $r->header_in($field); 
 	}
 SSI :
 	You are using the browser <!-- #HTTP_HEADER User-Agent -->.
 
 Пример HTML-файла с SSI :
 	<html> <head> <title>Server-Side Includes</title></head> 
 	<body bgcolor=white> 
 	<h1>Server-Side Includes</h2> 
 	This is some straight text.<p>
 	This is a "<!-- #HELLO -->" include.<p>
 	The file size is <strong><!-- #FSIZE --></strong>, and it was 
 	last modified on <!-- #MODTIME %x --><p>
 	Today is <!-- #DATE "%A, in <em>anno domini</em> %Y"-->.<p>
 	The user agent is <em><!--#HTTP_HEADER User-Agent--></em>.<p>
 	Oops: <!--#OOPS 0--><p>
 	Here is an included file: 
 	<pre> 
 	<!--#INCLUDE /include.txt 1--> 
 	</pre>
 	<!--#FOOTER--> 
 	</body> </html>
 
 
 Реализация SSI выполняется с помощью модуля Apache::ESSI :
 	package Apache::ESSI; 
 	use strict; 
 	use Apache::Constants qw(:common); 
 	use Apache::File (); 
 	use Text::ParseWords qw(quotewords); 
 	my (%MODIFIED, %SUBSTITUTION);
 	sub handler { 
 	    my $r = shift; 
 	    $r->content_type() eq 'text/html' || return DECLINED; 
 	    my $fh = Apache::File->new($r->filename) || return DECLINED; 
 	    my $sub = read_definitions($r)    || return SERVER_ERROR; 
 	    $r->send_http_header; 
 	    $r->print($sub->($r, $fh)); 
 	    return OK; 
 	}
 	sub read_definitions {
 	    my $r = shift; 
 	    my $def = $r->dir_config('ESSIDefs'); 
 	    return unless $def; 
 	    return unless -e ($def = $r->server_root_relative($def));
  Вначале read_definitions( ) возвращает путь к файлу , в котором находятся 
  определения SSI .
 В данном случае это переменная ESSIDefs , определенная в конфиге 
 с помощью PerlSetVar-директивы.
 Путь к этому файлу может быть относительным или абсолютным .
 Путь передается методу server_root_relative( ).
 Затем мы проверяем кэш и время изменения файла . Используются 2 хэша :
 %SUBSTITUTION хранит откомпилированный код и %MODIFIED хранит дату .
 Компиляция SSI-функции :
 	    my $package = join "::", _    _PACKAGE_    _, $def; 
 	    $package =~ tr/a-zA-Z0-9_/_/c;
 	    eval "package $package; do '$def'"; 
 	    if($@) { 
 	       $r->log_error("Eval of $def did not return true: $@"); 
 	       return; 
 	    }
 	    $SUBSTITUTION{$def} = sub {  
 	        do_substitutions($package, @_); 
 	    }; 
 	    $MODIFIED{$def} = -M $def;  # store modification date 
 	    return $SUBSTITUTION{$def}; 
 	}
 Результат компиляции сохраняем в %SUBSTITUTION , изменяем %MODIFIED.
 Добавляем содержимое переменной $package в do_substitutions( )
 	sub do_substitutions { 
 	    my $package = shift; 
 	    my($r, $fh) = @_; 
 	    # Make sure that eval() errors aren't trapped. 
 	    local $SIG{_    _WARN_    _}; 
 	    local $SIG{_    _DIE_    _}; 
 	    local $/; #slurp $fh 
 	    my $data = <$fh>; 
 	    $data =~ s/<!--\s*\#(\w+)  # start of a function name 
 	               \s*(.*?)        # optional parameters 
 	               \s*-->          # end of comment 
 	              /call_sub($package, $1, $r, $2)/xseg; 
 	    $data; 
 	}
 
 Полный пример :
 	package Apache::ESSI; 
 	# file: Apache/ESSI.pm
 	use strict; 
 	use Apache::Constants qw(:common); 
 	use Apache::File (); 
 	use Text::ParseWords qw(quotewords); 
 	my (%MODIFIED, %SUBSTITUTION); 
 	sub handler { 
 		my $r = shift; 
 		$r->content_type() eq 'text/html' || return DECLINED; 
 		my $fh = Apache::File->new($r->filename) || return DECLINED; 
 		my $sub = read_definitions($r)    || return SERVER_ERROR; 
 		$r->send_http_header; 
 		$r->print($sub->($r, $fh)); 
 		return OK; 
 	} 
 	sub read_definitions { 
 my $r = shift; 
 my $def = $r->dir_config('ESSIDefs'); 
 return unless $def; 
 return unless -e ($def = $r->server_root_relative($def)); 
 return $SUBSTITUTION{$def} if $MODIFIED{$def} && $MODIFIED{$def} <= -M _; 
 my $package = join "::", _    _PACKAGE_    _, $def; 
 $package =~ tr/a-zA-Z0-9_/_/c; 
 
 eval "package $package; do '$def'"; 
 if($@) { 
 $r->log_error("Eval of $def did not return true: $@"); 
 return; 
 } 
 $SUBSTITUTION{$def} = sub {  
         do_substitutions($package, @_); 
 }; 
 $MODIFIED{$def} = -M $def;  # store modification date 
 return $SUBSTITUTION{$def}; 
 	}
 	sub do_substitutions { 
 		my $package = shift; 
 		my($r, $fh) = @_; 
 		# Make sure that eval() errors aren't trapped. 
 		local $SIG{_    _WARN_    _}; 
 		local $SIG{_    _DIE_    _}; 
 		local $/; #slurp $fh 
 		my $data = <$fh>; 
 		$data =~ s/<!--\s*\#(\w+) # start of a function name 
 				\s*(.*?)         # optional parameters 
 				\s*-->           # end of comment 
 				/call_sub($package, $1, $r, $2)/xseg; 
 		$data; 
 	} 
 	sub call_sub { 
 		my($package, $name, $r, $args) = @_;  
 		my $sub = \&{join '::', $package, $name}; 
 		$r->chdir_file; 
 		my $res = eval { $sub->($r, quotewords('[ ,]',0,$args)) }; 
 		return "<em>[$@]</em>" if $@; 
 		return $res; 
 	} 
 	1; 
 	_    _END_    _
 
 
  Файл определения SSI-функций :
 	# This file is require'd, and therefore must end with 
 	# a true value. 
 	use Apache::File (); 
 	use Apache::Util qw(ht_time size_string); 
 	# insert the string "Hello World!" 
 	sub HELLO { 
 		my $r = shift; 
 		"Hello World!"; 
 	} 
 	# insert today's date possibly modified by a strftime() format 
 	# string 
 	sub DATE { 
 		my ($r,$format) = @_; 
 		return scalar(localtime) unless $format; 
 		return ht_time(time, $format, 0); 
 	} 
 	# insert the modification time of the document, possibly modified 
 	# by a strftime() format string. 
 	sub MODTIME { 
 		my ($r,$format) = @_; 
 		my $mtime = (stat $r->finfo)[9]; 
 		return localtime($mtime) unless $format; 
 		return ht_time($mtime, $format, 0); 
 	} 
 	# insert the size of the current document 
 	sub FSIZE { 
 		my $r = shift; 
 		return size_string -s $r->finfo; 
 	} 
 	# divide 10 by the argument (used to test runtime error trapping) 
 	sub OOPS { 10/$_[1]; } 
 	# insert a canned footer 
 	sub FOOTER { 
 		my $r = shift; 
 		my $modtime = MODTIME($r); 
 		return <<END; 
 	<hr> 
 &copy; 2001 <a href="http://www.ora.com/">O'Reilly &amp; Associates</a><br> 
 	<em>Last Modified: $modtime</em> 
 	END 
 	} 
 	# insert the named field from the incoming request 
 	sub HTTP_HEADER { 
 		my ($r,$h) = @_; 
 		$r->header_in($h); 
 	} 
 	#ensure that path is relative, and does not contain ".."  
 	sub is_below_only { $_[0] !~ m:(^/|(^|/)\.\.(/|$)): } 
 	# Insert the contents of a file.  If the $virtual flag is set 
 	# does a document-root lookup, otherwise treats filename as a 
 	# physical path. 
 	sub INCLUDE { 
 		my ($r,$path,$virtual) = @_; 
 		my $file; 
 		if($virtual) { 
 		$file = $r->lookup_uri($path)->filename; 
 		} 
 		else { 
 		unless(is_below_only($path)) { 
 			die "Can't include $path\n"; 
 		} 
 		$file = $path; 
 		} 
 		my $fh = Apache::File->new($file) || die "Couldn't open $file: $!\n"; 
 		local $/; 
 		return <$fh>; 
 	} 
 	1;
 
 Для нормальной работы SSI необходимы также Apache Embperl  и ePerl пакеты .


 Динамический бар навигации
 
 
 
 Многие сайты используют навигационный бар для помощи пользователям.
 Обычно там присутствует ссылки и кнопки в форме картинок.
 На некоторых сайтах используются скрипты для придания кнопочного эффекта
  "rollover" , когда кнопка меняет цвет при наведении мыши.
 Если содержимое сайта динамически меняется , то соответственное добавление бара 
 ко вновь добавленной странице может превратиться в проблему,
 поскольку прийдется постоянно апгрэйдить ее содержимое.
  
 
 
 
 Мы используем Apache content handlers.
 Мы разработаем бар-модуль под названием Apache::NavBar.
 При активации , этот модуль автоматом добавит бар ко всем страницам на сайте.
 На нем будет размещаться линк.
 (Figure 4-3).  
 
 
 Бар будет строиться динамически с помощью конфига.
 Например , как это делается на сайте http://stein.cshl.org:
 
 # Конфиг 
 /index.html             Home 
 /jade/                  Jade 
 /AcePerl/               AcePerl 
 /software/boulder/      BoulderIO 
 /software/WWW/          WWW 
 /linux/                 Linux
 
 
 
 Справа в конфиге - 6 значений : "Home," "Jade," "AcePerl," "BoulderIO," "WWW," , "Linux"
 (имена пакетов). Слева - соответствующие урлы.
 
 
 
Пример 4-6 
 дает полный код для Apache::NavBar. 
 В конце примера - perl.conf 
 (или httpd.conf если хотите), 
 который и активирует этот бар. 
 
 package Apache::NavBar; 
 # file Apache/NavBar.pm 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Apache::File (); 
   
 my %BARS = (); 
 my $TABLEATTS   = 'WIDTH="100%" BORDER=1'; 
 my $TABLECOLOR  = '#C8FFFF'; 
 my $ACTIVECOLOR = '#FF0000'; 
   
 
 
 The preamble brings in the usual modules and defines
 some constants that will be used later in the code. Among the constants
 are ones that control the color and size of the navigation bar.
 
  sub handler { 
     my $r = shift; 
     my $bar = read_configuration($r) || return DECLINED;
 
 
 The handler( ) function starts by calling an internal function named read_configuration( ),
 which, as its name implies, parses the navigation bar configuration
 file. If successful, the function returns a custom-designed NavBar
 object that implements the methods we need to build the navigation bar
 on the fly. As in the server-side includes example, we cache NavBar objects in the package global %BARS and only re-create them when the configuration file changes. The cache logic is all handled internally by read_configuration( ).   
 
 
 If, for some reason, read_configuration( ) returns an undefined value, we decline the transaction by returning DECLINED. Apache will display the page, but the navigation bar will be missing. 
 
     $r->content_type eq 'text/html'  || return DECLINED; 
     my $fh = Apache::File->new($r->filename) || return DECLINED;
 
 
 As in the server-side include example, we check the MIME type of the requested file. If it isn't of type text/html, then we can't add a navigation bar to it and we return DECLINED to let Apache take its default actions. Otherwise, we attempt to open the file by calling Apache::File 's new( ) method. If this fails, we again return DECLINED to let Apache generate the appropriate error message.  
 
     my $navbar = make_bar($r, $bar);
 
 
 Having successfully processed the configuration file and opened the requested file, we call an internal subroutine named make_bar( )
 to create the HTML text for the navigation bar. We'll look at this
 subroutine momentarily. This fragment of HTML is stored in a variable
 named $navbar.  
 
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     local $/ = ""; 
     while (<$fh>) { 
        s:(</BODY>):$navbar$1:i; 
        s:(<BODY.*?>):$1$navbar:si; 
     } continue {  
        $r->print($_);  
     } 
   
     return OK; 
 }
 
 
 The remaining code should look familiar. We send the
 HTTP header and loop through the text in paragraph-style chunks looking
 for all instances of the <BODY> and </BODY> tags. When we
 find either tag we insert the navigation bar just below or above it. We
 use paragraph mode (by setting $/ to the empty string) in order to catch documents that have spread the initial <BODY> tag among multiple lines. 
 
 sub make_bar { 
     my($r, $bar) = @_; 
     # create the navigation bar 
     my $current_url = $r->uri; 
     my @cells;
 
 
 The make_bar( ) function is
 responsible for generating the navigation bar HTML code. First, it
 recovers the current document's URI by calling the Apache request
 object's uri( ) method. Next, it calls $bar->urls( ) to fetch the list of partial URIs for the site's major areas and iterates over the areas in a for( ) loop: 
 
    for my $url ($bar->urls) { 
       my $label = $bar->label($url); 
       my $is_current = $current_url =~ /^$url/; 
       my $cell = $is_current ? 
           qq(<FONT COLOR="$ACTIVECOLOR">$label</FONT>) 
               : qq(<A HREF="$url">$label</A>); 
       push @cells,  
       qq(<TD CLASS="navbar" ALIGN=CENTER BGCOLOR="$TABLECOLOR">$cell</TD>\n); 
    }
 
 
 For each URI, the code fetches its human-readable label by calling $bar->label(  )
 and determines whether the current document is part of the area using a
 pattern match. What happens next depends on whether the current
 document is part of the area or not. In the former case, the code
 generates a label enclosed within a <FONT> tag with the COLOR
 attribute set to red. In the latter case, the code generates a
 hypertext link. The label or link is then pushed onto a growing array
 of HTML table cells. 
 
     return qq(<TABLE $TABLEATTS><TR>@cells</TR></TABLE>\n); 
 }
 
 
 At the end of the loop, the code incorporates the table cells into a one-row table and returns the HTML to the caller. 
 
 
 We next look at the read_configuration( ) function: 
 
 sub read_configuration { 
     my $r = shift; 
     my $conf_file; 
     return unless $conf_file = $r->dir_config('NavConf'); 
     return unless -e ($conf_file = $r->server_root_relative($conf_file));
 
 
 Potentially there can be several configuration files,
 each one for a different part of the site. The path to the
 configuration file is specified by a per-directory Perl configuration
 variable named NavConf. We retrieve the path to the configuration file with dir_config( ), convert it into an absolute path name with server_root_relative( ), and test that the file exists with the -e operator. 
 
     my $mod_time = (stat _)[9]; 
     return $BARS{$conf_file} if $BARS{$conf_file}  
       && $BARS{$conf_file}->modified >= $mod_time; 
     return $BARS{$conf_file} = NavBar->new($conf_file); 
 }
 
 
 Because we don't want to reparse the configuration each time we need it, we cache the NavBar object in much the same way we did with the server-side include example. Each NavBar object has a modified( ) method that returns the time that its configuration file was modified. The NavBar objects are held in a global cache named %BARS and indexed by the name of the configuration files. The next bit of code calls stat( ) to return the configuration file's modification time--notice that we can stat( ) the _ filehandle because the foregoing -e operation will have cached its results. We then check whether there is already a ready-made NavBar
 object in the cache, and if so, whether its modification date is not
 older than the configuration file. If both tests are true, we return
 the cached object; otherwise, we create a new one by calling the NavBar new( ) method.  
 
 
 You'll notice that we use a different technique for finding the modification date here than we did in Apache::ESSI (Example 4-3). In the previous example, we used the -M
 file test flag, which returns the relative age of the file in days
 since the Perl interpreter was launched. In this example, we use stat( )
 to determine the absolute age of the file from the filesystem
 timestamp. The reason for this will become clear later, when we modify
 the module to handle If-Modified-Since caching.
 
 
 Toward the bottom of the example is the definition for the NavBar class. It defines three methods named new( ), urls( ), and label( ) : 
 
 package NavBar;
  
 # create a new NavBar object 
 sub new { 
     my ($class,$conf_file) = @_; 
     my (@c,%c); 
     my $fh = Apache::File->new($conf_file) || return; 
     while (<$fh>) { 
        chomp; 
        s/^\s+//; s/\s+$//;   # fold leading and trailing whitespace 
        next if /^#/ || /^$/; # skip comments and empty lines 
        next unless my($url, $label) = /^(\S+)\s+(.+)/; 
        push @c, $url;     # keep the url in an ordered array 
        $c{$url} = $label; # keep its label in a hash 
     } 
     return bless {'urls'  => \@c, 
                  'labels' => \%c, 
                  'modified' => (stat $conf_file)[9]}, $class; 
 }
 
 
 The new( ) method is called to parse a configuration file and return a new NavBar
 object. It opens up the indicated configuration file, splits each row
 into the URI and label parts, and stores the two parts into a hash.
 Since the order in which the various areas appear in the navigation bar
 is significant, this method also saves the URIs to an ordered array. 
 
 # return ordered list of all the URIs in the navigation bar 
 sub urls  { return @{shift->{'urls'}}; } 
   
 # return the label for a particular URI in the navigation bar 
 sub label { return $_[0]->{'labels'}->{$_[1]} || $_[1]; } 
   
 # return the modification date of the configuration file 
 sub modified { return $_[0]->{'modified'}; } 
   
 1;
 
 
 The urls( ) method returns the ordered list of areas, and the label( ) method uses the NavBar object's hash to return the human-readable label for the given URI. If none is defined, it just returns the URL. modified( ) returns the modification time of the configuration file. 
 
 
 
 Example 4-6: 
 A Dynamic Navigation Bar  
 
 
 package Apache::NavBar; 
 # file Apache/NavBar.pm 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Apache::File (); 
   
 my %BARS = (); 
 my $TABLEATTS   = 'WIDTH="100%" BORDER=1'; 
 my $TABLECOLOR  = '#C8FFFF'; 
 my $ACTIVECOLOR = '#FF0000'; 
   
 sub handler { 
     my $r = shift; 
     my $bar = read_configuration($r) || return DECLINED; 
     $r->content_type eq 'text/html'  || return DECLINED; 
     my $fh = Apache::File->new($r->filename) || return DECLINED; 
     my $navbar = make_bar($r, $bar); 
      
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     local $/ = ""; 
     while (<$fh>) { 
        s:(</BODY>):$navbar$1:oi; 
        s:(<BODY.*?>):$1$navbar:osi; 
     } continue {  
        $r->print($_);  
     } 
   
     return OK; 
 } 
   
 sub make_bar { 
     my($r, $bar) = @_; 
     # create the navigation bar 
     my $current_url = $r->uri; 
     my @cells; 
     for my $url ($bar->urls) { 
        my $label = $bar->label($url); 
        my $is_current = $current_url =~ /^$url/; 
        my $cell = $is_current ? 
            qq(<FONT COLOR="$ACTIVECOLOR">$label</FONT>) 
                : qq(<A HREF="$url">$label</A>); 
        push @cells,  
        qq(<TD CLASS="navbar" ALIGN=CENTER BGCOLOR="$TABLECOLOR">$cell</TD>\n); 
     } 
     return qq(<TABLE $TABLEATTS><TR>@cells</TR></TABLE>\n); 
 } 
   
 # read the navigation bar configuration file and return it as a hash.
 sub read_configuration { 
     my $r = shift; 
     my $conf_file; 
     return unless $conf_file = $r->dir_config('NavConf'); 
     return unless -e ($conf_file = $r->server_root_relative($conf_file)); 
     my $mod_time = (stat _)[9]; 
     return $BARS{$conf_file} if $BARS{$conf_file}  
       && $BARS{$conf_file}->modified >= $mod_time; 
     return $BARS{$conf_file} = NavBar->new($conf_file); 
 } 
   
 package NavBar;
   
 # create a new NavBar object 
 sub new { 
     my ($class,$conf_file) = @_; 
     my (@c,%c); 
     my $fh = Apache::File->new($conf_file) || return; 
     while (<$fh>) { 
        chomp; 
        s/^\s+//; s/\s+$//;   # fold leading and trailing whitespace 
        next if /^#/ || /^$/; # skip comments and empty lines 
        next unless my($url, $label) = /^(\S+)\s+(.+)/; 
        push @c, $url;     # keep the url in an ordered array 
        $c{$url} = $label; # keep its label in a hash 
     } 
     return bless {'urls' => \@c, 
                  'labels' => \%c, 
                  'modified' => (stat $conf_file)[9]}, $class; 
 } 
   
 # return ordered list of all the URIs in the navigation bar 
 sub urls  { return @{shift->{'urls'}}; } 
   
 # return the label for a particular URI in the navigation bar 
 sub label { return $_[0]->{'labels'}->{$_[1]} || $_[1]; } 
   
 # return the modification date of the configuration file 
 sub modified { return $_[0]->{'modified'}; } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 A configuration file section to go with Apache::NavBar might read:
 
 <Location /> 
   SetHandler  perl-script 
   PerlHandler Apache::NavBar 
   PerlSetVar  NavConf conf/navigation.conf 
 </Location>
 
 
 Because so much of what Apache::NavBar and Apache:ESSI
 do is similar, you might want to merge the navigation bar and
 server-side include examples. This is just a matter of cutting and
 pasting the navigation bar code into the server-side function
 definitions file and then writing a small stub function named NAVBAR( ).
 This stub function will call the subroutines that read the
 configuration file and generate the navigation bar table. You can then
 incorporate the appropriate navigation bar into your pages anywhere you
 like with an include like this one: 
 
 <!--#NAVBAR-->
 
 
 Handling If-Modified-Since 
 
 
 One of us (Lincoln) thought the virtual navigation bar
 was so neat that he immediately ran out and used it for all documents
 on his site. Unfortunately, he had some pretty large (>400 MB) files
 there, and he soon noticed something interesting. Before installing the
 navigation bar handler, browsers would cache the large HTML files
 locally and only download them again when they had changed. After
 installing the handler, however, the files were always downloaded. What
 happened? 
 
 
 When a browser is asked to display a document that it
 has cached locally, it sends the remote server a GET request with an
 additional header field named If-Modified-Since. The request looks something like this: 
 
 GET /index.html HTTP/1.0 
 If-Modified-Since: Tue, 24 Feb 1998 11:19:03 GMT 
 User-Agent: (etc. etc. etc.)
 
 The server will compare the document's current
 modification date to the time given in the request. If the document is
 more recent than that, it will return the whole document. Otherwise,
 the server will respond with a 304 "not modified" message and the
 browser will display its cached copy. This reduces network bandwidth
 usage dramatically. 
 
 
 When you install a custom content handler, the If-Modified-Since mechanism no longer works unless you implement it. In fact, you can generally ignore If-Modified-Since
 because content handlers usually generate dynamic documents that change
 from access to access. However, in some cases the content you provide
 is sufficiently static that it pays to cache the documents. The
 navigation bar is one such case because even though the bar is
 generated dynamically, it rarely changes from day to day. 
 
 
 In order to handle If-Modified-Since
 caching, you have to settle on a definition for the document's most
 recent modification date. In the case of a static document, this is
 simply the modification time of the file. In the case of composite
 documents that consist equally of static file content and a dynamically
 generated navigation bar, the modification date is either the time that
 the HTML file was last changed or the time that the navigation bar
 configuration file was changed, whichever happens to be more recent.
 Fortunately for us, we're already storing the configuration file's
 modification date in the NavBar object, so finding this aggregate modification time is relatively simple. 
 
 
 To use these routines, simply add the following just before the call to $r->send_http_header in the handler( ) subroutine:
 
 $r->update_mtime($bar->modified);
 $r-.set_last_modified;
 my $rc = $r-> meets_conditions
 return $rc unless $rc == OK;
 
 
 We first call the update_mtime( )
 function with the navigation bar's modification date. This function
 will compare the specified date with the modification date of the
 request document and update the request's internal mtime field to the most recent of the two. We then call set_last_modified( ) to copy the mtime field into the outgoing Last-Modified header. If a synthesized document depends on several configuration files, you should call update_mtime( ) once for each configuration file, followed by set_last_modified( ) at the very end.
 
 
 The complete code for the new and improved Apache::NavBar, with the If-Modified-Since improvements, can be found at this book's companion web site. 
 
 
 If you think carefully about this module, you'll see
 that it still isn't strictly correct. There's a third modification date
 that we should take into account, that of the module source code
 itself. Changes to the source code may affect the appearance of the
 document without changing the modification date of either the
 configuration file or the HTML file. We could add a new update_mtime( ) with the modification time of the Apache::NavBar module, but then we'd have to worry about modification times of libraries that Apache::NavBar depends on, such as Apache::File.
 This gets hairy very quickly, which is why caching becomes a moot issue
 for any dynamic document much more complicated than this one. See "The
 Apache::File Class" in Chapter 9, Perl API Reference Guide, for a complete rundown of the methods that are available to you for controlling HTTP/1.1 caching.
 
 
 Sending Static Files 
 
 
 If you want your content handler to send a file through
 without modifying it, the easiest way is to let Apache do all the work
 for you. Simply return DECLINED from your
 handler (before you send the HTTP header or the body) and the request
 will fall through to Apache's default handler. This is a lot easier,
 not to mention faster, than opening up the file, reading it line by
 line, and transmitting it unchanged. In addition, Apache will
 automatically handle a lot of the details for you, first and foremost
 of which is handling the If-Modified-Since header and other aspects of client-side caching. 
 
 
 If you have a compelling reason to send static files manually, see Using Apache::File to Send Static Files in Chapter 9 for a full description of the technique. Also see "Redirection,"
 later in this chapter, for details on how to direct the browser to
 request a different URI or to make Apache send the browser a different
 document from the one that was specifically requested. 
 
 
 Virtual Documents 
 
 
 The previous sections of this chapter have been
 concerned with transforming existing files. Now we turn our attention
 to spinning documents out of thin air. Despite the fact that these two
 operations seem very different, Apache content handlers are responsible
 for them both. A content handler is free to ignore the translation of
 the URI that is passed to it. Apache neither knows nor cares that the
 document produced by the content handler has no correspondence to a
 physical file. 
 
 
 We've already seen an Apache content handler that produces a virtual document. Chapter 2, A First Module, gave the code for Apache::Hello, an Apache Perl module that produces a short HTML document. For convenience, we show it again in Example 4-7.
 This content handler is essentially identical to the previous content
 handlers we've seen. The main difference is that the content handler
 sets the MIME content type itself, calling the request object's content_type( ) method to set the MIME type to type text/html.
 This is in contrast to the idiom we used earlier, where the handler
 allowed Apache to choose the content type for it. After this, the
 process of emitting the HTTP header and the document itself is the same
 as we've seen before. 
 
 
 After setting the content type, the handler calls send_http_header( ) to send the HTTP header to the browser, and immediately exits with an OK status code if header_only( ) returns true (this is a slight improvement over the original Chapter 2 version of the program). We call get_remote_host( )
 to get the DNS name of the remote host machine, and incorporate the
 name into a short HTML document that we transmit using the request
 object's print( ) method. At the end of the handler, we return OK. 
 
 
 There's no reason to be limited to producing virtual
 HTML documents. You can just as easily produce images, sounds, and
 other types of multimedia, provided of course that you know how to
 produce the file format that goes along with the MIME type. 
 
 
 
 Example 4-7: 
 "Hello World" Redux  
 
 
 package Apache::Hello; 
 # file: Apache/Hello.pm
   
 use strict; 
 use Apache::Constants qw(:common);
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK unless $r->header_only; 
     my $host = $r->get_remote_host; 
     $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $host</h2> 
 "Hello world" is a terribly overused phrase in programming books, 
 don't you think? 
 </BODY> 
 </HTML> 
 END 
     return OK; 
 }
   
 1;
 
 
 
 
 
 Redirection 
 
 
 Instead of synthesizing a document, a content handler
 has the option of redirecting the browser to fetch a different URI
 using the HTTP redirect mechanism. You can use this facility to
 randomly select a page or picture to display in response to a URI
 request (many banner ad generators work this way) or to implement a
 custom navigation system. 
 
 
 Redirection is extremely simple with the Apache API. You need only add a Location field to the HTTP header containing the full or partial URI of the desired destination, and return a REDIRECT result code. A complete functional example using mod_perl is only a few lines (Example 4-8). This module, named Apache::GoHome, redirects users to the hardcoded URI http://www.ora.com/.
 When the user selects a document or a portion of the document tree that
 this content handler has been attached to, the browser will immediately
 jump to that URI. 
 
 
 The module begins by importing the REDIRECT error code from Apache::Constants (REDIRECT isn't among the standard set of result codes imported with :common). The handler( ) method then adds the desired location to the outgoing headers by calling Apache::header_out( ). header_out( )
 can take one or two arguments. Called with one argument, it returns the
 current value of the indicated HTTP header field. Called with two
 arguments, it sets the field indicated by the first argument to the
 value indicated by the second argument. In this case, we use the
 two-argument form to set the HTTP Location field to the desired URI. 
 
 
 The final step is to return the REDIRECT result code. There's no need to generate an HTML body, since most HTTP-compliant browsers will take you directly to the Location
 URI. However, Apache adds an appropriate body automatically in order to
 be HTTP-compliant. You can see the header and body message using
 telnet: 
 
 % telnet localhost 80 
 Trying 127.0.0.1... 
 Connected to localhost. 
 Escape character is '^]'. 
 GET /gohome HTTP/1.0  
 HTTP/1.1 302 Moved Temporarily 
 Date: Mon, 05 Oct 1998 22:15:17 GMT 
 Server: Apache/1.3.3-dev (Unix) mod_perl/1.16 
 Location: http://www.ora.com/ 
 Connection: close 
 Content-Type: text/html 
   
 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
 <HTML><HEAD> 
 <TITLE>302 Moved Temporarily</TITLE> 
 </HEAD><BODY> 
 <H1>Moved Temporarily</h2> 
 The document has moved <A HREF="http://www.ora.com/">here</A>.<P> 
 </BODY></HTML> 
 Connection closed by foreign host.
 
 
 You'll notice from this example that the REDIRECT
 status causes a "Moved Temporarily" message to be issued. This is
 appropriate in most cases because it makes no warrants to the browser
 that it will be redirected to the same location the next time it tries
 to fetch the desired URI. If you wish to redirect permanently, you
 should use the MOVED status code instead,
 which results in a "301 Moved Permanently" message. A smart browser
 might remember the redirected URI and fetch it directly from its new
 location the next time it's needed. 
 
 
 
 Example 4-8: 
 Generating a Redirect from a Content Handler  
 
 
 package Apache::GoHome; 
 # file: Apache/GoHome.pm
   
 use strict; 
 use Apache::Constants qw(REDIRECT);
   
 sub handler { 
   my $r = shift; 
   $r->content_type('text/html'); 
   $r->header_out(Location => 'http://www.ora.com/'); 
   return REDIRECT; 
 }
   
 1; 
 _    _END_    _
 
 
 
 
 
 As a more substantial example of redirection in action, consider Apache::RandPicture (Example 4-9)
 which randomly selects a different image file to display each time it's
 called. It works by selecting an image file from among the contents of
 a designated directory, then redirecting the browser to that file's
 URI. In addition to demonstrating a useful application of redirection,
 it again shows off the idiom for interconverting physical file names
 and URIs. 
 
 
 The handler begins by fetching the name of a directory
 to fetch the images from, which is specified in the server
 configuration file by the Perl variable PictureDir.
 Because the selected image has to be directly fetchable by the browser,
 the image directory must be given as a URI rather than as a physical
 path. 
 
 
 The next task is to convert the directory URI into a physical directory path. The subroutine adds a /
 to the end of the URI if there isn't one there already (ensuring that
 Apache treats the URI as a directory), then calls the request object's lookup_uri( ) and filename( ) methods in order to perform the URI translation steps. The code looks like this:
 
     my $subr = $r->lookup_uri($dir_uri); 
     my $dir = $subr->filename; 
 
 
 Now we need to obtain a listing of image files in the
 directory. The simple way to do this would be to use the Perl glob
 operator, for instance: 
 
 chdir $dir; 
 @files = <*.{jpg,gif}>;
 
 
 However, this technique is flawed. First off, on many
 systems the glob operation launches a C subshell, which sends
 performance plummeting and won't even work on systems without the C
 shell (like Win32 platforms). Second, it makes assumptions about the
 extension types of image files. Your site may have defined an alternate
 extension for image files (or may be using a completely different
 system for keeping track of image types, such as the Apache MIME magic
 module), in which case this operation will miss some images. 
 
 
 Instead, we create a DirHandle object using Perl's directory handle object wrapper. We call the directory handle's read( )
 method repeatedly to iterate through the contents of the directory. For
 each item we ask Apache what it thinks the file's MIME type should be,
 by calling the lookup_uri( ) method to turn the filename into a subrequest and content_type( )
 to fetch the MIME type information from the subrequest. We perform a
 pattern match on the returned type and, if the file is one of the MIME
 image types, add it to a growing list of image URIs. The subrequest
 object's uri( ) method is called to return the absolute URI for the image. The whole process looks like this: 
 
     my @files; 
     for my $entry ($dh->read) { 
        # get the file's MIME type 
        my $rr = $subr->lookup_uri($entry); 
        my $type = $rr->content_type; 
        next unless $type =~ m:^image/:; 
        push @files, $rr->uri; 
     }
 
 
 Note that we look up the directory entry's filename by calling the subrequest object's lookup_uri( ) method rather than using the main request object stored in $r. This takes advantage of the fact that subrequests will look up relative paths relative to their own URI. 
 
 
 The next step is to select a member of this list randomly, which we do using this time-tested Perl idiom: 
 
  my $lucky_one = $files[rand @files];
 
 
 The last step is to set the Location header to point at this file (being sure to express the location as a URI) and to return a REDIRECT
 result code. If you install the module using the sample configuration
 file and <IMG> tag shown at the bottom of the listing, a
 different picture will be displayed every time you load the page. 
 
 
 
 Example 4-9: 
 Redirecting the Browser to a Randomly Chosen Picture  
 
 
 package Apache::RandPicture; 
 # file: Apache/RandPicture.pm 
   
 use strict; 
 use Apache::Constants qw(:common REDIRECT); 
 use DirHandle (); 
   
 sub handler { 
     my $r = shift; 
     my $dir_uri = $r->dir_config('PictureDir'); 
     unless ($dir_uri) { 
        $r->log_reason("No PictureDir configured"); 
        return SERVER_ERROR; 
     } 
     $dir_uri .= "/" unless $dir_uri =~ m:/$:; 
   
     my $subr = $r->lookup_uri($dir_uri); 
     my $dir = $subr->filename; 
     # Get list of images in the directory. 
     my $dh = DirHandle->new($dir); 
     unless ($dh) { 
        $r->log_error("Can't read directory $dir: $!"); 
        return SERVER_ERROR; 
     } 
   
     my @files; 
     for my $entry ($dh->read) { 
        # get the file's MIME type 
        my $rr = $subr->lookup_uri($entry); 
        my $type = $rr->content_type; 
        next unless $type =~ m:^image/:; 
        push @files, $rr->uri; 
     } 
     $dh->close; 
     unless (@files) { 
        $r->log_error("No image files in directory"); 
        return SERVER_ERROR; 
     } 
   
     my $lucky_one = $files[rand @files]; 
     $r->header_out(Location => $lucky_one); 
     return REDIRECT; 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 A configuration section to go with Apache::RandPicture might be:
 
 <Location /random/picture> 
    SetHandler  perl-script 
    PerlHandler Apache::RandPicture 
    PerlSetVar  PictureDir   /banners 
 </Location>
 
 
 And you'd use it in an HTML document like this:
 
 <image src="/random/picture" alt="[Our Sponsor]">
 
 
 Although elegant, this technique for selecting a random
 image file suffers from a bad performance bottleneck. Instead of
 requiring only a single network operation to get the picture from the
 server to the browser, it needs two round-trips across the network: one
 for the browser's initial request and redirect and one to fetch the
 image itself. 
 
 
 You can eliminate this overhead in several different
 ways. The more obvious technique is to get rid of the redirection
 entirely and simply send the image file directly. After selecting the
 random image and placing it in the variable $lucky_one, we replace the last two lines of the handler( ) subroutine with code like this: 
 
     $subr = $r->lookup_uri($lucky_one); 
     $r->content_type($subr->content_type); 
     $r->send_http_header; 
     return OK unless $r->header_only; 
     my $fh = Apache::File->new($subr->filename) || return FORBIDDEN; 
     $r->send_fd($fh);
 
 
 We create yet another subrequest, this one for the
 selected image file, then use information from the subrequest to set
 the outgoing content type. We then open up the file and send it with
 the send_fd( ) method. 
 
 
 However, this is still a little wasteful because it
 requires you to open up the file yourself. A more subtle solution would
 be to let Apache do the work of sending the file by invoking the
 subrequest's run( ) method. run( )
 invokes the subrequest's content handler to send the body of the
 document, just as if the browser had made the request itself. The code
 now looks like this: 
 
    my $subr = $r->lookup_uri($lucky_one); 
    unless ($subr->status == DOCUMENT_FOLLOWS) { 
        $r->log_error("Can't lookup file $lucky_one}: $!");      
        return SERVER_ERROR; 
    } 
    $r->content_type($subr->content_type); 
    $r->send_http_header; 
    return OK if $r->header_only; 
    $subr->run; 
    return OK;
 
 
 We call lookup_uri( ) and check the value returned by its status( ) method in order to make sure that it is DOCUMENT_FOLLOWS (status code 200, the same as HTTP_OK). This constant is not exported by Apache::Constants
 by default but has to be imported explicitly. We then set the main
 request's content type to the same as that of the subrequest, and send
 off the appropriate HTTP header. Finally, we call the subrequest's run( ) method to invoke its content handler and send the contents of the image to the browser. 
 
 
 Internal Redirection 
 
 
 The two Apache::RandPicture
 optimizations that we showed in the previous section involve a lot of
 typing, and the resulting code is a bit obscure. A far more elegant
 solution is to let Apache do all the work for you with its internal
 redirection mechanism. In this scheme, Apache handles the entire
 redirection internally. It pretends that the web browser made the
 request for the new URI and sends the contents of the file, without
 letting the browser in on the secret. It is functionally equivalent to
 the solution that we showed at the end of the preceding section. 
 
 
 To invoke the Apache internal redirection system, modify the last two lines of Apache::RandPicture 's handler( ) subroutine to read like this: 
 
     $r->internal_redirect($lucky_one); 
     return OK;
 
 
 The request object's internal_redirect( ) method takes a single argument consisting of an absolute local URI (one starting with a / ).
 The method does all the work of translating the URI, invoking its
 content handler, and returning the file contents, if any. Unfortunately
 internal_redirect( ) returns no result code,
 so there's no way of knowing whether the redirect was successful (you
 can't do this from a conventional redirect either). However, the call
 will return in any case, allowing you to do whatever cleanup is needed.
 You should exit the handler with a result code of OK. 
 
 
 In informal benchmarks, replacing the basic Apache::RandPicture
 with a version that uses internal redirection increased the throughput
 by a factor of two, exactly what we'd expect from halving the number of
 trips through the network. In contrast, replacing all the MIME type
 lookups with a simpler direct grep for image file extensions had
 negligible effect on the speed of the module. Apache's subrequest
 mechanism is very efficient. 
 
 If you have very many images in the random pictures
 directory (more than a few hundred), iterating through the directory
 listing each time you need to fetch an image will result in a
 noticeable performance hit. In this case, you'll want to cache the
 directory listing in a package variable the first time you generate it
 and only rebuild the listing when the directory's modification time
 changes (or just wait for a server restart, if the directory doesn't
 change often). You could adapt the Apache::ESSI caching system for this purpose. 
 
 
 Internal redirection is a win for most cases when you
 want to redirect the browser to a different URI on your own site. Be
 careful not to use it for external URIs, however. For these, you must
 either use standard redirection or invoke Apache's proxy API (Chapter
 7). 
 
 
 When you use internal redirection to pass control from
 one module to another, the second module in the chain can retrieve the
 original query string, the document URI, and other information about
 the original request by calling the request object's prev( ) method or, in Apache::Registry scripts only, by examining certain environment variables. There is also a way, using Apache::err_header_out( )
 for the original module to set various HTTP header fields, such as
 cookies, that will be transferred to the second across the internal
 redirect. Because internal redirects are most commonly used in error
 handlers, these techniques are discussed in the section "Handling Errors" later in this chapter.
 
 
 Processing Input 
 
 
 You can make the virtual documents generated by the
 Apache API interactive in exactly the way that you would documents
 generated by CGI scripts. Your module will generate an HTML form for
 the user to fill out. When the user completes and submits the form,
 your module will process the parameters and generate a new document,
 which may contain another fill-out form that prompts the user for
 additional information. In addition, you can store information inside
 the URI itself by placing it in the additional path information part. 
 
 
 CGI Parameters 
 
 
 When a fill-out form is submitted, the contents of its
 fields are turned into a series of name=value parameter pairs that are
 available for your module's use. Unfortunately, correctly processing
 these parameter pairs is annoying because, for a number of historical
 reasons, there are a variety of formats that you must know about and
 deal with. The first complication is that the form may be submitted
 using either the HTTP GET or POST method. If the GET method is used,
 the URI encoded parameter pairs can be found separated by ampersands in
 the "query string," the part of the URI that follows the ? character: 
 
 http://your.site/uri/path?name1=val1&name2=val2&name3=val3...
 
 
 To recover the parameters from a GET request, mod_perl users should use the request object's args( )
 method. In a scalar context this method returns the entire query
 string, ampersands and all. In an array context, this method returns
 the parsed name=value pairs; however, you will still have to do further
 processing in order to correctly handle multivalued parameters. This
 feature is only found in the Perl API. Programmers who use the C API
 must recover the query string from the request object's args field and do all the parsing manually. 
 
 
 If the client uses the POST method to submit the
 fill-out form, the parameter pairs can be found in something called the
 "client block." C API users must call three functions named setup_client_block( ), should_client_block( ), and get_client_block( ) in order to retrieve the information.   
 
 
 While these methods are also available in the Perl API, mod_perl users have an easier way: they need only call the request object's content( ) method to retrieve the preparsed list of name=value pairs. However, there's a catch: this only works for the older application/x-www-form-urlencoded style of parameter encoding. If the browser uses the newer multipart/form-data encoding (which is used for file uploads, among other things), then mod_perl users will have to read and parse the content information themselves. read( )
 will fetch the unparsed content information by looping until the
 requested number of bytes have been read (or a predetermined timeout
 has occurred). Fortunately, there are a number of helpful modules that
 allow mod_perl programmers to accept file uploads without parsing the data themselves, including CGI.pm and Apache::Request, both of which we describe later. 
 
 
 To show you the general technique for prompting and processing user input, Example 4-10 gives a new version of Apache::Hello. It looks for a parameter named user_name
 and displays a customized welcome page, if present. Otherwise, it
 creates a more generic message. In both cases, it also displays a
 fill-out form that prompts the user to enter a new value for user_name.
 When the user presses the submission button labeled "Set Name," the
 information is POSTed to the module and the page is redisplayed (Figure 4-4).  
 
 
 
 Figure 4-4. 
 The Apache::Hello2 module can process user input.  
 
 
 
  
 
 
 
 
 
 The code is very simple. On entry to handler( ) the module calls the request object's method( )
 method to determine whether the handler was invoked using a POST
 request, or by some other means (usually GET). If the POST method was
 used, the handler calls the request object's content( ) method to retrieve the posted parameters. Otherwise, it attempts to retrieve the information from the query string by calling args( ). The parsed name=value pairs are now stuffed into a hash named %params for convenient access. 
 
 
 Having processed the user input, if any, the handler retrieves the value of the user_name parameter from the hash and stores it in a variable. If the parameter is empty, we default to "Unknown User." 
 
 
 The next step is to generate the document. We set the content type to text/html as before and emit the HTTP header. We again call the request object's header_only( ) to determine whether the client has requested the entire document or just the HTTP header information. 
 
 
 This is followed by a single long Apache::print( )
 statement. We create the HTML header and body, along with a suitable
 fill-out form. Notice that we use the current value of the user name
 variable to initialize the appropriate text field. This is a frill that
 we have always thought was kind of neat. 
 
 
 
 Example 4-10: 
 Processing User Input with the Apache Perl API  
 
 
 package Apache::Hello2; 
 # file: Apache/Hello2.pm 
 use strict; 
 use Apache::Constants qw(:common); 
   
 sub handler { 
     my $r = shift; 
     my %params = $r->method eq 'POST' ? $r->content : $r->args; 
     my $user_name = $params{'user_name'} || 'Unknown User';
   
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $user_name</h2> 
 Who would take this book seriously if the first example didn\'t 
 say "hello $user_name"? 
 <HR> 
 <FORM METHOD="POST"> 
 Enter your name: <INPUT TYPE="text" NAME="user_name" VALUE="$user_name"> 
 <INPUT TYPE="submit" VALUE="Set Name"> 
 </FORM> 
 </BODY> 
 </HTML> 
 END
   
     return OK; 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 A perl.conf entry to go with it might read: 
 
 <Location /hello/friend> 
  SetHandler  perl-script 
  PerlHandler Apache::Hello2 
 </Location>
 
 
 This method of processing user input is only one of
 several equally valid alternatives. For example, you might want to work
 with query string and POSTed parameters simultaneously, to accommodate
 this type of fill-out form: 
 
 <FORM ACTION="/hello/friend?day=saturday" METHOD="POST"> 
    <INPUT TYPE="text" NAME="user_name"> 
    <INPUT TYPE="submit"> 
 </FORM>
 
 
 In this case, you could recover the values of both the day and user_name parameters using a code fragment like this one: 
 
 my %params = ($r->args, $r->content);
 
 
 If the same parameter is present in both the query
 string and the POSTed values, then the latter will override the former.
 Depending on your application's logic, you might like this behavior.
 Alternatively, you could store the two types of parameter in different
 places or take different actions depending on whether the parameters
 were submitted via GET or POST. For example, you might want to use
 query string parameters to initialize the default values of the
 fill-out form and enter the information into a database when a POST
 request is received. 
 
 
 When you store the parsed parameters into a hash, you
 lose information about parameters that are present more than once. This
 can be bad if you are expecting multivalued parameters, such as those
 generated by a selection list or a series of checkboxes linked by the
 same name. To keep multivalued information, you need to do something
 like this: 
 
 my %params; 
 my @args = ($r->args, $r->content); 
 while (my($name,$value) = splice @args,0,2) { 
   push @{$params{$name}}, $value; 
 }
 
 
 This bit of code aggregates the GET and POST parameters into a single array named @args.
 It then loops through each name=value pair, building up a hash in which
 the key is the parameter name and the value is an array reference
 containing all the values for that parameter. This way, if you have a
 selection list that generates query strings of the form: 
 
 vegetable=kale&vegetable=broccoli&vegetable=carrots
 
 
 you can recover the complete vegetable list in this manner: 
 
 @vegetables = @{$params{'vegetable'}};
 
 
 An alternative is to use a module that was still in development at the time this chapter was written. This module, named Apache::Request,
 uses the CGI.pm-style method calls to process user input but does so
 efficiently by going directly to the request object. With this module,
 the user input parameters are retrieved by calling param( ). Call param( ) without any arguments to retrieve a list of all the parameter names. Call param( )
 with a parameter name to return a list of the values for that parameter
 in an array context, and the first member of the list in a scalar
 context. Unlike the vanilla request object, input of type multipart/form-data is handled correctly, and uploaded files can be recovered too (using the same API as CGI.pm). 
 
 
 To take advantage of Apache::Request in our "Hello World" module, we modify the top part of the module to read as follows: 
 
 package Apache::Hello3; 
 # file: Apache/Hello3.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Apache::Request;
   
 sub handler { 
    my $r = Apache::Request->new(shift); 
    my $user_name = $r->param('user_name') || 'Unknown User'; 
    $r->content_type('text/html'); 
    $r->print(<<END); 
 Who cares if every single example 
 says "Hello World"???! 
 END 
 ; 
 ...
 
 
 The main detail here is that instead of retrieving the request object directly, we wrap it inside an Apache::Request object. Apache::Request adds param( ) and a few other useful methods and inherits all other method calls from the Apache class. More information will be found in the Apache::Request manual page when that package is officially released. 
 
 
 Like CGI.pm, Apache::Request
 allows you to handle browser file uploading, although it is somewhat
 different in detail from the interface provided in CGI.pm versions 2.46
 and lower (the two libraries have been brought into harnony in Version
 2.47). As in ordinary CGI, you create a file upload field by defining
 an <INPUT> element of type "file" within a <FORM> section
 of type "multipart/form-data". After the form is POSTed, you retrieve
 the file contents by reading from a filehandle returned by the Apache::Request upload( ) method. This code fragment illustrates the technique:
 
 my $r = Apache::Request->new(shift);
 my $moose = 0;
 my $uploaded_file = $r->upload('uploaded-file');
 my $uploaded_name = $r->param('uploaded-file');
 while (<$uploaded_file>) {
     $moose++ if /moose/;
 }
 print "$moose moose(s) found in $uploaded_name\n";
 
 
 Additional Path Information 
 
 
 Recall that after Apache parses an incoming URI to
 figure out what module to invoke, there may be some extra bits left
 over. This extra stuff becomes the "additional path information" and is
 available for your module to use in any way it wishes. Because it is
 hierarchical, the additional path information part of the URI follows
 all the same relative path rules as the rest of the URI. For example, ..
 means to move up one level. For this reason, additional path
 information is often used to navigate through a virtual document tree
 that is dynamically created and maintained by a CGI script or module.
 However, you don't have to take advantage of the hierarchical nature of
 path information. You can just use it as a handy place to store
 variables. In the next chapter, we'll use additional path information
 to stash a session identifier for a long-running web application. 
 
 
 Apache modules fetch additional path information by calling the request object's path_info( ) method. If desired, they can then turn the path information into a physical filename by calling lookup_uri( ). 
 
 
 An example of how additional path information can be used as a virtual document tree is shown in Example 4-11, which contains the code for Apache::TreeBrowser.
 This module generates a series of documents that are organized in a
 browseable tree hierarchy that is indistinguishable to the user from a
 conventional HTML file hierarchy. However, there are no physical files.
 Instead, the documents are generated from a large treelike Perl data
 structure that specifies how each "document" should be displayed. Here
 is an excerpt: 
 
 'bark'=>{ 
     -title=>'The Wrong Tree', 
     -contents=>'His bark was worse than his bite.', 
     'smooth'=>{ 
        -title=>'Like Butter', 
        -contents=>'As smooth as silk.' 
        }, 
     'rough'=>{ 
        -title=>'Ruffled', 
        -contents=>"Don't get rough with me." 
        }, 
  }...
 
 
 In this bit of the tree, a document named "bark" has
 the title "The Wrong Tree" and the contents "His bark was worse than
 his bite." Beneath this document are two subdocuments named "smooth"
 and "rough." The "smooth" document has the title "Like Butter" and the
 contents "As smooth as silk." The "rough" document is similarly silly.
 These subdocuments can be addressed with the additional path
 information /bark/smooth and /bark/rough, respectively. The parent document, naturally enough, is addressed by /bark. Within the module, we call each chunk of this data structure a "node." 
 
 
 Using the information contained in the data structure, Apache::TreeBrowser
 constructs the document and displays its information along with a
 browseable set of links organized in hierarchical fashion (see Figure 4-5).
 As the user moves from document to document, the currently displayed
 document is highlighted--sort of a hierarchical navigation bar! 
 
 
 
 Figure 4-5. 
 Apache::TreeBrowser creates a hierarchical navigation tree. 
 
 
 
  
 
 
 
 
 
 The source code listing is long, so we'll run through it a chunk at a time: 
 
 package Apache::TreeBrowser; 
   
 use strict; 
 use Apache::Constants qw(:common REDIRECT); 
   
 my $TREE = make_tree();
   
 sub handler { 
     my $r = shift;
 
 
 The module starts by importing the usual Apache constants and the REDIRECT result code. It then creates the browseable tree by calling an internal subroutine named make_tree( ) and stores the information in a package global named $TREE.
 In a real-life application, this data structure would be created in
 some interesting way, for example, using a query on a database, but in
 this case make_tree( ) just returns the hardcoded data structure that follows the _    _DATA_    _ token at the end of the code. 
 
     my $path_info = $r->path_info; 
     my $path_translated = $r->lookup_uri($path_info)->filename; 
     my $current_uri = $r->uri;
 
 
 Now's the time to process the additional path
 information. The handler fetches the path information by calling the
 request object's path_info( ) method and fetches the module's base URI by calling uri( ). Even though we won't be using it, we transform the additional path information into a physical pathname by calling lookup_uri( ) and filename( ). This is useful for seeing how Apache does URI translation. 
 
    unless ($path_info) { 
        $r->header_out(Location => "$current_uri/"); 
        return REDIRECT; 
    }
 
 
 For this module to work correctly, some additional path information has to be provided, even if it's only a /
 character. If we find that the additional path information is empty, we
 rectify the situation by redirecting the browser to our URI with an
 additional / appended to the end. This is
 similar to the way that Apache redirects browsers to directories when
 the terminal slash is omitted. 
 
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     my($junk, @components) = split "/", $path_info;
   
     # follow the components down 
     my($node, $name) = ($TREE, ''); 
     foreach (@components) { 
        last unless $node->{$_}; 
        $name = $_; 
        $node = $node->{$_}; 
     }
 
 
 At this point we begin to construct the document. We set the content type to text/html, send out the HTTP header, and exit if header_only( )
 returns true. Otherwise, we split the path information into its
 components and then traverse the tree, following each component name
 until we either reach the last component on the list or come to a
 component that doesn't have a corresponding entry in the tree (which
 sometimes happens when users type in the URI themselves). By the time
 we reach the end of the tree traversal, the variable $node
 points to the part of the tree that is referred to by the additional
 path information or, if the path information wasn't entirely correct,
 to the part of the tree corresponding to the last valid path component.
 
 
     $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>$node->{-title}</TITLE> 
 </HEAD> 
 <BODY BGCOLOR="white"> 
 <H1>$node->{-title}</h2> 
   
 Contents = <b>$node->{-contents}</b> 
   
 <H2>Navigation Tree</H2> 
 END 
     my $prefix = "../" x @components; 
     print $prefix ?  
        qq(<H3><A HREF="$prefix">Tree Root</A></H3>\n) :  
        qq(<H3><FONT COLOR="red">Tree Root</FONT></H3>);
 
 
 We now call print( ) to print
 out the HTML document. We first display the current document's title
 and contents. We then print a hyperlink that points back to the "root"
 (really the top level) of the tree. Notice how we construct this link
 by creating a relative URI based on the number of components in the
 additional path information. If the additional path information is
 currently /bark/rough/cork, we construct a link whose HREF is ../../../. Through the magic of relative addressing, this will take us back to the root / document. 
 
     print_node('', $TREE, $node, $prefix); 
     print qq(<A HREF="../">Go up one level</A><P>) if $name;
 
 
 The next task is to construct the hierarchical navigation system shown in Figure 4-5. We do this by calling print_node( ), an internal function. This is followed by a link to the next-higher document, which is simply the relative path ../.  
 
     $r->print(<<END); 
 Node = <EM>$name</EM><br> 
 URI = <EM>$current_uri</EM><br> 
 Path information =<EM>$path_info</EM><br> 
 Translated path = <EM>$path_translated</EM> 
 </BODY> 
 </HTML> 
 END 
   
     return OK; 
 }
 
 
 Last, we print out some more information about the
 current document, including the internal name of the document, the
 current URI, the additional path information, and the translated path
 information. 
 
 
 Let's now look at the print_node( ) subroutine: 
 
 sub print_node { 
     my ($name, $node, $current, $prefix) = @_; 
     my (@branches) = grep !/^-/, sort keys %$node; 
     if ($name) { 
        # print the node itself 
        print $node != $current ? 
            qq(<LI><A HREF="$prefix$name/">$name</A></LI>\n) :
               qq(<LI><FONT COLOR="red">$name</FONT></LI>\n); 
        # print branches underneath it 
        $prefix .= "$name/"; 
     } 
     return unless @branches; 
     print "<UL>\n"; 
     foreach (@branches) { 
        print_node($_, $node->{$_}, $current, $prefix); 
     } 
     print "</UL>\n"; 
 }
 
 
 This subroutine is responsible for displaying a tree
 node as a nested list. It starts by finding all the branches beneath
 the requested node, which just happens to be all the hash keys that
 don't begin with a hyphen. It then prints out the name of the node. If
 the node being displayed corresponds to the current document, the name
 is surrounded by <FONT> tags to display it in red. Otherwise, the
 node name is turned into a hyperlink that points to the appropriate
 document. Then, for each subdocument beneath the current node, it
 invokes itself recursively to display the subdocument. The most obscure
 part of this subroutine is the need to append a $prefix variable to each URI the routine generates. $prefix contains just the right number of ../ sequences to make the URIs point to the root of the virtual document tree. This simplifies the program logic. 
 
 
 The last function in this module is make_tree( ). It simply reads in the text following the _    _DATA_    _ token and eval( ) s it, turning it into a Perl data structure: 
 
 sub make_tree { 
     local $/; 
     my $data = <DATA>; 
     eval $data; 
 }
   
 1; 
  _    _DATA_    _  
 
 
 Example 4-11: 
 Using Path Information to Browse a Tree  
 
 
 package Apache::TreeBrowser; 
 # file: Apache/TreeBrowser.pm 
   
 use strict; 
 use Apache::Constants qw(:common REDIRECT); 
   
 my $TREE = make_tree(); 
   
 sub handler { 
     my $r = shift; 
     my $path_info = $r->path_info; 
     my $path_translated = $r->lookup_uri($path_info)->filename; 
     my $current_uri = $r->uri; 
     unless ($path_info) { 
        $r->header_out(Location => "$current_uri/"); 
        return REDIRECT; 
     } 
      
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     my($junk, @components) = split "/", $path_info; 
      
     # follow the components down 
     my($node, $name) = ($TREE, ''); 
     foreach (@components) { 
        last unless $node->{$_}; 
        $name = $_; 
        $node = $node->{$_}; 
     } 
      
     $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>$node->{-title}</TITLE> 
 </HEAD> 
 <BODY BGCOLOR="white"> 
 <H1>$node->{-title}</h2> 
   
 Contents = <b>$node->{-contents}</b> 
   
 <H2>Navigation Tree</H2> 
 END 
                                         
     my $prefix = "../" x @components; 
     print $prefix ?  
        qq(<H3><A HREF="$prefix">Tree Root</A></H3>\n) :  
        qq(<H3><FONT COLOR="red">Tree Root</FONT></H3>); 
      
     print_node('', $TREE, $node, $prefix); 
     print qq(<A HREF="../">Go up one level</A><P>) if $name; 
      
     $r->print(<<END); 
 Node = <EM>$name</EM><br> 
 URI = <EM>$current_uri</EM><br> 
 Path information =<EM>$path_info</EM><br> 
 Translated path = <EM>$path_translated</EM> 
 </BODY> 
 </HTML> 
 END 
   
     return OK; 
 } 
   
 sub print_node { 
     my ($name, $node, $current, $prefix) = @_; 
     my (@branches) = grep !/^-/, sort keys %$node; 
     if ($name) { 
        # print the node itself 
        print $node != $current ? 
            qq(<LI><A HREF="$prefix$name/">$name</A></LI>\n) : 
                qq(<LI><FONT COLOR="red">$name</FONT></LI>\n); 
        # print branches underneath it 
        $prefix .= "$name/"; 
     } 
     return unless @branches; 
     print "<UL>\n"; 
     foreach (@branches) { 
        print_node($_, $node->{$_}, $current, $prefix); 
     } 
     print "</UL>\n"; 
 } 
   
 # create a sample tree to browse 
 sub make_tree { 
     local $/; 
     my $data = <DATA>; 
     eval $data; 
 } 
   
 _    _DATA_    _ 
 return { 
     -title => 'The Root of All Evil', 
     -contents => 'And so it begins...', 
     'bark' => { 
        -title => 'The Wrong Tree', 
        -contents => 'His bark was worse than his bite.', 
        'smooth' => { 
            -title => 'Like Butter', 
            -contents => 'As smooth as silk.', 
        }, 
        'rough' => { 
            -title => 'Ruffled', 
            -contents => "Don't get rough with me.", 
            'cork' => { 
                -title => 'Corked', 
                -contents => "Corks don't grow on trees...or do they?", 
            }, 
            'cinnamon' => { 
                -title => 'The Cinnamon Tree', 
                -contents => 'Little bird, little bird in the cinnamon tree...', 
            }, 
        } 
     }, 
     'bough' => { 
        -title => 'Stealing a Bough', 
        -contents => "I've taken a bough of silence.", 
        'forked' => { 
            -title => 'Forked Boughs', 
            -contents => 'What lightning and snakes\' tongues have in common.', 
        }, 
        'straight' => { 
            -title => 'Single Boughs', 
            -contents => 'Straight, but not narrow.', 
        }, 
        'extra' => { 
            -title => 'Take a Bough', 
            -contents => 'Nothing beats that special feeling,  
                               when you are stealing that extra bough!', 
        }, 
     }, 
     'branch' => { 
        -title => 'The Branch Not Taken', 
        -contents => 'Branch or be branched.', 
        'twig' => { 
            -title => 'Twiggy', 
            -contents => 'Anorexia returns!', 
            'twiglet' => { 
                -title => 'The Leastest Node', 
                -contents => 'Winnie the Pooh, Eeyore, and Twiglet.', 
            }, 
        }, 
        'leaf' => { 
            -title => 'Leaf me Alone!', 
            -contents => 'Look back, Leaf Ericksonn.', 
        } 
     }, 
 }
 
 
 
 
 
 
 Here is a sample configuration file entry to go with Apache::TreeBrowser :
 
 <Location /virtual> 
 SetHandler perl-script 
 PerlHandler Apache::TreeBrowser 
 </Location>
 
 
 Apache::Registry 
 
 
 If you are using mod_perl to write Apache modules, then you probably want to take advantage of Apache::Registry. Apache::Registry
 is a prewritten Apache Perl module that is a content handler for files
 containing Perl code. In addition to making it unnecessary to restart
 the server every time you revise a source file, Apache::Registry
 sets up a simulated CGI environment, so that programs that expect to
 get information about the transaction from environment variables can
 continue to do so. This allows legacy CGI applications to run under the
 Apache Perl API, and lets you use server-side code libraries (such as
 the original CGI.pm) that assume the script is running in a CGI
 environment. 
 
 
 Apache::Registry is similar
 in concept to the content filters we created earlier in this chapter,
 but instead of performing simple string substitutions on the contents
 of the requested file, Apache::Registry compiles and executes the code contained within it. In order to avoid recompiling the script each time it's requested, Apache::Registry
 caches the compiled code and checks the file modification time each
 time it's requested in order to determine whether it can safely use the
 cached code or whether it must recompile the file. Should you ever wish
 to look at its source code, Apache::Registry is a good example of a well-written Apache content handler that exercises much of the Perl API. 
 
 
 We created a typical configuration file entry for Apache::Registry in Chapter 2. Let's examine it in more detail now. 
 
 Alias /perl/ /usr/local/apache/perl/ 
 <Location /perl> 
    SetHandler     perl-script 
    PerlHandler    Apache::Registry 
    PerlSendHeader On 
    Options        +ExecCGI 
 </Location>
 
 
 The Alias directive simply maps the physical directory /usr/local/apache/perl/ to a virtual directory named /perl. The <Location> section is more interesting. It uses SetHandler to make perl-script the content handler for this directory and sets Apache::Registry to be the module to handle requests for files within this part of the document tree. 
 
 
 The PerlSendHeader On line tells mod_perl to intercept anything that looks like a header line (such as Content-Type: text/html )
 and to automatically turn it into a correctly formatted HTTP/1.0 header
 the way that Apache does with CGI scripts. This allows you to write
 scripts without bothering to call the request object's send_http_header( ) method. Like other Apache::Registry features, this option makes it easier to port CGI scripts to the Apache API. If you use CGI.pm's header( ) function to generate HTTP headers, you do not need to activate this directive because CGI.pm detects mod_perl and calls send_http_header( ) for you. However, it does not hurt to use this directive anyway. 
 
 
 Option +ExecCGI ordinarily tells Apache that it's all right for the directory to contain CGI scripts. In this case the flag is required by Apache::Registry to confirm that you really know what you're doing. In addition, all scripts located in directories handled by Apache::Registry must be executable--another check against accidentally leaving wayward nonscript files in the directory. 
 
 
 When you use Apache::Registry,
 you can program in either of two distinct styles. You can choose to
 ignore the Apache Perl API entirely and act as if your script were
 executed within a CGI environment, or you can ignore the CGI
 compatibility features and make Apache API calls. You can also combine
 both programming styles in a single script, although you run the risk
 of confusing yourself and anyone else who needs to maintain your code! 
 
 
 A typical example of the first style is the hello.pl script (Example 4-12),
 which you also saw in Chapter 2. The interesting thing about this
 script is that there's nothing Apache-specific about it. The same
 script will run as a standard CGI script under Apache or any other web
 server. Any library modules that rely on the CGI environment will work
 as well. 
 
 
 
 Example 4-12: 
 An Apache::Registry Script That Uses CGI-Compatibility Mode  
 
 
 #!/usr/local/bin/perl 
 # file: hello.pl
   
 print "Content-Type: text/html\n\n";
   
 print <<END; 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $ENV{REMOTE_HOST}</h2> 
 Who would take this book seriously if the examples 
 didn't say "hello world" in at least four different ways? 
 </BODY> 
 </HTML> 
 END
 
 
 
 
 
 Example 4-13 shows the same script rewritten more compactly by taking advantage of the various shortcuts provided by the CGI.pm module. 
 
 
 
 Example 4-13: 
 An Apache::Registry Script That Uses CGI.pm  
 
 
 #!/usr/local/bin/perl 
 # file: hello2.pl
   
 use CGI qw(:standard); 
 print header, 
    start_html('Hello There'), 
    h1('Hello',remote_host()), 
    'Who would take this book seriously if the examples', 
    'didn\'t say "hello world" in at least four different ways?', 
    end_html;
 
 
 
 
 
 In contrast, Example 4-14 shows the script written in the Apache Perl API style. If you compare the script to Example 4-7,
 which used the vanilla API to define its own content handler, you'll
 see that the contents of this script (with the exception of the #! line at the top) are almost identical to the body of the handler( )
 subroutine defined there. The main difference is that instead of
 retrieving the Apache request object from the subroutine argument list,
 we get it by calling Apache->request(). request( ) is a static (class) method in the Apache package where the current request object can always be found. 
 
 
 There are also some subtle differences between Apache::Registry scripts that make Apache API calls and plain content handlers. One thing to notice is that there is no return value from Apache::Registry scripts. Apache::Registry normally assumes an HTTP status code of 200 (OK). However, you can change the status code manually by calling the request object's status( ) method to change the status code before sending out the header: 
 
 $r->status(404);  # forbidden
 
 
 Strictly speaking, it isn't necessary to call send_http_header( ) if you have PerlSendHeader On. However, it is good practice to do so, and it won't lead to redundant headers being printed. 
 
 
 Alternatively, you can use the CGI compatibility mode to set the status by printing out an HTTP header that contains a Status: field: 
 
 print "Status: 404 Forbidden\n\n";
 
 
 Another subtle difference is that at least one of the command-line switches that may be found on the topmost #! line is significant. The -w switch, if present, will signal Apache::Registry to turn on Perl warnings by setting the $^W global to a true value. Another common switch used with CGI scripts is -T,
 which turns on taint checking. Currently, taint checking can be
 activated for the Perl interpreter as a whole only at server startup
 time by setting the configuration directive PerlTaintCheck On. However, if Apache::Registry notices -T on the #! line and taint checks are not activated, it will print a warning in the server error log. 
 
 
 Since Apache::Registry scripts can do double duty as normal CGI scripts and as mod_perl
 scripts, it's sometimes useful for them to check the environment and
 behave differently in the two situations. They can do this by checking
 for the existence of the environment variable MOD_PERL or for the value of GATEWAY_INTERFACE. When running under mod_perl, GATEWAY_INTERFACE will be equal to CGI-Perl/1.1. Under the normal CGI interface, it will be CGI/1.1. 
 
 
 
 Example 4-14: 
 An Apache::Registry Script That Uses the Apache API  
 
 
 #!/usr/local/bin/perl 
 # file: hello3.pl
   
 use strict;
   
 my $r = Apache->request; 
 $r->content_type('text/html'); 
 $r->send_http_header;
 return OK unless $r->header_only; 
  
 my $host = $r->get_remote_host; 
 $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $host</h2> 
 Enough with the "Hello worlds" already! 
 </BODY> 
 </HTML> 
 END
 
 
 
 
 
 A Useful Apache::Registry Application 
 
 
 All the Apache::Registry
 examples that we've seen so far have been short and, frankly, silly.
 Now let's look at an example of a real-world script that actually does
 something useful. The guestbook script (Example 4-15),
 as its name implies, manages a typical site guestbook, where visitors
 can enter their names, email addresses, and comments. It works well as
 both a standalone CGI script and a mod_perl Apache::Registry script, automatically detecting when it is running under the Apache Perl API in order to take advantage of mod_perl 's
 features. In addition to showing you how to generate a series of
 fill-out forms to handle a moderately complex user interaction, this
 script demonstrates how to read and update a file without the risk of
 several instances of the script trying to do so simultaneously. 
 
 
 Unlike some other guestbook programs, this one doesn't
 append users' names to a growing HTML document. Instead, it maintains a
 flat file in which each user's entry is represented as a single line in
 the file. Tabs separate the five fields, which are the date of the
 entry, the user's name, the user's email address, the user's location
 (e.g., city of residence), and comments. Nonalphanumeric characters are
 URL-escaped to prevent the format from getting messed up if the user
 enters newlines or tabs in the fields, giving records that look like: 
 
 05/07/98  JR  jr_ewing%40dallas.com  Dallas,%20TX  Like%20the%20hat
 
 
 When the script is first called, it presents the user
 with the option of signing the guestbook file or looking at previous
 entries (Figure 4-6).   
 
 
 
 Figure 4-6. 
 The Apache::Registry guestbook script generates its own fill-out form.  
 
 
 
  
 
 
 
 
 
 If the user presses the button labeled "Sign
 Guestbook," a confirmation page appears, which echoes the entry and
 prompts the user to edit or confirm it (Figure 4-7).   
 
 
 
 Figure 4-7. 
 The confirmation page generated by guestbook  
 
 
 
  
 
 
 
 
 
 Pressing the "Change Entry" button takes the user back
 to the previous page with the fields filled in and waiting for the
 user's changes. Pressing "Confirm Entry" appends the user's entry to
 the guestbook file and displays the whole file (Figure 4-8).   
 
 
 
 Figure 4-8. 
 The listing of previous guestbook entries generated by guestbook  
 
 
 
  
 
 
 
 
 
 Turning to the source code, the script begins by importing functions from a variety of modules, including CGI.pm, IO::File, Fcntl, and POSIX : 
 
 use strict; 
 use CGI qw(:standard :html3 escape unescape escapeHTML); 
 use IO::File (); 
 use Fcntl qw(:flock); 
 use POSIX qw(strftime); 
 use vars qw(@FIELDS %REQUIRED %BIG $GUESTBOOKFILE);
   
 @FIELDS = qw(name e-mail location comments); 
 %REQUIRED = ('name' => 1, 'e-mail' => 1); 
 %BIG = ('comments' => 1);
 
 
 The script then defines some constants. @FIELDS
 is an array of all the fields known to the guestbook. By changing the
 contents of this array you can generate different fill-out forms. %REQUIRED is a hash that designates certain fields as required, in this case name and e-mail.
 The script will refuse to add an entry to the guestbook until these
 fields are filled out (however, no error checking on the contents of
 the fields is done). %BIG is a hash containing the names of fields that are displayed as large text areas, in this case comments. Other fields are displayed as one-line text entries. 
 
 if ($ENV{MOD_PERL}) { 
     $GUESTBOOKFILE = Apache->request->dir_config('GuestbookFile'); 
 } 
 $GUESTBOOKFILE ||= "/usr/tmp/guestbookfile.txt";
 
 
 Next the script checks if it is running under mod_perl by checking for the MOD_PERL environment variable. If the script finds that it is running under mod_perl, it fetches the Apache request object and queries the object for a per-directory configuration variable named GuestbookFile.
 This contains the physical pathname of the file where the guestbook
 entries are stored. If the script is a standalone CGI script, or if no GuestbookFile configuration variable is defined, the script defaults to a hardcoded file path. In the case of Apache::Registry scripts, the PerlSetVar directive used to set per-directory configuration variables must be located in a .htaccess file in the same directory as the script. 
 
 print header, 
     start_html(-title => 'Guestbook', -bgcolor => 'silver'), 
     h1("Guestbook");
 
 
 The script now begins to generate the document by
 calling shortcut functions defined in the CGI module to generate the
 HTTP header, the HTML header and title, and a level 1 heading of
 "Guestbook." 
 
  CASE: { 
      $_ = param('action'); 
      /^sign/i and do    { sign_guestbook(); last CASE; }; 
      /^confirm/i and do { write_guestbook() and view_guestbook(); last CASE; }; 
      /^view/i and do    { view_guestbook(1); last CASE; }; 
      generate_form(); 
  }
 
 
 We now enter the variable part of the script. Depending
 on what phase of the transaction the user is in, we either want to
 prompt the user to fill out the guestbook form, confirm an entered
 entry, or view the entire guestbook. We distinguish between the phases
 by looking at the contents of a script parameter named action. If action equals sign,
 we know that the user has just completed the fill-out form and pressed
 the "Sign Guestbook" button, so we jump to the routine responsible for
 this part of the transaction. Similarly, we look for action values of confirm and view, and jump to the appropriate routines for these actions. If action is missing, or if it has some value we don't expect, we take the default action of generating the fill-out form. 
 
 print end_html; 
 exit 0;
 
 
 Having done its work, the script prints out the </HTML> tag and exits. 
 
 sub generate_form { 
     my @rows; 
     for my $field (@FIELDS) { 
        my $title = "Your $field"; 
        $title .= " (optional)" if !$REQUIRED{$field}; 
        my $element = $BIG{$field} ?  
            textarea(-name => $field, 
                     -rows => 4, 
                     -columns => 50, 
                     -wrap => 1) 
                : textfield(-name => $field, -size => 50); 
        push @rows, th($title) . td($element); 
     } 
     print start_form, 
     table(TR{-align => 'LEFT'}, \@rows), 
     br, 
     submit(-name => 'action', -value => 'View Guestbook'), 
     submit(-name => 'action', -value => 'Sign Guestbook'), 
     end_form; 
 }
 
 
 The subroutine responsible for generating the form is named, appropriately enough, generate_form( ). It iterates over @FIELDS
 and dynamically generates a text label and a form element for each
 field, modifying the format somewhat based on whether the field is
 marked optional or big. Each label/field pair is pushed onto a list
 named @rows. When the loop is finished, @rows
 is turned into a nicely formatted table using CGI.pm's table-generation
 shortcuts. The "View Guestbook" and "Sign Guestbook" buttons are added
 to the form, and the routine finishes. 
 
 sub sign_guestbook { 
     my @missing = check_missing(@FIELDS); 
     if (@missing) { 
        print_warning(@missing); 
        generate_form(); 
        return; 
     }
 
 
 sign_guestbook( ) has a
 slightly more complex job. Its first task is to check the submitted
 form for missing required fields by calling the internal subroutine check_missing( ). If any are missing, it displays the missing fields by calling another internal subroutine, print_warning( ), and then invokes generate_form( )
 to redisplay the form with its current values. No particular
 hocus-pocus is required to display the partially completed form
 correctly; this is just one of the beneficial side effects of CGI.pm's
 "sticky forms" feature. 
 
     my @rows; 
     foreach (@FIELDS) { 
        push @rows, TR(th({-align=>'LEFT'},$_),  
                       td(escapeHTML(param($_)))); 
     } 
     print "Here is your guestbook entry.  Press ", 
     em('Confirm')," to save it, or ",em('Change'), 
     " to change it.", 
     hr, 
     table(@rows), 
     hr;
 
 
 If all the required fields are filled in, sign_guestbook( )
 generates an HTML table to display the user's entries. The technique
 for generating the form is similar to that used in the previous
 subroutine, except that no special cases are needed for different types
 of fields. We do, however, have to be careful to call escapeHTML( )
 (a function imported from CGI.pm) in order to prevent HTML entities and
 other funny characters that the user might have entered from messing up
 the page. 
 
     print start_form; 
     foreach (@FIELDS) { 
        print hidden(-name => $_); 
     } 
     print submit(-name => 'action', 
                 -value => 'Change Entry'), 
     submit(-name => 'action', 
           -value => 'Confirm Entry'), 
     end_form; 
 }
 
 
 We end the routine by creating a short fill-out form.
 This form contains the contents of the user's guestbook entry stashed
 into a series of hidden fields, and push buttons labeled "Change Entry"
 and "Confirm Entry." We hide the guestbook entry information in this
 way in order to carry the information forward to the next set of pages.
 
 
 sub check_missing { 
     my %p; 
     for (@_) { ++$p{$_} if param($_) } 
     return grep !$p{$_}, keys %REQUIRED; 
 } 
   
 sub print_warning { 
     print font({-color => 'red'}, 
               'Please fill in the following fields: ', 
               em(join ', ', @_), 
               '.'); 
 }
 
 
 The check_missing( ) and print_warning( ) subroutines are short and sweet. The first routine uses the Perl grep( )
 function to check the list of provided fields against the list of
 required fields and returns a list of the truants, if any. The second
 routine accepts a list of missing fields and turns it into a warning of
 the form, "Please fill in the following fields: e-mail." For emphasis, the message is rendered in a red font (under browsers that understand the <FONT> extension). 
 
 
 The write_guestbook( ) and view_guestbook( )
 subroutines are the most complex of the bunch. The main complication is
 that, on an active site, there's a pretty good chance that a second
 instance of the script may be invoked by another user before the first
 instance has completed opening and updating the guestbook file. If the
 writes overlap, the file could be corrupted and a guestbook entry lost
 or scrambled. For this reason, it's important for the script to lock
 the file before working with it. 
 
 
 POSIX-compliant systems (which include both Unix and Windows systems) offer a simple form of advisory file locking through the flock( ) system call. When a process opens a file and flock( ) s it, no other process can flock( )
 it until the first process either closes the file or manually
 relinquishes the lock. There are actually two types of lock. A "shared"
 lock can be held by many processes simultaneously. An "exclusive" lock
 can only be held by one process at a time and prevents any other
 program from locking the file. Typically, a program that wants to read
 from a file will obtain a shared lock, while a program that wants to
 write to the file asks the system for an exclusive lock. A shared lock
 allows multiple programs to read from a file without worrying that some
 other process will change the file while they are still reading it. A
 program that wants to write to a file will call flock( )
 to obtain an exclusive lock; the call will then block until all other
 processes have released their locks. After an exclusive lock is
 granted, no other program can lock the file until the writing process
 has finished its work and released the lock. 
 
 
 It's important to realize that the flock( ) locking mechanism is advisory. Nothing prevents a program from ignoring the flock( )
 call and reading from or writing to a file without seeking to obtain a
 lock first. However, as long as only the programs you've written
 yourself attempt to access the file and you're always careful to call flock( ) before working with it, the system works just fine. 
 
 sub lock { 
     my $path = shift; 
     my $for_writing = shift; 
   
     my ($lock_type, $path_name, $description); 
     if ($for_writing) { 
        $lock_type = LOCK_EX;  
        $path_name = ">>$path";  
        $description = 'writing'; 
     }  
     else { 
        $lock_type = LOCK_SH;  
        $path_name = $path;  
        $description = 'reading'; 
     } 
   
     my $fh = IO::File->new($path_name) or 
        warn "Couldn't open $path for $description: $!", return; 
   
 # now try to lock it 
     my $success; 
     my $tries = 0; 
     while ($tries++ < 10) { 
        last if $success = flock($fh, $lock_type|LOCK_NB); 
        print p("Waiting for $description lock on guestbook file..."); 
        sleep(1);               # wait a second 
     } 
     unless ($success) { 
        warn("Couldn't get lock for $description");  
        return; 
     } 
     return $fh; 
 }
 
 
 To make life a little simpler, the guestbook script defines a utility function named lock( ) that takes care of opening and locking the guestbook file (you'll find the definition at the bottom of the source listing). lock( )
 takes two arguments: the name of the file to open and a flag indicating
 whether the file should be opened for writing. If the write flag is
 true, the function opens the file in append mode and then attempts to
 obtain an exclusive lock. Otherwise, it opens the file read only and
 tries to obtain a shared lock. If successful, the opened filehandle is
 returned to the caller. 
 
 
 The flock( ) function is used
 to obtain both types of lock. The first argument is the opened
 filehandle; the second is a constant indicating the type of lock to
 obtain. The constants for exclusive and shared locks are LOCK_EX and LOCK_SH, respectively. Both constants are imported from the Fcntl module using the :flock tag. We combine these constants with the LOCK_NB (nonblocking) constant, also obtained from Fcntl, in order to tell flock( ) to return if a lock cannot be obtained immediately. Otherwise, flock( )
 will block indefinitely until the file is available. In order to avoid
 a long wait in which the script appears to be hung, we call flock( )
 in a polling loop. If a lock cannot immediately be obtained, we print a
 warning message to the browser screen and sleep for 1 second. After 10
 consecutive failed tries, we give up and exit the script. If the lock
 is successful, we return the filehandle. 
 
 sub write_guestbook { 
     my $fh = lock($GUESTBOOKFILE, 1); 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),p(); 
        Delete('action'); 
        print a({-href => self_url}, 'Try again'); 
        return; 
     } 
     seek ($fh,0,2);  # paranoia: seek to end of file 
     my $date = strftime('%D',localtime); 
     print $fh join("\t", $date, map {escape(param($_))} (@FIELDS)),"\n"; 
     print "Thank you, ", param('name'),", for signing the guestbook.\n"; 
     $fh->close;         
     1; 
 }
 
 To write a new entry into the guestbook, the write_guestbook( ) function calls lock( )
 with the path to the guestbook file and a flag indicating we want write
 access. If the call fails, we display an appropriate error message and
 return. Otherwise, we seek to the end of the file, just in case someone
 else wrote to the file while we were waiting for the lock. We then join
 together the current date (obtained from the POSIX strftime( )
 function) with the current values of the guestbook fields and write
 them out to the guestbook filehandle. To avoid the possibility of the
 user messing up our tab-delimited field scheme by entering tabs or
 newlines in the fill-out form, we're careful to escape the fields
 before writing them to the file. To do this, we use the map operator to pass the fields through CGI.pm's escape( ) function. This function is ordinarily used to make text safe for use in URIs, but it works just as well here. 
 
 
 After writing to the file, we're careful to close the
 filehandle. This releases the lock on the file and gives other
 processes access to it. 
 
 sub view_guestbook { 
     my $show_sign_button = shift; 
     print start_form, 
     submit(-name => 'Sign Guestbook'), 
     end_form 
        if $show_sign_button; 
     my $fh = lock($GUESTBOOKFILE, 0); 
   
     my @rows; 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),br; 
        Delete('action'); 
        print a({-href => self_url},'Try again'); 
        return; 
     }
 
 
 The view_guestbook( )
 subroutine looks a lot like the one we just looked at but in reverse.
 It starts by creating a tiny fill-out form containing a single button
 labeled "Sign Guestbook." This button is only displayed when someone
 views the guestbook without signing it first and is controlled by the $show_sign_button flag. Next we obtain a read-only filehandle on the guestbook file by calling lock( ) with a false second argument. If lock( )
 returns an undefined result, we print an error message and exit.
 Otherwise, we read the contents of the guestbook file line by line and
 split out the fields. 
 
     while (<$fh>) { 
        chomp; 
        my @data = map {escapeHTML($_)} map {unescape($_)} split("\t"); 
        unshift @rows, td(\@data); 
     } 
     unshift @rows, th(['Date',@FIELDS]); 
     print p( 
            table({-border => ''}, 
                  caption(strong('Previous Guests')), 
                  TR(\@rows))); 
     $fh->close; 
     print a({-href => '/'}, 'Home');  
     1; 
 }
 
 
 The fields are then processed through map( ) twice: once to unescape the URL escape characters using the CGI.pm unescape( ) function and once again to make them safe to display on an HTML page using CGI.pm's escapeHTML( ) function. The second round of escaping is to avoid problems with values that contain the <, >, and & symbols. The processed lines are turned into HTML table cells, and unshifted onto a list named @rows. The purpose of the unshift
 is to reverse the order of the lines, so that more recent guestbook
 entries appear at the top of the list. We add the headings for the
 table and turn the whole thing into an HTML table using the appropriate
 CGI.pm shortcuts. We close the filehandle and exit. 
 
 
 If we were not interested in running this script under
 standard CGI, we could increase performance slightly and reduce memory
 consumption substantially by replacing a few functions with their Apache:: equivalents: 
 
 IO::File        --> Apache::File 
 CGI::escape     --> Apache::Util::escape_uri 
 CGI::unescape   --> Apache::Util::unescape_uri 
 CGI::escapeHTML --> Apache::Util::escape_html 
 POSIX::strftime --> Apache::Util::ht_time
 
 
 See the reference listings in Chapter 9 for the proper
 syntax for these replacements. You'll also find a version of the
 guestbook script that uses these lightweight replacements on this
 book's companion web site, http://www.modperl.com. 
 
 
 
 Example 4-15: 
 A Guestbook Script  
 
 
 #!/usr/local/bin/perl 
 # guestbook.cgi 
 use strict; 
 use CGI qw(:standard :html3 escape unescape escapeHTML); 
 use IO::File (); 
 use Fcntl qw(:flock); 
 use POSIX qw(strftime); 
 use vars qw(@FIELDS %REQUIRED %BIG $GUESTBOOKFILE); 
   
 @FIELDS = qw(name e-mail location comments); 
 %REQUIRED = ('name' => 1, 'e-mail' => 1); 
 %BIG = ('comments' => 1); 
   
 if ($ENV{MOD_PERL}) { 
     $GUESTBOOKFILE = Apache->request->dir_config('GuestbookFile'); 
 } 
 $GUESTBOOKFILE ||= "/usr/tmp/guestbookfile.txt"; 
   
 print header, 
     start_html(-title => 'Guestbook', -bgcolor => 'silver'), 
     h1("Guestbook"); 
   
  CASE: { 
      $_ = param('action'); 
      /^sign/i and do    { sign_guestbook(); last CASE; }; 
      /^confirm/i and do { write_guestbook() and view_guestbook(); last CASE; }; 
      /^view/i and do    { view_guestbook(1); last CASE; }; 
      generate_form(); 
  } 
   
 print end_html; 
 exit 0; 
   
 sub generate_form { 
     my @rows; 
     for my $field (@FIELDS) { 
        my $title = "Your $field"; 
        $title .= " (optional)" if !$REQUIRED{$field}; 
        my $element = $BIG{$field} ?  
            textarea(-name => $field, 
                     -rows => 4, 
                     -columns => 50, 
                     -wrap => 1) 
                : textfield(-name => $field, -size => 50); 
        push @rows, th($title) . td($element); 
     } 
     print start_form, 
     table(TR{-align => 'LEFT'}, \@rows), 
     br, 
     submit(-name => 'action', -value => 'View Guestbook'), 
     submit(-name => 'action', -value => 'Sign Guestbook'), 
     end_form; 
 } 
   
 sub sign_guestbook { 
     my @missing = check_missing(@FIELDS); 
     if (@missing) { 
        print_warning(@missing); 
        generate_form(); 
        return; 
     } 
     my @rows; 
     foreach (@FIELDS) { 
        push @rows, TR(th({-align=>'LEFT'},$_),  
                       td(escapeHTML(param($_)))); 
     } 
     print "Here is your guestbook entry.  Press ", 
     em('Confirm')," to save it, or ",em('Change'), 
     " to change it.", 
     hr, 
     table(@rows), 
     hr; 
   
     print start_form; 
     foreach (@FIELDS) { 
        print hidden(-name => $_); 
     } 
     print submit(-name => 'action', 
                 -value => 'Change Entry'), 
     submit(-name => 'action', 
           -value => 'Confirm Entry'), 
     end_form; 
 } 
   
 sub check_missing { 
     my %p; 
     for (@_) { ++$p{$_} if param($_) } 
     return grep !$p{$_}, keys %REQUIRED; 
 } 
   
 sub print_warning { 
     print font({-color => 'red'}, 
               'Please fill in the following fields: ', 
               em(join ', ', @_), 
               '.'); 
 } 
   
 sub write_guestbook { 
     my $fh = lock($GUESTBOOKFILE, 1); 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),p(); 
        Delete('action'); 
        print a({-href => self_url}, 'Try again'); 
        return; 
     } 
     seek ($fh,0,2);  # paranoia: seek to end of file 
     my $date = strftime('%D',localtime); 
     print $fh join("\t", $date, map {escape(param($_))} (@FIELDS)),"\n"; 
     print "Thank you, ", param('name'),", for signing the guestbook.\n"; 
     $fh->close;         
     1; 
 } 
   
 sub view_guestbook { 
     my $show_sign_button = shift; 
     print start_form, 
     submit(-name => 'Sign Guestbook'), 
     end_form 
        if $show_sign_button; 
     my $fh = lock($GUESTBOOKFILE, 0); 
   
     my @rows; 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),br; 
        Delete('action'); 
        print a({-href => self_url},'Try again'); 
        return; 
     } 
     while (<$fh>) { 
        chomp; 
        my @data = map {escapeHTML($_)} map {unescape($_)} split("\t"); 
        unshift @rows, td(\@data); 
     } 
     unshift @rows, th(['Date',@FIELDS]); 
     print p( 
            table({-border => ''}, 
                  caption(strong('Previous Guests')), 
                  TR(\@rows))); 
     $fh->close; 
     print a({-href => '/'}, 'Home');  
     1; 
 } 
   
 sub lock { 
     my $path = shift; 
     my $for_writing = shift; 
   
     my ($lock_type, $path_name, $description); 
     if ($for_writing) { 
        $lock_type = LOCK_EX;  
        $path_name = ">>$path";  
        $description = 'writing'; 
     }  
     else { 
        $lock_type = LOCK_SH;  
        $path_name = $path;  
        $description = 'reading'; 
     } 
   
     my $fh = IO::File->new($path_name) or 
        warn "Couldn't open $path for $description: $!", return; 
   
 # now try to lock it 
     my $success; 
     my $tries = 0; 
     while ($tries++ < 10) { 
        last if $success = flock($fh, $lock_type|LOCK_NB); 
        print p("Waiting for $description lock on guestbook file..."); 
        sleep(1);               # wait a second 
     } 
     unless ($success) { 
        warn("Couldn't get lock for $description");  
        return; 
     } 
     return $fh; 
 }
 
 
 
 
 
 A .htaccess file to go with the guestbook script might be: 
 
 PerlSetVar GuestbookFile /home/www/etc/guests.txt
 
 Apache::Registry Traps 
 
 
 There are a number of traps and pitfalls that you can fall into when using Apache::Registry. This section warns you about them. 
 
 
 It helps to know how Apache::Registry works in order to understand why the traps are there. When the server is asked to return a file that is handled by the Apache::Registry content handler (in other words, a script!), Apache::Registry
 first looks in an internal cache of compiled subroutines that it
 maintains. If it doesn't find a subroutine that corresponds to the
 script file, it reads the contents of the file and repackages it into a
 block of code that looks something like this: 
 
  package $mangled_package_name; 
  use Apache qw(exit); 
  sub handler { 
    #line 1 $original_filename 
    contents of the file 
  }
 
 
 $mangled_package_name is a
 version of the script's URI which has been modified in such a way as to
 turn it into a legal Perl package name while keeping it distinct from
 all other compiled Apache::Registry scripts. For example, the guestbook.cgi script shown in the last section would be turned into a cached subroutine in the package Apache::ROOT::perl::guestbook_2ecgi. The compiled code is then cached for later use.  
 
 
 Before Apache::Registry even comes into play, mod_perl fiddles with the environment to make it appear as if the script were being called under the CGI protocol. For example, the $ENV{QUERY_STRING} environment variable is initialized with the contents of Apache::args( ), and $ENV{SERVER_NAME} is filled in from the value returned by Apache::server_hostname( ). This behavior is controlled by the PerlSetupEnv directive, which is On by default. If your scripts do not need to use CGI %ENV variables, turning this directive Off will reduce memory overhead slightly. 
 
 
 In addition to caching the compiled script, Apache::Registry
 also stores the script's last modification time. It checks the stored
 time against the current modification time before executing the cached
 code. If it detects that the script has been modified more recently
 than the last time it was compiled, it discards the cached code and
 recompiles the script. 
 
 
 The first and most common pitfall when using Apache::Registry
 is to forget that the code will be persistent across many sessions.
 Perl CGI programmers commonly make profligate use of globals, allocate
 mammoth memory structures without disposing of them, and open
 filehandles and never close them. They get away with this because CGI
 scripts are short-lived. When the CGI transaction is done, the script
 exits, and everything is cleaned up automatically. 
 
 
 Not so with Apache::Registry
 scripts (or any other Apache Perl module, for that matter). Globals
 persist from invocation to invocation, big data structures will remain
 in memory, and open files will remain open until the Apache child
 process has exited or the server itself it shut down. 
 
 
 Therefore, it is vital to code cleanly. You should
 never depend on a global variable being uninitialized in order to
 determine when a subroutine is being called for the first time. In
 fact, you should reduce your dependency on globals in general. Close
 filehandles when you are finished with them, and make sure to kill (or
 at least wait on) any child processes you may have launched. 
 
 
 Perl provides two useful tools for writing clean code. use strict turns on checks that make it harder to use global variables unintentionally. Variables must either be lexically scoped (with my
 ) or qualified with their complete package names. The only way around
 these restrictions is to declare variables you intend to use as globals
 at the top of the script with use vars. This code snippet shows how: 
 
 use strict; 
 use vars qw{$INIT $DEBUG @NAMES %HANDLES};
 
 
 We have used strict in many of the examples in the preceding sections, and we strongly recommend it for any Perl script you write. 
 
 
 The other tool is Perl runtime warnings, which can be turned on in Apache::Registry scripts by including a -w switch on the #! line, or within other modules by setting the magic $^W variable to true. You can even enable warnings globally by setting $^W to true inside the server's Perl startup script, if there is one (see Chapter 2). 
 
 
 -w will catch a variety of
 errors, dubious programming constructs, typos, and other sins. Among
 other things, it will warn when a bareword (a string without
 surrounding quotation marks) conflicts with a subroutine name, when a
 variable is used only once, and when a lexical variable is
 inappropriately shared between an outer and an inner scope (a horrible
 problem which we expose in all its gory details a few paragraphs
 later). 
 
 
 -w may also generate hundreds
 of "Use of uninitialized value" messages at runtime, which will fill up
 your server error log. Many of these warnings can be hard to track
 down. If there is no line number reported with the warning, or if the
 reported line number is incorrect,[[2]] try using Perl's #line token described in the perlsyn manual page and in Chapter 9 under "Special Global Variables, Subroutines, and Literals." 
 
 
 It may also be helpful to see a full stack trace of the code which triggered the warning. The cluck( ) function found in the standard Carp module will give you this functionality. Here is an example: 
 
 use Carp (); 
 local $SIG{_    _WARN_    _} = \&Carp::cluck;
 
 
 Note that -w checks are done
 at runtime, which may slow down script execution time. In production
 mode, you may wish to turn warnings off altogether or localize warnings
 using the $^W global variable described in the perlvar manpage. 
 
 
 Another subtle mod_perl trap
 that lies in wait for even experienced programmers involves the sharing
 of lexical variables between outer and inner named subroutines. To
 understand this problem, consider the following innocent-looking code: 
 
 #!/usr/local/bin/perl -w
   
 for (0..3) { 
     bump_and_print(); 
 }
   
 sub bump_and_print { 
     my $a = 1; 
     sub bump { 
        $a++; 
        print "In the inner scope, \$a is $a\n"; 
     } 
     print "In the outer scope, \$a is $a\n"; 
     bump(); 
 }
 
 
 When you run this script, it generates the following inexplicable output: 
 
 Variable "$a" will not stay shared at ./test.pl line 12. 
 In the outer scope, $a is 1 
 In the inner scope, $a is 2 
 In the outer scope, $a is 1 
 In the inner scope, $a is 3 
 In the outer scope, $a is 1 
 In the inner scope, $a is 4 
 In the outer scope, $a is 1 
 In the inner scope, $a is 5
 
 
 For some reason the variable $a has become "unstuck" from its my( ) declaration in bump_and_print( ) and has taken on a life of its own in the inner subroutine bump( ). Because of the -w
 switch, Perl complains about this problem during the compilation phase,
 with the terse warning that the variable "will not stay shared." This
 behavior does not happen if the inner subroutine is made into an
 anonymous subroutine. It only affects named inner subroutines. 
 
 
 The rationale for the peculiar behavior of lexical variables and ways to avoid it in conventional scripts are explained in the perldiag manual page. When using Apache::Registry this bug can bite you when you least expect it. Because Apache::Registry works by wrapping the contents of a script inside a handler( ) function, inner named subroutines are created whether you want them or not. Hence, this piece of code will not do what you expect: 
 
  #!/usr/local/bin/perl 
  use CGI qw/param header/;
   
  my $name = param('name'); 
  print header('text/plain'); 
  print_body(); 
  exit 0;
   
  sub print_body { 
     print "The contents of \$name is $name.\n"; 
  }
 
 
 The first time you run it, it will run correctly, printing the value of the name
 CGI parameter. However, on subsequent invocations the script will
 appear to get "stuck" and remember the values of previous invocations.
 This is because the lexically scoped $name variable is being referenced from within print_body( ), which, when running under Apache::Registry, is a named inner subroutine. Because multiple Apache processes are running, each process will remember a different value of $name, resulting in bizarre and arbitrary behavior. 
 
 
 Perl may be fixed someday to do the right thing with
 inner subroutines. In the meantime, there are several ways to avoid
 this problem. Instead of making the outer variable lexically scoped,
 you can declare it to be a package global, as this snippet shows: 
 
 use strict; 
 use vars '$name'; 
 $name = param('name');
 
 
 Because globals are global, they aren't subject to weird scoping rules.
 
 
 Alternatively, you can pass the variable to the
 subroutine as an argument and avoid sharing variables between scopes
 altogether. This example shows that variant:
 
 my $name = param('name'); 
 print_body($name);
  
 sub print_body { 
    my $name = shift; 
    print "The contents of \$name is $name.\n";      
 }
 
 
 Finally, you can put the guts of your application into a library and use or require it. The Apache::Registry then becomes only a hook that invokes the library: 
 
 #!/usr/local/bin/perl 
 require "my_application_guts"; 
 do_everything();
 
 
 The shared lexical variable problem is a good reason to use the -w switch during Apache::Registry
 script development and debugging. If you see warnings about a variable
 not remaining shared, you have a problem, even if the ill effects don't
 immediately manifest themselves. 
 
 
 Another problem that you will certainly run into involves the use of custom libraries by Apache::Registry scripts. When you make an editing change to a script, the Apache::Registry
 notices the recent modification time and reloads the script. However,
 the same isn't true of any library file that you load into the script
 with use or require. If you make a change to a require
 d file, the script will continue to run the old version of the file
 until the script itself is recompiled for some reason. This can lead to
 confusion and much hair-tearing during development! 
 
 
 You can avoid going bald by using Apache::StatINC, a standard part of the mod_perl distribution. It watches over the contents of the internal Perl %INC array and reloads any files that have changed since the last time it was invoked. Installing Apache::StatINC is easy. Simply install it as the PerlInitHandler for any directory that is managed by Apache::Registry. For example, here is an access.conf entry that installs both Apache::Registry and Apache::StatINC : 
 
 Alias /perl/ /usr/local/apache/perl/ 
 <Location /perl> 
   SetHandler      perl-script 
   PerlHandler     Apache::Registry 
   PerlInitHandler Apache::StatINC 
   PerlSendHeader  On 
   Options         +ExecCGI 
 </Location>
 
 
 Because Apache::StatINC operates at a level above the level of individual scripts, any nonstandard library locations added by the script with use lib or by directly manipulating the contents of @INC will be ignored. If you want these locations to be monitored by Apache::StatINC,
 you should make sure that they are added to the library search path
 before invoking the script. You can do this either by setting the PERL5LIB environment variable before starting up the Apache server (for instance, in the server startup script), or by placing a use lib line in your Perl startup file, as described in Chapter 2. 
 
 
 When you use Apache::StatINC, there is a slight overhead for performing a stat
 on each included file every time a script is run. This overhead is
 usually immeasurable, but it will become noticeable on a heavily loaded
 server. In this case, you may want to forego it and instead manually
 force the embedded Perl interpreter to reload all its compiled scripts by restarting the server with apachectl. In order for this to work, the PerlFreshRestart directive must be turned on in the Apache configuration file. If you haven't done so already, add this line to perl.conf or one of the other configuration files: 
 
 PerlFreshRestart On
 
 
 You can try reloading compiled scripts in this way
 whenever things seem to have gotten themselves into a weird state. This
 will reset all scripts to known initial settings and allow you to
 investigate problems systematically. You might also want to stop the
 server completely and restart it using the -X
 switch. This forces the server to run as a single process in the
 foreground. Interacting with a single process rather than multiple ones
 makes it easier to debug misbehaving scripts. In a production
 environment, you'll want to do this on a test server in order to avoid
 disrupting web services.
 
 
 Handling Errors 
 
 
 Errors in Apache modules do occur, and tracking them
 down is significantly trickier than in standalone Perl or C programs.
 Some errors are due to bugs in your code, while others are due to the
 unavoidable hazards of running in a networked environment. The remote
 user might cancel a form submission before it is entirely done, the
 connection might drop while you're updating a database, or a file that
 you're trying to access might not exist. 
 
 
 A virtuous Apache module must let at least two people
 know when a problem has occurred: you, the module's author, and the
 remote user. You can communicate errors and other exception conditions
 to yourself by writing out entries to the server log. For alerting the
 user when a problem has occurred, you can take advantage of the simple
 but flexible Apache ErrorDocument system, use CGI::Carp, or roll your own error handler. 
 
 
 Error Logging 
 
 
 We talked about tracking down code bugs in Chapter 2
 and will talk more about C-language specific debugging in Chapter 10.
 This section focuses on defensive coding techniques for intercepting
 and handling other types of runtime errors. 
 
 
 The most important rule is to log everything.
 Log anything unexpected, whether it is a fatal error or a condition
 that you can work around. Log expected but unusual conditions too, and
 generate routine logging messages that can help you trace the execution
 of your module under normal conditions. 
 
 
 Apache versions 1.3 and higher offer syslog-like log levels ranging in severity from debug, for low-priority messages, through warn, for noncritical errors, to emerg, for fatal errors that make the module unusable. By setting the LogLevel
 directive in the server configuration file, you can adjust the level of
 messages that are written to the server error log. For example, by
 setting LogLevel to warn, messages with a priority level of warn and higher are displayed in the log; lower-priority messages are ignored. 
 
 
 To use this adjustable logging API, you must load the standard Apache::Log module. This adds a log( ) method to the Apache request object, which will return an Apache::Log
 object. You can then invoke this object's methods in order to write
 nicely formatted log entries to the server's error log at the priority
 level you desire. Here's a short example: 
 
 use Apache::Log ();
  
 my $log = $r->log; 
 $log->debug("Trying to lock guestbook file now"); 
 unless (lock($GUESTBOOKFILE,1)) { 
    $log->emerg("Can't get lock!"); 
    return SERVER_ERROR; 
 } 
 $log->debug("Got lock");
 
 
 In this example, we first obtain a log object by calling the request object's log( ) method. We call the log object's debug( )
 method to send a debug message to the error log and then try to perform
 a locking operation. If the operation fails, we log an error message at
 the emerg priority level using the log object's emerg( ) method and exit. Otherwise, we log another debugging message. 
 
 
 You'll find the full list of method calls made available by Apache::Log in Chapter 9, in the subsection "Logging Methods" under "The Apache Request Object."
 In addition, the Apache Perl API offers three simpler methods for
 entering messages into the log file. You don't have to import the Apache::Log module to use these methods, and they're appropriate for smaller projects (such as most of the examples in this book). 
 
 
 $r->log_error($message) 
 log_error( ) writes out a time-stamped message into the server error log using a facility of error.
 Use it for critical errors that make further normal execution of the
 module impossible. This method predates the 1.3 LogLevel API but still
 exists for backward compatibility and as a shortcut to $r->log->error.  
 $r->warn($message) 
 warn( ) will log an error message with a severity level of warn.
 You can use this for noncritical errors or unexpected conditions that
 you can work around. This method predates the 1.3 LogLevel API but
 still exists for backward compatibility and as a shortcut to $r->log->warn.  
 $r->log_reason($message,$file) 
 This is a special-purpose log message used for
 errors that occur when a content handler tries to process a file. It
 results in a message that looks something like this: 
 access to /usr/local/apache/htdocs/index.html failed for ppp12.yahoo.com,  
     reason: user phyllis not authorized
 
 
 You might also choose to include a $DEBUG
 global in your modules, either hard-coding it directly into the source,
 or by pulling its value out of the configuration file with Apache::dir_config( ).
 Your module can then check this global every time it does something
 significant. If set to a true value, your script should send verbose
 informational messages to the Apache error log (or to an alternative
 log file of your choice). 
 
 
 The ErrorDocument System 
 
 
 Apache provides a handy ErrorDocument
 directive that can be used to display a custom page when a handler
 returns a non-OK status code. The custom page can be any URI, including
 a remote web page, a local static page, a local server-side include
 document, or a CGI script or module. In the last three cases, the
 server generates an internal redirect, making the redirection very
 efficient. 
 
 
 For example, the configuration file for Lincoln's laboratory site contains this directive:  
 
 ErrorDocument 404 /perl/missing.cgi
 
 
 When the server encounters a 404 "Not Found" status
 code, whether generated by a custom module or by the default content
 handler, it will generate an internal redirect to a mod_perl script named missing.cgi. Before calling the script, Apache sets some useful environment variables including the following:  
 
 
 REDIRECT_URL 
 The URL of the document that the user was originally trying to fetch.  
 REDIRECT_STATUS 
 The status code that caused the redirection to occur.  
 REDIRECT_REQUEST_METHOD 
 The method (GET or POST) that caused the redirection.  
 REDIRECT_QUERY_STRING 
 The original query string, if any. 
 REDIRECT_ERROR_NOTES 
 The logged error message, if any. 
 
 
 A slightly simplified version of missing.cgi that works with Apache::Registry (as well as a standalone CGI script) is shown in Example 4-16. For a screenshot of what the user gets when requesting a nonexistent URI, see Figure 4-9.  
 
 
 
 Figure 4-9. 
 The missing.cgi script generates a custom page to display when a URI is not found.  
 
 
 
  
 
 
 
 
 
 
 Example 4-16: 
 A Simple Apache::Registry ErrorDocument Handler  
 
 
 #!/usr/local/bin/perl 
 # file: missing.cgi 
 use CGI qw(:standard); 
 use strict; 
   
 print header, 
       start_html(-title => 'Missing Document', -bgcolor => 'white'), 
       h1(img({-src => '/icons/unknown.gif'}), 
       'Document Not Found'), 
       p("I'm sorry, but the document you requested,", 
         strong($ENV{REDIRECT_URL}), 
         "is not available.  Please try the", 
         a({-href => "/search.html"}, "search page"), 
         "for help locating the document."), 
       hr, 
       address(a({-href => "mailto:$ENV{SERVER_ADMIN}"}, 'webmaster')), 
       end_html;
 
 
 
 
 
 If you want to implement the ErrorDocument handler as a vanilla Apache Perl API script, the various REDIRECT_ environment variables will not be available to you. However, you can get the same information by calling the request object's prev( )
 method. This returns the request object from the original request. You
 can then query this object to recover the requested URI, the request
 method, and so forth. 
 
 
 Example 4-17 shows a rewritten version of missing.cgi that uses prev( ) to recover the URI of the missing document. The feature to note in this code is the call to $r->prev on the fifth line of the handler( )
 subroutine. If the handler was invoked as the result of an internal
 redirection, this call will return the original request object, which
 we then query for the requested document by calling its uri( )
 method. If the handler was invoked directly (perhaps by the user
 requesting its URI), the original request will be undefined and we use
 an empty string for the document URI. 
 
 
 
 Example 4-17: 
 An ErrorDocument Handler Using the Vanilla Apache API  
 
 
 package Apache::Missing; 
 # File: Apache/Missing.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use CGI qw(:html); 
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     my $original_request = $r->prev; 
     my $original_uri = $original_request ? $original_request->uri : ''; 
     my $admin = $r->server->server_admin; 
   
     $r->print( 
              start_html(-title => 'Missing Document', 
                         -bgcolor => 'white'), 
              h1(img({-src => '/icons/unknown.gif'}), 
                 'Document Not Found'), 
              p( 
                "I'm sorry, but the document you requested,", 
                strong($original_uri), 
                ", is not available.  Please try the", 
                a({-href => "/search.html"}, "search page"), 
                "for help locating the document." 
                ), 
              hr, 
              address(a({-href => "mailto:$admin"}, 'webmaster')), 
              end_html 
              ); 
   
     return OK; 
 } 
   
 1; 
 _    _END_    _ 
 
 
 
 
 
 Here's an example using Apache::Missing in the configuration file:
 
 <Location /Missing> 
    SetHandler  perl-script 
    PerlHandler Apache::Missing 
 </Location>
 
 
 If the static nature of the Apache ErrorDocument
 directive is inadequate for your needs, you can set the error document
 dynamically from within a handler by calling the request object's custom_response( )
 method. This method takes two arguments: the status code of the
 response you want to handle and the URI of the document or module that
 you want to pass control to. This error document setting will persist
 for the lifetime of the current request only. After the handler exits,
 the setting returns to its default. 
 
 
 For example, the following code snippet sets up a custom error handler for the SERVER_ERROR error code (a generic error that covers a variety of sins). If the things_are_ok( )
 subroutine (not implemented here) returns a true value, we do our work
 and return an OK status. Otherwise, we set the error document to point
 to a URI named /Carp and return a SERVER_ERROR status. 
 
 package Apache::GoFish; 
 # file: Apache/GoFish.pm
   
 use strict; 
 use Apache::Constants qw(:common);
   
 sub handler { 
    my $r = shift; 
    if (things_are_ok($r)) { 
      do_something(); 
      return OK; 
    } 
    $r->custom_response(SERVER_ERROR, "/Carp"); 
    return SERVER_ERROR; 
 }
   
 1; 
 _    _END_    _
 
 HTTP Headers and Error Handling 
 
 
 You already know about using header_out( ) to set HTTP header fields. A properly formatted HTTP header is sent to the browser when your module explicitly calls send_http_header( ), or it is sent for you automatically if you are using Apache::Registry, the PerlSendHeader directive is set to On, and your script prints some text that looks like an HTTP header. 
 
 
 You have to be careful, however, if your module ever
 returns non-OK status codes. Apache wants to assume control over the
 header generation process in the case of errors; if your module has
 already sent the header, then Apache will send a redundant set of
 headers with unattractive results. This applies both to real HTTP
 errors, like BAD_REQUEST and NOT_FOUND, as well as to nonfatal conditions like REDIRECT and AUTH_REQUIRED. 
 
 
 Consider the following fishy example: 
 
 package Apache::Crash; 
 # File: Apache/Crash.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use constant CRASH => 1; 
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     return SERVER_ERROR if CRASH; 
     $r->print('Half a haddock is better than none.'); 
     return OK; 
 } 
   
 1; 
 _    _END_    _
 
 
 After setting the document MIME type, this module sends off the HTTP header. It then checks a constant named CRASH and if true, which it always is, returns a status code of SERVER_ERROR.
 Apache would ordinarily send a custom HTTP header in response to this
 status code, but because the module has already emitted a header, it's
 too late. Confusion results. If we map this module to the URI /Crash, we can telnet directly to the server to demonstrate the problem: 
 
 % telnet www.modperl.com 80 
 Trying 192.168.2.5... 
 Connected to modperl.com. 
 Escape character is '^]'. 
 GET /Crash HTTP/1.0  
 HTTP/1.1 200 OK 
 Date: Thu, 21 May 1998 11:31:40 GMT 
 Server: Apache/1.3b6 
 Connection: close 
 Content-Type: text/plain
   
 HTTP/1.1 200 OK 
 Date: Thu, 21 May 1998 11:31:40 GMT 
 Server: Apache/1.3b6 
 Connection: close 
 Content-Type: text/html
   
 <HTML><HEAD> 
 <TITLE>500 Internal Server Error</TITLE> 
 </HEAD><BODY> 
 <H1>Internal Server Error</h2> 
 The server encountered an internal error or 
 misconfiguration and was unable to complete 
 your request.<P> 
 </BODY></HTML> 
 Connection closed by foreign host.
 
 
 Not only are there two HTTP headers here, but both of them indicate a status code of 200 OK,
 which is definitely not right. When displayed in the browser, the page
 will be marred by extraneous header lines at the top of the screen. 
 
 
 The cardinal rule is that you should never call Apache::send_http_header( ) until your module has completed all its error checking and has decided to return an OK status code. Here's a better version of Apache::Crash that avoids the problem: 
 package Apache::Crash; 
 # File: Apache/Crash.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use constant CRASH => 1; 
   
 sub handler { 
     my $r = shift; 
     return SERVER_ERROR if CRASH; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     $r->print('Half a haddock is better than none.'); 
     return OK; 
 } 
   
 1; 
 _    _END_    _
 
 
 Now when we telnet to the server, the server response looks the way it should: 
 
 (~) 103% telnet www.modperl.com 80 Trying 192.168.2.5... 
 Connected to modperl.com. 
 Escape character is '^]'. 
 GET /Crash HTTP/1.0  
  
 HTTP/1.1 500 Internal Server Error 
 Date: Thu, 21 May 1998 11:40:56 GMT 
 Server: Apache/1.3b6 
 Connection: close 
 Content-Type: text/html
  
 <HTML><HEAD> 
 <TITLE>500 Internal Server Error</TITLE> 
 </HEAD><BODY> 
 <H1>Internal Server Error</h2> 
 The server encountered an internal error or 
 misconfiguration and was unable to complete 
 your request.<P> 
 </BODY></HTML>
 
 
 Another important detail about error handling is that Apache ignores the fields that you set with header_out( )
 when your module generates an error status or invokes an internal
 redirect. This is usually not a problem, but there are some cases in
 which this restriction can be problematic. The most typical case is the
 one in which you want a module to give the browser a cookie and
 immediately redirect to a different URI. Or you might want to assign an
 error document to the UNAUTHORIZED status
 code so that a custom login screen appears when the user tries to
 access a restricted page. In both cases you need to manipulate the HTTP
 header fields prior to the redirect. 
 
 
 For these cases, call the request object's err_header_out( ) method. It has identical syntax to header_out( ),
 but the fields that you set with it are sent to the browser only when
 an error has occurred. Unlike ordinary headers, the fields set with err_header_out( ) persist across internal redirections, and so they are passed to Apache ErrorDocument handlers and other local URIs. 
 
 
 This provides you with a simple way to pass information
 between modules across internal redirects. Combining the example from
 this section with the example from the previous section gives the
 modules shown in Example 4-18. Apache::GoFish generates a SERVER_ERROR, which is intercepted and handled by the custom ErrorDocument handler named Apache::Carp (Example 4-19). Before relinquishing control, however, Apache::GoFish creates a custom HTTP field named X-Odor which gives the error handler something substantial to complain about. The end result is shown in Figure 4-10. 
 
 
 
 Figure 4-10. 
 When Apache::GoFish generates a custom error document, it displays the contents of the custom X-Odor header.  
 
 
 
  
 
 
 
 
 
 The code should be fairly self-explanatory. The main point to notice is Apache::GoFish 's use of err_header_out( ) to set the value of the X-Odor field, and Apache::Carp 's use of the same function to retrieve it. Like header_out( ), when you call err_header_out( )
 with a single argument, it returns the current value of the field and
 does not otherwise alter the header. When you call it with two
 arguments, it sets the indicated field. 
 
 
 An interesting side effect of this technique is that the X-Odor
 field is also returned to the browser in the HTTP header. This could be
 construed as a feature. If you wished to pass information between the
 content handler and the error handler without leaving tracks in the
 HTTP header, you could instead use the request object's "notes" table
 to pass messages from one module to another. Chapter 9 covers how to
 use this facility (see the description of the notes( ) method under "Server Core Functions"). 
 
 
 
 Example 4-18: 
 Invoking a Custom Error Handler Document  
 
 
 package Apache::GoFish; 
 # File: Apache/GoFish.pm
   
 use Apache::Constants qw(:common :response); 
 use constant CRASH=>1;
   
 sub handler { 
    my $r = shift; 
    $r->err_header_out('X-Odor'=>"something's rotten in Denmark"); 
    $r->custom_response(SERVER_ERROR, "/Carp"); 
    return SERVER_ERROR if CRASH; 
    $r->content_type('text/plain'); 
    $r->send_http_header; 
    return OK if $r->header_only; 
    $r->print('Half a haddock is better than none.'); 
    return OK; 
 } 
 1; 
 _    _END_    _ 
 
 
 
 
 
 Here is a sample configuration entry:
 
 <Location /GoFish> 
    SetHandler perl-script 
    PerlHandler Apache::GoFish 
 </Location> 
 
 
 Example 4-19: 
 An Error Handler to Complement the Previous Example 
 
 
 package Apache::Carp; 
 # File: Apache/Carp.pm 
 use strict; 
 use Apache::Constants qw(:common); 
 use CGI qw(:html); 
   
 sub handler { 
     my $r = shift; 
     my $odor = $r->err_header_out('X-Odor'); 
     $odor ||= 'unspecified odor'; 
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     my $original_request = $r->prev; 
     my $original_uri = $original_request ? $original_request->uri : ''; 
     my $admin = $r->server->server_admin; 
   
     $r->print( 
              start_html(-title => 'Phew!!', -bgcolor => 'white'), 
              h1('Phew!!'), 
              p("Something fishy happened while processing this request."), 
              p("The odor was ", strong($odor), '.'), 
              hr, 
              address(a({-href => "mailto:$admin"}, 'webmaster')), 
              end_html 
              ); 
   
     return OK; 
 } 
   
 1; 
 _    _END_    _ 
 
 
 
 
 
 
 Here is a sample configuration entry:
 
 <Location /Carp> 
    SetHandler  perl-script    PerlHandler Apache::Carp 
 </Location>
 
 
 Chaining Content Handlers 
 
 
 The C-language Apache API only allows a single content
 handler to completely process a request. Several handlers may be given
 a shot at it, but the first one to return an OK status will terminate
 the content handling phase of the transaction. 
 
 
 There are times when it would be nice to chain handlers
 into a pipeline. For example, one handler could add canned headers and
 footers to the page, another could correct spelling errors, while a
 third could add trademark symbols to all proprietary names. Although
 the native C API can't do this yet,[[3]] the Perl API can, using a technique called "stacked handlers."   
 
 
 It is actually quite simple to stack handlers. Instead of declaring a single module or subroutine in the PerlHandler
 directive, you declare several. Each handler will be called in turn in
 the order in which it was declared. The exception to this rule is if
 one of the handlers in the series returns an error code (anything other
 than OK, DECLINED, or DONE). Handlers can adjust the stacking order themselves, or even arrange to process each other's output. 
 
 
 Simple Case of Stacked Handlers 
 
 
 Example 4-20 gives a very simple example of a stack of three content handlers. It's adapted slightly from the mod_perl manual page. For simplicity, all three handlers are defined in the same file, and are subroutines named header( ), body( ), and footer( ).
 As the names imply, the first handler is responsible for the top of the
 page (including the HTTP header), the second is responsible for the
 middle, and the third for the bottom. 
 
 
 A suitable configuration section looks like this: 
 
 PerlModule My 
 <Location /My> 
   SetHandler perl-script 
   PerlHandler My::header My::body My::footer 
 </Location>
 
 
 We first load the whole module into memory using the PerlModule directive. We then declare a URI location /My and assign the perl-script handler to it. Perl in turn is configured to run the My::header, My::body, and My::footer subroutines by passing them as arguments to a PerlHandler directive. In this case, the /My location has no corresponding physical directory, but there's no reason that it couldn't. 
 
 
 After bringing in the OK constant from Apache::Constants, we define the subroutines header( ), body( ), and footer( ). header( ) sets the document's content type to plain text, sends the HTTP header, and prints out a line at the top of the document. body( ) and footer( ) both print out a line of text to identify themselves. The resulting page looks like this: 
 
 header text 
 body text 
 footer text
 
 
 Example 4-20: 
 A Simple Stacked Handler  
 
 
 package My;
   
 use strict; 
 use Apache::Constants 'OK';
   
 sub header { 
    my $r = shift; 
    $r->content_type('text/plain'); 
    $r->send_http_header; 
    $r->print("header text\n"); 
    OK; 
 } 
 sub body { 
    my $r = shift; 
    $r->print("body text\n"); 
    OK; 
 } 
 sub footer { 
    my $r = shift; 
    $r->print("footer text\n"); 
    OK; 
 } 
 1;
 
 
 
 
 
 
 Coordinating Stacked Handlers 
 
 
 Stacked handlers often have to coordinate their activities. In the example of the previous section, the header( )
 handler must be run before either of the other two in order for the
 HTTP header to come out correctly. Sometimes it's useful to make the
 first handler responsible for coordinating the other routines rather
 than relying on the configuration file. The request object's push_handlers( ) method will help you do this. 
 
 
 push_handlers( ) takes two
 arguments: a string representing the phase to handle, and a reference
 to a subroutine to handle that phase. For example, this code fragment
 will arrange for the footer( ) subroutine to be the next content handler invoked: 
 
 $r->push_handlers(PerlHandler => \&footer);
 
 
 With this technique, we can rewrite the previous example along the lines shown in Example 4-21. In the revised module, we declare a subroutine named handler( ) that calls push_handlers( ) three times, once each for the header, body, and footer of the document. It then exits. The other routines are unchanged. 
 
 
 The revised configuration file entry looks like this: 
 
 <Location /MyChain> 
   SetHandler perl-script 
   PerlHandler My::Chain 
 </Location>
 
 
 Because we followed the mod_perl convention of naming the first handler subroutine handler( ), there's now no need for a PerlModule statement to load the module into memory. 
 
 
 
 Example 4-21: 
 Coordinated Stacked Handlers  
 
 
 package My::Chain; 
   
 use strict; 
 use Apache::Constants 'OK'; 
   
 sub handler { 
     my $r = shift; 
     for my $cv (\&header, \&body, \&footer) { 
        $r->push_handlers(PerlHandler => $cv); 
     } 
     OK; 
 } 
   
 sub header { 
     my $r = shift; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     $r->print("header text\n"); 
     OK; 
 } 
   
 sub body { 
     my $r = shift; 
     $r->print("body text\n"); 
     OK; 
 } 
   
 sub footer { 
     my $r = shift; 
     $r->print("footer text\n"); 
     OK; 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 Stacked Handler Pipelining 
 
 
 The stacked handlers we looked at in the previous
 example didn't interact. When one was finished processing, the next
 took over. A more sophisticated set of handlers might want to pipeline
 their results in such a way that the output of one handler becomes the
 input to the next. This would allow the handlers to modify each other's
 output in classic Unix filter fashion. This sounds difficult, but in
 fact it's pretty simple. This section will show you how to set up a
 filter pipeline. As an aside, it will also introduce you to the concept
 of Apache Perl API method handlers. 
 
 
 The trick to achieving a handler pipeline is to use
 "tied" filehandles to connect the neighbors together. In the event that
 you've never worked with a tied filehandle before, it's a way of giving
 a filehandle seemingly magic behavior. When you print( )
 to a tied filehandle, the data is redirected to a method in a
 user-defined class rather than going through the usual filesystem
 routines. To create a tied filehandle, you simply declare a class that
 defines a method named TIEHANDLE( ) and various methods to handle the sorts of things one does with a filehandle, such as PRINT( ) and READ( ). 
 
 
 Here's a concrete example of a tied filehandle class that interfaces to an antique daisywheel printer of some sort: 
 
 package DaisyWheel;
  
 sub TIEHANDLE { 
   my($class, $printer_name) = @_; 
   open_daisywheel($printer_name); 
   bless { 'printer' => $printer_name }, $class; 
 }
  
 sub PRINT { 
   my $self = shift; 
   send_to_daisywheel($self->{'printer'}, @_); 
 }
  
 sub DESTROY { 
   my $self = shift; 
   close_daisywheel($self->{'printer'}); 
 }
  
 1; 
 _    _END_    _
 
 
 The TIEHANDLE( ) method gets
 called first. It is responsible for opening the daisywheel printer
 driver (routine not shown here!) and returning a blessed object
 containing its instance variables. The PRINT( )
 method is called whenever the main program prints to the tied
 filehandle. Its arguments are the blessed object and a list containing
 the arguments to print( ). It recovers the
 printer name from its instance variables and then passes it, and the
 items to print, to an internal routine that does the actual work. DESTROY( ) is called when the filehandle is untie( ) d or closed. It calls an internal routine that closes the printer driver. 
 
 To use this class, a program just has to call tie( ) with the name of an appropriate printer: 
 
 use DaisyWheel (); 
 tie *DAISY, 'DaisyWheel', 'dwj002'; 
 print DAISY "Daisy... Daisy... Daisy the Kangaroo.\n"; 
 print DAISY "She wanted to live in a private home,\n"; 
 print DAISY "So she ran away from the zoo!\n"; 
 close DAISY;
 
 
 A more complete tied filehandle class might include a PRINTF( ) method, a READ( ) method, a READLINE( ) method, and a GETC( ) method, but for output-only filehandles PRINT( ) is usually enough. 
 
 
 Now back to Apache. The strategy will be for each
 filter in the pipeline, including the very first and last ones, to
 print to STDOUT, rather than directly invoking the Apache::print( ) method via the request object. We will arrange for STDOUT to be tied( ) in each case to a PRINT( ) method defined in the next filter down the chain. The whole scheme looks something like this: 
 
 filter1 -> filter2::PRINT()   [STDOUT tied to filter2] 
 filter2 -> filter3::PRINT()   [STDOUT tied to filter3] 
 filter3 -> filter4::PRINT()   [STDOUT tied to filter4] 
            .
            .
            .
 filterN -> Apache::PRINT()    [STDOUT tied to Apache]
 
 
 Interestingly enough, the last filter in the chain
 doesn't have to get special treatment. Internally, the Apache request
 ties STDOUT to Apache::PRINT( ), which in turn calls Apache::print( ). This is why handlers can use $r->print('something') and print('something') interchangeably. 
 
 
 To simplify setting up these pipelines, we'll define a utility class called Apache::Forward.[[4]] Apache::Forward
 is a null filter that passes its input through to the next filter in
 the chain unmodified. Modules that inherit from this class override its
 PRINT( ) method to do something interesting with the data. 
 
 
 Example 4-22 gives the source code for Apache::Forward. We'll discuss the code one section at a time.  
 
 package Apache::Forward; 
   
 use strict; 
 use Apache::Constants qw(OK SERVER_ERROR); 
 use vars qw($VERSION); 
 $VERSION = '1.00'; 
   
 sub handler ($$) { 
     my($class, $r) = @_; 
     my $next = tied *STDOUT || return SERVER_ERROR; 
     tie *STDOUT, $class, $r, $next or return SERVER_ERROR; 
     $r->register_cleanup(sub { untie *STDOUT }); 
     OK; 
 }
 
 
 Most of the work is done in the handler( ) subroutine, which is responsible for correctly tying the STDOUT filehandle. Notice that the function prototype for handler( ) is ($$), or two scalar arguments. This is a special signal to Apache to activate its method handler behavior. Instead of calling handler( ) like an ordinary subroutine, Apache calls handler( ) like this: 
 
 Apache::Forward->handler($r);
 
 
 The result is that the handler( )
 receives the class name as its first argument, and the request object
 as the second argument. This object-oriented calling style allows Apache::Forward to be subclassed. 
 
 
 The handler( ) subroutine begins by recovering the identity of the next handler in the pipeline. It does this by calling tied( ) on the STDOUT filehandle. tied( )
 returns a reference to whatever object a filehandle is tied to. It will
 always return a valid object, even when the current package is the last
 filter in the pipeline. This is because Apache ties STDOUT to itself,
 so the last filter will get a reference to the Apache object.
 Nevertheless, we do check that tied( ) returns an object and error out if not--just in case.  
 
 
 Next the subroutine reties STDOUT to itself, passing tie( )
 the request object and the reference to the next filter in the
 pipeline. This call shouldn't fail, but if it does, we return a server
 error at this point. 
 
 
 Before finishing up, the handler( )
 method needs to ensure that the filehandle will be untied before the
 transaction terminates. We do this by registering a handler for the
 cleanup phase. This is the last handler to be called before a
 transaction terminates and is traditionally reserved for this kind of
 garbage collection. We use register_cleanup( )
 to push an anonymous subroutine that unties STDOUT. When the time
 comes, the filehandle will be untied, automatically invoking the
 class's DESTROY( ) method. This gives the
 object a chance to clean up, if it needs to. Note that the client
 connection will be closed before registered cleanups are run, so class DESTROY( ) methods should not attempt to send any data to the client. 
 
 sub TIEHANDLE { 
     my($class, $r, $next) = @_; 
     bless { 'r' => $r,        # request object 
             'next' => $next   # next in the chain 
           }, $class; 
 }
 
 
 The next routine to consider is TIEHANDLE( ), whose job is to return a new blessed object. It creates a blessed hash containing the keys r and next. r points to the request object, and next points to the next filter in the pipeline. Both of these arguments were passed to us by handler( ). 
 
 sub PRINT { 
     my $self = shift; 
     # Subclasses should do something interesting here 
     $self->forward(@_); 
 }
 
 
 The PRINT( ) method is
 invoked whenever the caller wants to print something to the tied
 filehandle. The arguments consist of the blessed object and a list of
 data items to be processed. Subclasses will want to modify the data
 items in some way, but we just forward them unmodified to the next
 filter in line by calling an internal routine named forward( ). 
 
 #sub DESTROY { 
 #    my $self = shift; 
 #    # maybe clean up here 
 #}
 
 
 DESTROY( ) is normally
 responsible for cleaning up. There's nothing to do in the general case,
 so we comment out the definition to avoid being called, saving a bit of
 overhead. 
 
 sub forward { 
     shift()->{'next'}->PRINT(@_); 
 }
 
 
 forward( ) is called by PRINT( )
 to forward the modified data items to the next filter in line. We shift
 the blessed object off the argument stack, find the next filter in
 line, and invoke its PRINT( ) method. 
 
 
 
 Example 4-22: 
 A Chained Content Handler  
 
 
 package Apache::Forward; 
   
 use strict; 
 use Apache::Constants qw(OK SERVER_ERROR); 
 use vars qw($VERSION); 
 $VERSION = '1.00'; 
   
 sub handler ($$) { 
     my($class, $r) = @_; 
     my $next = tied *STDOUT || return SERVER_ERROR; 
     tie *STDOUT, $class, $r, $next or return SERVER_ERROR; 
     $r->register_cleanup(sub { untie *STDOUT }); 
     OK; 
 } 
   
 sub TIEHANDLE { 
     my($class, $r, $next) = @_; 
     bless { 'r' => $r,          # request object 
             'next' => $next     # next in the chain 
           }, $class; 
 } 
   
 sub PRINT { 
     my $self = shift; 
     # Subclasses should do something interesting here 
     $self->forward(@_); 
 } 
   
 #sub DESTROY { 
 #    my $self = shift; 
 #    # maybe clean up here 
 #} 
   
 sub forward { 
     shift()->{'next'}->PRINT(@_); 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 Having defined the filter base class, we can now define
 filters that actually do something. We'll show a couple of simple ones
 to give you the idea first, then create a larger module that does
 something useful. 
 
 
 Apache::Upcase (Example 4-23) transforms everything it receives into uppercase letters. It inherits from Apache::Forward and then overrides the PRINT( ) method. PRINT( ) loops through the list of data items, calling uc( ) on each. It then forwards the modified data to the next filter in line by calling its forward( ) method (which we do not need to override). 
 
 
 
 Example 4-23: 
 Apache::Upcase Transforms Its Input into Uppercase  
 
 
 package Apache::Upcase; 
   
 use strict; 
 use Apache::Forward (); 
 use vars qw(@ISA $VERSION); 
 @ISA = qw(Apache::Forward); 
 $VERSION = '1.00'; 
   
 sub PRINT { 
     my $self = shift; 
     $self->forward(map { uc $_ } @_); 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 Along the same lines, Apache::Censor (Example 4-24)
 filters its input data to replace four-letter words with starred
 versions. It takes the definition of "four-letter word" a little
 liberally, transforming "sent" into "s**t." It is identical in every
 way to Apache::Upcase, except that PRINT( )
 performs a global regular expression substitution on the input data.
 The transformed data is then forwarded to the next filter as before. 
 
 
 
 Example 4-24: 
 A Handler that Removes Four-Letter Words  
 
 
 package Apache::Censor;
   
 use strict; 
 use Apache::Forward (); 
 use vars qw(@ISA $VERSION); 
 @ISA = qw(Apache::Forward); 
 $VERSION = '1.00'; 
   
 sub PRINT { 
     my($self, @data) = @_; 
     foreach (@data) { s/\b(\w)\w{2}(\w)\b/$1**$2/g; } 
     $self->forward(@data); 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 To watch these filters in action, we need a data
 source. Here's a very simple content handler that emits a constant
 string. It is very important that the content be sent with a regular print( ) statement rather than the specialized $r->print( ) method. If you call Apache::print( ) directly, rather than through the tied STDOUT filehandle, you short-circuit the whole chain! 
 
 package Apache::TestFilter; 
   
 use strict; 
 use Apache::Constants 'OK'; 
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     print(<<END); 
 This is some text that is being sent out with a print() 
 statement to STDOUT.  We do not know whether STDOUT is tied 
 to Apache or to some other source, and in fact it does not 
 really matter.  We are just the content source.  The filters 
 come later. 
 END 
     OK; 
 } 
   
 1; 
 _    _END_    _
 
 
 The last step is to provide a suitable entry in the configuration file. The PerlHandler directive should declare the components of the pipeline in reverse
 order. As Apache works its way forward from the last handler in the
 pipeline to the first, each of the handlers unties and reties STDOUT.
 The last handler in the series is the one that creates the actual
 content. It emits its data using print( ) and the chained handlers do all the rest. Here's a sample entry: 
 
 <Location /Filter> 
   SetHandler  perl-script 
   PerlHandler Apache::Upcase Apache::Censor Apache::TestFilter 
 </Location>
 
 
 Figure 4-11 shows the page that appears when the pipeline runs. 
 
 
 
 Figure 4-11. 
 The final output from three chained content handlers  
 
 
 
  
 
 
 
 
 
 The last filter we'll show you is actually useful in
 its own right. When inserted into a filter pipeline, it compresses the
 data stream using the GZip protocol, and flags the browser that the
 data has been GZip-encoded by adding a Content-Encoding
 field to the HTTP header. Browsers that support on-the-fly
 decompression of GZip data will display the original document without
 any user intervention.[[5]] 
 
 
 This filter requires the zlib compression library and its Perl interface, Paul Marquess' Compress::Zlib. zlib, along with instructions on installing it, can be found at ftp://ftp.uu.net/pub/archiving/zip/zlib*. As usual, you can find Compress::Zlib
 at CPAN. Together these libraries provide both stream-based and
 in-memory compression/decompression services, as well as a high-level
 interface for creating and reading gzip files. 
 
 
 The filter is a little more complicated than the
 previous ones because GZip works best when the entire document is
 compressed in a single large segment. However, the filter will be
 processing a series of print( ) statements
 on data that is often as short as a single line. Although we could
 compress each line as a single segment, compression efficiency suffers
 dramatically. So instead we buffer the output, using zlib 's stream-oriented compression routines to emit the encoded data whenever zlib
 thinks enough data has been received to compress efficiently. We also
 have to take care of the details of creating a valid GZip header and
 footer. The header consists of the current date, information about the
 operating system, and some flags. The footer contains a CRC redundancy
 check and the size of the uncompressed file. 
 
 
 Example 4-25 gives the complete code for Apache::GZip. Although it inherits its core functionality from Apache::Forward, each subroutine has to be tweaked a bit to support the unique requirements of GZip compression.   
 
 package Apache::GZip; 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Compress::Zlib qw(deflateInit crc32 MAX_WBITS Z_DEFLATED); 
 use Apache::Forward (); 
 use vars qw($VERSION @ISA); 
   
 use constant GZIP_MAGIC => 0x1f8b; 
 use constant OS_MAGIC => 0x03; 
   
 $VERSION = '1.00'; 
 @ISA = qw(Apache::Forward);
 
 
 After the usual preamble, we import the compression routines from Compress::Zlib, and bring in the Apache::Forward
 class. We then define a couple of constants needed for the GZip header
 (in case you're wondering, we got these constants by looking at the zlib C code). 
 
 sub handler ($$) { 
     my($class, $r) = @_; 
     #return DECLINED unless $r->header_in("Accept-Encoding") =~ /gzip/; 
     $r->content_encoding('gzip'); 
     $class->SUPER::handler($r); 
 }
 
 
 In order for the browser to automatically decompress the data, it needs to see a Content-Encoding field with the value gzip in the HTTP header. In order to insert this field, we override the parent class's handler( ) subroutine and set the field using the request object's content_encoding( ) method. We then call our superclass's handler( ) method to do the rest of the work. 
 
 
 The commented line that comes before the call to content_encoding( ) is an attempt to "do the right thing." Browsers are supposed to send a header named Accept-Encoding
 if they can accept compressed or otherwise encoded data formats. This
 line tests whether the browser can accept the GZip format and declines
 the transaction if it can't. Unfortunately, it turns out that many
 Netscape browsers don't transmit this essential header, so we skip the
 test.[[6]] 
 
 sub TIEHANDLE { 
     my $class = shift; 
     my $r = shift;  
     my $self = $class->SUPER::TIEHANDLE($r, @_); 
     my $d = deflateInit(-WindowBits => -MAX_WBITS()) || return; 
     @{$self}{'crc','d','l','h'} = (crc32(undef),$d,0,0); 
     $r->push_handlers(PerlHandler => sub { $self->flush }); 
     return $self; 
 }
 
 
 All the compression work is done in TIEHANDLE( ), PRINT( ), and flush( ). TIEHANDLE( ) begins by invoking the superclass's handler( ) method to create an object blessed into the current class. The method then creates a new Compress::Zlib deflation object by calling deflateInit( ), using an argument of -WindowBits that is appropriate for GZip files (again, we got this by reading the zlib
 C source code). Finally we add a few new instance variables to the
 object and return it to the caller. The instance variables include crc, for the cyclic redundancy check, d for the deflation object, l for the total length of the uncompressed data, and h for a flag that indicates whether the header has been printed.[[7]] Finally, TIEHANDLE( ) will call the push_handlers( ) method, installing our flush( ) method at the end of the output chain.  
 
 sub gzheader { 
     pack("nccVcc", GZIP_MAGIC, Z_DEFLATED, 0,time,0, OS_MAGIC) 
 } 
   
 sub PRINT { 
     my $self = shift; 
     $self->forward(gzheader()) unless $self->{'h'}++; 
     foreach (@_) { 
        my $data = $self->{d}->deflate($_); 
        $self->{l} += length($_); 
        $self->{crc} = crc32($_, $self->{crc}); 
        $self->forward($data); 
     } 
 } 
 
 
 The PRINT( ) method is called once each time the previous filter in the pipeline calls print( ). It first checks whether the GZip header has already been sent, and sends it if not. The GZip header is created by the gzheader( )
 routine and consists of a number of constants packed into a 10-byte
 string. It then passes each of its arguments to the deflation object's deflate( )
 method to compress the information, then forwards whatever compressed
 data is returned to the next filter in the chain (or Apache, if this is
 the last filter). The subroutine also updates the running total of
 bytes compressed and calculates the CRC, using Compress::Zlib 's crc32( ) subroutine. 
 
 sub flush { 
     my $self = shift; 
     my $data = $self->{d}->flush; 
     return unless $self->{'h'}; 
     $self->forward($data); 
     $self->forward(pack("V V", $self->{'crc'}, $self->{'l'})); 
 } 
   
 
 
 The flush( ) routine is called when the last of our chained handlers is run. Because zlib
 buffers its compressed data, there is usually some data left in its
 internal buffers that hasn't yet been printed. We call the deflation
 object's flush( ) method to obtain whatever
 is left and forward it onward. Lastly we forward the CRC and the total
 length of the uncompressed file, creating the obligatory GZip footer. 
 
 
 Apache::GZip will usually go last in the filter chain, like this: 
 
 <Location /Compressed> 
    SetHandler  perl-script 
    PerlHandler Apache::GZip OneFilter AnotherFilter 
 </Location>
 
 
 You can use Apache::GZip with any content handler that prints directly to STDOUT. Most of the modules given in this chapter send data via $r->print( ). Simply delete the $r-> part to make them compatible with Apache::GZip and other chained content handlers. 
 
 
 
 Example 4-25: 
 A Handler That Compresses Its Input Before Forwarding It  
 
 
 package Apache::GZip; 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Compress::Zlib qw(deflateInit crc32 MAX_WBITS Z_DEFLATED); 
 use Apache::Forward (); 
 use vars qw($VERSION @ISA); 
   
 use constant GZIP_MAGIC => 0x1f8b; 
 use constant OS_MAGIC => 0x03; 
   
 $VERSION = '1.00'; 
 @ISA = qw(Apache::Forward); 
   
 sub handler ($$) { 
     my($class, $r) = @_; 
     #return DECLINED unless $r->header_in("Accept-Encoding") =~ /gzip/; 
     $r->content_encoding('gzip'); 
     $class->SUPER::handler($r); 
 } 
   
 sub TIEHANDLE { 
     my $class = shift; 
     my $r = shift;  
     my $self = $class->SUPER::TIEHANDLE($r, @_); 
     my $d = deflateInit(-WindowBits => -MAX_WBITS()) || return; 
     @{$self}{'crc','d','l','h'} = (crc32(undef),$d,0,0); 
     $r->push_handlers(PerlHandler => sub { $self->flush }); 
     return $self; 
 } 
   
 sub gzheader { 
     pack("nccVcc", GZIP_MAGIC, Z_DEFLATED, 0,time,0, OS_MAGIC) 
 } 
   
 sub PRINT { 
     my $self = shift; 
     $self->forward(gzheader()) unless $self->{'h'}++; 
     foreach (@_) { 
        my $data = $self->{d}->deflate($_); 
        $self->{l} += length($_); 
        $self->{crc} = crc32($_, $self->{crc}); 
        $self->forward($data); 
     } 
 } 
   
 sub flush { 
     my $self = shift; 
     my $data = $self->{d}->flush; 
     return unless $self->{'h'}; 
     $self->forward($data); 
     $self->forward(pack("V V", $self->{'crc'}, $self->{'l'})); 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 Readers who are interested in content handler pipelines should be aware of Jan Pazdziora's Apache::OutputChain module. It accomplishes the same thing as Apache::Forward but uses an object model that is less transparent than this one (among other things, the Apache::OutputChain module must always appear first on the PerlHandler list). You should also have a look at Andreas Koenig's Apache::PassFile and Apache::GZipChain
 modules. The former injects a file into an OutputChain and is an
 excellent way of providing the input to a set of filters. The latter
 implements compression just as Apache::GZip does but doesn't buffer the compression stream, losing efficiency when print( ) is called for multiple small data segments. 
 
 
 Just as this book was going to press, Ken Williams announced Apache::Filter,
 a chained content handler system that uses a more devious scheme than
 that described here. Among the advantages of this system is that you do
 not have to list the components of the pipeline in reverse order.
 
 
 Other Types of Stacked Handlers 
 
 
 Content handlers aren't the only type of Apache Perl
 API handler that can be stacked. Translation handlers, type handlers,
 authorization handlers, and in fact all types of handlers can be
 chained using exactly the same techniques we used for the content
 phase. 
 
 
 A particularly useful phase for stacking is the cleanup
 handler. Your code can use this to register any subroutines that should
 be called at the very end of the transaction. You can deallocate
 resources, unlock files, decrement reference counts, or clear globals.
 For example, the CGI.pm module maintains a number of package globals
 controlling various programmer preferences. In order to continue to
 work correctly in the persistent environment of mod_perl, CGI.pm has to clear these globals after each transaction. It does this by arranging for an internal routine named _reset_globals( ) to be called at the end of each transaction using this line of code: 
 
 $r->push_handlers('PerlCleanupHandler',\&CGI::_reset_globals);
 
 
 Your program can push as many handlers as it likes, but
 you should remember that despite its name, the handler stack doesn't
 act like the classic LIFO (last-in/first-out) stack. Instead it acts
 like a FIFO (first-in/first-out) queue. Also remember that if the same
 handler is pushed twice, it will be invoked twice.
 
 
 Method Handlers 
 
 
 It should come as no surprise that between the Apache
 distribution and third-party modules, there exist dozens of
 authentication modules, several directory indexing modules, and a
 couple of extended server-side include modules. All of these modules
 contain code that was copied and pasted from each other. In some cases
 all but a minuscule portion of the module consists of duplicated code. 
 
 
 Code duplication is not bad in and of itself, but it is
 wasteful of memory resources and, more important, of developers' time.
 It would be much better if code could be reused
 rather than duplicated, by using a form of object-oriented subclassing.
 For the C-language API there's not much hope of this. Vanilla C doesn't
 provide object-oriented features, while C++ would require both the
 Apache core and every extension module to adopt the same class
 hierarchy--and it's a little late in the game for this to happen. 
 
 
 Fortunately, the Perl language does support a simple
 object-oriented model that doesn't require that everyone buy into the
 same class hierarchy. This section describes how these object-oriented
 features can be used by Perl API modules to reuse code instead of
 duplicating it. 
 
 
 We've already looked at piecing together documents in
 various ways. Here we will explore an implementation using method
 handlers. There are two classes involved with this example: My::PageBase and My::Page.   
 
 
 Example 4-26 shows the My::PageBase class, which provides the base functionality for the family of documents derived from this class. My::PageBase stitches together a document by calling four methods: the header( ) method sends the HTTP headers, the top( ) method emits the beginning of an HTML document, including the title, the body( ) method emits the main contents of the page, and the bottom( ) method adds a common footer. My::PageBase includes generic definitions for header( ), top( ), body( ), and bottom( ), each of which can be overridden by its subclasses. These are all very simple methods. See Example 4-26 for the definitions. 
 
 
 The My::PageBase handler( ) method looks like this: 
 
 sub handler ($$) { 
     my($self, $r) = @_; 
     unless (ref($self)) { 
        $self = $self->new; 
     } 
     for my $meth (qw(header top body bottom)) { 
        $self->$meth($r); 
     } 
     return OK; 
 }
 
 
 The key to using My::PageBase in an object-oriented way is the handler( ) subroutine's use of the ($$) function prototype. This tells mod_perl
 that the handler wants two arguments: the static class name or object,
 followed by the Apache request object that is normally passed to
 handlers. When the handler is called, it retrieves its class name or
 object reference and stores it in the lexical variable $self. It checks whether $self is an object reference, and if not, it calls its own new( ) method to create a new object. It then invokes the header( ), top( ), body( ), and bottom( ) methods in turn. 
 
 
 The My::PageBase new( ) method turn the arguments passed to it into a blessed hash in the My::PageBase package. Each key in the hash is an attribute that can be used to construct the page. We do not define any default attributes: 
 
 sub new { 
     my $class = shift; 
     bless {@_}, $class; 
 }
 
 
 We will see later why this method is useful. 
 
 
 As we saw in the section on the Apache::Forward module, method handlers are configured just like any other: 
 
 <Location /my> 
   PerlHandler My::PageBase 
   SetHandler perl-script 
 </Location>
 
 
 However, for clarity's sake, or if you use a handler method named something other than handler( ), you can use Perl's standard -> method-calling notation. You will have to load the module first with the PerlModule directive: 
 
 PerlModule My::PageBase 
 <Location /my> 
   PerlHandler My::PageBase->handler 
   SetHandler perl-script 
 </Location>
 
 
 When My::PageBase is installed in this way and you request URI /my, you will see the exciting screen shown in Figure 4-12. 
 
 
 
 Figure 4-12. 
 The generic document produced by My::PageBase  
 
 
 
  
 
 
 
 
 
 Naturally, we'll want to add a bit more spice to this
 page. Because the page is modularized, we can do so one step at a time
 by subclassing Apache::PageBase 's methods. The My::Page class does so by inheriting from the My::PageBase class and simply overriding the body( ) method. 
 
 package My::Page; 
 # file: My/Page.pm
   
 use strict; 
 use vars qw(@ISA); 
 use My::PageBase (); 
 @ISA = qw(My::PageBase);
   
 sub body { 
     my($self, $r) = @_; 
     $r->print(<<END); 
 <p><img src="/icons/cover.gif" align=CENTER> 
 This is My homepage</p>  
 <br clear=all> 
 END 
 }
   
 1; 
 _    _END_    _
 
 
 Then change the configuration to invoke the handler( ) method via My::Page rather than My::PageBase : 
 
 PerlModule My::Page 
 <Location /my> 
   PerlHandler My::Page->handler 
   SetHandler perl-script 
 </Location>
 
 
 Things look almost the same, but the body text has changed (Figure 4-13). 
 
 
 
 Figure 4-13. 
 My::Page overrides the body( ) method of My::PageBase, creating a more interesting document.  
 
 
 
  
 
 
 
 
 
 Now we need a better title for our document. We could override the top( ) method as we did for body( ), but that would involve cutting and pasting a significant amount of HTML (see Example 4-26). Instead, we can make use of the object's title attribute, which is used by the top( ) method in this way: 
 
 my $title = $self->{title} || "untitled document";
 
 
 So how do we set the title attribute? This is where the My::PageBase new( ) method comes in. When it is called with a set of attribute=value
 pairs, it blesses them into a hash reference and returns the new
 object. To set the title attribute, we just have to call the new( ) method like this: 
 
 use My::Page (); 
 $My::Homepage = My::Page->new(title => 'My Homepage');
 
 
 This will create a global scalar variable in the My namespace named $My::Homepage. It's most convenient to do this during server startup--for instance, in the Perl startup file. 
 
 
 Now we just change the configuration section to use the object as the handler rather than the class name: 
 
 <Location /my> 
   PerlHandler $My::Homepage->handler 
   SetHandler perl-script 
 </Location>
 
 
 The object will be retrieved by mod_perl and used to invoke the handler, which will lead to the creation of the page shown in Figure 4-14.  
 
 
 
 Figure 4-14. 
 After creating a My::Page object with a title attribute defined, the page displays a custom title and level 1 header.  
 
 
 
  
 
 
 
 
 
 
 Example 4-26: 
 Using a Method Handler for Object-Oriented Programming 
 

 
 Techniques  
 
 
 package My::PageBase; 
 # file: My/PageBase.pm 
   
 use strict; 
 use Apache::Constants qw(:common); 
   
 sub new { 
     my $class = shift; 
     bless {@_}, $class; 
 } 
   
 sub handler ($$) { 
     my($self, $r) = @_; 
     unless (ref($self)) { 
        $self = $self->new; 
     } 
     for my $meth (qw(header top body bottom)) { 
        $self->$meth($r); 
     } 
     return OK; 
 } 
   
 sub header { 
     my($self, $r) = @_; 
     $r->content_type($self->{type} || "text/html"); 
     $r->send_http_header; 
 } 
   
 sub top { 
     my($self, $r) = @_; 
     my $title = $self->{title} || "untitled document"; 
     $r->print(<<EOF); 
 <html> 
 <head> 
 <title>$title</title> 
 </head> 
 <body> 
 <h1>$title</h2> 
 <hr> 
 EOF 
 } 
   
 sub bottom { 
     my($self, $r) = @_; 
     my $admin = $r->server->server_admin; 
     $r->print(<<EOF); 
 <hr> 
 <i><a href="mailto:$admin">$admin</a></i> 
 </body> 
 </html> 
 EOF 
 } 
   
 sub body { 
     my($self, $r) = @_; 
     $r->print("<p>This is the document body<p>"); 
 } 
   
 1; 
 _    _END_    _
 
 
 
 
 
 This wraps up our discussion of the basic techniques
 for generating page content, filtering files, and processing user
 input. The next chapter ventures into the perilous domain of imposing
 state on the stateless HTTP protocol. You'll learn techniques for
 setting up user sessions, interacting with databases, and managing
 long-term relationships with users. 
 
 
 
 1.
 At least in theory, you can divine what MIME types a browser prefers by examining the contents of the Accept header with $r->header_in('Accept').
 According to the HTTP protocol, this should return a list of MIME types
 that the browser can handle along with a numeric preference score. The
 CGI.pm module even has an accept( ) function
 that leverages this information to choose the best format for a given
 document type. Unfortunately, this part of the HTTP protocol has
 atrophied, and neither Netscape's nor Microsoft's browsers give enough
 information in the Accept header to make it useful for content negotiation.
 
 
 2.
 Certain uses of the eval operator and "here" documents are known to throw off Perl's line numbering.
 
 
 3.
 At the time this was written, the Apache developers were discussing a
 layered I/O system which will be part of the Apache 2.0 API.
 
 
 4.
 The more obvious name, Apache::Filter, is already taken by a third-party module that does output chaining in a slightly different manner.
 
 
 5.
 For historical reasons this facility is limited to Unix versions of
 Netscape Navigator, to PowerPC versions of Navigator on the Macintosh,
 and to some other Unix-based browsers such as W3-Emacs. However, now
 that Navigator's source code has been released to the developer
 community, we hope to see a more widespread implementation of this
 useful feature.
 
 
 6.
 Andreas Koenig's Apache::GzipChain module, which does much the same thing as this one, contains a hardcoded pattern match for the browser type contained in the User-Agent field. You can add this sort of test yourself if you wish, or wait for the browser developers to implement Accept-Encoding correctly.
 
 
 7.
 At the time this chapter was being prepared, the author of Compress::Zlib, Paul Marquess, was enhancing his library to make this manual manipulation of the compressed output stream unnecessary.
 
 
 
 
 
 Back to: Writing Apache Modules with Perl and C
 
 

 
 
 
 oreilly.com Home  | 
 
 O'Reilly Bookstores  | 
 
 How to Order  | 
 
 O'Reilly Contacts

 
 International  | 
 
 About O'Reilly  | 
 
 Affiliated Companies  | 
 
 Privacy Policy
 © 2001, O'Reilly & Associates, Inc.
 
 
 
 
 Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными 

 Автор   Комментарий к данной статье