Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
iakovlev.org

Manipulating Strings

Bash supports a surprising number of string manipulation operations. Unfortunately, these tools lack a unified focus. Some are a subset of parameter substitution, and others fall under the functionality of the UNIX expr command. This results in inconsistent command syntax and overlap of functionality, not to mention confusion.

String Length

${#string}

expr length $string

expr "$string" : '.*'

stringZ=abcABC123ABCabc
 
 echo ${#stringZ}                 # 15
 echo `expr length $stringZ`      # 15
 echo `expr "$stringZ" : '.*'`    # 15

Example 9-10. Inserting a blank line between paragraphs in a text file

#!/bin/bash
 # paragraph-space.sh
 
 # Inserts a blank line between paragraphs of a single-spaced text file.
 # Usage: $0 <FILENAME
 
 MINLEN=45        # May need to change this value.
 #  Assume lines shorter than $MINLEN characters
 #+ terminate a paragraph.
 
 while read line  # For as many lines as the input file has...
 do
   echo "$line"   # Output the line itself.
 
   len=${#line}
   if [ "$len" -lt "$MINLEN" ]
     then echo    # Add a blank line after short line.
   fi  
 done
 
 exit 0

Length of Matching Substring at Beginning of String

expr match "$string" '$substring'

$substring is a regular expression.

expr "$string" : '$substring'

$substring is a regular expression.

stringZ=abcABC123ABCabc
 #       |------|
 
 echo `expr match "$stringZ" 'abc[A-Z]*.2'`   # 8
 echo `expr "$stringZ" : 'abc[A-Z]*.2'`       # 8

Index

expr index $string $substring

Numerical position in $string of first character in $substring that matches.

stringZ=abcABC123ABCabc
 echo `expr index "$stringZ" C12`             # 6
                                              # C position.
 
 echo `expr index "$stringZ" 1c`              # 3
 # 'c' (in #3 position) matches before '1'.

This is the near equivalent of strchr() in C.

Substring Extraction

${string:position}

Extracts substring from $string at $position.

If the $string parameter is "*" or "@", then this extracts the positional parameters, [1] starting at $position.

${string:position:length}

Extracts $length characters of substring from $string at $position.

stringZ=abcABC123ABCabc
 #       0123456789.....
 #       0-based indexing.
 
 echo ${stringZ:0}                            # abcABC123ABCabc
 echo ${stringZ:1}                            # bcABC123ABCabc
 echo ${stringZ:7}                            # 23ABCabc
 
 echo ${stringZ:7:3}                          # 23A
                                              # Three characters of substring.
 
 
 
 # Is it possible to index from the right end of the string?
     
 echo ${stringZ:-4}                           # abcABC123ABCabc
 # Defaults to full string, as in ${parameter:-default}.
 # However . . .
 
 echo ${stringZ:(-4)}                         # Cabc 
 echo ${stringZ: -4}                          # Cabc
 # Now, it works.
 # Parentheses or added space "escape" the position parameter.
 
 # Thank you, Dan Jacobson, for pointing this out.

If the $string parameter is "*" or "@", then this extracts a maximum of $length positional parameters, starting at $position.

echo ${*:2}          # Echoes second and following positional parameters.
 echo ${@:2}          # Same as above.
 
 echo ${*:2:3}        # Echoes three positional parameters, starting at second.

expr substr $string $position $length

Extracts $length characters from $string starting at $position.

stringZ=abcABC123ABCabc
 #       123456789......
 #       1-based indexing.
 
 echo `expr substr $stringZ 1 2`              # ab
 echo `expr substr $stringZ 4 3`              # ABC

expr match "$string" '\($substring\)'

Extracts $substring at beginning of $string, where $substring is a regular expression.

expr "$string" : '\($substring\)'

Extracts $substring at beginning of $string, where $substring is a regular expression.

stringZ=abcABC123ABCabc
 #       =======	    
 
 echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'`   # abcABC1
 echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'`       # abcABC1
 echo `expr "$stringZ" : '\(.......\)'`                   # abcABC1
 # All of the above forms give an identical result.

expr match "$string" '.*\($substring\)'

Extracts $substring at end of $string, where $substring is a regular expression.

expr "$string" : '.*\($substring\)'

Extracts $substring at end of $string, where $substring is a regular expression.

stringZ=abcABC123ABCabc
 #                ======
 
 echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'`    # ABCabc
 echo `expr "$stringZ" : '.*\(......\)'`                       # ABCabc

Substring Removal

${string#substring}

Strips shortest match of $substring from front of $string.

${string##substring}

Strips longest match of $substring from front of $string.

stringZ=abcABC123ABCabc
 #       |----|
 #       |----------|
 
 echo ${stringZ#a*C}      # 123ABCabc
 # Strip out shortest match between 'a' and 'C'.
 
 echo ${stringZ##a*C}     # abc
 # Strip out longest match between 'a' and 'C'.

${string%substring}

Strips shortest match of $substring from back of $string.

${string%%substring}

Strips longest match of $substring from back of $string.

stringZ=abcABC123ABCabc
 #                    ||
 #        |------------|
 
 echo ${stringZ%b*c}      # abcABC123ABCa
 # Strip out shortest match between 'b' and 'c', from back of $stringZ.
 
 echo ${stringZ%%b*c}     # a
 # Strip out longest match between 'b' and 'c', from back of $stringZ.

Example 9-11. Converting graphic file formats, with filename change

#!/bin/bash
 #  cvt.sh:
 #  Converts all the MacPaint image files in a directory to "pbm" format.
 
 #  Uses the "macptopbm" binary from the "netpbm" package,
 #+ which is maintained by Brian Henderson (bryanh@giraffe-data.com).
 #  Netpbm is a standard part of most Linux distros.
 
 OPERATION=macptopbm
 SUFFIX=pbm          # New filename suffix. 
 
 if [ -n "$1" ]
 then
   directory=$1      # If directory name given as a script argument...
 else
   directory=$PWD    # Otherwise use current working directory.
 fi  
   
 #  Assumes all files in the target directory are MacPaint image files,
 #+ with a ".mac" filename suffix.
 
 for file in $directory/*    # Filename globbing.
 do
   filename=${file%.*c}      #  Strip ".mac" suffix off filename
                             #+ ('.*c' matches everything
 			    #+ between '.' and 'c', inclusive).
   $OPERATION $file > "$filename.$SUFFIX"
                             # Redirect conversion to new filename.
   rm -f $file               # Delete original files after converting.   
   echo "$filename.$SUFFIX"  # Log what is happening to stdout.
 done
 
 exit 0
 
 # Exercise:
 # --------
 #  As it stands, this script converts *all* the files in the current
 #+ working directory.
 #  Modify it to work *only* on files with a ".mac" suffix.

A simple emulation of getopt using substring extraction constructs.

Example 9-12. Emulating getopt

#!/bin/bash
 # getopt-simple.sh
 # Author: Chris Morgan
 # Used in the ABS Guide with permission.
 
 
 getopt_simple()
 {
     echo "getopt_simple()"
     echo "Parameters are '$*'"
     until [ -z "$1" ]
     do
       echo "Processing parameter of: '$1'"
       if [ ${1:0:1} = '/' ]
       then
           tmp=${1:1}               # Strip off leading '/' . . .
           parameter=${tmp%%=*}     # Extract name.
           value=${tmp##*=}         # Extract value.
           echo "Parameter: '$parameter', value: '$value'"
           eval $parameter=$value
       fi
       shift
     done
 }
 
 # Pass all options to getopt_simple().
 getopt_simple $*
 
 echo "test is '$test'"
 echo "test2 is '$test2'"
 
 exit 0
 
 ---
 
 sh getopt_example.sh /test=value1 /test2=value2
 
 Parameters are '/test=value1 /test2=value2'
 Processing parameter of: '/test=value1'
 Parameter: 'test', value: 'value1'
 Processing parameter of: '/test2=value2'
 Parameter: 'test2', value: 'value2'
 test is 'value1'
 test2 is 'value2'

Substring Replacement

${string/substring/replacement}

Replace first match of $substring with $replacement.

${string//substring/replacement}

Replace all matches of $substring with $replacement.

stringZ=abcABC123ABCabc
 
 echo ${stringZ/abc/xyz}           # xyzABC123ABCabc
                                   # Replaces first match of 'abc' with 'xyz'.
 
 echo ${stringZ//abc/xyz}          # xyzABC123ABCxyz
                                   # Replaces all matches of 'abc' with # 'xyz'.

${string/#substring/replacement}

If $substring matches front end of $string, substitute $replacement for $substring.

${string/%substring/replacement}

If $substring matches back end of $string, substitute $replacement for $substring.

stringZ=abcABC123ABCabc
 
 echo ${stringZ/#abc/XYZ}          # XYZABC123ABCabc
                                   # Replaces front-end match of 'abc' with 'XYZ'.
 
 echo ${stringZ/%abc/XYZ}          # abcABC123ABCXYZ
                                   # Replaces back-end match of 'abc' with 'XYZ'.

9.2.1. Manipulating strings using awk

A Bash script may invoke the string manipulation facilities of awk as an alternative to using its built-in operations.

Example 9-13. Alternate ways of extracting substrings

#!/bin/bash
 # substring-extraction.sh
 
 String=23skidoo1
 #      012345678    Bash
 #      123456789    awk
 # Note different string indexing system:
 # Bash numbers first character of string as '0'.
 # Awk  numbers first character of string as '1'.
 
 echo ${String:2:4} # position 3 (0-1-2), 4 characters long
                                          # skid
 
 # The awk equivalent of ${string:pos:length} is substr(string,pos,length).
 echo | awk '
 { print substr("'"${String}"'",3,4)      # skid
 }
 '
 #  Piping an empty "echo" to awk gives it dummy input,
 #+ and thus makes it unnecessary to supply a filename.
 
 exit 0

Parameter Substitution

Manipulating and/or expanding variables

${parameter}

Same as $parameter, i.e., value of the variable parameter. In certain contexts, only the less ambiguous ${parameter} form works.

May be used for concatenating variables with strings.

your_id=${USER}-on-${HOSTNAME}
 echo "$your_id"
 #
 echo "Old \$PATH = $PATH"
 PATH=${PATH}:/opt/bin  #Add /opt/bin to $PATH for duration of script.
 echo "New \$PATH = $PATH"

${parameter-default}, ${parameter:-default}

If parameter not set, use default.

echo ${username-`whoami`}
 # Echoes the result of `whoami`, if variable $username is still unset.

Note

${parameter-default} and ${parameter:-default} are almost equivalent. The extra : makes a difference only when parameter has been declared, but is null.

#!/bin/bash
 # param-sub.sh
 
 #  Whether a variable has been declared
 #+ affects triggering of the default option
 #+ even if the variable is null.
 
 username0=
 echo "username0 has been declared, but is set to null."
 echo "username0 = ${username0-`whoami`}"
 # Will not echo.
 
 echo
 
 echo username1 has not been declared.
 echo "username1 = ${username1-`whoami`}"
 # Will echo.
 
 username2=
 echo "username2 has been declared, but is set to null."
 echo "username2 = ${username2:-`whoami`}"
 #                            ^
 # Will echo because of :- rather than just - in condition test.
 # Compare to first instance, above.
 
 
 #
 
 # Once again:
 
 variable=
 # variable has been declared, but is set to null.
 
 echo "${variable-0}"    # (no output)
 echo "${variable:-1}"   # 1
 #               ^
 
 unset variable
 
 echo "${variable-2}"    # 2
 echo "${variable:-3}"   # 3
 
 exit 0

The default parameter construct finds use in providing "missing" command-line arguments in scripts.

DEFAULT_FILENAME=generic.data
 filename=${1:-$DEFAULT_FILENAME}
 #  If not otherwise specified, the following command block operates
 #+ on the file "generic.data".
 #
 #  Commands follow.

See also Example 3-4, Example 28-2, and Example A-6.

Compare this method with using an and list to supply a default command-line argument.

${parameter=default}, ${parameter:=default}

If parameter not set, set it to default.

Both forms nearly equivalent. The : makes a difference only when $parameter has been declared and is null, [1] as above.

echo ${username=`whoami`}
 # Variable "username" is now set to `whoami`.

${parameter+alt_value}, ${parameter:+alt_value}

If parameter set, use alt_value, else use null string.

Both forms nearly equivalent. The : makes a difference only when parameter has been declared and is null, see below.

echo "###### \${parameter+alt_value} ########"
 echo
 
 a=${param1+xyz}
 echo "a = $a"      # a =
 
 param2=
 a=${param2+xyz}
 echo "a = $a"      # a = xyz
 
 param3=123
 a=${param3+xyz}
 echo "a = $a"      # a = xyz
 
 echo
 echo "###### \${parameter:+alt_value} ########"
 echo
 
 a=${param4:+xyz}
 echo "a = $a"      # a =
 
 param5=
 a=${param5:+xyz}
 echo "a = $a"      # a =
 # Different result from   a=${param5+xyz}
 
 param6=123
 a=${param6+xyz}
 echo "a = $a"      # a = xyz

${parameter?err_msg}, ${parameter:?err_msg}

If parameter set, use it, else print err_msg.

Both forms nearly equivalent. The : makes a difference only when parameter has been declared and is null, as above.

Example 9-14. Using parameter substitution and error messages

#!/bin/bash
 
 #  Check some of the system's environmental variables.
 #  This is good preventative maintenance.
 #  If, for example, $USER, the name of the person at the console, is not set,
 #+ the machine will not recognize you.
 
 : ${HOSTNAME?} ${USER?} ${HOME?} ${MAIL?}
   echo
   echo "Name of the machine is $HOSTNAME."
   echo "You are $USER."
   echo "Your home directory is $HOME."
   echo "Your mail INBOX is located in $MAIL."
   echo
   echo "If you are reading this message,"
   echo "critical environmental variables have been set."
   echo
   echo
 
 # ------------------------------------------------------
 
 #  The ${variablename?} construction can also check
 #+ for variables set within the script.
 
 ThisVariable=Value-of-ThisVariable
 #  Note, by the way, that string variables may be set
 #+ to characters disallowed in their names.
 : ${ThisVariable?}
 echo "Value of ThisVariable is $ThisVariable".
 echo
 echo
 
 
 : ${ZZXy23AB?"ZZXy23AB has not been set."}
 #  If ZZXy23AB has not been set,
 #+ then the script terminates with an error message.
 
 # You can specify the error message.
 # : ${variablename?"ERROR MESSAGE"}
 
 
 # Same result with:    dummy_variable=${ZZXy23AB?}
 #                      dummy_variable=${ZZXy23AB?"ZXy23AB has not been set."}
 #
 #                      echo ${ZZXy23AB?} >/dev/null
 
 #  Compare these methods of checking whether a variable has been set
 #+ with "set -u" . . .
 
 
 
 echo "You will not see this message, because script already terminated."
 
 HERE=0
 exit $HERE   # Will NOT exit here.
 
 # In fact, this script will return an exit status (echo $?) of 1.

Example 9-15. Parameter substitution and "usage" messages

#!/bin/bash
 # usage-message.sh
 
 : ${1?"Usage: $0 ARGUMENT"}
 #  Script exits here if command-line parameter absent,
 #+ with following error message.
 #    usage-message.sh: 1: Usage: usage-message.sh ARGUMENT
 
 echo "These two lines echo only if command-line parameter given."
 echo "command line parameter = \"$1\""
 
 exit 0  # Will exit here only if command-line parameter present.
 
 # Check the exit status, both with and without command-line parameter.
 # If command-line parameter present, then "$?" is 0.
 # If not, then "$?" is 1.

Parameter substitution and/or expansion. The following expressions are the complement to the match in expr string operations (see Example 12-9). These particular ones are used mostly in parsing file path names.

Variable length / Substring removal

${#var}

String length (number of characters in $var). For an array, ${#array} is the length of the first element in the array.

Note

Exceptions:

  • ${#*} and ${#@} give the number of positional parameters.

  • For an array, ${#array[*]} and ${#array[@]} give the number of elements in the array.

Example 9-16. Length of a variable

#!/bin/bash
 # length.sh
 
 E_NO_ARGS=65
 
 if [ $# -eq 0 ]  # Must have command-line args to demo script.
 then
   echo "Please invoke this script with one or more command-line arguments."
   exit $E_NO_ARGS
 fi  
 
 var01=abcdEFGH28ij
 echo "var01 = ${var01}"
 echo "Length of var01 = ${#var01}"
 # Now, let's try embedding a space.
 var02="abcd EFGH28ij"
 echo "var02 = ${var02}"
 echo "Length of var02 = ${#var02}"
 
 echo "Number of command-line arguments passed to script = ${#@}"
 echo "Number of command-line arguments passed to script = ${#*}"
 
 exit 0
${var#Pattern}, ${var##Pattern}

Remove from $var the shortest/longest part of $Pattern that matches the front end of $var.

A usage illustration from Example A-7:
# Function from "days-between.sh" example.
 # Strips leading zero(s) from argument passed.
 
 strip_leading_zero () #  Strip possible leading zero(s)
 {                     #+ from argument passed.
   return=${1#0}       #  The "1" refers to "$1" -- passed arg.
 }                     #  The "0" is what to remove from "$1" -- strips zeros.

Manfred Schwarb's more elaborate variation of the above:
strip_leading_zero2 () # Strip possible leading zero(s), since otherwise
 {                      # Bash will interpret such numbers as octal values.
   shopt -s extglob     # Turn on extended globbing.
   local val=${1##+(0)} # Use local variable, longest matching series of 0's.
   shopt -u extglob     # Turn off extended globbing.
   _strip_leading_zero2=${val:-0}
                        # If input was 0, return 0 instead of "".
 }

Another usage illustration:
echo `basename $PWD`        # Basename of current working directory.
 echo "${PWD##*/}"           # Basename of current working directory.
 echo
 echo `basename $0`          # Name of script.
 echo $0                     # Name of script.
 echo "${0##*/}"             # Name of script.
 echo
 filename=test.data
 echo "${filename##*.}"      # data
                             # Extension of filename.

${var%Pattern}, ${var%%Pattern}

Remove from $var the shortest/longest part of $Pattern that matches the back end of $var.

Version 2 of Bash added additional options.

Example 9-17. Pattern matching in parameter substitution

#!/bin/bash
 # patt-matching.sh
 
 # Pattern matching  using the # ## % %% parameter substitution operators.
 
 var1=abcd12345abc6789
 pattern1=a*c  # * (wild card) matches everything between a - c.
 
 echo
 echo "var1 = $var1"           # abcd12345abc6789
 echo "var1 = ${var1}"         # abcd12345abc6789
                               # (alternate form)
 echo "Number of characters in ${var1} = ${#var1}"
 echo
 
 echo "pattern1 = $pattern1"   # a*c  (everything between 'a' and 'c')
 echo "--------------"
 echo '${var1#$pattern1}  =' "${var1#$pattern1}"    #         d12345abc6789
 # Shortest possible match, strips out first 3 characters  abcd12345abc6789
 #                                     ^^^^^               |-|
 echo '${var1##$pattern1} =' "${var1##$pattern1}"   #                  6789      
 # Longest possible match, strips out first 12 characters  abcd12345abc6789
 #                                    ^^^^^                |----------|
 
 echo; echo; echo
 
 pattern2=b*9            # everything between 'b' and '9'
 echo "var1 = $var1"     # Still  abcd12345abc6789
 echo
 echo "pattern2 = $pattern2"
 echo "--------------"
 echo '${var1%pattern2}  =' "${var1%$pattern2}"     #     abcd12345a
 # Shortest possible match, strips out last 6 characters  abcd12345abc6789
 #                                     ^^^^                         |----|
 echo '${var1%%pattern2} =' "${var1%%$pattern2}"    #     a
 # Longest possible match, strips out last 12 characters  abcd12345abc6789
 #                                    ^^^^                 |-------------|
 
 # Remember, # and ## work from the left end (beginning) of string,
 #           % and %% work from the right end.
 
 echo
 
 exit 0

Example 9-18. Renaming file extensions:

#!/bin/bash
 # rfe.sh: Renaming file extensions.
 #
 #         rfe old_extension new_extension
 #
 # Example:
 # To rename all *.gif files in working directory to *.jpg,
 #          rfe gif jpg
 
 
 E_BADARGS=65
 
 case $# in
   0|1)             # The vertical bar means "or" in this context.
   echo "Usage: `basename $0` old_file_suffix new_file_suffix"
   exit $E_BADARGS  # If 0 or 1 arg, then bail out.
   ;;
 esac
 
 
 for filename in *.$1
 # Traverse list of files ending with 1st argument.
 do
   mv $filename ${filename%$1}$2
   #  Strip off part of filename matching 1st argument,
   #+ then append 2nd argument.
 done
 
 exit 0

Variable expansion / Substring replacement

These constructs have been adopted from ksh.

${var:pos}

Variable var expanded, starting from offset pos.

${var:pos:len}

Expansion to a max of len characters of variable var, from offset pos. See Example A-14 for an example of the creative use of this operator.

${var/Pattern/Replacement}

First match of Pattern, within var replaced with Replacement.

If Replacement is omitted, then the first match of Pattern is replaced by nothing, that is, deleted.

${var//Pattern/Replacement}

Global replacement. All matches of Pattern, within var replaced with Replacement.

As above, if Replacement is omitted, then all occurrences of Pattern are replaced by nothing, that is, deleted.

Example 9-19. Using pattern matching to parse arbitrary strings

#!/bin/bash
 
 var1=abcd-1234-defg
 echo "var1 = $var1"
 
 t=${var1#*-*}
 echo "var1 (with everything, up to and including first - stripped out) = $t"
 #  t=${var1#*-}  works just the same,
 #+ since # matches the shortest string,
 #+ and * matches everything preceding, including an empty string.
 # (Thanks, Stephane Chazelas, for pointing this out.)
 
 t=${var1##*-*}
 echo "If var1 contains a \"-\", returns empty string...   var1 = $t"
 
 
 t=${var1%*-*}
 echo "var1 (with everything from the last - on stripped out) = $t"
 
 echo
 
 # -------------------------------------------
 path_name=/home/bozo/ideas/thoughts.for.today
 # -------------------------------------------
 echo "path_name = $path_name"
 t=${path_name##/*/}
 echo "path_name, stripped of prefixes = $t"
 # Same effect as   t=`basename $path_name` in this particular case.
 #  t=${path_name%/}; t=${t##*/}   is a more general solution,
 #+ but still fails sometimes.
 #  If $path_name ends with a newline, then `basename $path_name` will not work,
 #+ but the above expression will.
 # (Thanks, S.C.)
 
 t=${path_name%/*.*}
 # Same effect as   t=`dirname $path_name`
 echo "path_name, stripped of suffixes = $t"
 # These will fail in some cases, such as "../", "/foo////", # "foo/", "/".
 #  Removing suffixes, especially when the basename has no suffix,
 #+ but the dirname does, also complicates matters.
 # (Thanks, S.C.)
 
 echo
 
 t=${path_name:11}
 echo "$path_name, with first 11 chars stripped off = $t"
 t=${path_name:11:5}
 echo "$path_name, with first 11 chars stripped off, length 5 = $t"
 
 echo
 
 t=${path_name/bozo/clown}
 echo "$path_name with \"bozo\" replaced  by \"clown\" = $t"
 t=${path_name/today/}
 echo "$path_name with \"today\" deleted = $t"
 t=${path_name//o/O}
 echo "$path_name with all o's capitalized = $t"
 t=${path_name//o/}
 echo "$path_name with all o's deleted = $t"
 
 exit 0
${var/#Pattern/Replacement}

If prefix of var matches Pattern, then substitute Replacement for Pattern.

${var/%Pattern/Replacement}

If suffix of var matches Pattern, then substitute Replacement for Pattern.

Example 9-20. Matching patterns at prefix or suffix of string

#!/bin/bash
 # var-match.sh:
 # Demo of pattern replacement at prefix / suffix of string.
 
 v0=abc1234zip1234abc    # Original variable.
 echo "v0 = $v0"         # abc1234zip1234abc
 echo
 
 # Match at prefix (beginning) of string.
 v1=${v0/#abc/ABCDEF}    # abc1234zip1234abc
                         # |-|
 echo "v1 = $v1"         # ABCDEF1234zip1234abc
                         # |----|
 
 # Match at suffix (end) of string.
 v2=${v0/%abc/ABCDEF}    # abc1234zip123abc
                         #              |-|
 echo "v2 = $v2"         # abc1234zip1234ABCDEF
                         #               |----|
 
 echo
 
 #  ----------------------------------------------------
 #  Must match at beginning / end of string,
 #+ otherwise no replacement results.
 #  ----------------------------------------------------
 v3=${v0/#123/000}       # Matches, but not at beginning.
 echo "v3 = $v3"         # abc1234zip1234abc
                         # NO REPLACEMENT.
 v4=${v0/%123/000}       # Matches, but not at end.
 echo "v4 = $v4"         # abc1234zip1234abc
                         # NO REPLACEMENT.
 
 exit 0			

Typing variables: declare or typeset

The declare or typeset builtins (they are exact synonyms) permit restricting the properties of variables. This is a very weak form of the typing available in certain programming languages. The declare command is specific to version 2 or later of Bash. The typeset command also works in ksh scripts.

declare/typeset options

-r readonly

declare -r var1

(declare -r var1 works the same as readonly var1)

This is the rough equivalent of the C const type qualifier. An attempt to change the value of a readonly variable fails with an error message.

-i integer

declare -i number
 # The script will treat subsequent occurrences of "number" as an integer.		
 
 number=3
 echo "Number = $number"     # Number = 3
 
 number=three
 echo "Number = $number"     # Number = 0
 # Tries to evaluate the string "three" as an integer.

Certain arithmetic operations are permitted for declared integer variables without the need for expr or let.

n=6/3
 echo "n = $n"       # n = 6/3
 
 declare -i n
 n=6/3
 echo "n = $n"       # n = 2

-a array

declare -a indices

The variable indices will be treated as an array.

-f functions

declare -f

A declare -f line with no arguments in a script causes a listing of all the functions previously defined in that script.

declare -f function_name

A declare -f function_name in a script lists just the function named.

-x export

declare -x var3

This declares a variable as available for exporting outside the environment of the script itself.

-x var=$value

declare -x var3=373

The declare command permits assigning a value to a variable in the same statement as setting its properties.

Example 9-21. Using declare to type variables

#!/bin/bash
 
 func1 ()
 {
 echo This is a function.
 }
 
 declare -f        # Lists the function above.
 
 echo
 
 declare -i var1   # var1 is an integer.
 var1=2367
 echo "var1 declared as $var1"
 var1=var1+1       # Integer declaration eliminates the need for 'let'.
 echo "var1 incremented by 1 is $var1."
 # Attempt to change variable declared as integer.
 echo "Attempting to change var1 to floating point value, 2367.1."
 var1=2367.1       # Results in error message, with no change to variable.
 echo "var1 is still $var1"
 
 echo
 
 declare -r var2=13.36         # 'declare' permits setting a variable property
                               #+ and simultaneously assigning it a value.
 echo "var2 declared as $var2" # Attempt to change readonly variable.
 var2=13.37                    # Generates error message, and exit from script.
 
 echo "var2 is still $var2"    # This line will not execute.
 
 exit 0                        # Script will not exit here.

Indirect References to Variables

Assume that the value of a variable is the name of a second variable. Is it somehow possible to retrieve the value of this second variable from the first one? For example, if a=letter_of_alphabet and letter_of_alphabet=z, can a reference to a return z? This can indeed be done, and it is called an indirect reference. It uses the unusual eval var1=\$$var2 notation.

Example 9-22. Indirect References

#!/bin/bash
 # ind-ref.sh: Indirect variable referencing.
 # Accessing the contents of the contents of a variable.
 
 a=letter_of_alphabet   # Variable "a" holds the name of another variable.
 letter_of_alphabet=z
 
 echo
 
 # Direct reference.
 echo "a = $a"          # a = letter_of_alphabet
 
 # Indirect reference.
 eval a=\$$a
 echo "Now a = $a"      # Now a = z
 
 echo
 
 
 # Now, let's try changing the second-order reference.
 
 t=table_cell_3
 table_cell_3=24
 echo "\"table_cell_3\" = $table_cell_3"            # "table_cell_3" = 24
 echo -n "dereferenced \"t\" = "; eval echo \$$t    # dereferenced "t" = 24
 # In this simple case, the following also works (why?).
 #         eval t=\$$t; echo "\"t\" = $t"
 
 echo
 
 t=table_cell_3
 NEW_VAL=387
 table_cell_3=$NEW_VAL
 echo "Changing value of \"table_cell_3\" to $NEW_VAL."
 echo "\"table_cell_3\" now $table_cell_3"
 echo -n "dereferenced \"t\" now "; eval echo \$$t
 # "eval" takes the two arguments "echo" and "\$$t" (set equal to $table_cell_3)
 
 echo
 
 # (Thanks, Stephane Chazelas, for clearing up the above behavior.)
 
 
 # Another method is the ${!t} notation, discussed in "Bash, version 2" section.
 # See also ex78.sh.
 
 exit 0

Of what practical use is indirect referencing of variables? It gives Bash a little of the functionality of pointers in C, for instance, in table lookup. And, it also has some other very interesting applications. . . .

Nils Radtke shows how to build "dynamic" variable names and evaluate their contents. This can be useful when sourcing configuration files.
#!/bin/bash
 
 
 # ---------------------------------------------
 # This could be "sourced" from a separate file.
 isdnMyProviderRemoteNet=172.16.0.100
 isdnYourProviderRemoteNet=10.0.0.10
 isdnOnlineService="MyProvider"
 # ---------------------------------------------
       
 
 remoteNet=$(eval "echo \$$(echo isdn${isdnOnlineService}RemoteNet)")
 remoteNet=$(eval "echo \$$(echo isdnMyProviderRemoteNet)")
 remoteNet=$(eval "echo \$isdnMyProviderRemoteNet")
 remoteNet=$(eval "echo $isdnMyProviderRemoteNet")
 
 echo "$remoteNet"    # 172.16.0.100
 
 # ================================================================
 
 #  And, it gets even better.
 
 #  Consider the following snippet given a variable named getSparc,
 #+ but no such variable getIa64:
 
 chkMirrorArchs () { 
   arch="$1";
   if [ "$(eval "echo \${$(echo get$(echo -ne $arch |
        sed 's/^\(.\).*/\1/g' | tr 'a-z' 'A-Z'; echo $arch |
        sed 's/^.\(.*\)/\1/g')):-false}")" = true ]
   then
      return 0;
   else
      return 1;
   fi;
 }
 
 getSparc="true"
 unset getIa64
 chkMirrorArchs sparc
 echo $?        # 0
                # True
 
 chkMirrorArchs Ia64
 echo $?        # 1
                # False
 
 # Notes:
 # -----
 # Even the to-be-substituted variable name part is built explicitly.
 # The parameters to the chkMirrorArchs calls are all lower case.
 # The variable name is composed of two parts: "get" and "Sparc" . . .

Example 9-23. Passing an indirect reference to awk

#!/bin/bash
 
 #  Another version of the "column totaler" script
 #+ that adds up a specified column (of numbers) in the target file.
 #  This one uses indirect references.
 
 ARGS=2
 E_WRONGARGS=65
 
 if [ $# -ne "$ARGS" ] # Check for proper no. of command line args.
 then
    echo "Usage: `basename $0` filename column-number"
    exit $E_WRONGARGS
 fi
 
 filename=$1
 column_number=$2
 
 #===== Same as original script, up to this point =====#
 
 
 # A multi-line awk script is invoked by   awk ' ..... '
 
 
 # Begin awk script.
 # ------------------------------------------------
 awk "
 
 { total += \$${column_number} # indirect reference
 }
 END {
      print total
      }
 
      " "$filename"
 # ------------------------------------------------
 # End awk script.
 
 #  Indirect variable reference avoids the hassles
 #+ of referencing a shell variable within the embedded awk script.
 #  Thanks, Stephane Chazelas.
 
 
 exit 0

$RANDOM: generate random integer

$RANDOM is an internal Bash function (not a constant) that returns a pseudorandom [1] integer in the range 0 - 32767. It should not be used to generate an encryption key.

Example 9-24. Generating random numbers

#!/bin/bash
 
 # $RANDOM returns a different random integer at each invocation.
 # Nominal range: 0 - 32767 (signed 16-bit integer).
 
 MAXCOUNT=10
 count=1
 
 echo
 echo "$MAXCOUNT random numbers:"
 echo "-----------------"
 while [ "$count" -le $MAXCOUNT ]      # Generate 10 ($MAXCOUNT) random integers.
 do
   number=$RANDOM
   echo $number
   let "count += 1"  # Increment count.
 done
 echo "-----------------"
 
 # If you need a random int within a certain range, use the 'modulo' operator.
 # This returns the remainder of a division operation.
 
 RANGE=500
 
 echo
 
 number=$RANDOM
 let "number %= $RANGE"
 #           ^^
 echo "Random number less than $RANGE  ---  $number"
 
 echo
 
 #  If you need a random integer greater than a lower bound,
 #+ then set up a test to discard all numbers below that.
 
 FLOOR=200
 
 number=0   #initialize
 while [ "$number" -le $FLOOR ]
 do
   number=$RANDOM
 done
 echo "Random number greater than $FLOOR ---  $number"
 echo
 
 
 # Combine above two techniques to retrieve random number between two limits.
 number=0   #initialize
 while [ "$number" -le $FLOOR ]
 do
   number=$RANDOM
   let "number %= $RANGE"  # Scales $number down within $RANGE.
 done
 echo "Random number between $FLOOR and $RANGE ---  $number"
 echo
 
 
 
 # Generate binary choice, that is, "true" or "false" value.
 BINARY=2
 T=1
 number=$RANDOM
 
 let "number %= $BINARY"
 #  Note that    let "number >>= 14"    gives a better random distribution
 #+ (right shifts out everything except last binary digit).
 if [ "$number" -eq $T ]
 then
   echo "TRUE"
 else
   echo "FALSE"
 fi  
 
 echo
 
 
 # Generate a toss of the dice.
 SPOTS=6   # Modulo 6 gives range 0 - 5.
           # Incrementing by 1 gives desired range of 1 - 6.
           # Thanks, Paulo Marcel Coelho Aragao, for the simplification.
 die1=0
 die2=0
 # Would it be better to just set SPOTS=7 and not add 1? Why or why not?
 
 # Tosses each die separately, and so gives correct odds.
 
     let "die1 = $RANDOM % $SPOTS +1" # Roll first one.
     let "die2 = $RANDOM % $SPOTS +1" # Roll second one.
     #  Which arithmetic operation, above, has greater precedence --
     #+ modulo (%) or addition (+)?
 
 
 let "throw = $die1 + $die2"
 echo "Throw of the dice = $throw"
 echo
 
 
 exit 0

Example 9-25. Picking a random card from a deck

#!/bin/bash
 # pick-card.sh
 
 # This is an example of choosing random elements of an array.
 
 
 # Pick a card, any card.
 
 Suites="Clubs
 Diamonds
 Hearts
 Spades"
 
 Denominations="2
 3
 4
 5
 6
 7
 8
 9
 10
 Jack
 Queen
 King
 Ace"
 
 # Note variables spread over multiple lines.
 
 
 suite=($Suites)                # Read into array variable.
 denomination=($Denominations)
 
 num_suites=${#suite[*]}        # Count how many elements.
 num_denominations=${#denomination[*]}
 
 echo -n "${denomination[$((RANDOM%num_denominations))]} of "
 echo ${suite[$((RANDOM%num_suites))]}
 
 
 # $bozo sh pick-cards.sh
 # Jack of Clubs
 
 
 # Thank you, "jipe," for pointing out this use of $RANDOM.
 exit 0

Jipe points out a set of techniques for generating random numbers within a range.
#  Generate random number between 6 and 30.
 rnumber=$((RANDOM%25+6))	
 
 #  Generate random number in the same 6 - 30 range,
 #+ but the number must be evenly divisible by 3.
 rnumber=$(((RANDOM%30/3+1)*3))
 
 # Note that this will not work all the time.
 # It fails if $RANDOM returns 0.
 
 #  Exercise: Try to figure out the pattern here.

Bill Gradwohl came up with an improved formula that works for positive numbers.
rnumber=$(((RANDOM%(max-min+divisibleBy))/divisibleBy*divisibleBy+min))

Here Bill presents a versatile function that returns a random number between two specified values.

Example 9-26. Random between values

#!/bin/bash
 # random-between.sh
 # Random number between two specified values. 
 # Script by Bill Gradwohl, with minor modifications by the document author.
 # Used with permission.
 
 
 randomBetween() {
    #  Generates a positive or negative random number
    #+ between $min and $max
    #+ and divisible by $divisibleBy.
    #  Gives a "reasonably random" distribution of return values.
    #
    #  Bill Gradwohl - Oct 1, 2003
 
    syntax() {
    # Function embedded within function.
       echo
       echo    "Syntax: randomBetween [min] [max] [multiple]"
       echo
       echo    "Expects up to 3 passed parameters, but all are completely optional."
       echo    "min is the minimum value"
       echo    "max is the maximum value"
       echo    "multiple specifies that the answer must be a multiple of this value."
       echo    "    i.e. answer must be evenly divisible by this number."
       echo    
       echo    "If any value is missing, defaults area supplied as: 0 32767 1"
       echo    "Successful completion returns 0, unsuccessful completion returns"
       echo    "function syntax and 1."
       echo    "The answer is returned in the global variable randomBetweenAnswer"
       echo    "Negative values for any passed parameter are handled correctly."
    }
 
    local min=${1:-0}
    local max=${2:-32767}
    local divisibleBy=${3:-1}
    # Default values assigned, in case parameters not passed to function.
 
    local x
    local spread
 
    # Let's make sure the divisibleBy value is positive.
    [ ${divisibleBy} -lt 0 ] && divisibleBy=$((0-divisibleBy))
 
    # Sanity check.
    if [ $# -gt 3 -o ${divisibleBy} -eq 0 -o  ${min} -eq ${max} ]; then 
       syntax
       return 1
    fi
 
    # See if the min and max are reversed.
    if [ ${min} -gt ${max} ]; then
       # Swap them.
       x=${min}
       min=${max}
       max=${x}
    fi
 
    #  If min is itself not evenly divisible by $divisibleBy,
    #+ then fix the min to be within range.
    if [ $((min/divisibleBy*divisibleBy)) -ne ${min} ]; then 
       if [ ${min} -lt 0 ]; then
          min=$((min/divisibleBy*divisibleBy))
       else
          min=$((((min/divisibleBy)+1)*divisibleBy))
       fi
    fi
 
    #  If max is itself not evenly divisible by $divisibleBy,
    #+ then fix the max to be within range.
    if [ $((max/divisibleBy*divisibleBy)) -ne ${max} ]; then 
       if [ ${max} -lt 0 ]; then
          max=$((((max/divisibleBy)-1)*divisibleBy))
       else
          max=$((max/divisibleBy*divisibleBy))
       fi
    fi
 
    #  ---------------------------------------------------------------------
    #  Now, to do the real work.
 
    #  Note that to get a proper distribution for the end points,
    #+ the range of random values has to be allowed to go between
    #+ 0 and abs(max-min)+divisibleBy, not just abs(max-min)+1.
 
    #  The slight increase will produce the proper distribution for the
    #+ end points.
 
    #  Changing the formula to use abs(max-min)+1 will still produce
    #+ correct answers, but the randomness of those answers is faulty in
    #+ that the number of times the end points ($min and $max) are returned
    #+ is considerably lower than when the correct formula is used.
    #  ---------------------------------------------------------------------
 
    spread=$((max-min))
    [ ${spread} -lt 0 ] && spread=$((0-spread))
    let spread+=divisibleBy
    randomBetweenAnswer=$(((RANDOM%spread)/divisibleBy*divisibleBy+min))   
 
    return 0
 
    #  However, Paulo Marcel Coelho Aragao points out that
    #+ when $max and $min are not divisible by $divisibleBy,
    #+ the formula fails.
    #
    #  He suggests instead the following formula:
    #    rnumber = $(((RANDOM%(max-min+1)+min)/divisibleBy*divisibleBy))
 
 }
 
 # Let's test the function.
 min=-14
 max=20
 divisibleBy=3
 
 
 #  Generate an array of expected answers and check to make sure we get
 #+ at least one of each answer if we loop long enough.
 
 declare -a answer
 minimum=${min}
 maximum=${max}
    if [ $((minimum/divisibleBy*divisibleBy)) -ne ${minimum} ]; then 
       if [ ${minimum} -lt 0 ]; then
          minimum=$((minimum/divisibleBy*divisibleBy))
       else
          minimum=$((((minimum/divisibleBy)+1)*divisibleBy))
       fi
    fi
 
 
    #  If max is itself not evenly divisible by $divisibleBy,
    #+ then fix the max to be within range.
 
    if [ $((maximum/divisibleBy*divisibleBy)) -ne ${maximum} ]; then 
       if [ ${maximum} -lt 0 ]; then
          maximum=$((((maximum/divisibleBy)-1)*divisibleBy))
       else
          maximum=$((maximum/divisibleBy*divisibleBy))
       fi
    fi
 
 
 #  We need to generate only positive array subscripts,
 #+ so we need a displacement that that will guarantee
 #+ positive results.
 
 displacement=$((0-minimum))
 for ((i=${minimum}; i<=${maximum}; i+=divisibleBy)); do
    answer[i+displacement]=0
 done
 
 
 # Now loop a large number of times to see what we get.
 loopIt=1000   #  The script author suggests 100000,
               #+ but that takes a good long while.
 
 for ((i=0; i<${loopIt}; ++i)); do
 
    #  Note that we are specifying min and max in reversed order here to
    #+ make the function correct for this case.
 
    randomBetween ${max} ${min} ${divisibleBy}
 
    # Report an error if an answer is unexpected.
    [ ${randomBetweenAnswer} -lt ${min} -o ${randomBetweenAnswer} -gt ${max} ] && echo MIN or MAX error - ${randomBetweenAnswer}!
    [ $((randomBetweenAnswer%${divisibleBy})) -ne 0 ] && echo DIVISIBLE BY error - ${randomBetweenAnswer}!
 
    # Store the answer away statistically.
    answer[randomBetweenAnswer+displacement]=$((answer[randomBetweenAnswer+displacement]+1))
 done
 
 
 
 # Let's check the results
 
 for ((i=${minimum}; i<=${maximum}; i+=divisibleBy)); do
    [ ${answer[i+displacement]} -eq 0 ] && echo "We never got an answer of $i." || echo "${i} occurred ${answer[i+displacement]} times."
 done
 
 
 exit 0

Just how random is $RANDOM? The best way to test this is to write a script that tracks the distribution of "random" numbers generated by $RANDOM. Let's roll a $RANDOM die a few times . . .

Example 9-27. Rolling a single die with RANDOM

#!/bin/bash
 # How random is RANDOM?
 
 RANDOM=$$       # Reseed the random number generator using script process ID.
 
 PIPS=6          # A die has 6 pips.
 MAXTHROWS=600   # Increase this if you have nothing better to do with your time.
 throw=0         # Throw count.
 
 ones=0          #  Must initialize counts to zero,
 twos=0          #+ since an uninitialized variable is null, not zero.
 threes=0
 fours=0
 fives=0
 sixes=0
 
 print_result ()
 {
 echo
 echo "ones =   $ones"
 echo "twos =   $twos"
 echo "threes = $threes"
 echo "fours =  $fours"
 echo "fives =  $fives"
 echo "sixes =  $sixes"
 echo
 }
 
 update_count()
 {
 case "$1" in
   0) let "ones += 1";;   # Since die has no "zero", this corresponds to 1.
   1) let "twos += 1";;   # And this to 2, etc.
   2) let "threes += 1";;
   3) let "fours += 1";;
   4) let "fives += 1";;
   5) let "sixes += 1";;
 esac
 }
 
 echo
 
 
 while [ "$throw" -lt "$MAXTHROWS" ]
 do
   let "die1 = RANDOM % $PIPS"
   update_count $die1
   let "throw += 1"
 done  
 
 print_result
 
 exit 0
 
 #  The scores should distribute fairly evenly, assuming RANDOM is fairly random.
 #  With $MAXTHROWS at 600, all should cluster around 100, plus-or-minus 20 or so.
 #
 #  Keep in mind that RANDOM is a pseudorandom generator,
 #+ and not a spectacularly good one at that.
 
 #  Randomness is a deep and complex subject.
 #  Sufficiently long "random" sequences may exhibit
 #+ chaotic and other "non-random" behavior.
 
 # Exercise (easy):
 # ---------------
 # Rewrite this script to flip a coin 1000 times.
 # Choices are "HEADS" and "TAILS".

As we have seen in the last example, it is best to "reseed" the RANDOM generator each time it is invoked. Using the same seed for RANDOM repeats the same series of numbers. [2] (This mirrors the behavior of the random() function in C.)

Example 9-28. Reseeding RANDOM

#!/bin/bash
 # seeding-random.sh: Seeding the RANDOM variable.
 
 MAXCOUNT=25       # How many numbers to generate.
 
 random_numbers ()
 {
 count=0
 while [ "$count" -lt "$MAXCOUNT" ]
 do
   number=$RANDOM
   echo -n "$number "
   let "count += 1"
 done  
 }
 
 echo; echo
 
 RANDOM=1          # Setting RANDOM seeds the random number generator.
 random_numbers
 
 echo; echo
 
 RANDOM=1          # Same seed for RANDOM...
 random_numbers    # ...reproduces the exact same number series.
                   #
                   # When is it useful to duplicate a "random" number series?
 
 echo; echo
 
 RANDOM=2          # Trying again, but with a different seed...
 random_numbers    # gives a different number series.
 
 echo; echo
 
 # RANDOM=$$  seeds RANDOM from process id of script.
 # It is also possible to seed RANDOM from 'time' or 'date' commands.
 
 # Getting fancy...
 SEED=$(head -1 /dev/urandom | od -N 1 | awk '{ print $2 }')
 #  Pseudo-random output fetched
 #+ from /dev/urandom (system pseudo-random device-file),
 #+ then converted to line of printable (octal) numbers by "od",
 #+ finally "awk" retrieves just one number for SEED.
 RANDOM=$SEED
 random_numbers
 
 echo; echo
 
 exit 0

Note

The /dev/urandom device-file provides a method of generating much more "random" pseudorandom numbers than the $RANDOM variable. dd if=/dev/urandom of=targetfile bs=1 count=XX creates a file of well-scattered pseudorandom numbers. However, assigning these numbers to a variable in a script requires a workaround, such as filtering through od (as in above example and Example 12-13), or using dd (see Example 12-54), or even piping to md5sum (see Example 33-14).

There are also other ways to generate pseudorandom numbers in a script. Awk provides a convenient means of doing this.

Example 9-29. Pseudorandom numbers, using awk

#!/bin/bash
 # random2.sh: Returns a pseudorandom number in the range 0 - 1.
 # Uses the awk rand() function.
 
 AWKSCRIPT=' { srand(); print rand() } '
 #            Command(s) / parameters passed to awk
 # Note that srand() reseeds awk's random number generator.
 
 
 echo -n "Random number between 0 and 1 = "
 
 echo | awk "$AWKSCRIPT"
 # What happens if you leave out the 'echo'?
 
 exit 0
 
 
 # Exercises:
 # ---------
 
 # 1) Using a loop construct, print out 10 different random numbers.
 #      (Hint: you must reseed the "srand()" function with a different seed
 #+     in each pass through the loop. What happens if you fail to do this?)
 
 # 2) Using an integer multiplier as a scaling factor, generate random numbers 
 #+   in the range between 10 and 100.
 
 # 3) Same as exercise #2, above, but generate random integers this time.

The date command also lends itself to generating pseudorandom integer sequences.

Notes

[1]

True "randomness," insofar as it exists at all, can only be found in certain incompletely understood natural phenomena such as radioactive decay. Computers can only simulate randomness, and computer-generated sequences of "random" numbers are therefore referred to as pseudorandom.

[2]

The seed of a computer-generated pseudorandom number series can be considered an identification label. For example, think of the pseudorandom series with a seed of 23 as series #23.

A property of a pseurandom number series is the length of the cycle before it starts repeating itself. A good pseurandom generator will produce series with very long cycles.

Loops

A loop is a block of code that iterates (repeats) a list of commands as long as the loop control condition is true.

for loops

for arg in [list]

This is the basic looping construct. It differs significantly from its C counterpart.

for arg in [list]
do
command(s)...
done

Note

During each pass through the loop, arg takes on the value of each successive variable in the list.

for arg in "$var1" "$var2" "$var3" ... "$varN"  
 # In pass 1 of the loop, arg = $var1	    
 # In pass 2 of the loop, arg = $var2	    
 # In pass 3 of the loop, arg = $var3	    
 # ...
 # In pass N of the loop, arg = $varN
 
 # Arguments in [list] quoted to prevent possible word splitting.

The argument list may contain wild cards.

If do is on same line as for, there needs to be a semicolon after list.

for arg in [list] ; do

Example 10-1. Simple for loops

#!/bin/bash
 # Listing the planets.
 
 for planet in Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto
 do
   echo $planet  # Each planet on a separate line.
 done
 
 echo
 
 for planet in "Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto"
 # All planets on same line.
 # Entire 'list' enclosed in quotes creates a single variable.
 do
   echo $planet
 done
 
 exit 0

Note

Each [list] element may contain multiple parameters. This is useful when processing parameters in groups. In such cases, use the set command (see Example 11-15) to force parsing of each [list] element and assignment of each component to the positional parameters.

Example 10-2. for loop with two parameters in each [list] element

#!/bin/bash
 # Planets revisited.
 
 # Associate the name of each planet with its distance from the sun.
 
 for planet in "Mercury 36" "Venus 67" "Earth 93"  "Mars 142" "Jupiter 483"
 do
   set -- $planet  # Parses variable "planet" and sets positional parameters.
   # the "--" prevents nasty surprises if $planet is null or begins with a dash.
 
   # May need to save original positional parameters, since they get overwritten.
   # One way of doing this is to use an array,
   #        original_params=("$@")
 
   echo "$1		$2,000,000 miles from the sun"
   #-------two  tabs---concatenate zeroes onto parameter $2
 done
 
 # (Thanks, S.C., for additional clarification.)
 
 exit 0

A variable may supply the [list] in a for loop.

Example 10-3. Fileinfo: operating on a file list contained in a variable

#!/bin/bash
 # fileinfo.sh
 
 FILES="/usr/sbin/accept
 /usr/sbin/pwck
 /usr/sbin/chroot
 /usr/bin/fakefile
 /sbin/badblocks
 /sbin/ypbind"     # List of files you are curious about.
                   # Threw in a dummy file, /usr/bin/fakefile.
 
 echo
 
 for file in $FILES
 do
 
   if [ ! -e "$file" ]       # Check if file exists.
   then
     echo "$file does not exist."; echo
     continue                # On to next.
    fi
 
   ls -l $file | awk '{ print $9 "         file size: " $5 }'  # Print 2 fields.
   whatis `basename $file`   # File info.
   # Note that the whatis database needs to have been set up for this to work.
   # To do this, as root run /usr/bin/makewhatis.
   echo
 done  
 
 exit 0

The [list] in a for loop may contain filename globbing, that is, wildcards used in filename expansion.

Example 10-4. Operating on files with a for loop

#!/bin/bash
 # list-glob.sh: Generating [list] in a for-loop using "globbing".
 
 echo
 
 for file in *
 do
   ls -l "$file"  # Lists all files in $PWD (current directory).
   # Recall that the wild card character "*" matches every filename,
   # however, in "globbing", it doesn't match dot-files.
 
   # If the pattern matches no file, it is expanded to itself.
   # To prevent this, set the nullglob option
   # (shopt -s nullglob).
   # Thanks, S.C.
 done
 
 echo; echo
 
 for file in [jx]*
 do
   rm -f $file    # Removes only files beginning with "j" or "x" in $PWD.
   echo "Removed file \"$file\"".
 done
 
 echo
 
 exit 0

Omitting the in [list] part of a for loop causes the loop to operate on $@ -- the list of arguments given on the command line to the script. A particularly clever illustration of this is Example A-16.

Example 10-5. Missing in [list] in a for loop

#!/bin/bash
 
 #  Invoke this script both with and without arguments,
 #+ and see what happens.
 
 for a
 do
  echo -n "$a "
 done
 
 #  The 'in list' missing, therefore the loop operates on '$@'
 #+ (command-line argument list, including whitespace).
 
 echo
 
 exit 0

It is possible to use command substitution to generate the [list] in a for loop. See also Example 12-48, Example 10-10 and Example 12-42.

Example 10-6. Generating the [list] in a for loop with command substitution

#!/bin/bash
 #  for-loopcmd.sh: for-loop with [list]
 #+ generated by command substitution.
 
 NUMBERS="9 7 3 8 37.53"
 
 for number in `echo $NUMBERS`  # for number in 9 7 3 8 37.53
 do
   echo -n "$number "
 done
 
 echo 
 exit 0

This is a somewhat more complex example of using command substitution to create the [list].

Example 10-7. A grep replacement for binary files

#!/bin/bash
 # bin-grep.sh: Locates matching strings in a binary file.
 
 # A "grep" replacement for binary files.
 # Similar effect to "grep -a"
 
 E_BADARGS=65
 E_NOFILE=66
 
 if [ $# -ne 2 ]
 then
   echo "Usage: `basename $0` search_string filename"
   exit $E_BADARGS
 fi
 
 if [ ! -f "$2" ]
 then
   echo "File \"$2\" does not exist."
   exit $E_NOFILE
 fi  
 
 
 IFS="\n"         # Per suggestion of Paulo Marcel Coelho Aragao.
 for word in $( strings "$2" | grep "$1" )
 # The "strings" command lists strings in binary files.
 # Output then piped to "grep", which tests for desired string.
 do
   echo $word
 done
 
 # As S.C. points out, lines 23 - 29 could be replaced with the simpler
 #    strings "$2" | grep "$1" | tr -s "$IFS" '[\n*]'
 
 
 # Try something like  "./bin-grep.sh mem /bin/ls"  to exercise this script.
 
 exit 0

More of the same.

Example 10-8. Listing all users on the system

#!/bin/bash
 # userlist.sh
 
 PASSWORD_FILE=/etc/passwd
 n=1           # User number
 
 for name in $(awk 'BEGIN{FS=":"}{print $1}' < "$PASSWORD_FILE" )
 # Field separator = :    ^^^^^^
 # Print first field              ^^^^^^^^
 # Get input from password file               ^^^^^^^^^^^^^^^^^
 do
   echo "USER #$n = $name"
   let "n += 1"
 done  
 
 
 # USER #1 = root
 # USER #2 = bin
 # USER #3 = daemon
 # ...
 # USER #30 = bozo
 
 exit 0
 
 #  Exercise:
 #  --------
 #  How is it that an ordinary user (or a script run by same)
 #+ can read /etc/passwd?
 #  Isn't this a security hole? Why or why not?

A final example of the [list] resulting from command substitution.

Example 10-9. Checking all the binaries in a directory for authorship

#!/bin/bash
 # findstring.sh:
 # Find a particular string in binaries in a specified directory.
 
 directory=/usr/bin/
 fstring="Free Software Foundation"  # See which files come from the FSF.
 
 for file in $( find $directory -type f -name '*' | sort )
 do
   strings -f $file | grep "$fstring" | sed -e "s%$directory%%"
   #  In the "sed" expression,
   #+ it is necessary to substitute for the normal "/" delimiter
   #+ because "/" happens to be one of the characters filtered out.
   #  Failure to do so gives an error message (try it).
 done  
 
 exit 0
 
 #  Exercise (easy):
 #  ---------------
 #  Convert this script to taking command-line parameters
 #+ for $directory and $fstring.

The output of a for loop may be piped to a command or commands.

Example 10-10. Listing the symbolic links in a directory

#!/bin/bash
 # symlinks.sh: Lists symbolic links in a directory.
 
 
 directory=${1-`pwd`}
 #  Defaults to current working directory,
 #+ if not otherwise specified.
 #  Equivalent to code block below.
 # ----------------------------------------------------------
 # ARGS=1                 # Expect one command-line argument.
 #
 # if [ $# -ne "$ARGS" ]  # If not 1 arg...
 # then
 #   directory=`pwd`      # current working directory
 # else
 #   directory=$1
 # fi
 # ----------------------------------------------------------
 
 echo "symbolic links in directory \"$directory\""
 
 for file in "$( find $directory -type l )"   # -type l = symbolic links
 do
   echo "$file"
 done | sort                                  # Otherwise file list is unsorted.
 #  Strictly speaking, a loop isn't really necessary here,
 #+ since the output of the "find" command is expanded into a single word.
 #  However, it's easy to understand and illustrative this way.
 
 #  As Dominik 'Aeneas' Schnitzer points out,
 #+ failing to quote  $( find $directory -type l )
 #+ will choke on filenames with embedded whitespace.
 #  Even this will only pick up the first field of each argument.
 
 exit 0
 
 
 # Jean Helou proposes the following alternative:
 
 echo "symbolic links in directory \"$directory\""
 # Backup of the current IFS. One can never be too cautious.
 OLDIFS=$IFS
 IFS=:
 
 for file in $(find $directory -type l -printf "%p$IFS")
 do     #                              ^^^^^^^^^^^^^^^^
        echo "$file"
        done|sort

The stdout of a loop may be redirected to a file, as this slight modification to the previous example shows.

Example 10-11. Symbolic links in a directory, saved to a file

#!/bin/bash
 # symlinks.sh: Lists symbolic links in a directory.
 
 OUTFILE=symlinks.list                         # save file
 
 directory=${1-`pwd`}
 #  Defaults to current working directory,
 #+ if not otherwise specified.
 
 
 echo "symbolic links in directory \"$directory\"" > "$OUTFILE"
 echo "---------------------------" >> "$OUTFILE"
 
 for file in "$( find $directory -type l )"    # -type l = symbolic links
 do
   echo "$file"
 done | sort >> "$OUTFILE"                     # stdout of loop
 #           ^^^^^^^^^^^^^                       redirected to save file.
 
 exit 0

There is an alternative syntax to a for loop that will look very familiar to C programmers. This requires double parentheses.

Example 10-12. A C-like for loop

#!/bin/bash
 # Two ways to count up to 10.
 
 echo
 
 # Standard syntax.
 for a in 1 2 3 4 5 6 7 8 9 10
 do
   echo -n "$a "
 done  
 
 echo; echo
 
 # +==========================================+
 
 # Now, let's do the same, using C-like syntax.
 
 LIMIT=10
 
 for ((a=1; a <= LIMIT ; a++))  # Double parentheses, and "LIMIT" with no "$".
 do
   echo -n "$a "
 done                           # A construct borrowed from 'ksh93'.
 
 echo; echo
 
 # +=========================================================================+
 
 # Let's use the C "comma operator" to increment two variables simultaneously.
 
 for ((a=1, b=1; a <= LIMIT ; a++, b++))  # The comma chains together operations.
 do
   echo -n "$a-$b "
 done
 
 echo; echo
 
 exit 0

See also Example 26-15, Example 26-16, and Example A-6.

---

Now, a for loop used in a "real-life" context.

Example 10-13. Using efax in batch mode

#!/bin/bash
 # Faxing (must have 'fax' installed).
 
 EXPECTED_ARGS=2
 E_BADARGS=65
 
 if [ $# -ne $EXPECTED_ARGS ]
 # Check for proper no. of command line args.
 then
    echo "Usage: `basename $0` phone# text-file"
    exit $E_BADARGS
 fi
 
 
 if [ ! -f "$2" ]
 then
   echo "File $2 is not a text file"
   exit $E_BADARGS
 fi
   
 
 fax make $2              # Create fax formatted files from text files.
 
 for file in $(ls $2.0*)  # Concatenate the converted files.
                          # Uses wild card in variable list.
 do
   fil="$fil $file"
 done  
 
 efax -d /dev/ttyS3 -o1 -t "T$1" $fil   # Do the work.
 
 
 # As S.C. points out, the for-loop can be eliminated with
 #    efax -d /dev/ttyS3 -o1 -t "T$1" $2.0*
 # but it's not quite as instructive [grin].
 
 exit 0
while

This construct tests for a condition at the top of a loop, and keeps looping as long as that condition is true (returns a 0 exit status). In contrast to a for loop, a while loop finds use in situations where the number of loop repetitions is not known beforehand.

while [condition]
do
command...
done

As is the case with for loops, placing the do on the same line as the condition test requires a semicolon.

while [condition] ; do

Note that certain specialized while loops, as, for example, a getopts construct, deviate somewhat from the standard template given here.

Example 10-14. Simple while loop

#!/bin/bash
 
 var0=0
 LIMIT=10
 
 while [ "$var0" -lt "$LIMIT" ]
 do
   echo -n "$var0 "        # -n suppresses newline.
   #             ^           Space, to separate printed out numbers.
 
   var0=`expr $var0 + 1`   # var0=$(($var0+1))  also works.
                           # var0=$((var0 + 1)) also works.
                           # let "var0 += 1"    also works.
 done                      # Various other methods also work.
 
 echo
 
 exit 0

Example 10-15. Another while loop

#!/bin/bash
 
 echo
                                # Equivalent to:
 while [ "$var1" != "end" ]     # while test "$var1" != "end"
 do
   echo "Input variable #1 (end to exit) "
   read var1                    # Not 'read $var1' (why?).
   echo "variable #1 = $var1"   # Need quotes because of "#" . . .
   # If input is 'end', echoes it here.
   # Does not test for termination condition until top of loop.
   echo
 done  
 
 exit 0

A while loop may have multiple conditions. Only the final condition determines when the loop terminates. This necessitates a slightly different loop syntax, however.

Example 10-16. while loop with multiple conditions

#!/bin/bash
 
 var1=unset
 previous=$var1
 
 while echo "previous-variable = $previous"
       echo
       previous=$var1
       [ "$var1" != end ] # Keeps track of what $var1 was previously.
       # Four conditions on "while", but only last one controls loop.
       # The *last* exit status is the one that counts.
 do
 echo "Input variable #1 (end to exit) "
   read var1
   echo "variable #1 = $var1"
 done  
 
 # Try to figure out how this all works.
 # It's a wee bit tricky.
 
 exit 0

As with a for loop, a while loop may employ C-like syntax by using the double parentheses construct (see also Example 9-30).

Example 10-17. C-like syntax in a while loop

#!/bin/bash
 # wh-loopc.sh: Count to 10 in a "while" loop.
 
 LIMIT=10
 a=1
 
 while [ "$a" -le $LIMIT ]
 do
   echo -n "$a "
   let "a+=1"
 done           # No surprises, so far.
 
 echo; echo
 
 # +=================================================================+
 
 # Now, repeat with C-like syntax.
 
 ((a = 1))      # a=1
 # Double parentheses permit space when setting a variable, as in C.
 
 while (( a <= LIMIT ))   # Double parentheses, and no "$" preceding variables.
 do
   echo -n "$a "
   ((a += 1))   # let "a+=1"
   # Yes, indeed.
   # Double parentheses permit incrementing a variable with C-like syntax.
 done
 
 echo
 
 # Now, C programmers can feel right at home in Bash.
 
 exit 0

Note

A while loop may have its stdin redirected to a file by a < at its end.

A while loop may have its stdin supplied by a pipe.

until

This construct tests for a condition at the top of a loop, and keeps looping as long as that condition is false (opposite of while loop).

until [condition-is-true]
do
command...
done

Note that an until loop tests for the terminating condition at the top of the loop, differing from a similar construct in some programming languages.

As is the case with for loops, placing the do on the same line as the condition test requires a semicolon.

until [condition-is-true] ; do

Example 10-18. until loop

#!/bin/bash
 
 END_CONDITION=end
 
 until [ "$var1" = "$END_CONDITION" ]
 # Tests condition here, at top of loop.
 do
   echo "Input variable #1 "
   echo "($END_CONDITION to exit)"
   read var1
   echo "variable #1 = $var1"
   echo
 done  
 
 exit 0

Text Processing Commands

Commands affecting text and text files

sort

File sorter, often used as a filter in a pipe. This command sorts a text stream or file forwards or backwards, or according to various keys or character positions. Using the -m option, it merges presorted input files. The info page lists its many capabilities and options. See Example 10-9, Example 10-10, and Example A-8.

tsort

Topological sort, reading in pairs of whitespace-separated strings and sorting according to input patterns.

uniq

This filter removes duplicate lines from a sorted file. It is often seen in a pipe coupled with sort.
cat list-1 list-2 list-3 | sort | uniq > final.list
 # Concatenates the list files,
 # sorts them,
 # removes duplicate lines,
 # and finally writes the result to an output file.

The useful -c option prefixes each line of the input file with its number of occurrences.

bash$ cat testfile
 This line occurs only once.
  This line occurs twice.
  This line occurs twice.
  This line occurs three times.
  This line occurs three times.
  This line occurs three times.
 
 
 bash$ uniq -c testfile
       1 This line occurs only once.
        2 This line occurs twice.
        3 This line occurs three times.
 
 
 bash$ sort testfile | uniq -c | sort -nr
       3 This line occurs three times.
        2 This line occurs twice.
        1 This line occurs only once.
 	      

The sort INPUTFILE | uniq -c | sort -nr command string produces a frequency of occurrence listing on the INPUTFILE file (the -nr options to sort cause a reverse numerical sort). This template finds use in analysis of log files and dictionary lists, and wherever the lexical structure of a document needs to be examined.

Example 12-11. Word Frequency Analysis

#!/bin/bash
 # wf.sh: Crude word frequency analysis on a text file.
 # This is a more efficient version of the "wf2.sh" script.
 
 
 # Check for input file on command line.
 ARGS=1
 E_BADARGS=65
 E_NOFILE=66
 
 if [ $# -ne "$ARGS" ]  # Correct number of arguments passed to script?
 then
   echo "Usage: `basename $0` filename"
   exit $E_BADARGS
 fi
 
 if [ ! -f "$1" ]       # Check if file exists.
 then
   echo "File \"$1\" does not exist."
   exit $E_NOFILE
 fi
 
 
 
 ########################################################
 # main ()
 sed -e 's/\.//g'  -e 's/\,//g' -e 's/ /\
 /g' "$1" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr
 #                           =========================
 #                            Frequency of occurrence
 
 #  Filter out periods and commas, and
 #+ change space between words to linefeed,
 #+ then shift characters to lowercase, and
 #+ finally prefix occurrence count and sort numerically.
 
 #  Arun Giridhar suggests modifying the above to:
 #  . . . | sort | uniq -c | sort +1 [-f] | sort +0 -nr
 #  This adds a secondary sort key, so instances of
 #+ equal occurrence are sorted alphabetically.
 #  As he explains it:
 #  "This is effectively a radix sort, first on the
 #+ least significant column
 #+ (word or string, optionally case-insensitive)
 #+ and last on the most significant column (frequency)."
 ########################################################
 
 exit 0
 
 # Exercises:
 # ---------
 # 1) Add 'sed' commands to filter out other punctuation,
 #+   such as semicolons.
 # 2) Modify the script to also filter out multiple spaces and
 #    other whitespace.

bash$ cat testfile
 This line occurs only once.
  This line occurs twice.
  This line occurs twice.
  This line occurs three times.
  This line occurs three times.
  This line occurs three times.
 
 
 bash$ ./wf.sh testfile
       6 this
        6 occurs
        6 line
        3 times
        3 three
        2 twice
        1 only
        1 once
 	       

expand, unexpand

The expand filter converts tabs to spaces. It is often used in a pipe.

The unexpand filter converts spaces to tabs. This reverses the effect of expand.

cut

A tool for extracting fields from files. It is similar to the print $N command set in awk, but more limited. It may be simpler to use cut in a script than awk. Particularly important are the -d (delimiter) and -f (field specifier) options.

Using cut to obtain a listing of the mounted filesystems:
cat /etc/mtab | cut -d ' ' -f1,2

Using cut to list the OS and kernel version:
uname -a | cut -d" " -f1,3,11,12

Using cut to extract message headers from an e-mail folder:
bash$ grep '^Subject:' read-messages | cut -c10-80
 Re: Linux suitable for mission-critical apps?
  MAKE MILLIONS WORKING AT HOME!!!
  Spam complaint
  Re: Spam complaint

Using cut to parse a file:
# List all the users in /etc/passwd.
 
 FILENAME=/etc/passwd
 
 for user in $(cut -d: -f1 $FILENAME)
 do
   echo $user
 done
 
 # Thanks, Oleg Philon for suggesting this.

cut -d ' ' -f2,3 filename is equivalent to awk -F'[ ]' '{ print $2, $3 }' filename

See also Example 12-42.

paste

Tool for merging together different files into a single, multi-column file. In combination with cut, useful for creating system log files.

join

Consider this a special-purpose cousin of paste. This powerful utility allows merging two files in a meaningful fashion, which essentially creates a simple version of a relational database.

The join command operates on exactly two files, but pastes together only those lines with a common tagged field (usually a numerical label), and writes the result to stdout. The files to be joined should be sorted according to the tagged field for the matchups to work properly.

File: 1.data
 
 100 Shoes
 200 Laces
 300 Socks

File: 2.data
 
 100 $40.00
 200 $1.00
 300 $2.00

bash$ join 1.data 2.data
 File: 1.data 2.data
 
  100 Shoes $40.00
  200 Laces $1.00
  300 Socks $2.00
 	      

Note

The tagged field appears only once in the output.

head

lists the beginning of a file to stdout (the default is 10 lines, but this can be changed). It has a number of interesting options.

Example 12-12. Which files are scripts?

#!/bin/bash
 # script-detector.sh: Detects scripts within a directory.
 
 TESTCHARS=2    # Test first 2 characters.
 SHABANG='#!'   # Scripts begin with a "sha-bang."
 
 for file in *  # Traverse all the files in current directory.
 do
   if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]]
   #      head -c2                      #!
   #  The '-c' option to "head" outputs a specified
   #+ number of characters, rather than lines (the default).
   then
     echo "File \"$file\" is a script."
   else
     echo "File \"$file\" is *not* a script."
   fi
 done
   
 exit 0
 
 #  Exercises:
 #  ---------
 #  1) Modify this script to take as an optional argument
 #+    the directory to scan for scripts
 #+    (rather than just the current working directory).
 #
 #  2) As it stands, this script gives "false positives" for
 #+    Perl, awk, and other scripting language scripts.
 #     Correct this.

Example 12-13. Generating 10-digit random numbers

#!/bin/bash
 # rnd.sh: Outputs a 10-digit random number
 
 # Script by Stephane Chazelas.
 
 head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'
 
 
 # =================================================================== #
 
 # Analysis
 # --------
 
 # head:
 # -c4 option takes first 4 bytes.
 
 # od:
 # -N4 option limits output to 4 bytes.
 # -tu4 option selects unsigned decimal format for output.
 
 # sed: 
 # -n option, in combination with "p" flag to the "s" command,
 # outputs only matched lines.
 
 
 
 # The author of this script explains the action of 'sed', as follows.
 
 # head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'
 # ----------------------------------> |
 
 # Assume output up to "sed" --------> |
 # is 0000000 1198195154\n
 
 #  sed begins reading characters: 0000000 1198195154\n.
 #  Here it finds a newline character,
 #+ so it is ready to process the first line (0000000 1198195154).
 #  It looks at its <range><action>s. The first and only one is
 
 #   range     action
 #   1         s/.* //p
 
 #  The line number is in the range, so it executes the action:
 #+ tries to substitute the longest string ending with a space in the line
 #  ("0000000 ") with nothing (//), and if it succeeds, prints the result
 #  ("p" is a flag to the "s" command here, this is different from the "p" command).
 
 #  sed is now ready to continue reading its input. (Note that before
 #+ continuing, if -n option had not been passed, sed would have printed
 #+ the line once again).
 
 # Now, sed reads the remainder of the characters, and finds the end of the file.
 # It is now ready to process its 2nd line (which is also numbered '$' as
 # it's the last one).
 # It sees it is not matched by any <range>, so its job is done.
 
 #  In few word this sed commmand means:
 #  "On the first line only, remove any character up to the right-most space,
 #+ then print it."
 
 # A better way to do this would have been:
 #           sed -e 's/.* //;q'
 
 # Here, two <range><action>s (could have been written
 #           sed -e 's/.* //' -e q):
 
 #   range                    action
 #   nothing (matches line)   s/.* //
 #   nothing (matches line)   q (quit)
 
 #  Here, sed only reads its first line of input.
 #  It performs both actions, and prints the line (substituted) before quitting
 #+ (because of the "q" action) since the "-n" option is not passed.
 
 # =================================================================== #
 
 # An even simpler altenative to the above one-line script would be:
 #           head -c4 /dev/urandom| od -An -tu4
 
 exit 0
See also Example 12-35.

tail

lists the end of a file to stdout (the default is 10 lines). Commonly used to keep track of changes to a system logfile, using the -f option, which outputs lines appended to the file.

Example 12-14. Using tail to monitor the system log

#!/bin/bash
 
 filename=sys.log
 
 cat /dev/null > $filename; echo "Creating / cleaning out file."
 #  Creates file if it does not already exist,
 #+ and truncates it to zero length if it does.
 #  : > filename   and   > filename also work.
 
 tail /var/log/messages > $filename  
 # /var/log/messages must have world read permission for this to work.
 
 echo "$filename contains tail end of system log."
 
 exit 0

Tip

To list a specific line of a text file, pipe the output of head to tail -1. For example head -8 database.txt | tail -1 lists the 8th line of the file database.txt.

To set a variable to a given block of a text file:
var=$(head -$m $filename | tail -$n)
 
 # filename = name of file
 # m = from beginning of file, number of lines to end of block
 # n = number of lines to set variable to (trim from end of block)

See also Example 12-5, Example 12-35 and Example 29-6.

grep

A multi-purpose file search tool that uses Regular Expressions. It was originally a command/filter in the venerable ed line editor: g/re/p -- global - regular expression - print.

grep pattern [file...]

Search the target file(s) for occurrences of pattern, where pattern may be literal text or a Regular Expression.

bash$ grep '[rst]ystem.$' osinfo.txt
 The GPL governs the distribution of the Linux operating system.
 	      

If no target file(s) specified, grep works as a filter on stdout, as in a pipe.

bash$ ps ax | grep clock
 765 tty1     S      0:00 xclock
  901 pts/1    S      0:00 grep clock
 	      

The -i option causes a case-insensitive search.

The -w option matches only whole words.

The -l option lists only the files in which matches were found, but not the matching lines.

The -r (recursive) option searches files in the current working directory and all subdirectories below it.

The -n option lists the matching lines, together with line numbers.

bash$ grep -n Linux osinfo.txt
 2:This is a file containing information about Linux.
  6:The GPL governs the distribution of the Linux operating system.
 	      

The -v (or --invert-match) option filters out matches.
grep pattern1 *.txt | grep -v pattern2
 
 # Matches all lines in "*.txt" files containing "pattern1",
 # but ***not*** "pattern2".	      

The -c (--count) option gives a numerical count of matches, rather than actually listing the matches.
grep -c txt *.sgml   # (number of occurrences of "txt" in "*.sgml" files)
 
 
 #   grep -cz .
 #            ^ dot
 # means count (-c) zero-separated (-z) items matching "."
 # that is, non-empty ones (containing at least 1 character).
 # 
 printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz .     # 4
 printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '$'   # 5
 printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '^'   # 5
 #
 printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -c '$'    # 9
 # By default, newline chars (\n) separate items to match. 
 
 # Note that the -z option is GNU "grep" specific.
 
 
 # Thanks, S.C.

When invoked with more than one target file given, grep specifies which file contains matches.

bash$ grep Linux osinfo.txt misc.txt
 osinfo.txt:This is a file containing information about Linux.
  osinfo.txt:The GPL governs the distribution of the Linux operating system.
  misc.txt:The Linux operating system is steadily gaining in popularity.
 	      

Tip

To force grep to show the filename when searching only one target file, simply give /dev/null as the second file.

bash$ grep Linux osinfo.txt /dev/null
 osinfo.txt:This is a file containing information about Linux.
  osinfo.txt:The GPL governs the distribution of the Linux operating system.
 	      

If there is a successful match, grep returns an exit status of 0, which makes it useful in a condition test in a script, especially in combination with the -q option to suppress output.
SUCCESS=0                      # if grep lookup succeeds
 word=Linux
 filename=data.file
 
 grep -q "$word" "$filename"    # The "-q" option causes nothing to echo to stdout.
 
 if [ $? -eq $SUCCESS ]
 # if grep -q "$word" "$filename"   can replace lines 5 - 7.
 then
   echo "$word found in $filename"
 else
   echo "$word not found in $filename"
 fi

Example 29-6 demonstrates how to use grep to search for a word pattern in a system logfile.

Example 12-15. Emulating "grep" in a script

#!/bin/bash
 # grp.sh: Very crude reimplementation of 'grep'.
 
 E_BADARGS=65
 
 if [ -z "$1" ]    # Check for argument to script.
 then
   echo "Usage: `basename $0` pattern"
   exit $E_BADARGS
 fi  
 
 echo
 
 for file in *     # Traverse all files in $PWD.
 do
   output=$(sed -n /"$1"/p $file)  # Command substitution.
 
   if [ ! -z "$output" ]           # What happens if "$output" is not quoted?
   then
     echo -n "$file: "
     echo $output
   fi              #  sed -ne "/$1/s|^|${file}: |p"  is equivalent to above.
 
   echo
 done  
 
 echo
 
 exit 0
 
 # Exercises:
 # ---------
 # 1) Add newlines to output, if more than one match in any given file.
 # 2) Add features.

How can grep search for two (or more) separate patterns? What if you want grep to display all lines in a file or files that contain both "pattern1" and "pattern2"?

One method is to pipe the result of grep pattern1 to grep pattern2.

For example, given the following file:

# Filename: tstfile
 
 This is a sample file.
 This is an ordinary text file.
 This file does not contain any unusual text.
 This file is not unusual.
 Here is some text.

Now, let's search this file for lines containing both "file" and "test" . . .

bash$ grep file tstfile
 # Filename: tstfile
  This is a sample file.
  This is an ordinary text file.
  This file does not contain any unusual text.
  This file is not unusual.
 
 bash$ grep file tstfile | grep text
 This is an ordinary text file.
  This file does not contain any unusual text.

--

egrep - extended grep - is the same as grep -E. This uses a somewhat different, extended set of Regular Expressions, which can make the search a bit more flexible.

fgrep - fast grep - is the same as grep -F. It does a literal string search (no Regular Expressions), which usually speeds things up a bit.

Note

On some Linux distros, egrep and fgrep are symbolic links to, or aliases for grep, but invoked with the -E and -F options, respectively.

Example 12-16. Looking up definitions in Webster's 1913 Dictionary

#!/bin/bash
 # dict-lookup.sh
 
 #  This script looks up definitions in the 1913 Webster's Dictionary.
 #  This Public Domain dictionary is available for download
 #+ from various sites, including
 #+ Project Gutenberg (http://www.gutenberg.org/etext/247).
 #
 #  Convert it from DOS to UNIX format (only LF at end of line)
 #+ before using it with this script.
 #  Store the file in plain, uncompressed ASCII.
 #  Set DEFAULT_DICTFILE variable below to path/filename.
 
 
 E_BADARGS=65
 MAXCONTEXTLINES=50                        # Maximum number of lines to show.
 DEFAULT_DICTFILE="/usr/share/dict/webster1913-dict.txt"
                                           # Default dictionary file pathname.
                                           # Change this as necessary.
 #  Note:
 #  ----
 #  This particular edition of the 1913 Webster's
 #+ begins each entry with an uppercase letter
 #+ (lowercase for the remaining characters).
 #  Only the *very first line* of an entry begins this way,
 #+ and that's why the search algorithm below works.
 
 
 
 if [[ -z $(echo "$1" | sed -n '/^[A-Z]/p') ]]
 #  Must at least specify word to look up, and
 #+ it must start with an uppercase letter.
 then
   echo "Usage: `basename $0` Word-to-define [dictionary-file]"
   echo
   echo "Note: Word to look up must start with capital letter,"
   echo "with the rest of the word in lowercase."
   echo "--------------------------------------------"
   echo "Examples: Abandon, Dictionary, Marking, etc."
   exit $E_BADARGS
 fi
 
 
 if [ -z "$2" ]                            #  May specify different dictionary
                                           #+ as an argument to this script.
 then
   dictfile=$DEFAULT_DICTFILE
 else
   dictfile="$2"
 fi
 
 # ---------------------------------------------------------
 Definition=$(fgrep -A $MAXCONTEXTLINES "$1 \\" "$dictfile")
 #                  Definitions in form "Word \..."
 #
 #  And, yes, "fgrep" is fast enough
 #+ to search even a very large text file.
 
 
 # Now, snip out just the definition block.
 
 echo "$Definition" |
 sed -n '1,/^[A-Z]/p' |
 #  Print from first line of output
 #+ to the first line of the next entry.
 sed '$d' | sed '$d'
 #  Delete last two lines of output
 #+ (blank line and first line of next entry).
 # ---------------------------------------------------------
 
 exit 0
 
 # Exercises:
 # ---------
 # 1)  Modify the script to accept any type of alphabetic input
 #   + (uppercase, lowercase, mixed case), and convert it
 #   + to an acceptable format for processing.
 #
 # 2)  Convert the script to a GUI application,
 #   + using something like "gdialog" . . .
 #     The script will then no longer take its argument(s)
 #   + from the command line.
 #
 # 3)  Modify the script to parse one of the other available
 #   + Public Domain Dictionaries, such as the U.S. Census Bureau Gazetteer.

agrep (approximate grep) extends the capabilities of grep to approximate matching. The search string may differ by a specified number of characters from the resulting matches. This utility is not part of the core Linux distribution.

Tip

To search compressed files, use zgrep, zegrep, or zfgrep. These also work on non-compressed files, though slower than plain grep, egrep, fgrep. They are handy for searching through a mixed set of files, some compressed, some not.

To search bzipped files, use bzgrep.

look

The command look works like grep, but does a lookup on a "dictionary", a sorted word list. By default, look searches for a match in /usr/dict/words, but a different dictionary file may be specified.

Example 12-17. Checking words in a list for validity

#!/bin/bash
 # lookup: Does a dictionary lookup on each word in a data file.
 
 file=words.data  # Data file from which to read words to test.
 
 echo
 
 while [ "$word" != end ]  # Last word in data file.
 do
   read word      # From data file, because of redirection at end of loop.
   look $word > /dev/null  # Don't want to display lines in dictionary file.
   lookup=$?      # Exit status of 'look' command.
 
   if [ "$lookup" -eq 0 ]
   then
     echo "\"$word\" is valid."
   else
     echo "\"$word\" is invalid."
   fi  
 
 done <"$file"    # Redirects stdin to $file, so "reads" come from there.
 
 echo
 
 exit 0
 
 # ----------------------------------------------------------------
 # Code below line will not execute because of "exit" command above.
 
 
 # Stephane Chazelas proposes the following, more concise alternative:
 
 while read word && [[ $word != end ]]
 do if look "$word" > /dev/null
    then echo "\"$word\" is valid."
    else echo "\"$word\" is invalid."
    fi
 done <"$file"
 
 exit 0
sed, awk

Scripting languages especially suited for parsing text files and command output. May be embedded singly or in combination in pipes and shell scripts.

sed

Non-interactive "stream editor", permits using many ex commands in batch mode. It finds many uses in shell scripts.

awk

Programmable file extractor and formatter, good for manipulating and/or extracting fields (columns) in structured text files. Its syntax is similar to C.

wc

wc gives a "word count" on a file or I/O stream:
bash $ wc /usr/share/doc/sed-4.1.2/README
 13  70  447 README
 [13 lines  70 words  447 characters]

wc -w gives only the word count.

wc -l gives only the line count.

wc -c gives only the byte count.

wc -m gives only the character count.

wc -L gives only the length of the longest line.

Using wc to count how many .txt files are in current working directory:
$ ls *.txt | wc -l
 # Will work as long as none of the "*.txt" files have a linefeed in their name.
 
 # Alternative ways of doing this are:
 #      find . -maxdepth 1 -name \*.txt -print0 | grep -cz .
 #      (shopt -s nullglob; set -- *.txt; echo $#)
 
 # Thanks, S.C.

Using wc to total up the size of all the files whose names begin with letters in the range d - h
bash$ wc [d-h]* | grep total | awk '{print $3}'
 71832
 	      

Using wc to count the instances of the word "Linux" in the main source file for this book.
bash$ grep Linux abs-book.sgml | wc -l
 50
 	      

See also Example 12-35 and Example 16-8.

Certain commands include some of the functionality of wc as options.
... | grep foo | wc -l
 # This frequently used construct can be more concisely rendered.
 
 ... | grep -c foo
 # Just use the "-c" (or "--count") option of grep.
 
 # Thanks, S.C.

tr

character translation filter.

Caution

Must use quoting and/or brackets, as appropriate. Quotes prevent the shell from reinterpreting the special characters in tr command sequences. Brackets should be quoted to prevent expansion by the shell.

Either tr "A-Z" "*" <filename or tr A-Z \* <filename changes all the uppercase letters in filename to asterisks (writes to stdout). On some systems this may not work, but tr A-Z '[**]' will.

The -d option deletes a range of characters.
echo "abcdef"                 # abcdef
 echo "abcdef" | tr -d b-d     # aef
 
 
 tr -d 0-9 <filename
 # Deletes all digits from the file "filename".

The --squeeze-repeats (or -s) option deletes all but the first instance of a string of consecutive characters. This option is useful for removing excess whitespace.
bash$ echo "XXXXX" | tr --squeeze-repeats 'X'
 X

The -c "complement" option inverts the character set to match. With this option, tr acts only upon those characters not matching the specified set.

bash$ echo "acfdeb123" | tr -c b-d +
 +c+d+b++++

Note that tr recognizes POSIX character classes. [1]

bash$ echo "abcd2ef1" | tr '[:alpha:]' -
 ----2--1
 	      

Example 12-18. toupper: Transforms a file to all uppercase.

#!/bin/bash
 # Changes a file to all uppercase.
 
 E_BADARGS=65
 
 if [ -z "$1" ]  # Standard check for command line arg.
 then
   echo "Usage: `basename $0` filename"
   exit $E_BADARGS
 fi  
 
 tr a-z A-Z <"$1"
 
 # Same effect as above, but using POSIX character set notation:
 #        tr '[:lower:]' '[:upper:]' <"$1"
 # Thanks, S.C.
 
 exit 0
 
 #  Exercise:
 #  Rewrite this script to give the option of changing a file
 #+ to *either* upper or lowercase.

Example 12-19. lowercase: Changes all filenames in working directory to lowercase.

#!/bin/bash
 #
 #  Changes every filename in working directory to all lowercase.
 #
 #  Inspired by a script of John Dubois,
 #+ which was translated into Bash by Chet Ramey,
 #+ and considerably simplified by the author of the ABS Guide.
 
 
 for filename in *                # Traverse all files in directory.
 do
    fname=`basename $filename`
    n=`echo $fname | tr A-Z a-z`  # Change name to lowercase.
    if [ "$fname" != "$n" ]       # Rename only files not already lowercase.
    then
      mv $fname $n
    fi  
 done   
 
 exit $?
 
 
 # Code below this line will not execute because of "exit".
 #--------------------------------------------------------#
 # To run it, delete script above line.
 
 # The above script will not work on filenames containing blanks or newlines.
 # Stephane Chazelas therefore suggests the following alternative:
 
 
 for filename in *    # Not necessary to use basename,
                      # since "*" won't return any file containing "/".
 do n=`echo "$filename/" | tr '[:upper:]' '[:lower:]'`
 #                             POSIX char set notation.
 #                    Slash added so that trailing newlines are not
 #                    removed by command substitution.
    # Variable substitution:
    n=${n%/}          # Removes trailing slash, added above, from filename.
    [[ $filename == $n ]] || mv "$filename" "$n"
                      # Checks if filename already lowercase.
 done
 
 exit $?

Example 12-20. Du: DOS to UNIX text file conversion.

#!/bin/bash
 # Du.sh: DOS to UNIX text file converter.
 
 E_WRONGARGS=65
 
 if [ -z "$1" ]
 then
   echo "Usage: `basename $0` filename-to-convert"
   exit $E_WRONGARGS
 fi
 
 NEWFILENAME=$1.unx
 
 CR='\015'  # Carriage return.
            # 015 is octal ASCII code for CR.
            # Lines in a DOS text file end in CR-LF.
            # Lines in a UNIX text file end in LF only.
 
 tr -d $CR < $1 > $NEWFILENAME
 # Delete CR's and write to new file.
 
 echo "Original DOS text file is \"$1\"."
 echo "Converted UNIX text file is \"$NEWFILENAME\"."
 
 exit 0
 
 # Exercise:
 # --------
 # Change the above script to convert from UNIX to DOS.

Example 12-21. rot13: rot13, ultra-weak encryption.

#!/bin/bash
 # rot13.sh: Classic rot13 algorithm,
 #           encryption that might fool a 3-year old.
 
 # Usage: ./rot13.sh filename
 # or     ./rot13.sh <filename
 # or     ./rot13.sh and supply keyboard input (stdin)
 
 cat "$@" | tr 'a-zA-Z' 'n-za-mN-ZA-M'   # "a" goes to "n", "b" to "o", etc.
 #  The 'cat "$@"' construction
 #+ permits getting input either from stdin or from files.
 
 exit 0

Example 12-22. Generating "Crypto-Quote" Puzzles

#!/bin/bash
 # crypto-quote.sh: Encrypt quotes
 
 #  Will encrypt famous quotes in a simple monoalphabetic substitution.
 #  The result is similar to the "Crypto Quote" puzzles
 #+ seen in the Op Ed pages of the Sunday paper.
 
 
 key=ETAOINSHRDLUBCFGJMQPVWZYXK
 # The "key" is nothing more than a scrambled alphabet.
 # Changing the "key" changes the encryption.
 
 # The 'cat "$@"' construction gets input either from stdin or from files.
 # If using stdin, terminate input with a Control-D.
 # Otherwise, specify filename as command-line parameter.
 
 cat "$@" | tr "a-z" "A-Z" | tr "A-Z" "$key"
 #        |  to uppercase  |     encrypt       
 # Will work on lowercase, uppercase, or mixed-case quotes.
 # Passes non-alphabetic characters through unchanged.
 
 
 # Try this script with something like:
 # "Nothing so needs reforming as other people's habits."
 # --Mark Twain
 #
 # Output is:
 # "CFPHRCS QF CIIOQ MINFMBRCS EQ FPHIM GIFGUI'Q HETRPQ."
 # --BEML PZERC
 
 # To reverse the encryption:
 # cat "$@" | tr "$key" "A-Z"
 
 
 #  This simple-minded cipher can be broken by an average 12-year old
 #+ using only pencil and paper.
 
 exit 0
 
 #  Exercise:
 #  --------
 #  Modify the script so that it will either encrypt or decrypt,
 #+ depending on command-line argument(s).
fold

A filter that wraps lines of input to a specified width. This is especially useful with the -s option, which breaks lines at word spaces (see Example 12-23 and Example A-1).

fmt

Simple-minded file formatter, used as a filter in a pipe to "wrap" long lines of text output.

Example 12-23. Formatted file listing.

#!/bin/bash
 
 WIDTH=40                    # 40 columns wide.
 
 b=`ls /usr/local/bin`       # Get a file listing...
 
 echo $b | fmt -w $WIDTH
 
 # Could also have been done by
 #    echo $b | fold - -s -w $WIDTH
  
 exit 0

See also Example 12-5.

Tip

A powerful alternative to fmt is Kamil Toman's par utility, available from http://www.cs.berkeley.edu/~amc/Par/.

col

This deceptively named filter removes reverse line feeds from an input stream. It also attempts to replace whitespace with equivalent tabs. The chief use of col is in filtering the output from certain text processing utilities, such as groff and tbl.

column

Column formatter. This filter transforms list-type text output into a "pretty-printed" table by inserting tabs at appropriate places.

Example 12-24. Using column to format a directory listing

#!/bin/bash
 # This is a slight modification of the example file in the "column" man page.
 
 
 (printf "PERMISSIONS LINKS OWNER GROUP SIZE MONTH DAY HH:MM PROG-NAME\n" \
 ; ls -l | sed 1d) | column -t
 
 #  The "sed 1d" in the pipe deletes the first line of output,
 #+ which would be "total        N",
 #+ where "N" is the total number of files found by "ls -l".
 
 # The -t option to "column" pretty-prints a table.
 
 exit 0
colrm

Column removal filter. This removes columns (characters) from a file and writes the file, lacking the range of specified columns, back to stdout. colrm 2 4 <filename removes the second through fourth characters from each line of the text file filename.

Warning

If the file contains tabs or nonprintable characters, this may cause unpredictable behavior. In such cases, consider using expand and unexpand in a pipe preceding colrm.

nl

Line numbering filter. nl filename lists filename to stdout, but inserts consecutive numbers at the beginning of each non-blank line. If filename omitted, operates on stdin.

The output of nl is very similar to cat -n, however, by default nl does not list blank lines.

Example 12-25. nl: A self-numbering script.

#!/bin/bash
 # line-number.sh
 
 # This script echoes itself twice to stdout with its lines numbered.
 
 # 'nl' sees this as line 4 since it does not number blank lines.
 # 'cat -n' sees the above line as number 6.
 
 nl `basename $0`
 
 echo; echo  # Now, let's try it with 'cat -n'
 
 cat -n `basename $0`
 # The difference is that 'cat -n' numbers the blank lines.
 # Note that 'nl -ba' will also do so.
 
 exit 0
 # -----------------------------------------------------------------
pr

Print formatting filter. This will paginate files (or stdout) into sections suitable for hard copy printing or viewing on screen. Various options permit row and column manipulation, joining lines, setting margins, numbering lines, adding page headers, and merging files, among other things. The pr command combines much of the functionality of nl, paste, fold, column, and expand.

pr -o 5 --width=65 fileZZZ | more gives a nice paginated listing to screen of fileZZZ with margins set at 5 and 65.

A particularly useful option is -d, forcing double-spacing (same effect as sed -G).

gettext

The GNU gettext package is a set of utilities for localizing and translating the text output of programs into foreign languages. While originally intended for C programs, it now supports quite a number of programming and scripting languages.

The gettext program works on shell scripts. See the info page.

msgfmt

A program for generating binary message catalogs. It is used for localization.

iconv

A utility for converting file(s) to a different encoding (character set). Its chief use is for localization.

recode

Consider this a fancier version of iconv, above. This very versatile utility for converting a file to a different encoding is not part of the standard Linux installation.

TeX, gs

TeX and Postscript are text markup languages used for preparing copy for printing or formatted video display.

TeX is Donald Knuth's elaborate typsetting system. It is often convenient to write a shell script encapsulating all the options and arguments passed to one of these markup languages.

Ghostscript (gs) is a GPL-ed Postscript interpreter.

enscript

Utility for converting plain text file to PostScript

For example, enscript filename.txt -p filename.ps produces the PostScript output file filename.ps.

groff, tbl, eqn

Yet another text markup and display formatting language is groff. This is the enhanced GNU version of the venerable UNIX roff/troff display and typesetting package. Manpages use groff.

The tbl table processing utility is considered part of groff, as its function is to convert table markup into groff commands.

The eqn equation processing utility is likewise part of groff, and its function is to convert equation markup into groff commands.

Example 12-26. manview: Viewing formatted manpages

#!/bin/bash
 # manview.sh: Formats the source of a man page for viewing.
 
 #  This script is useful when writing man page source.
 #  It lets you look at the intermediate results on the fly
 #+ while working on it.
 
 E_WRONGARGS=65
 
 if [ -z "$1" ]
 then
   echo "Usage: `basename $0` filename"
   exit $E_WRONGARGS
 fi
 
 # ---------------------------
 groff -Tascii -man $1 | less
 # From the man page for groff.
 # ---------------------------
 
 #  If the man page includes tables and/or equations,
 #+ then the above code will barf.
 #  The following line can handle such cases.
 #
 #   gtbl < "$1" | geqn -Tlatin1 | groff -Tlatin1 -mtty-char -man
 #
 #   Thanks, S.C.
 
 exit 0
lex, yacc

The lex lexical analyzer produces programs for pattern matching. This has been replaced by the nonproprietary flex on Linux systems.

The yacc utility creates a parser based on a set of specifications. This has been replaced by the nonproprietary bison on Linux systems.

Notes

[1]

This is only true of the GNU version of tr, not the generic version often found on commercial UNIX systems.

Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье