The book starts out with a misleading statement: “The \( \) and \{ \}, however, are not allowed. You can use them; just use them without the backslashes.
Looking at the lines found in Example 4.31 might make you think that
egrep '3+' datafile
means “one or more 3s anywhere on the line”; it really
means “one or more consecutive 3s anywhere on the line.”
Similarly, the output from Example 4.33 doesn’t give you a good idea of what the plus sign does. As a slightly more-to-the-point example, type this into a file temp.txt
met meat mitt mutt maitre d' muumuu
And then do this to find all words that have an m, one or more of the vowels aeio, followed by t.
egrep 'm[aeio]+t' temp.txt
Why would you ever want to use fgrep? If you are doing
a search for something that contains a lot of metacharacters (for example,
a line like a[3]=b[0]+5 in a program), you can use
fgrep to avoid having to put backslashes everywhere.
Everything up to this point has been valid for grep as found
on generic UNIX systems. This section tells you what has been added to the
GNU version of grep, which is what you will find on Linux
systems.
There is an interaction between using ranges,
character encoding, and egrep.
Presume the following file, letters:
å a A
The default character set in Linux is Unicode, and ranges get sorted into dictionary order, so that upper and lowercase coincide. Thus, you will get these results from these commands:
egrep '[abcdefghijklmnopqrstuvwxyz]' letters # finds only line two egrep '[a-z]' letters # finds all three lines egrep '[[:lower:]]' letters # finds the first two lines
Unless you are using a ton of metacharacters, I recommend that you
always use egrep, because it doesn’t require as
many backslashes.
Recursive grep (egrep -r) will
indiscriminately search all files in the subdirectories.
Presume you only wish to find things in files ending with
.html. In that case, you would add the
--include option, which lets you specify a shell pattern
for files to include in the search.
egrep -r -include='*.html' thingToFind *
The *.html has to be placed in quote marks to prevent the
shell from expanding the asterisk. The last * means to search
all files in the current directory (the --include will filter
out anything other than .html files.)
On page 114, there should not be two dashes after --help.
On page 115, the long name for -v should be
--invert-match.
Example 4.67 may not work as advertised.