Start and end of line
# matches all line starting with "cat"
egrep '^cat' regex.txt
# matches all line ending with "cat"
egrep 'cat$' regex.txt
# matches all line that has "cat" anywhere on the line
egrep 'cat' regex.txt
# matches a line that contains only "cat"
egrep '^cat$' regex.txt
# matches empty lines
egrep '^$' regex.txt
# matches non-empty lines (-v is for negating the output)
egrep -v '^$' regex.txt
Single character match
# there must be any single character betwen "a" and "c"
egrep 'a.c' regex.txt
Character class
# matches "gray" or "grey"
egrep 'gr[ae]y' regex.txt
# combining several character classes
egrep 'sep[ea]r[ea]te' regex.txt
# matches "<H1>", "<H2>", and "<H3>"
egrep '<H[123]>' regex.txt
# same as above ^ (provides a range)
egrep '<H[1-3]>' regex.txt
# matches "<H[-]>" (doesn't provide a range)
egrep '<H[-]>' regex.txt
# multiple ranges are fine
egrep '<H[0123456789abcdefABCDEF]>' regex.txt
# simplified version of the above ^ expression
egrep '<H[0-9a-fA-F]>' regex.txt
# matches a "!", ".", "_", and "?"
egrep '<H[!._?]>' regex.txt
# match if and only if there is something that is not
# "<Hx>" (remember this concept)
egrep '<H[^x]>' regex.txt
# matches all that are not "<H1>", "<H2>",
# or "<H3>"
egrep '<H[^1-3]>' regex.txt
Alternatives
# matches "gray" or "grey"
egrep 'gray|grey' regex.txt
# same as above ^
egrep 'gr(a|e)y' regex.txt
# matches any line that begins with 'From: ',
# 'To: ', or 'Subject: '
egrep '^(From|To|Subject) ' regex.txt
Word boundaries
# matches all lines that have a string which starts with "cat"
egrep '\<cat' regex.txt
# matches all lines that have a string which ends with "cat"
egrep 'cat\>' regex.txt
# matches all lines only that have a word "cat" which is not
# embedded within another word (or string). e.g this will
# match line `the cat is furry` but not `concatenate this file`
egrep '\<cat\>' regex.txt
Optional items
# matches lines with string "color" or "colour" ("u" is optional)
egrep 'colou?r' regex.txt
# same as `egrep '(July|Jul) (4th|four|4)' regex.txt`
egrep 'July? (four|4(th)?)'
# allows one optional space
egrep '<H1 ?>' regex.txt
Quantifiers: repetition
# matches "<H1>", "<H1 >", "<H1 >", "<H1 >", and
# so on (no space, w/ one space, or w/ more than
# one space after H1)
egrep '<H1 *>' regex.txt
# matches "<H1 >", "<H1 >", "<H1 >", and so on
# (atleast w/ one space after H1 is required)
egrep '<H1 +>' regex.txt
# matches "<H>", "<H1>", "<H2>", "<H3>", ... "<H9>"
# (number after "H" is not required)
egrep '<H[0-9]*>' regex.txt
# matches "<H0>", "<H1>", "<H2>", "<H3>", ... "<H9>"
# (number after "H" is required)
egrep '<H[0-9]+>' regex.txt
# matches "o" for atleast once or up to 3
# times ({min,max})
egrep 'co{1,3}l' regex.txt
# matches "o" for exactly 3 times ({min,max})
egrep 'co{3,3}l' regex.txt
# see p 75/780 of "OReilly - Mastering Regular Expressions" book
egrep <HR +SIZE *= *[0-9]+ *> regex.txt
Parentheses and backreferences
# matches all words that are repeated atleast
# twice (with space between repetitions) like
- not all `egrep` supports backreference # "the the", "apple apple apple", etc
and `\< .. \>` egrep '\<([a-zA-Z]+) +\1\>' regex.txt
# same as above but this time this version also
# matches double words with different capitalization
# like "The the"
#
# this seems wrong??
egrep '\<([a-zA-Z]+) +\1\>' regex.txt
Escape sequence
# removes the special function of "." w/c
# is to match any single character
egrep 'www\.facebook\.com' regex.txt
Miscellaneous
# moves all non-hidden files on the current directory
# to the target directory
mv *.* Archive/
Some examples
# matches a variable name that are allowed to contain only
# alphanumeric characters and underscores, but which may
# not begin with a number
egrep '[a-zA-Z_][a-zA-Z_0-9]*' regex.txt
# a string within doublequotes (see book for explanation)
egrep '"[^"]*"' regex.txt
# dollar amount (with optional cents)
egrep '\$[0-9]+(\.[0-9][0-9])?' regex.txt
# time of day, such as "9:17 am" or "12:30 pm"
egrep '(1[012]|[1-9]):[0-5][0-9] (am|pm)' regex.tx
Metacharacters
- special chracters that are used to match and manipulate patterns
^ : matches start of line
$ : matches end of line
| : provides alternatives
. : matches any single character
() : you can put alternatives inside (separated by |)
? : quantifier - optional item (must be placed after the optional item)
* : quantifier - similar to ?, matches none, one or more of the immediately-preceding item (exit status is always 0)
+ : quantifier - similar to ?, MUST match one or more of the immediately-preceding item (exit status is 0 or 1 for fail)
{min,max} : interval quantifier - matches the immediately-preceding item for atleast "min" times or until "max" times
Character Class
[] : represents a single character to match
Character Class Metacharacter
- these are special characters put inside character classes
- they have different meanings inside a character class compared to when placed outside a character class
- : provides range of characters (not considered metacharacter if it is the first character in the class)
^ : negates the list
Metasequences
- these are used for word boundaries
- use this if you want to search for a particular string that is not embedded in a larger word
- let say you want to look only for the word "cat" and disregard lines with "catleya", "concatenate", etc
- this is not supported on all versions of egrep
\< : the position at the start of a word
\> : the position at the end of a word
\1 : remembers strings/texts inside immediately-preceding parenthesis (used as backreferencing tool)
EGREP
egrep "^(From|Subject): " --> same as egrep "^From: |^Subject: "
- Not all egrep programs are the same. The supported set of metacharacters, as well as their meanings, are often different—see your local documentation
- The useful -i option discounts capitalization during a match
grep -i 'regular_expression' text_file ##search a filename based on the regular expression
grep -i '^$' text_file ## searches fro blank lines
grep -i '^$' text_file | wc -l ## returns the number of blank lines
grep . text_file ## deletes all blank lines