grep

grep is a useful utility that finds fixed strings or patterns of text in a given file or standard input. Used in conjunction with error logs or debugger output, grep can help a programmer identify errors in an application code base or filesystem; to help with this, grep understands regular expressions, what it calls “extended” regular expressions, Perl-style regular expressions, and POSIX definitions.

Most Linux and Unix distributions include grep, and Windows implementations of it are available via utilities like wingrep or the Cygwin Bash shell. All examples here represent the GNU version included with most Linux distros, as run in the Bash shell.

A full listing of grep’s command line switches are available in the grep man page. Usage of grep is is grep options regexp filename(s) or STDIN grep options regexp.

The options I use most frequently are:

SwitchPurpose
-EUse extended regular expressions (less typing than default grep)
-f fileUse matches from the specified file
-iIgnore case when matching
-lPrint file names instead of individual matches; opposite of this is -L
-nOutput matches showing their line numbers
-PUse Perl-style regular expressions, derived from the PCRE (Perl-Compatible RegEx) library on your system
-rSearch recursively, that is, current directory and child directories
-vInvert matches; i.e., select the non-matching lines

Some grep Examples

A simple usage of grep, this finds all examples of ‘string’ in the current directory and subdirectories:

$ grep -r 'string' *

You can pipe the results of another command to grep, or pipe the results of grep to another utility or file:

# find all .py files, then scan them for 'string' and output the findings to disk: 
$ find . -name '*.py' -exec grep 'string' {} ; > found.txt
# cat a file, search for 'string', add the line numbers where 'string' is found, and output to disk: 
$ cat file.txt | grep -n 'string' > found.txt

grep observes shell commands and variables:

# Use the results of the whoami shell command to search file.py, then output results to disk:  
$ grep `whoami` file.py > found.txt
# Use the value of the $HOME environment variable to search file.py, then output the results to disk:
$ grep "$HOME" file.py > found.txt

Monitor the system log, looking for ‘ERROR’ in the last 10 lines:

$ tail -f /var/log/messages | grep ERROR

Show IP addresses from a file:

$ grep -E 'b[0-9]{1,3}(.[0-9]{1,3}){3}b' maillog

Show email addresses:

$ grep -Ei 'b[a-z0-9]{1,}@*.(com|net|org|uk|mil|gov|edu)b' maillog

Show Social Security numbers:

$ grep -E 'b[0-9]{3}( |-|)[0-9]{2}( |-|)[0-9]{4}b' employees.csv

grep has the ability to match strings based on previous conditions; to do this you can use backreferences. The 1 backreference appended to a single match condition will return lines where two or more examples of the match condition are found, while the 2 backreference appended to two match conditions (determined by parenthesis) will match lines where multiple examples of the first (but not second) match condition are found. Say you want to find lines containing multiple instances of the word “History” in the text file classes.txt:

History is best learned in Algebra.
History is best learned in English.
History is best learned in History. 
History is best learned in Calculus.

You could do:

$ grep -Ein '(history).*1' classes.txt
3:History is best learned in History.

Match strings across multiple lines (using Perl-style grep for the newline (n) character):

$ grep -P '(?m)phonennumber' employees.csv

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *