sed and awk

sed and awk are two powerful but often overlooked data processing tools, and if you know how to use them effectively you’ll be well ahead of many sysadmins and developers.

sed

sed is an in-place stream editor, that is, it accepts input from a file or STDIN, manipulates the data stream in some way, and then passes the results to another file or STDOUT. It is another product of Bell Labs, and Linux users are most likely to encounter the GNU version. TLDP is also an excellent resource.

More specifically, sed matches data using line numbers or regular expressions and then acts on these matches as specified by the supplied commands. sed thinks in terms of its pattern space, which is a data buffer where sed stores each full line, then removes any newline (n) character, and then performs the specified commands on the line; and when the script finishes the entire pattern space is printed out unless you limit sed’s output (for example, with the -n option). sed’s hold space is a second buffer that can store matched data specified by you.

Some sed Commands and Switches

Here are the commands and switches I use most frequently. General usage is sed switch command.

Command Purpose
s/old/new/ The master substitute command, changes the first instance of ‘old’ to ‘new’ in the pattern space
/! Inverter, i.e., only perform actions on addresses that do not match the specified pattern
/1, /2, etc. Specify which occurrence of a match is changed
/d Delete the pattern space
/g Perform substitution on all matches in the pattern space, not just the first
/I Ignore case
/p Print the pattern space to STDOUT; usually used with -n
/w file Write the pattern space to file
Switch Purpose
-e script Run the commands from a specified script
-f file Run the commands from a specified file
-i.optional-extension Edit files in-place, that is, overwrite the input file(s); if extension is supplied, copy the original to file.extension before overwriting
-l number Specify line length (default is 70 characters)
-n Don’t print anything unless requested (usually with /p)
-r Use extended regular expressions

A Few sed Examples

Some examples of sed. Additional examples can be found on my search and replace page. The sed one-liners page provides many more.

Find “text” at any point/beginning/end of a line:

$ sed 's/text OR ^text OR text$//' file.txt

Replace all instances of “text”, case-insensitive first letter, with “blar” in file.txt:

$ sed 's/[Tt]ext/blar/g' file.txt

Delete all lines that do or don’t contain “blar”:

$ sed '/blar/d' file.txt
$ sed '/blar/!d' file.txt

Replace all instances of “text” with “blar” from line 1-100, or from line 101 to EOF:

$ sed '1,100 s/text/blar/g' file.txt
$ sed '101,$ s/text/blar/g' file.txt

Change “text” to “blar” on all lines except between START and END:

$ sed '/START/,/END/!s/text/blar/g' file.txt

Redo all lines in file.txt but add parentheses:

$ sed 's/.*/( & )/' file.txt

Delete blank lines in file.txt:

$ sed '/^$/d' file.txt

Lowercase or uppercase an entire file:

$ sed 's/.*/L&/' file.txt
$ sed 's/.*/U&/' file.txt

Uppercase first letter of each word on current line:

$ sed 's/<./u&/g' file.txt

awk

awk is an interpreted language used for text processing and reporting. The original is yet another product of Bell Labs and various other implementations are widely encountered: mawk is the default interpreter for many current Linux distros, and the GNU implementation (gawk) is common as well. The awk Manual is a good resource for the awk language itself, as is the awk Primer.

Basic awk is pretty straightforward: it reads input either from STDIN or specified files, and this input is split into records denoted by $0 (the default being one line, much like sed), and each record is split into fields denoted by $1, $2, etc., at which point the specified patterns and rules are applied to each field.

awk also understands arithmetic operations (+, -, /, *, %), numeric operations (sin, exp, sqrt, rand), arrays, C-like string formatting with printf (%f, %s, %d), if-else conditions, for and while loops, and some string and system functions like substr(), length(), and systime(). This varies between awk implementations so check your documentation. On the command line, general usage is awk options search-pattern {program actions} file.

Some awk Options and Built-in Variables

Some commonly available awk command-line switches and variables. More are available in gawk, mawk, etc. but these are the basics.

Option Purpose
-Ffs Specifies the field delimiter (default is space)
-f Specify a file from which to load awk commands or programs
Variable Purpose
$0 The full record; equivalent to “print all fields”
$1, $2, etc. Given field in a record
FILENAME The name of the file being read
FS The field separator; default is whitespace; same as using -F on the command line
NF Number of fields in a given record
NR Number of records
OFS The output field separator; default is the space character
ORS Output record separator; default is the newline character
RS The record separator; default is the newline character

Some awk Examples and One-Liners

Print the fifth field from /etc/passwd, delimited by ‘:’, from any record (line) containing ‘admin’, then print the record number and its number of fields:

$ awk -F: '/admin/ { print $5, "Records: "NR, "Fields: "NF }' /etc/passwd
Gnats Bug-Reporting System (admin) Records: 17 Fields: 7

Double-space or triple-space a file. (Note: In awk, ‘1’ always evaluates to true, making awk perform the default operation {print $0}, so it’s a shorthand way of printing a line):

$ awk '1;{print ""}' file.txt  # This is the same as $ awk '{print $0 "n"}' file.txt
$ awk '1;{print "n"}' file.txt

Number each line with its line number, followed by tab:

$ awk '{print NR "t" $0}' file.txt

Count the lines in a file:

$ awk 'END{print NR}' file.txt

Print every line with more than 4 fields, or every line where the value of the last field is > 4:

$ awk 'NF > 4' file.txt
$ awk '$NF > 4' file.txt

Delete leading whitespace (align text left), delete trailing whitespace:

$ awk '{sub(/^[ t]+/, "")};1' file.txt
$ awk '{sub(/[ t]+$/, "")};1' file.txt

Match/inverted match of a field against a regular expression:

$ awk '$1  ~ /^[a-f]/' file.txt # print if match
$ awk '$1 !~ /^[a-f]/' file.txt # print if doesn't match

Delete all blanks lines from a file:

$ awk NF file.txt # this works because 'NF' = 0 for a blank record, thus nothing is printed

Leave a Reply

Your email address will not be published.