16 min to read
Mastering sed and awk - The Ultimate Guide to Text Processing in Linux
Powerful text manipulation tools for Linux administrators and developers

Introduction
sed (stream editor) and awk (named after its creators Aho, Weinberger, and Kernighan) are powerful text processing utilities available in Unix/Linux environments. Both tools excel at manipulating text data but approach the task differently. While sed operates on a line-by-line basis and is designed for simple text transformations, awk treats text as records and fields, making it more suitable for complex data processing tasks.
This comprehensive guide will explore both utilities, their unique capabilities, and how they can be combined to solve complex text processing challenges. Whether you’re a system administrator managing log files, a data analyst processing CSV files, or a developer automating text transformations, mastering sed and awk will significantly enhance your command-line productivity.
What are sed and awk?
These tools allow you to transform, extract, and report on text data with remarkable flexibility, making them essential for system administrators, data analysts, and developers working in terminal environments.
Comparing sed and awk
sed | awk | |
---|---|---|
Primary Purpose | Stream editing and text transformation | Text pattern scanning and processing |
Designed For | Simple text substitutions and filtering | Structured data processing and reporting |
Processing Model | Line-by-line text processing | Record-based data processing with fields |
Complexity | Simpler syntax, focused on editing | Fuller programming language with variables and functions |
Best For | Find and replace, text filtering, basic transformations | Data extraction, report generation, complex transformations |
Both tools emerged from Bell Labs during the development of Unix:
- sed was created by Lee E. McMahon in 1973 as part of the Unix text processing pipeline.
- awk was developed in 1977, named after its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan.
Their enduring popularity for over four decades speaks to their utility and power in command-line environments.
The sed Command in Depth
How sed Works
sed processes text through the following sequence:
- Read: Read a line from the input stream
- Execute: Apply all commands to the line (stored in “pattern space”)
- Display: Print the result (unless suppressed)
- Repeat: Move to the next line and repeat
Basic sed Syntax
sed [options] 'command' file(s)
sed [options] -e 'command1' -e 'command2' file(s)
sed [options] -f script-file file(s)
Common sed Commands
Essential sed Examples
Basic Substitution
Replace the first occurrence of a pattern on each line:
# Create example.txt
cat <<EOF > example.txt
Hello World
Hello Somaz
Hello promi
EOF
# Replace 'Hello' with 'Hi' (first occurrence on each line)
sed 's/Hello/Hi/' example.txt
Output:
Hi World
Hi Somaz
Hi promi
Global Substitution
Replace all occurrences of a pattern on each line:
# Create example with multiple occurrences
cat <<EOF > multiple.txt
Hello World Hello Again
Hello Somaz Hello Again
EOF
# Replace all occurrences of 'Hello' with 'Hi'
sed 's/Hello/Hi/g' multiple.txt
Output:
Hi World Hi Again
Hi Somaz Hi Again
In-place Editing
Modify files directly rather than printing to standard output:
# First let's check the original file
cat example.txt
Hello World
Hello Somaz
Hello promi
# Now edit in place
sed -i 's/Hello/Hi/' example.txt
# Verify the changes
cat example.txt
Hi World
Hi Somaz
Hi promi
Creating Backups During In-place Editing
# Create backup with .bak extension
sed -i.bak 's/Hi/Hello/' example.txt
# Original file was changed
cat example.txt
Hello World
Hello Somaz
Hello promi
# Backup contains the previous version
cat example.txt.bak
Hi World
Hi Somaz
Hi promi
Deleting Lines
Remove lines matching a pattern:
# Create a file with numbered lines
cat <<EOF > numbers.txt
Line 1
Line 2
Line 3
Line 4
Line 5
EOF
# Delete lines containing 'Line 3'
sed '/Line 3/d' numbers.txt
Output:
Line 1
Line 2
Line 4
Line 5
Working with Line Numbers
Process specific lines by number:
# Delete line 2
sed '2d' numbers.txt
Line 1
Line 3
Line 4
Line 5
# Delete lines 2 through 4
sed '2,4d' numbers.txt
Line 1
Line 5
# Delete from line 3 to the end
sed '3,$d' numbers.txt
Line 1
Line 2
Multiple Commands with -e
Apply several editing commands in sequence:
# Replace 'Line' with 'Entry' and remove lines containing '3'
sed -e 's/Line/Entry/g' -e '/3/d' numbers.txt
Output:
Entry 1
Entry 2
Entry 4
Entry 5
Using Address Ranges
Apply commands only to specific line ranges:
# Replace 'Line' with 'Entry' only on lines 2-4
sed '2,4s/Line/Entry/' numbers.txt
Output:
Line 1
Entry 2
Entry 3
Entry 4
Line 5
Advanced sed Techniques
Using the Hold Buffer
sed has a special “hold buffer” that can store text for later use:
# Create example file
cat <<EOF > verse.txt
Roses are red
Violets are blue
Sugar is sweet
And so are you
EOF
# Reverse the order of lines
sed -n '1!G;h;$p' verse.txt
Output:
And so are you
Sugar is sweet
Violets are blue
Roses are red
Explanation:
1!G
: For all lines except the first, get the hold buffer and append it to the pattern spaceh
: Copy the pattern space to the hold buffer$p
: On the last line, print the pattern space
Extended Regular Expressions with -E
Use extended regular expressions for more powerful pattern matching:
Output:
contact@example.com
user.name@company.co.jp
test_123@test-server.io
just.another@example.com
Multiline Processing
Process text across multiple lines:
Output:
<title>Sample Page</title>
Conditional Processing with sed
Process lines based on conditions:
Output:
Name: John, Age: 25
Name: Bob, Age: 32
Character Translations
Translate (replace) characters systematically:
# Create a sample file
echo "Hello, World! 123" > translate.txt
# Convert all lowercase to uppercase and digits to 'X'
sed 'y/abcdefghijklmnopqrstuvwxyz0123456789/ABCDEFGHIJKLMNOPQRSTUVWXYZXXXXXXXXXX/' translate.txt
Output:
HELLO, WORLD! XXX
Practical sed Use Cases
Batch File Processing
Process multiple files with a single command:
Output:
server=new-server.com
Log File Analysis
Extract specific information from log files:
# Create a sample log file
cat <<EOF > sample.log
2023-01-01 12:00:01 INFO Server started
2023-01-01 12:05:32 ERROR Database connection failed
2023-01-01 12:06:15 WARN Retry attempt 1
2023-01-01 12:06:45 ERROR Login failed for user 'admin'
2023-01-01 12:07:23 INFO Configuration reloaded
EOF
# Extract all ERROR messages
sed -n '/ERROR/p' sample.log
Output:
2023-01-01 12:05:32 ERROR Database connection failed
2023-01-01 12:06:45 ERROR Login failed for user 'admin'
Comment/Uncomment Configuration Lines
Easily comment or uncomment configuration file lines:
# Create a configuration file
cat <<EOF > app.conf
# Main settings
port=8080
debug=false
# log_level=debug
max_connections=100
EOF
# Uncomment the log_level line
sed 's/^# log_level/log_level/' app.conf
Output:
# Main settings
port=8080
debug=false
log_level=debug
max_connections=100
CSV Data Processing
Clean and transform CSV data:
# Create a CSV file with inconsistent formatting
cat <<EOF > data.csv
Name, Age, Location
John Doe, 28,New York
Jane Smith,31 , Boston
"Bob Johnson", 45, "San Francisco"
EOF
# Clean up formatting (remove extra spaces, standardize quotes)
sed -E 's/[[:space:]]*,[[:space:]]*/,/g; s/"//g' data.csv
Output:
Name,Age,Location
John Doe,28,New York
Jane Smith,31,Boston
Bob Johnson,45,San Francisco
The awk Command in Depth
How awk Works
awk processes text through the following sequence:
- Begin Phase: Execute the BEGIN block (if any)
- Main Processing Phase: For each input line:
- Split the line into fields
- Test against all patterns
- Execute associated actions for matching patterns
- End Phase: Execute the END block (if any)
Basic awk Syntax
awk [options] 'pattern { action }' file(s)
awk [options] 'BEGIN { actions } pattern { actions } END { actions }' file(s)
Understanding awk Fields
awk automatically splits each input line into fields:
- By default, fields are separated by whitespace
- Fields are accessed using
$1
,$2
, etc. $0
represents the entire line- The field separator can be changed with the
-F
option
# Create a sample file
cat <<EOF > employees.txt
John Smith IT 75000
Jane Doe HR 65000
Mike Johnson Finance 82000
Sarah Williams IT 78000
EOF
# Print specific fields
awk '{print $1, $2, $4}' employees.txt
Output:
John Smith 75000
Jane Doe 65000
Mike Johnson 82000
Sarah Williams 78000
Key awk Variables
Essential awk Examples
Basic Field Processing
Extract specific fields from structured data:
# Create data.txt with comma-separated values
cat <<EOF > data.csv
John,Smith,35,New York
Jane,Doe,28,San Francisco
Bob,Johnson,42,Chicago
Alice,Williams,31,Boston
EOF
# Print fields in a different order with a custom separator
awk -F, '{print $2 " - " $1 ", Age: " $3}' data.csv
Output:
Smith - John, Age: 35
Doe - Jane, Age: 28
Johnson - Bob, Age: 42
Williams - Alice, Age: 31
Pattern Matching
Process only lines that match specific patterns:
# Only print lines where age > 30
awk -F, '$3 > 30 {print $1 " " $2 " is " $3 " years old"}' data.csv
Output:
John Smith is 35 years old
Bob Johnson is 42 years old
Alice Williams is 31 years old
Calculating Totals and Averages
Perform calculations on numeric data:
# Calculate average age
awk -F, '{ sum += $3; count++ } END { print "Average age: " sum/count }' data.csv
Output:
Average age: 34
Using BEGIN and END Blocks
Initialize variables and print summaries:
Output:
Name Age City
--------------------
John Smith 35 New York
Jane Doe 28 San Francisco
Bob Johnson 42 Chicago
Alice Williams 31 Boston
--------------------
Total records: 4
Custom Field and Record Separators
Process data with non-standard formats:
Output:
John is 35 years old and lives in New York
Jane is 28 years old and lives in San Francisco
Conditional Logic in awk
Implement if-else statements for complex decision making:
Output:
John: Average = 85.0, Grade = B
Mary: Average = 91.3, Grade = A
Peter: Average = 67.7, Grade = D
Sarah: Average = 91.0, Grade = A
Arrays in awk
Use arrays for more complex data processing:
Output:
Word Frequency:
quick 2
than 1
runs 1
dog 2
jumps 1
brown 2
lazy 1
over 1
a 1
the 2
fox 2
faster 1
Built-in Functions
awk includes numerous built-in functions for string and mathematical operations:
Output:
Original: string1 UPPERCASE 3.14159
Modified: STRING1 uppercase (length: 7)
Math: rounded = 3, sqrt = 1.7725
Original: string2 lowercase 2.71828
Modified: STRING2 lowercase (length: 7)
Math: rounded = 2, sqrt = 1.6488
Original: string3 MixedCase 1.61803
Modified: STRING3 mixedcase (length: 7)
Math: rounded = 1, sqrt = 1.2722
Formatted Output with printf
Create precisely formatted output with printf:
Output:
Region Q1 Q2 Q3 Total
------ -- -- -- -----
North 10500.00 8500.00 12000.00 31000.00
South 9500.00 7200.00 8300.00 25000.00
East 12300.00 10500.00 11200.00 34000.00
West 8200.00 9200.00 10100.00 27500.00
------ -- -- -- -----
Total 40500.00 35400.00 41600.00 117500.00
🚄 Combined Usage of sed and awk
sed 's/, */,/g' example.txt
→ Replaces comma-space patterns with just commas
→ 's' indicates substitution
→ 'g' flag means global replacement (applies to all occurrences in each line)
awk -F, '$3 > 30 {print $2}'
→ Processes the output from sed
→ '-F' sets comma as the field separator
→ '$3 > 30 {print $2}' instructs awk to print the second field when the third field (age) is greater than 30
sed and awk in Modern DevOps Workflows
In today’s DevOps environments, sed and awk remain invaluable for their ability to quickly transform and analyze configuration files, logs, and deployment artifacts. Here are some modern applications:
Infrastructure as Code (IaC)
Container Orchestration
GitOps Workflows
Common Options
sed Options:
-i
: Edit files in place-i.bak
: Edit files in place but create backup with .bak extension-n
: Suppress automatic printing of pattern space-e
: Add the script to the commands to be executed-f
: Add the contents of script-file to the commands to be executeds/pattern/replacement/
: Substitute pattern with replacementg
: Global replacement (all occurrences in each line)p
: Print the pattern spaced
: Delete pattern space and start next cyclew filename
: Write pattern space to file
awk Options:
-F
: Specify field separator-f
: Read program from file-v var=value
: Assign value to variable var-W
: Set warning level$n
: Reference nth field$0
: Reference entire lineNF
: Number of fields in current recordNR
: Current record numberFS
: Input field separatorOFS
: Output field separatorRS
: Input record separatorORS
: Output record separatorprint
: Output specified fieldsprintf
: Formatted outputpattern {action}
: Perform action when pattern matchesBEGIN {action}
: Execute action before processing any inputEND {action}
: Execute action after processing all input
Comments