January 19, 2025 16 min to read

Mastering sed and awk - The Ultimate Guide to Text Processing in Linux

Powerful text manipulation tools for Linux administrators and developers

Introduction

sed (stream editor) and awk (named after its creators Aho, Weinberger, and Kernighan) are powerful text processing utilities available in Unix/Linux environments. Both tools excel at manipulating text data but approach the task differently. While sed operates on a line-by-line basis and is designed for simple text transformations, awk treats text as records and fields, making it more suitable for complex data processing tasks.

This comprehensive guide will explore both utilities, their unique capabilities, and how they can be combined to solve complex text processing challenges. Whether you’re a system administrator managing log files, a data analyst processing CSV files, or a developer automating text transformations, mastering sed and awk will significantly enhance your command-line productivity.

What are sed and awk?

sed and awk are powerful text processing utilities in Unix/Linux environments that form the cornerstone of command-line text manipulation.

These tools allow you to transform, extract, and report on text data with remarkable flexibility, making them essential for system administrators, data analysts, and developers working in terminal environments.

graph LR A(Text Input) --> B(sed) A --> C(awk) B --> D(Transformed Text) C --> E(Structured Data) B --> F(Filtered Content) C --> G(Reports & Analysis)

Comparing sed and awk

	sed	awk
Primary Purpose	Stream editing and text transformation	Text pattern scanning and processing
Designed For	Simple text substitutions and filtering	Structured data processing and reporting
Processing Model	Line-by-line text processing	Record-based data processing with fields
Complexity	Simpler syntax, focused on editing	Fuller programming language with variables and functions
Best For	Find and replace, text filtering, basic transformations	Data extraction, report generation, complex transformations

Historical Context

Both tools emerged from Bell Labs during the development of Unix:

sed was created by Lee E. McMahon in 1973 as part of the Unix text processing pipeline.
awk was developed in 1977, named after its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan.

Their enduring popularity for over four decades speaks to their utility and power in command-line environments.

The sed Command in Depth

sed (Stream Editor) is a non-interactive text editor that processes text line by line from standard input or files, applying specified transformations and outputting the result to standard output.

How sed Works

sed processes text through the following sequence:

Read: Read a line from the input stream
Execute: Apply all commands to the line (stored in “pattern space”)
Display: Print the result (unless suppressed)
Repeat: Move to the next line and repeat

graph LR A(Input Stream) --> B(Read Line) B --> C(Apply Commands) C --> D(Print Result) D --> E(Next Line) E --> B E --> F(End)

Basic sed Syntax

sed [options] 'command' file(s)
sed [options] -e 'command1' -e 'command2' file(s)
sed [options] -f script-file file(s)

Common sed Commands

Essential sed Examples

Basic Substitution

Replace the first occurrence of a pattern on each line:

# Create example.txt
cat <<EOF > example.txt
Hello World
Hello Somaz
Hello promi
EOF

# Replace 'Hello' with 'Hi' (first occurrence on each line)
sed 's/Hello/Hi/' example.txt

Output:

Hi World
Hi Somaz
Hi promi

Global Substitution

Replace all occurrences of a pattern on each line:

# Create example with multiple occurrences
cat <<EOF > multiple.txt
Hello World Hello Again
Hello Somaz Hello Again
EOF

# Replace all occurrences of 'Hello' with 'Hi'
sed 's/Hello/Hi/g' multiple.txt

Output:

Hi World Hi Again
Hi Somaz Hi Again

In-place Editing

Modify files directly rather than printing to standard output:

# First let's check the original file
cat example.txt
Hello World
Hello Somaz
Hello promi

# Now edit in place
sed -i 's/Hello/Hi/' example.txt

# Verify the changes
cat example.txt
Hi World
Hi Somaz
Hi promi

Creating Backups During In-place Editing

# Create backup with .bak extension
sed -i.bak 's/Hi/Hello/' example.txt

# Original file was changed
cat example.txt
Hello World
Hello Somaz
Hello promi

# Backup contains the previous version
cat example.txt.bak
Hi World
Hi Somaz
Hi promi

Deleting Lines

Remove lines matching a pattern:

# Create a file with numbered lines
cat <<EOF > numbers.txt
Line 1
Line 2
Line 3
Line 4
Line 5
EOF

# Delete lines containing 'Line 3'
sed '/Line 3/d' numbers.txt

Output:

Line 1
Line 2
Line 4
Line 5

Working with Line Numbers

Process specific lines by number:

# Delete line 2
sed '2d' numbers.txt
Line 1
Line 3
Line 4
Line 5

# Delete lines 2 through 4
sed '2,4d' numbers.txt
Line 1
Line 5

# Delete from line 3 to the end
sed '3,$d' numbers.txt
Line 1
Line 2

Multiple Commands with -e

Apply several editing commands in sequence:

# Replace 'Line' with 'Entry' and remove lines containing '3'
sed -e 's/Line/Entry/g' -e '/3/d' numbers.txt

Output:

Entry 1
Entry 2
Entry 4
Entry 5

Using Address Ranges

Apply commands only to specific line ranges:

# Replace 'Line' with 'Entry' only on lines 2-4
sed '2,4s/Line/Entry/' numbers.txt

Output:

Line 1
Entry 2
Entry 3
Entry 4
Line 5

Advanced sed Techniques

Using the Hold Buffer

sed has a special “hold buffer” that can store text for later use:

# Create example file
cat <<EOF > verse.txt
Roses are red
Violets are blue
Sugar is sweet
And so are you
EOF

# Reverse the order of lines
sed -n '1!G;h;$p' verse.txt

Output:

And so are you
Sugar is sweet
Violets are blue
Roses are red

Explanation:

1!G: For all lines except the first, get the hold buffer and append it to the pattern space
h: Copy the pattern space to the hold buffer
$p: On the last line, print the pattern space

Extended Regular Expressions with -E

Use extended regular expressions for more powerful pattern matching:

Output:

contact@example.com
user.name@company.co.jp
test_123@test-server.io
just.another@example.com

Multiline Processing

Process text across multiple lines:

Output:

  <title>Sample Page</title>

Conditional Processing with sed

Process lines based on conditions:

Output:

Name: John, Age: 25
Name: Bob, Age: 32

Character Translations

Translate (replace) characters systematically:

# Create a sample file
echo "Hello, World! 123" > translate.txt

# Convert all lowercase to uppercase and digits to 'X'
sed 'y/abcdefghijklmnopqrstuvwxyz0123456789/ABCDEFGHIJKLMNOPQRSTUVWXYZXXXXXXXXXX/' translate.txt

Output:

HELLO, WORLD! XXX

Practical sed Use Cases

Batch File Processing

Process multiple files with a single command:

Output:

server=new-server.com

Log File Analysis

Extract specific information from log files:

# Create a sample log file
cat <<EOF > sample.log
2023-01-01 12:00:01 INFO  Server started
2023-01-01 12:05:32 ERROR Database connection failed
2023-01-01 12:06:15 WARN  Retry attempt 1
2023-01-01 12:06:45 ERROR Login failed for user 'admin'
2023-01-01 12:07:23 INFO  Configuration reloaded
EOF

# Extract all ERROR messages
sed -n '/ERROR/p' sample.log

Output:

2023-01-01 12:05:32 ERROR Database connection failed
2023-01-01 12:06:45 ERROR Login failed for user 'admin'

Comment/Uncomment Configuration Lines

Easily comment or uncomment configuration file lines:

# Create a configuration file
cat <<EOF > app.conf
# Main settings
port=8080
debug=false
# log_level=debug
max_connections=100
EOF

# Uncomment the log_level line
sed 's/^# log_level/log_level/' app.conf

Output:

# Main settings
port=8080
debug=false
log_level=debug
max_connections=100

CSV Data Processing

Clean and transform CSV data:

# Create a CSV file with inconsistent formatting
cat <<EOF > data.csv
Name, Age, Location
John Doe,  28,New York
Jane Smith,31 , Boston
"Bob Johnson", 45, "San Francisco"
EOF

# Clean up formatting (remove extra spaces, standardize quotes)
sed -E 's/[[:space:]]*,[[:space:]]*/,/g; s/"//g' data.csv

Output:

Name,Age,Location
John Doe,28,New York
Jane Smith,31,Boston
Bob Johnson,45,San Francisco

The awk Command in Depth

awk is a powerful data processing language designed for extracting and reporting on data. It treats each input line as a record and divides it into fields, making it ideal for structured data processing.

How awk Works

awk processes text through the following sequence:

Begin Phase: Execute the BEGIN block (if any)
Main Processing Phase: For each input line:
- Split the line into fields
- Test against all patterns
- Execute associated actions for matching patterns
End Phase: Execute the END block (if any)

graph TD A(BEGIN Block) --> B(Read Input Line) B --> C(Split Line into Fields) C --> D(Apply Pattern-Action Rules) D --> E(Print Output) E --> B E --> F(END Block)

Basic awk Syntax

awk [options] 'pattern { action }' file(s)
awk [options] 'BEGIN { actions } pattern { actions } END { actions }' file(s)

Understanding awk Fields

awk automatically splits each input line into fields:

By default, fields are separated by whitespace
Fields are accessed using $1, $2, etc.
$0 represents the entire line
The field separator can be changed with the -F option

# Create a sample file
cat <<EOF > employees.txt
John Smith IT 75000
Jane Doe HR 65000
Mike Johnson Finance 82000
Sarah Williams IT 78000
EOF

# Print specific fields
awk '{print $1, $2, $4}' employees.txt

Output:

John Smith 75000
Jane Doe 65000
Mike Johnson 82000
Sarah Williams 78000

Key awk Variables

Essential awk Examples

Basic Field Processing

Extract specific fields from structured data:

# Create data.txt with comma-separated values
cat <<EOF > data.csv
John,Smith,35,New York
Jane,Doe,28,San Francisco
Bob,Johnson,42,Chicago
Alice,Williams,31,Boston
EOF

# Print fields in a different order with a custom separator
awk -F, '{print $2 " - " $1 ", Age: " $3}' data.csv

Output:

Smith - John, Age: 35
Doe - Jane, Age: 28
Johnson - Bob, Age: 42
Williams - Alice, Age: 31

Pattern Matching

Process only lines that match specific patterns:

# Only print lines where age > 30
awk -F, '$3 > 30 {print $1 " " $2 " is " $3 " years old"}' data.csv

Output:

John Smith is 35 years old
Bob Johnson is 42 years old
Alice Williams is 31 years old

Calculating Totals and Averages

Perform calculations on numeric data:

# Calculate average age
awk -F, '{ sum += $3; count++ } END { print "Average age: " sum/count }' data.csv

Output:

Average age: 34

Using BEGIN and END Blocks

Initialize variables and print summaries:

Output:

Name		Age	City
--------------------
John Smith      35	New York
Jane Doe        28	San Francisco
Bob Johnson     42	Chicago
Alice Williams  31	Boston
--------------------
Total records: 4

Custom Field and Record Separators

Process data with non-standard formats:

Output:

John is 35 years old and lives in New York
Jane is 28 years old and lives in San Francisco

Conditional Logic in awk

Implement if-else statements for complex decision making:

Output:

John: Average = 85.0, Grade = B
Mary: Average = 91.3, Grade = A
Peter: Average = 67.7, Grade = D
Sarah: Average = 91.0, Grade = A

Arrays in awk

Use arrays for more complex data processing:

Output:

Word Frequency:
quick       2
than        1
runs        1
dog         2
jumps       1
brown       2
lazy        1
over        1
a           1
the         2
fox         2
faster      1

Built-in Functions

awk includes numerous built-in functions for string and mathematical operations:

Output:

Original: string1 UPPERCASE 3.14159
Modified: STRING1 uppercase (length: 7)
Math: rounded = 3, sqrt = 1.7725

Original: string2 lowercase 2.71828
Modified: STRING2 lowercase (length: 7)
Math: rounded = 2, sqrt = 1.6488

Original: string3 MixedCase 1.61803
Modified: STRING3 mixedcase (length: 7)
Math: rounded = 1, sqrt = 1.2722

Formatted Output with printf

Create precisely formatted output with printf:

Output:

Region          Q1         Q2         Q3      Total
------          --         --         --      -----
North      10500.00    8500.00   12000.00   31000.00
South       9500.00    7200.00    8300.00   25000.00
East       12300.00   10500.00   11200.00   34000.00
West        8200.00    9200.00   10100.00   27500.00
------          --         --         --      -----
Total      40500.00   35400.00   41600.00  117500.00

🚄 Combined Usage of sed and awk

🔍 Understanding sed and awk Commands

sed 's/, */,/g' example.txt
→ Replaces comma-space patterns with just commas
→ 's' indicates substitution
→ 'g' flag means global replacement (applies to all occurrences in each line)

awk -F, '$3 > 30 {print $2}'
→ Processes the output from sed
→ '-F' sets comma as the field separator
→ '$3 > 30 {print $2}' instructs awk to print the second field when the third field (age) is greater than 30

sed and awk in Modern DevOps Workflows

In today’s DevOps environments, sed and awk remain invaluable for their ability to quickly transform and analyze configuration files, logs, and deployment artifacts. Here are some modern applications:

Infrastructure as Code (IaC)

Container Orchestration

GitOps Workflows

Common Options

sed Options:

-i: Edit files in place
-i.bak: Edit files in place but create backup with .bak extension
-n: Suppress automatic printing of pattern space
-e: Add the script to the commands to be executed
-f: Add the contents of script-file to the commands to be executed
s/pattern/replacement/: Substitute pattern with replacement
g: Global replacement (all occurrences in each line)
p: Print the pattern space
d: Delete pattern space and start next cycle
w filename: Write pattern space to file

awk Options:

-F: Specify field separator
-f: Read program from file
-v var=value: Assign value to variable var
-W: Set warning level
$n: Reference nth field
$0: Reference entire line
NF: Number of fields in current record
NR: Current record number
FS: Input field separator
OFS: Output field separator
RS: Input record separator
ORS: Output record separator
print: Output specified fields
printf: Formatted output
pattern {action}: Perform action when pattern matches
BEGIN {action}: Execute action before processing any input
END {action}: Execute action after processing all input

Mastering sed and awk - The Ultimate Guide to Text Processing in Linux

Introduction

What are sed and awk?

Comparing sed and awk

The sed Command in Depth

How sed Works

Basic sed Syntax

Common sed Commands

Essential sed Examples

Basic Substitution

Global Substitution

In-place Editing

Creating Backups During In-place Editing

Deleting Lines

Working with Line Numbers

Multiple Commands with -e

Using Address Ranges

Advanced sed Techniques

Using the Hold Buffer

Extended Regular Expressions with -E

Multiline Processing

Conditional Processing with sed

Character Translations

Practical sed Use Cases

Batch File Processing

Log File Analysis

Comment/Uncomment Configuration Lines

CSV Data Processing

The awk Command in Depth

How awk Works

Basic awk Syntax

Understanding awk Fields

Key awk Variables

Essential awk Examples

Basic Field Processing

Pattern Matching

Calculating Totals and Averages

Using BEGIN and END Blocks

Custom Field and Record Separators

Conditional Logic in awk

Arrays in awk

Built-in Functions

Formatted Output with printf

🚄 Combined Usage of sed and awk

sed and awk in Modern DevOps Workflows

Infrastructure as Code (IaC)

Container Orchestration

GitOps Workflows

Common Options

sed Options:

awk Options:

Reference

Understanding chattr and lsattr - Linux File Attribute Management

Share

Somaz

Comments