June 23, 2025 9 min to read

Load Average in top Command - What It Really Means

Understanding system load metrics beyond CPU usage

Overview

Load Average is one of the most commonly monitored metrics in Linux systems administration and performance analysis. It provides a high-level view of system resource utilization over time, serving as an early warning system for potential performance bottlenecks. When you run commands like top, uptime, or htop, you’ll see three numbers representing the 1, 5, and 15-minute load averages.

However, load average is also one of the most misunderstood metrics. Many system administrators erroneously equate it directly with CPU usage, leading to incorrect interpretations like “a load of 2.0 means the CPU is 200% utilized” or “any load above 1.0 indicates an overloaded system.”

The reality is more nuanced – load average combines several factors into a single metric that reflects the “pressure” on your system. Understanding this metric correctly is essential for effective system monitoring, capacity planning, and troubleshooting in production environments.

Historical Context

The concept of load average originated in the early Unix systems of the 1970s. It was designed to provide a simple measure of system busyness that could be understood at a glance and compared over time.

The original Unix implementation only counted processes in the run queue (either running or waiting for CPU time). However, Linux later expanded this definition in the early 1990s to include processes in uninterruptible sleep states (D state), primarily those waiting for I/O operations to complete.

This difference in implementation explains why the interpretation of load average varies between Unix-like systems, and why Linux load averages tend to be higher than those on systems like FreeBSD or macOS for equivalent workloads. The Linux kernel developer Matthias Urlichs described the inclusion of I/O wait in load average as "the most unlucky design decision in the history of Unix" due to the confusion it causes, though others argue it provides a more complete picture of system resource contention.

What is Load Average?

Load Average is a measurement of system demand over time, expressed as the average number of processes that are either:

Running (using CPU)
Runnable (waiting for CPU time)
Uninterruptible Sleep (waiting for I/O operations, typically disk I/O)

Think of it as a measure of the “queue length” of processes demanding system resources. If you have a load average of 1.0, it means that over the measured time period, there was, on average, exactly one process active or waiting for resources at any given time.

The three load average numbers you typically see represent averages over different time periods:

1-minute average: Recent system activity
5-minute average: Intermediate-term trend
15-minute average: Long-term system behavior

Visualization: Understanding Load Average

Imagine a single checkout lane at a grocery store:

A load of 1.0 means there's exactly one customer being served at all times.
A load of 0.5 means the checkout lane is idle half the time.
A load of 2.0 means there's one customer being served and one waiting in line, on average.
A load of 10.0 means there's one customer being served and nine waiting in line, on average.

Now imagine you have a store with 4 checkout lanes (like a 4-core CPU):

A load of 4.0 means all checkout lanes are busy, but no one is waiting.
A load below 4.0 means some lanes are occasionally idle.
A load above 4.0 means customers are waiting (indicating potential performance issues).

How Load Average is Calculated

Load Average isn’t a simple arithmetic mean but an Exponential Moving Average. This means recent data carries more weight in the calculation, making it more responsive to changes.

1-minute average → More sensitive to recent changes
15-minute average → Better for observing long-term trends

This calculation method makes it easy to compare sudden load increases with sustained load conditions.

Components of Load Average

Load Average is not simply CPU usage, but is calculated based on the number of processes in the following states:

State	Included	Description
Running	Yes	Processes currently executing
Runnable	Yes	Processes waiting for CPU allocation
I/O Wait	Yes (Linux only)	Processes waiting for I/O (e.g., disk/network)
Sleeping / Idle	No	Processes not doing any work

In essence, Load Average represents how many processes “are currently using or waiting to use the CPU.”

Load Average vs CPU Usage

Aspect	Load Average	CPU Usage
Meaning	Overall system workload	Percentage of time CPU is actually working
Components	Running + waiting processes	CPU time (%)
Unit	Number of processes	Percentage (%)
Example	Load 2.00 = 2 processes running simultaneously	CPU usage 80% = CPU busy 80% of the time

Common Misconceptions

“Load Average is CPU usage” → ❌ No! It’s the number of processes (including waiting ones)
“Load of 2 means CPU is used at 200%” → ❌ Must consider the number of CPU cores
“High load means poor server performance” → ❌ Could be due to I/O bottlenecks

Interpreting Load Average Values

How you interpret load average depends on your system’s CPU count. The common rule of thumb is:

Load average ≈ number of CPUs: The system is fully utilized but not overloaded
Load average < number of CPUs: The system has spare capacity
Load average > number of CPUs: The system is potentially overloaded

Practical Example: Analyzing System Load

Consider a 4-core server with these load averages: 2.34, 3.45, 4.56

1-minute (2.34): The system is at ~58% capacity (2.34/4). There's still headroom.
5-minute (3.45): The system is at ~86% capacity (3.45/4). Getting close to full utilization.
15-minute (4.56): The system is at ~114% capacity (4.56/4). The server has been overloaded for some time.

This pattern suggests:

The system was overloaded (15-min average > 4)
The situation is improving (1-min average < 5-min average < 15-min average)
No immediate action may be needed, but monitoring should continue

If the pattern were reversed (1-min > 5-min > 15-min), it would indicate a worsening situation requiring prompt investigation.

Is High Load Always Bad?

Not necessarily. Consider these factors:

Duration: Brief spikes are normal during batch operations
Composition: CPU-bound load differs from I/O-bound load
System Response: If the system remains responsive despite high load, it may be acceptable
Expected Patterns: Some applications naturally create periodic high load

Interpretation Tips for Production

Checkpoints

Sudden increase: If 1-minute average is higher than 5/15-minute averages, load has increased rapidly
Consistent high load: If 15-minute average is consistently high, there’s a persistent bottleneck
I/O wait inclusion: Determine if the bottleneck is disk-related or CPU-related (use iostat, vmstat)

Monitoring Examples

Real-time monitoring available in top, uptime, htop, grafana, prometheus.node_exporter

Prometheus query example:

node_load1 / count(node_cpu_seconds_total{mode="idle"})

How is Load Average Used in Kubernetes?

In K8s node autoscaling, custom metrics based on Load Average can be applied alongside CPU usage criteria. Example: With Prometheus + KEDA combination, node_load1 / core count can be used as a criterion for scaling out.

Practical Examples

$ top
load average: 1.5, 1.2, 1.0

On a 4-core system? → Plenty of capacity
On a 1-core system? → Already experiencing wait bottlenecks

$ top
load average: 5.0, 4.9, 5.1

On a 2-core system? → Overloaded; check queries/batch jobs/IO

Troubleshooting High Load Average

When load average exceeds your CPU count for extended periods, here are some commands to help diagnose the issue:

# View top processes by CPU usage
top

# Sort by memory usage in top
# Press Shift+M while top is running

# View processes in tree format
pstree -p

# Check I/O operations
iostat -x 1

# View memory usage
free -h

# Check for processes in uninterruptible sleep (D state)
ps aux | grep -w D

# Check system activity reports over time
sar -q

The most common causes of high load average include:

CPU-bound processes: Computationally intensive tasks consuming CPU cycles
Memory pressure: Excessive swapping due to memory shortage
I/O bottlenecks: Slow disk operations causing processes to wait
Resource contention: Multiple processes competing for the same resources
Runaway processes: Processes in infinite loops or with memory leaks

Pro Tip: For AWS EC2 instances, check if you're hitting CPU credit limits on burstable instance types (T2/T3/T4g). These instances can temporarily exceed their baseline performance but will throttle once credits are exhausted.

Conclusion: “Not CPU Usage, But Length of the System’s Queue”

Load Average is not simply CPU usage, but an indirect indicator of system load showing ‘how many processes were waiting for CPU or resource allocation.’

Therefore, rather than simply evaluating it as “high/low”:

Compare with CPU core count → If Load Average is consistently higher than the CPU Core count, there may be a bottleneck
Analyze time trends → Looking at the flow between 1/5/15-minute averages helps determine if the issue is ‘sudden’ vs ‘persistent’
Combine with iowait analysis → Accurately distinguish between CPU bottlenecks and disk bottlenecks to take appropriate action

In practice, it’s best to comprehensively identify bottlenecks by looking at CPU usage + Load Average + iowait together.

Practical Tips

Short-term spikes can be ignored, but high loads persisting beyond the 15-minute average should be investigated
When I/O wait is also high, disk delays, NFS bottlenecks, or backup operations may be suspects
Visualizing trends with Prometheus, Grafana, etc. makes it much easier to identify spikes and their causes

In conclusion, Load Average is not just a number but closer to the “breathing of your system.” Being able to read this metric correctly gives you the ability to detect problem signs early and prevent failures.

somaz v3.1.2

Load Average in top Command - What It Really Means

Overview

What is Load Average?

How Load Average is Calculated

Components of Load Average

Load Average vs CPU Usage

Common Misconceptions

Interpreting Load Average Values

Is High Load Always Bad?

Interpretation Tips for Production

Checkpoints

Monitoring Examples

How is Load Average Used in Kubernetes?

Practical Examples

Troubleshooting High Load Average

Conclusion: “Not CPU Usage, But Length of the System’s Queue”

Practical Tips

References

Debugging Terraform State Errors

Somaz

Comments

Load Average in top Command - What It Really Means

Overview

What is Load Average?

How Load Average is Calculated

Components of Load Average

Load Average vs CPU Usage

Common Misconceptions

Interpreting Load Average Values

Is High Load Always Bad?

Interpretation Tips for Production

Checkpoints

Monitoring Examples

How is Load Average Used in Kubernetes?

Practical Examples

Troubleshooting High Load Average

Conclusion: “Not CPU Usage, But Length of the System’s Queue”

Practical Tips

References

Debugging Terraform State Errors

Share

Somaz

Comments