Linux high IO wait is a common Linux performance issue. Today we will look at what io wait means and what contributes to this problem. Hope this can give you more ideas about how to fix high IOwait issue.
- What is Linux IO wait?
- Determine if an IO problem is causing the system to slow
- Find which disk is being written
- Find processes that cause high IO wait
- Find which file caused the iowait
What is Linux IO wait?
The iowait column on top command output shows the percentage of time that the processor was waiting for I/O to complete. It indicates that the system is waiting on disk or network IO. Because the system is waiting on those resources, it can not fully utilize the CPU.
IO wait is related to the CPU resource on the server.
Learn more about Linux IO wait here.
Determine if an IO problem is causing the system to slow
Verify that the system is slow due to I/O we can use multiple commands, but the simplest is the Linux command top.
top – 15:19:26 up 6:10, 4 users, load average: 0.00, 0.01, 0.05
Tasks: 147 total, 1 running, 146 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 96.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 999936 total, 121588 free, 328672 used, 549676 buff/cache
KiB Swap: 2097148 total, 2095792 free, 1356 used. 450460 avail Mem[/dt_code]
From the CPU line we can see the percentage of CPU wasted on I/O wait; the higher the number, the more CPU resources are waiting for I/O permissions.
[dt_code]wa — iowait
Amount of time the CPU has been waiting for I/O to complete.[/dt_code]
Find which disk is being written
The top command above explains I/O wait from a whole, but does not indicate which disk is affected, and to know which disk is causing the problem, we use another command Iostat command. Check this post to troubleshooting Disk Issues In Linux for more.
[ ~]# iostat -x 2 5
Linux 3.10.0-514.el7.x86_64 (localhost.localdomain) 2017年03月03日 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.34 0.00 0.31 0.01 0.00 99.33
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.05 1.16 0.17 39.00 17.38 84.60 0.00 2.17 0.87 11.14 0.65 111.41
scd0 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.64 0.64 0.00 0.64 0.00
dm-0 0.00 0.00 1.10 0.20 37.85 17.21 84.71 0.00 2.43 0.90 10.88 0.66 0.09
dm-1 0.00 0.00 0.01 0.02 0.07 0.08 9.70 0.00 1.42 0.27 2.05 0.09 0.00
In the example above, Iostat will be updated every 2 seconds, printing 5 times of information, and-X’s option is to print out the extended information
The first Iostat report prints the statistics after the last boot of the system, which means that in most cases the first printed information should be ignored, and the remaining reports are based on the time of the previous interval.
For example, this command will be printed 5 times, the second report is a statistic from the first report, the third time is based on the second, and so on.
In the above example, the%utilized of sda is 111.41%, which is a good indication that a process is being written to the sda disk.
In addition to%utilized, we can get richer resources from iostat, such as read/write requests per millisecond (rrqm/s & wrqm/s)), read and write per second (r/s & w/s), and of course more. In the example above, our project seems to be reading and writing very much information. This is very useful for us to find the appropriate process.
Find processes that cause high IO wait
The simplest way to find the culprit is to [use Linux Iotop to check disk IO usage Per process] , by looking at iotop statistics, we can easily guide sshd as the culprit.
Although Iotop is a very powerful tool and is easy to use, it is not installed by default on all Linux operating systems.
[[email protected] ~]# iotop
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1028 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % sshd
Find which file caused the iowait
The lsof command can show all the files that a process opens, or all the processes that open a file. From this list, we can find out exactly what files are written, depending on the size of the file and the specific data of the IO file in/proc
We can use the-p method to reduce the output, PID is the specific process
[dt_code][ ~]# lsof -p 1028
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
sshd 1028 root cwd DIR 253,0 233 64 /
sshd 1028 root rtd DIR 253,0 233 64 /
sshd 1028 root txt REG 253,0 819640 2393730 /usr/sbin/sshd
sshd 1028 root mem REG 253,0 61752 180464 /usr/lib64/libnss_files-2.17.so
sshd 1028 root mem REG 253,0 43928 180476 /usr/lib64/librt-2.17.so
sshd 1028 root mem REG 253,0 15688 269136 /usr/lib64/libkeyutils.so.1.5
sshd 1028 root mem REG 253,0 62744 482870 /usr/lib64/libkrb5support.so.0.1
sshd 1028 root mem REG 253,0 11384 180425 /usr/lib64/libfreebl3.so
sshd 1028 root mem REG 253,0 143352 180472 /usr/lib64/libpthread-2.17.so[/dt_code]
In order to further confirm that these files are read and written frequently, we can view them with the following command.
[dt_code][[email protected] ~] # df / tmp
File system 1K-block used available used% mount point
/ dev / mapper / cl-root 17811456 3981928 13829528 23% /[/dt_code]
From the results of the above command, we can determine that/tmp is the root of our environment’s logical disk.