process
After logging in one day, I found that the file creation failed, indicating that there is insufficient space. So I looked at it with df command, and the utilization rate really reached 100%.
The first reaction was that the nginx log blew up the hard disk. Because it was a test server, I deleted the access log without saying a word.
Last df look, What??! Or 100% utilization?!
It's really a meal. The operation is as fierce as a tiger, but there are no eggs
This is very strange, so I began to check the problems one by one. First, use the Du - h -- max depth = 1 command to check the storage occupation of the next level directory and files under the root path:
$ du -h --max-depth=1 /
872K /run
5.6G /var
2.1G /usr
1.3G /mnt
...
9.3G /
Strangely, it is found that the total occupation of the root directory is only about 9.3 G! What the hell is this?
Now the key question is why the results of df and du commands are inconsistent?
So google speculated that the deleted file was still occupied by the process, so it confirmed it with the following command on the Internet:
$ lsof -a +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
openresty 864 nobody 4w REG 253,1 909429464760 0 659511 /mnt/fastmock/logs/access.log (deleted)
openresty 865 nobody 4w REG 253,1 909429464760 0 659511 /mnt/fastmock/logs/access.log (deleted)
mysqld 2417 polkitd 4u REG 0,38 0 0 659108 /tmp/ibAQqtrD (deleted)
mysqld 2417 polkitd 5u REG 0,38 0 0 659454 /tmp/ib3mPnlj (deleted)
mysqld 2417 polkitd 6u REG 0,38 0 0 659482 /tmp/ibiFOifZ (deleted)
...
Note: the SIZE column shows the SIZE of the deleted file, and COMMAND and PID can help us find the corresponding process.
After confirming that it is the nginx (open resty) process, we must restart the corresponding process to release the occupied files. Since the log file is actually occupied by the worker process of nginx, the nginx service is not required here, and you only need to restart the worker process with the replant command.
$ openresty -s reload
Using the lsof command again, you can confirm that the deleted file previously occupied by openresty has been released:
$ lsof -a +L1
mysqld 2417 polkitd 4u REG 0,38 0 0 659108 /tmp/ibAQqtrD (deleted)
mysqld 2417 polkitd 5u REG 0,38 0 0 659454 /tmp/ib3mPnlj (deleted)
mysqld 2417 polkitd 6u REG 0,38 0 0 659482 /tmp/ibiFOifZ (deleted)
...
Then use the df command to confirm that the storage space has been completely released:
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 99G 6.9G 87G 8% /
So far, the problem is solved!
Root cause
Because the rm command is used to delete the access log of nginx during the running of nginx, although you can't see the corresponding log file on ls or du, the file has not been really deleted. On linux, the rm command deletes a file by unlink ing it from the file structure.
However, if the file is open (for example, used by the process), the process can still access these files. This is why when you delete the file on Linux, there will be no prompt that the file similar to windows is occupied by other programs and cannot be deleted. These files will not be deleted until the occupied process stops.
takeaway
Why do you write a file to redo the disk in this ordinary trouble shooting process? In fact, after careful analysis, I think there are still many gains to share with you:
- In the production environment, please keep the habit of log configuration reversal. The log rotation function of Linux will be analyzed in detail. Here is an nginx log reversal configuration for reference:
$ cat /etc/logrotate.d/fastmock
/mnt/fastmock/logs/*.log {
daily
size 4k
rotate 5
compress
copytruncate
dateext
sharedscripts
postrotate
/bin/kill -HUP `cat /usr/local/openresty/nginx/logs/nginx.pid 2> /dev/null` 2> /dev/null || true
endscript
}
- Use truncate to clean up log files instead of deleting them directly, for example:
> logs/access.log
- Asking the right questions during the search will get you twice the result with half the effort. The key phenomenon of the problem is that the results of du and df are inconsistent, so I describe it as follows:
du show disk full but can't find
- Difference between du and dh:
According to the manual description, the df command reports the disk utilization of the file system.
df - report file system disk space usage
The du command estimates the file usage.
du - estimate file space usage
du works at the file level to estimate the recursive file size statistics for a given path.
df is estimated from the file system level, and its result is directly the result of kernel call.
- To view a file that has been marked as deleted, this is recommended first:
lsof -a +L1
If you can't remember, you can use the grep command to filter:
lsof | grep deleted
Another method is to directly use the find command to find files:
find /proc/*/fd -ls | grep '(deleted)'