Disk space cleanup

process

After logging in one day, I found that the file creation failed, indicating that there is insufficient space. So I looked at it with df command, and the utilization rate really reached 100%.

The first reaction was that the nginx log blew up the hard disk. Because it was a test server, I deleted the access log without saying a word.

Last df look, What??! Or 100% utilization?!

It's really a meal. The operation is as fierce as a tiger, but there are no eggs

This is very strange, so I began to check the problems one by one. First, use the Du - h -- max depth = 1 command to check the storage occupation of the next level directory and files under the root path:

$ du -h --max-depth=1 /
872K    /run
5.6G    /var
2.1G    /usr
1.3G    /mnt
...
9.3G    /

Strangely, it is found that the total occupation of the root directory is only about 9.3 G! What the hell is this?

Now the key question is why the results of df and du commands are inconsistent?

So google speculated that the deleted file was still occupied by the process, so it confirmed it with the following command on the Internet:

$ lsof -a +L1 
COMMAND     PID    USER   FD   TYPE DEVICE SIZE/OFF NLINK    NODE NAME
openresty   864  nobody    4w   REG  253,1 909429464760     0  659511 /mnt/fastmock/logs/access.log (deleted)
openresty   865  nobody    4w   REG  253,1 909429464760     0  659511 /mnt/fastmock/logs/access.log (deleted)
mysqld     2417 polkitd    4u   REG   0,38        0     0  659108 /tmp/ibAQqtrD (deleted)
mysqld     2417 polkitd    5u   REG   0,38        0     0  659454 /tmp/ib3mPnlj (deleted)
mysqld     2417 polkitd    6u   REG   0,38        0     0  659482 /tmp/ibiFOifZ (deleted)
...

Note: the SIZE column shows the SIZE of the deleted file, and COMMAND and PID can help us find the corresponding process.

After confirming that it is the nginx (open resty) process, we must restart the corresponding process to release the occupied files. Since the log file is actually occupied by the worker process of nginx, the nginx service is not required here, and you only need to restart the worker process with the replant command.

$ openresty -s reload

Using the lsof command again, you can confirm that the deleted file previously occupied by openresty has been released:

$ lsof -a +L1 
mysqld     2417 polkitd    4u   REG   0,38        0     0  659108 /tmp/ibAQqtrD (deleted)
mysqld     2417 polkitd    5u   REG   0,38        0     0  659454 /tmp/ib3mPnlj (deleted)
mysqld     2417 polkitd    6u   REG   0,38        0     0  659482 /tmp/ibiFOifZ (deleted)
...

Then use the df command to confirm that the storage space has been completely released:

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        99G  6.9G   87G   8% /

So far, the problem is solved!

Root cause

Because the rm command is used to delete the access log of nginx during the running of nginx, although you can't see the corresponding log file on ls or du, the file has not been really deleted. On linux, the rm command deletes a file by unlink ing it from the file structure.

However, if the file is open (for example, used by the process), the process can still access these files. This is why when you delete the file on Linux, there will be no prompt that the file similar to windows is occupied by other programs and cannot be deleted. These files will not be deleted until the occupied process stops.

takeaway

Why do you write a file to redo the disk in this ordinary trouble shooting process? In fact, after careful analysis, I think there are still many gains to share with you:

  • In the production environment, please keep the habit of log configuration reversal. The log rotation function of Linux will be analyzed in detail. Here is an nginx log reversal configuration for reference:
    $ cat /etc/logrotate.d/fastmock 
/mnt/fastmock/logs/*.log {
    daily
    size 4k
    rotate 5
    compress
    copytruncate
    dateext
    sharedscripts
    postrotate
        /bin/kill -HUP `cat /usr/local/openresty/nginx/logs/nginx.pid 2> /dev/null` 2> /dev/null || true
    endscript
}
  • Use truncate to clean up log files instead of deleting them directly, for example:
    > logs/access.log
  • Asking the right questions during the search will get you twice the result with half the effort. The key phenomenon of the problem is that the results of du and df are inconsistent, so I describe it as follows:
    du show disk full but can't find
  • Difference between du and dh:

According to the manual description, the df command reports the disk utilization of the file system.

df - report file system disk space usage

The du command estimates the file usage.

du - estimate file space usage

du works at the file level to estimate the recursive file size statistics for a given path.

df is estimated from the file system level, and its result is directly the result of kernel call.

  • To view a file that has been marked as deleted, this is recommended first:
    lsof -a +L1 

If you can't remember, you can use the grep command to filter:

    lsof | grep deleted

Another method is to directly use the find command to find files:

    find /proc/*/fd -ls | grep  '(deleted)'

Added by MeOnTheW3 on Wed, 08 Dec 2021 06:03:58 +0200