Encountered a problem that Linux system files still occupy disk after being deleted

In the use of Linux system, it is sometimes found that a large number of files (especially log files) have been deleted, but with du to view the disk, the space is still not reduced. This very strange problem has recently encountered. This article describes the causes and consequences, and gives some personal views.

cause

Recently, when the project was ready to go online, my colleagues checked the disk usage of the server and found that the disk space was reduced by several hundred megabytes every few days. They asked me to check it.
The environment is described as follows:
1. Noejs application service, managed by pm2, generates a large number of logs every day and needs to be kept. (This is necessary to keep evidence in order to talk to the manufacturer)
2. Automatically compress the log in the middle of the night. (Necessary, otherwise too many logs)
3. There is also a legacy product on the server, including source code and logs. No longer in use.

after

First, use du-h -- MAX-depth = 1 to view the space. As follows:

0       ./dev
0       ./proc
16K     ./lost+found
961M    ./root
337M    ./var
2.5G    ./usr
33M     ./etc
8.0K    ./opt
147G    ./home
9.4M    ./tmp
4.0K    ./media
226M    ./run
103M    ./boot
4.0K    ./srv
298G 

Note: If a interested friend calculates the total space of each directory, he will find that it is much smaller than the space shown in the last line. The author realized it at the last moment.
/ The home directory is an external hard disk, not mentioned here. As you can see from the above information, there are several directories that are suspicious:

961M    ./root
337M    ./var
2.5G    ./usr
226M    ./run

It was found that / usr/local/share/.cache/yarn/v2, / var/cache/yum had cached data and could be deleted. In addition, / root, / root/path has some old code and log files, sql files that are no longer used, / root/.pm2 / has log generated by pm2.

Next, look for the files that have been modified in the last day. The order is as follows:

find ./* -mtime -1

The file size obtained is not large, which may not be the reason.

Search the Internet and find articles that introduce relevant content. Use lsof |grep delete to query deleted files. As a result, there are a large number of file lists.

lsof |grep delete | wc -l
7344

Examples are as follows:

node  786  root 15w   REG  202,64     1941641 5354776 /root/logs/log39.2019-07-26.txt (deleted)
node  786  root 16w   REG  202,64   437345392 5354830 /root/logs/logtf.2019-07-26.txt (deleted)
node  786  root 17w   REG  202,64      471811 5354836 /root/logs/logtt.2019-07-26.txt (deleted)
node  786  root 18w   REG  202,64  1189231954 5354838 /root/logs/logt3.2019-07-26.txt (deleted)
node  786  root 19w   REG  202,64     2003838 5354840 /root/logs/logit.2019-07-26.txt (deleted)

They are command, PID, owner, FD, type, device, size, node and file name. Just focus on commands, PIDs, sizes and file names.

Find a process and check the file handle fd:

ls -al /proc/74408/fd
total 0
dr-x------ 2 root root  0 Jul 31 12:19 .
dr-xr-xr-x 9 root root  0 Jun 27 12:27 ..
lrwx------ 1 root root 64 Jul 31 12:19 0 -> socket:[1673886202]
l-wx------ 1 root root 64 Jul 31 12:19 1 -> /root/.pm2/pm2.log
lr-x------ 1 root root 64 Jul 31 12:19 10 -> /dev/null
lrwx------ 1 root root 64 Jul 31 12:19 100 -> socket:[3240337185]
l-wx------ 1 root root 64 Jul 31 12:19 13 -> /root/.pm2/logs/1024-error-28.log (deleted)
lrwx------ 1 root root 64 Jul 31 12:19 130 -> socket:[3207913087]
l-wx------ 1 root root 64 Jul 31 12:19 29 -> /root/logs/log34.2019-06-27.txt (deleted)
lrwx------ 1 root root 64 Jul 31 12:19 3 -> socket:[2024014357]
l-wx------ 1 root root 64 Jul 31 12:19 30 -> /root/logs/log51.2019-06-27.txt (deleted)
l-wx------ 1 root root 64 Jul 31 12:19 31 -> /root/logs/logfa.2019-06-27.txt
l-wx------ 1 root root 64 Jul 31 12:19 32 -> /root/logs/log3f.2019-06-27.txt
l-wx------ 1 root root 64 Jul 31 12:19 33 -> /root/logs/log35.2019-06-27.txt (deleted)
lrwx------ 1 root root 64 Jul 31 12:19 41 -> socket:[2024031696]
lrwx------ 1 root root 64 Jul 31 12:19 42 -> socket:[2024031697]

So far, we can find the reason: nodejs service uses log4js as log management module, backup regularly every day, compress old log files into new files, and delete old log files, but the program has been running, did not exit, that is, the occupied file handle has not been released, so a large number of deleted files should be accumulated. Use disk space.

Solve

The nodejs service is restarted manually once, which frees up a lot of space. Here is the disk space before the restart.

Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/xvda2      37024320  27152180   7984756  78% /
devtmpfs         3983648         0   3983648   0% /dev
tmpfs            3981888         0   3981888   0% /dev/shm
tmpfs            3981888     25672   3956216   1% /run
tmpfs            3981888         0   3981888   0% /sys/fs/cgroup
tmpfs             747024         0    747024   0% /run/user/0
/dev/xvde      515930552 189452476 300263676  39% /home

After restart, the space list is as follows:

Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/xvda2      37024320   9699008  25437928  28% /
devtmpfs         3983648         0   3983648   0% /dev
tmpfs            3981888         0   3981888   0% /dev/shm
tmpfs            3981888     25616   3956272   1% /run
tmpfs            3981888         0   3981888   0% /sys/fs/cgroup
tmpfs             747024         0    747024   0% /run/user/0
/dev/xvde      515930552 181515448 308200704  38% /home

In fact, the author also has doubts that log4js should automatically release the handle after backing up the log, and the search data failed. According to the observation, restarting nodejs once can solve the problem. It seems that all nodejs processes need to be restarted within a certain time interval.

guidance

1. For old materials, the deletion should be deleted and should not be retained. It has no other use except occupying space. In this survey, the old data are counted, at least 1 GB or more. Limited by the authority, I dare not clean up this time.
2. System cache (whether npm, yarn or apt, yum) is recommended to be deleted at intervals.
3. Deep study should be made on the program modules used without neglecting the details. (In the era of speed, I can't do it yet)
4. Log standardization, the information to write, should not write, do not write.
5. Program installation directory, try not to choose the system directory, such as / bin, / sbin, / usr/bin, / usr/sbin. If you have to install, only install the necessary binary in the system directory, configuration files, log files, data files, and put them in other partitions. This discovery, mysql and redis data directory and installation directory are together, this is not reasonable.

Expand

yarn related commands:
List the cache directory: yarn cache dir.
List the detailed files: yarn cache list.
Clear cache: yarn cache clean.
Set the cache directory: yarn config set cache-folder < directory >.

Find file commands that have been modified at a specified time:
Find within N days: find. /* - Mtime - < N days >
Before finding N days: find. /* - Mtime + < N days >
Just N days: find. /* - Mtime < N days >
- mtime is in days, similarly, - mmin is in minutes.

data

Disk space occupied by unknown resources
A bizarre survey of disk space occupancy
Weird Linux disk space is inexplicably occupied

PS: This article refers to the network information for the author, and combines some records obtained from the experimental experience. It is not authoritative, but only personal views.

Tags: socket Linux yum lsof

Posted on Thu, 01 Aug 2019 22:31:28 -0700 by bleh