linux uses grep condition to search lines of large files and other operations

I. Preface

When you need to query large log files recently, you will get stuck every time you open vim, cat, etc., but you need to check how many rows of data meet the conditions, which worries me. Here are some common matching query commands.

2, Common search commands

1. grep search

                        grep parameter filename | head / / find from scratch
			grep parameter filename | wc- l / / check how many lines meet the criteria
			cat filename | grep parameter $/ / output the line content ending with the parameter

2. Example

(1) Search rows according to specific parameters

cat /data/weblogs/xxx.access.log  |grep "GET /pixel.jpg?"|wc -l 
			4102386

(2) Partial regular query

cat /data/weblogs/em.evony.com.access.log |grep "25/Nov/2019:15:[00-59]" |wc -l 
		120

Query all data at 25/Nov/2019:15, then the minutes after 15 are 00-59

(3) Pipeline connection can be used between multiple conditions to query the number of rows that meet both conditions at the same time

cat /data/weblogs/xxx.log |grep "25/Nov/2019:15:[00-59]" |grep "GET /pixel.jpg?"|wc -l 

		120

Query the number of rows matching condition 1 or 2

cat /data/weblogs/xxx.log |grep -E "25/Nov/2019:15:[00-59] |GET /pixel.jpg?"|wc -l 
			4098135

//Abbreviation: grep -E "exp1|exp2|exp3" | wc -l
//Reference: https://blog.csdn.net/lijing742180/article/details/84959963

3. grep is fuzzy query

When grep is used to search the port number, the results are not satisfactory. All the ghosts and ghosts are found. The example is as follows:

netstat -anp |grep -i '80'
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 127.0.0.1:80                0.0.0.0:*                   LISTEN      -                   
tcp        0      0 10.17.2.50:80               0.0.0.0:*                   LISTEN      -                   
tcp        0      0 216.66.17.189:80            0.0.0.0:*                   LISTEN      -                   
tcp        0      0 10.17.2.50:10050            10.17.13.2:33801            TIME_WAIT   -              

It is recommended to query the usage of port 80 better. Use the command:

 netstat -apn | awk '{split($4,arr,":"); if(arr[2] == "80") print $0}'

One step in place, found out are 80 port processes, very easy to use.

3, Search ip address in file

1, match ip

grep -Eo '([^0-9]|\b)((1[0-9]{2}|2[0-4][0-9]|25[0-5]|[1-9][0-9]|[0-9])\.){3}(1[0-9][0-9]|2[0-4][0-9]|25[0-5]|[1-9][0-9]|[0-9])([^0-9]|\b)' xxx.log | sed -nr 's/([^0-9]|\b)(([0-9]{1,3}\.){3}[0-9]{1,3})([^0-9]|\b)/\2/p'|wc -l

31116275

2. Query the number of times each ip occurs

grep -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"  xxx.log |sort|uniq -c

      2 99.203.87.103
      2 99.203.87.142
      4 99.203.87.145
      8 99.203.87.153

The front is the number of occurrences, and the back is the ip address

3. More accurate ip matching

grep -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"  xxx.log|wc -l

32929372

4. Fuzzy matching ip

grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" xxx.log|wc -l

32930309

5. To query ip by multiple conditions, first obtain the specified number of rows according to the limited conditions, and then search the number of ip

cat xxx.log |grep "25/Nov/2019:15:[00-59]" |grep "GET /pixel.jpg?"|grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}"|wc -l 
1110

I feel that these ip check methods are all bad, because the log file has been growing, so the results are not the same, and the speed of check is relatively slow, maybe the file is too large, so it is always useful to record here.

I hope that the above content can help you. Many PHPer will encounter some problems and bottlenecks when they are advanced. There is no sense of direction when they write too much business code. I don't know where to start to improve. For this, I collated some data, including but not limited to: distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, laravel, Redis, Swoole, SW Microsoft, Mysql optimization, shell script, Docker, microservice, Nginx and other advanced dry goods can be shared for free, please stamp PHP advanced architect > > > free access to video and interview documents

Or click to watch the video:

Case analysis of 100 million level pv multi-level cache concurrent architecture (load flow limiting algorithm, de duplication, delay queue)

Source: https://blog.csdn.net/LJFPHP/article/details/103378223

Tags: Programming vim Laravel Redis MySQL

Posted on Sat, 11 Apr 2020 00:18:02 -0700 by kpulatsu