Shell scripts -- regular expressions

The Concept of Regular Expressions

Regular expression: Use a single string to describe and match a series of strings that conform to a certain syntactic rule
It is composed of ordinary characters and special characters. It is commonly used in script programming and text editor, such as php, python, she, etc. It is abbreviated as regex, regexp, used to retrieve and replace text that conforms to the pattern. It has powerful text matching function.
It can process the text quickly and efficiently in the ocean of text.

Basic regular expressions

The string representation of regular expressions can be divided into basic regular expressions and extended regular expressions according to their rigor and function. Fundamental regular expressions are the most basic part of commonly used regular expressions. In common file processing tools in Linux systems, grep and sed support basic regular expressions, while egrep and awk support extended regular expressions.

Metacharacter Summary

$matches the end of the input string. If the Multiline property of the RegExp object is set, then "$" matches'\ n'.
Or `r'. To match the "$" character itself, use "$"
Match any single character except "rn"
\ Mark the next character as a special character, a literal character, a backward reference, and an octal escape character. For example,'n'matches the character'n'. ' n'matches the newline character. Sequence'\'matches'', while'(' matches'("
* Match the previous subexpression zero or more times. To match the "*" character, use "*"
[] Character set. Matches any character contained. For example, "[abc]" can match "a" in "plain"
[^] Assignment character set. Matches an arbitrary character that is not included. For example, "[^ abc]" can match any letter of "plin" in "plain"
[n1-n2] character range. Matches any character within the specified range. For example, "[a-z]" can match any lowercase letter character from "a" to "z".
Note: A hyphen (-) can only represent the range of a character if it is inside a character group and appears between two characters; for example
 If it appears at the beginning of a character group, it can only represent the hyphen itself.
{n} n is a non-negative integer, matching the determined n times. For example, "o{2}" does not match "o" in "Bob", but can match two "o" in "food".
{n,} n is a non-negative integer that matches at least n times. For example, "o{2,}" does not match "o" in "Bob", but matches all "o" in "foood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o *"
{n, m} m and N are non-negative integers, where n <= m, matching n times at least and m times at most

Linux Text Processing Tool

grep (filtering, but not supporting regular expressions)
egrep (supports regular expressions)
sed (line filtering)
awk (column filtering)

Demonstration of Practical Operation

First, we create and name the test file needed for the operation as test.txt file, which reads as follows

1) Find specific characters
Finding a specific character is very simple. If you execute the following command, you can find the location of the specific character "god" from the test.txt file. Where "-n" means display line number and "-i" means case insensitive. When the command is executed, the font color becomes red for characters that meet the matching criteria

[root@localhost ~]# Grep-n'god'test.txt//Filter the word'god'in the txt file and display the number of rows
8:god
16:abcgo
[root@localhost ~]# 
[root@localhost ~]# Grep - in'god'test. TXT // / Filter file with capitals or lowercases at the beginning of the word'god' and display the number of rows
8:god
15:God
16:abcgod

2) Find the set character by using the middle bracket "[]"

[root@localhost ~]# grep -n 'go[bc]l' test.txt 
12:gobl
13:gocl

To look up Repetitive Words

[root@localhost ~]# grep -n 'oo' test.txt 
9:good
10:goooood

To achieve reverse lookup, only the reverse selection of set characters "[^]" is needed.

[root@localhost ~]# Grep-n'[^g]oo'test.txt//filter words that do not start with "g" but have "oo"
2:loood
3:lood

If you don't want lowercase letters in front of "oo", you can use the "grep-n'[^a-z]oo'test.txt" command, where "a-z" means lowercase letters and "A-Z" means uppercase letters.

[root@localhost ~]# grep -n '[^a-z]oo' test.txt 
5:Goood
6:Good
[root@localhost ~]# grep -n '[^A-Z]oo' test.txt 
2:loood
3:lood
4:good

Finding rows containing numbers can be done by using the "grep-n'[0-9]'test.txt" command

[root@localhost ~]# grep -n '[0-9]' test.txt 
10:abc12345
12:12345

3) Find the beginning "^" and end character "$"

[root@localhost ~]# Grep-n'^g'test.txt// / Check the words beginning with "g"
1:gd
4:good
7:gola
8:gobl
9:gocl
[root@localhost ~]# Grep-n'l$'test.txt// / View the words ending with "l"
8:gobl
9:gocl
** If you want to query rows starting with lowercase letters, you can filter them by the'^ [a-z]'rule. If you query rows starting with uppercase letters, you use the'^ [A-Z]' rule. If you query rows not starting with letters, you use the'^[^ a-zA-Z] rule. * *
[root@localhost ~]# Grep-n'^[a-z]'test.txt//Filter words beginning with lowercase letters
1:gd
2:loood
3:lood
4:good
7:gola
8:gobl
9:gocl
10:abc12345
[root@localhost ~]# Grep-n'^[A-Z]'test.txt//Filter words beginning with capital letters
5:Goood
6:Good
11:God
[root@localhost ~]# Grep-n'^[^a-zA-Z]'test.txt//filter does not start with letters
12:12345

Execute the following command to implement the line that ends with a decimal point (.). Because decimal point (.) is also a metacharacter in regular expressions, it is necessary to use the escape character "" to convert characters with special meaning into ordinary characters.

[root@localhost ~]# grep -n '\.$' test.txt 
abc12345.
God.
12345.

When a blank line is queried, execute the command "grep - n'^$'test.txt"

[root@localhost ~]# grep -n '^$' test.txt 
15:

* 4) Find any character "." and duplicate character ", for example, the following command can be executed to find the string of"g??l", that is, a total of four characters, beginning with G and ending with L. * *

[root@localhost ~]# grep -n 'g..l' test.txt 
8:gobl
9:gocl

If you want to query oo, ooo, OOo and other information, you need to use the asterisk () metacharacter. However, it should be noted that "" represents the repetition of zero or more previous single characters. "O" means having zero (i.e. empty characters) or more characters than or equal to one "o", because empty characters are allowed, executing the "grep-n'o'test.txt" command prints out all the contents of the text. If it is "o o", then the first O must exist, and the second O is zero or more o, so all the data including o, oo, ooo, ooo, etc. are up to standard. Similarly, if the query contains at least two strings of more than o, then execute the "grep-n'ooo'test.txt" command.

Example

[root@localhost ~]# grep -n 'ooo*' test.txt 
2:loood
3:lood
4:good
5:Goood
6:Good

Queries begin with g and end with l, and contain at least one string of o, which can be implemented by executing the following commands

[root@localhost ~]# grep -n 'goo*d' test.txt 
4:good
5:gooood
6:goood

The query ends with d at the beginning of g, and the characters in the middle can be dispensable strings.

[root@localhost ~]# grep -n 'g.*d' test.txt 
1:gd
4:good
5:gooood
6:goood

Query for rows with arbitrary numbers

[root@localhost ~]# grep -n '[0-9][0-9]*' test.txt 
12:abc12345.
14:12345.

5) Find the continuous character range "{}"
We use "." and "*" to set zero to infinite number of repetitive characters, and "{}" has special significance in Shell, so when using "{}" characters, we need to use the escape character "\" to convert "{}" characters into ordinary characters.
(1) Query the characters of two o

[root@localhost ~]# grep -n 'o\{2\}' test.txt 
2:loood
3:lood
4:good
5:gooood
6:goood
7:Goood
8:Good

(2) Queries begin with g and end with l, with strings containing 2-5 o

[root@localhost ~]# grep -n 'go\{2,5\}d' test.txt 
4:good
5:goooood
6:goood

(3) Queries begin with w and end with d, with strings containing more than 2 o in between

[root@localhost ~]# grep -n 'go\{2,\}d' test.txt 
4:good
5:goooood
6:goood

Extended regular expressions

The egrep command is a search file acquisition mode that can search for any string and symbol in a file, or for strings of one or more files. A prompt can be a single character, a string, a word or a sentence.

+ Function: Repeat one or more previous characters

Example: By executing the command "egrep-n'wo+d'test.txt", you can query strings such as "wood", "woood", "woooood".
? Function: The first character of zero or one

Example: Execute the command "egrep-n'bes?T'test.txt" to query the two strings "bet" and "best"
| Function: Use or find multiple characters

Example: Execute the command "egrep-n'of | is | on'test.txt" to query "of" or "if" or "on" strings
 () role: find group string
 Example: "egrep-n't (a | e) st'test.txt". "T a st" and "test" because the two words "t" and "st" are repetitive, so "a" and "e" are listed in the "()" symbol, and separated by "|", you can query the "tast" or "test" string.
()+Function: Identifying multiple repetitive groups

Example: "egrep-n'A(xyz)+C'test.txt". The command is "A" at the beginning of the query and "C" at the end, meaning more than one "xyz" string in the middle.

Example

Repeat one or more previous characters

[root@localhost ~]# egrep -n 'go+d' test.txt 
4:good
5:goooood
6:goood

The first character of zero or one

[root@localhost ~]# egrep -n 'go?d' test.txt 
1:gd

Find multiple characters using or (or)

[root@localhost ~]# egrep -n 'ol|ob' test.txt 
9:gola
10:gobl

Find the Group String

[root@localhost ~]# egrep -n 'go(b|c)l' test.txt 
10:gobl
11:gocl

Identify multiple repetitive groups

[root@localhost ~]# egrep -n 'g(abc)+d' test.txt 
2:gabcd
3:gabcabcd

sed tools

sed is a powerful and simple text parsing and transformation tool, which can read text, edit text content (delete, replace, add, move, etc.) according to specified conditions, and finally output all lines or only some processed lines. sed can also implement quite complex text processing operations without interaction, and is widely used in Shell scripts to complete various automated processing tasks.
sed's workflow mainly includes three processes: reading, executing and displaying.
Read: sed reads a line of content from an input stream (file, pipeline, standard input) and stores it in a temporary slowdown

Stroke area (also known as pattern space).
Execution: By default, all sed commands are executed sequentially in the schema space, unless the address of the line is specified, the SED command will execute sequentially on all lines.
Display: Send the modified content to the output stream. After sending the data again, the schema space will be emptied.
Before all the file contents are processed, the above process will be repeated until all the contents are processed.

1. Common usage of SED command
Generally, there are two formats for calling sed commands, as shown below. Among them, "parameter" refers to the operation of the target file, when there are multiple operation objects, the files are separated by commas; while scriptfile represents the script file, which needs to be specified by the "-f" option. When the script file appears before the target file, it means that the input target file is processed by the specified script file.
sed format
sed [option]'operation'parameter
sed [option] - f scriptfile parameter

Common sed command options include the following.
- e or -- expression=: Represents processing input text files with specified commands or scripts.
- f or -- file=: Represents that the input text file is processed with the specified script file.
- h or -- help: Display help.
- n,--quiet or silent: Represents only the results after processing.
- i: Edit text files directly.

If you operate between rows, common operations include the following.
A: Increase by adding a specified line below the current line.
c: Replace, replacing the selected row with the specified content.
d: Delete, delete selected rows.
i: Insert, insert a specified line above the selected line.

p: Print, if you specify rows at the same time, it means print the specified rows; if you do not specify rows, it means print all content; if there are non-print characters, it is output with ASCII code. It is usually used with the "-n" option.
s: Replace, replace the specified character.
y: Character conversion.

Output eligible text (p for normal output)

[root@localhost ~]# Sed-n'p'test.txt//Output all content
godg
gabcd
gabcabcd
.....//Omit part of content
abc12345.
God.
12345.
[root@localhost ~]# Sed-n'3p'test.txt//Output line 3
gabcabcd
[root@localhost ~]# Sed-n'3,5p'test.txt//Output 3-5 lines
gabcabcd
good
goooood
[root@localhost ~]# Sed-n'p; n'test. TXT // / Output odd line content
godg
gabcabcd
goooood
Goood
gola
gocl
God.
[root@localhost ~]# Sed-n'n; p'test. TXT // / Output the contents of even rows
gabcd
good
goood
Good
gobl
abc12345.
12345.
[root@localhost ~]# Sed-n'1,5{p; n}'test.txt//Output 1-5 odd rows
godg
gabcabcd
goooood
[root@localhost ~]# Sed-n'10, ${n; p}'test.txt// even lines of the last 10 lines of output 
gocl
God.

These are the basic uses of sed commands. When combined with regular expressions, the format of SED commands is slightly different. Regular expressions are surrounded by "/". The following is an example of SED commands used in conjunction with regular expressions.

[root@localhost ~]# Sed-n'/goo/p'test.txt//output words containing "goo"
good
goooood
goood
[root@localhost ~]# Sed-n'4, /go/p'test.txt//output
good
goooood
[root@localhost ~]# Sed-n'/go/='test.txt// Output the number of rows containing "go"
1
4
5
6
9
10
11
[root@localhost ~]# Sed-n'/^G/='test.txt//Enter the number of rows starting with "G"
7
8
13
[root@localhost ~]# Sed-n'/good>/p'test.txt//Output Number of rows containing the word "good"
good

2) Delete eligible text (d)

[root@localhost ~]# NL test.txt | sed'3d'// / Deleted the third line
     1  godg
     2  gabcd
     4  good
     5  goooood
     6  goood
     7  Goood
[root@localhost ~]# NL test.txt | sed'3,5d'// / Delete lines 3 and 5
     1  godg
     2  gabcd
     6  goood
     7  Goood
     8  Good
     9  gola
[root@localhost ~]# NL test.txt | sed'/good/d'// / Delete the line of the word "good"
     1  godg
     2  gabcd
     3  gabcabcd
     5  goooood
     6  goood
     7  Goood
[root@localhost ~]# Sed'/^[a-z]/d'test.txt// / Delete those that begin with lowercase letters
Goood
Good
God.
12345.
[root@localhost ~]# Sed'/. $/d'test.txt//Delete letters ending with ".".
godg
gabcd
gabcabcd
good
goooood
goood
Goood
Good
gola
gobl
gocl
[root@localhost ~]# Sed'/^$/d'test.txt//Delete blank lines
godg
gabcd
gabcabcd
good
goooood
goood
Goood
Good
gola
gobl
gocl
abc12345.
God.
12345.

3) Replacement of eligible texts

sed 's/the/THE/' test.txt   //Replace THE first of THE lines with THE 
sed 's/l/L/2' test.txt  //Replace the third L in each row with L 
sed 's/the/THE/g' test.txt  //Replace all THE thes in THE file with THE
sed 's/o//G'test.txt// Delete all o in the file (replace with empty string)
sed 's/^/#/' test.txt   //Insert # at the beginning of each line
sed '/the/s/^/#/' test.txt      //Insert # at the beginning of each line containing the
sed 's/$/EOF/' test.txt     //Insert the string EOF at the end of each line 
sed '3,5s/the/THE/g' test.txt   //Replace all THE thes in lines 3 to 5 with THE
sed '/the/s/o/O/g' test.txt //Replace o in all rows containing the with O

4) Migrating eligible texts

sed '/the/{H;d};$G' test.txt    //Migrate the row containing the to the end of the file, {;} for multiple operations
sed '1,5{H;d};17G' test.txt //Transfer lines 1-5 to lines 17
sed '/the/w out.file' test.txt  //Save the row containing the as a file out.file
sed '/the/r /etc/hostname' test.txt //Add the contents of the file / etc/hostname to the
//After each line containing the
sed '3aNew' test.txt        //Insert a new line after line 3, New 
sed '/the/aNew' test.txt    //Insert a New line after each line containing the
sed '3aNew1\nNew2' test.txt //Insert multi-line content after line 3, withn in the middle representing newline

5) Use scripts to edit files

[root@localhost ~]# Sed'1,5 {H; d}; 7G'test. TXT // / Transfer lines 1-5 to lines 7
goood
Goood

godg
gabcd
gabcabcd
good
goooood
Good
gola
gobl
gocl
abc12345.
God.
12345.
[root@localhost ~]# vim local_only_ftp.sh #!/bin/bash
# Specify Sample File Path, Configuration File Path
SAMPLE="/usr/share/doc/vsftpd-3.0.2/EXAMPLE/INTERNET_SITE/vsftpd.conf " 
CONFIG="/etc/vsftpd/vsftpd.conf"
# Back up the original configuration file and check whether the backup file named / etc/vsftpd/vsftpd.conf.bak exists or not. If not, use the cp command to backup the file. 
[ ! -e "$CONFIG.bak" ] && cp $CONFIG $CONFIG.bak # Adjust based on sample configuration to overwrite existing files
sed -e '/^anonymous_enable/s/YES/NO/g' $SAMPLE > $CONFIG
sed -i -e '/^local_enable/s/NO/YES/g' -e '/^write_enable/s/NO/YES/g' $CONFIG grep "listen" $CONFIG || sed -i '$alisten=YES' $CONFIG
# Start the vsftpd service and set it to run automatically after booting
 systemctl restart vsftpd
systemctl enable vsftpd

awk tools

In Linux/UNIX system, awk is a powerful editing tool. It reads input text line by line, searches according to the specified matching mode, formats and outputs qualified content or filters it. It can realize quite complex text operation without interaction. It is widely used in Shell scripts to complete various automatic configuration tasks.
1. Common usage of awk
Usually, the command format used by awk is as follows, where single quotation marks plus braces "{}" are used to set the processing actions for data. Awk can process the target file directly, or it can process the target file by reading the script "-f".

awk option'mode or condition {edit instructions}'file 1 file 2// filter and output the contents of file character conditions
 awk -f script file 1, file 2, / / / from the script, edit instructions, filter and output contents.

If you need to find out the user name, user ID, group ID and other columns of / etc/passwd, execute the following awk command

[root@localhost ~]# awk -F ':' '{print $1,$3,$5}' /etc/passwd
root 0 root
bin 1 bin
daemon 2 daemon
....//Ellipsis part
awk contains several special built-in variables (available directly) as follows:
    FS: Specifies a field separator for each line of text, defaulting to a space or tab.

    NF: Number of fields in rows currently processed.
    NR: The line number (ordinal number) of the row being processed.
    $0: The entire line of the row being processed.
    $n: The nth field (column n) of the current processing row.
    FILENAME: The name of the file being processed.
    RS: Data records are separated by default to n, which means one record per action.
  1. Usage example
    1) Output text by line
awk '{print}' test.txt       //Output of all content, equivalent to cat test.txt
awk '{print $0}' test.txt   //Output of all content, equivalent to cat test.txt
awk 'NR==1,NR==3{print}' test.txt   //Output lines 1-3
awk '(NR>=1)&&(NR<=3){print}' test.txt  //Output lines 1-3
awk 'NR==1||NR==3{print}' test.txt  //Output lines 1 and 3
awk '(NR%2)==1{print}' test.txt //Output the contents of all odd lines
awk '(NR%2)==0{print}' test.txt //Output the contents of all even lines
awk '/^root/{print}' /etc/passwd    //Output lines starting with root
awk '/nologin$/{print}' /etc/passwd//Output lines ending with nologin
awk 'BEGIN {x=0} ; /\/bin\/bash$/{x++};END {print x}' /etc/passwd
//Statistically, the number of rows ending in / bin/bash is equivalent to grep - C "/ bin/bash $"/ etc / passwd 
awk 'BEGIN{RS=""};END{print NR}' /etc/squid/squid.conf
//Number of text paragraphs separated by blank lines

2) Output text by field

awk '{print $3}' test.txt   //Output the third field in each row (separated by spaces or tabs)
awk '{print $1,$3}' test.txt    //Output the first and third fields in each row
awk -F ":" '$2==""{print}' /etc/shadow //shadow record of user whose password is empty
awk 'BEGIN {FS=":"}; $2==""{print}' /etc/shadow
//shadow record of user whose password is empty
awk -F ":" '$7~"/bash"{print $1}' /etc/passwd

Tags: Linux vsftpd shell Programming

Posted on Thu, 10 Oct 2019 22:24:58 -0700 by keyboard