Regular expression

Chapter 1 understanding regularity

The so-called regular expression is actually string regular expression. For example, the familiar "*" represents all characters. In fact, it should not be called regular expression. You call it regular expression better, because its main function is to find out what you want through rules.

1. Describe the law of the string you are looking for.
2. Call the function to execute the regular expression.


//Find the 'hi' of the string
$str = 'hi,this is his history';
$patt = '/hi/';

Programmers can use it, but they don't usually use it much, so it's easy to forget.

Start with: who to look for? How to find it? How many?

  • Specific characters (literal value) - > for example, a, b, hi
  • Character boundary (BOLD below) -- > from where to where
  • Find any condition in character set [ace], [0123456789] >
  • Character complement 1 : not in the qxz range -- > do not match any conditions
  • Character range [a-Z0-9] -- > note: must be continuous, you cannot write a-Z
  • Character cluster (system defined common set, in Chapter 2) -- > system defined common set

Character boundary

  • ^Start of matching string
  • End of $match string
  • \b matches the beginning and end of a word (boundary)
  • \B non boundary of matching words

Chapter 2 common character clusters

cluster representative
(point) Any character, without line breaks
w [a-zA-Z0-9 _]
W Complement of W
s Blank characters, including nrtv, etc
S Non blank character
d [0-9]
D Non numeric

Chapter 3 word matching

// Find the hi word of the string
// Rule, beginning of word = > hi = > end of word \ b
$str = 'hi , this is some history book';
$patt = '/\bhi\b/';

//Find the hi included in the word
$patt = '/\Bhi\B/';
$str = ''

Chapter 4 examples of sets and complements

Given a set of cell phone numbers, which must be composed of [0123456789], where can I find them? From the beginning of the string, find the end of the string$ 
Who to look for [01235689]
How many? 11
$arr = array('13800138000','13487656887','434456','45454353434543');
//$pat = '/ ^ [^ 47] {11} $/'; / / complement method
$patt = '/^[01235689]{11}$/';//Collection mode
foreach($arr as $v){

Chapter 5 character range

//Try to find words in pure letters
$str = 'o2o, b2b,hello,wordl, that';
//$pat = '/ \ [a-za-z] {1,} \ b'; //{1,} at least 1 letter
$patt = '/\b[a-zA-Z]+\b';

Chapter 6 character cluster

It's the identification method specified by the system

$str = 'tommorw is another day,o2o ,you dont bird me i dont bird you';
$patt = '/\W{1,}';// \Complement of w \w[a-zA-Z0-9]
//Preg split splits a string through regular expressions

//Replace multiple spaces or tabs with a single space
$str = 'a     b        hello          world';//'a b hello world';
$patt = '/\s{1,}/'; //\s whitespace, including \ n\r\t\v, etc
//preg_replace - performs a regular expression search and replace
echo preg_replace($patt,' ',$str);

Chapter 7 find some

  • *Matches the previous subexpression zero or more times.
  • +Matches the previous subexpression one or more times.
  • ? matches the previous subexpression zero or once.
  • {n} n is a non negative integer. Match the determined n times.
    {n, m} m and N are non negative integers, where n < = m
  • Match n times at least and m times at most..
  • {n,} n is a non negative integer. Match at least N times.
$str = 'longren lao wang meng ge bi ';
// Five letter words
//$patt = '/\b[a-zA-Z]{5}\b/';

// 3-5 letter words
//$patt = '/\b[a-zA-Z]{3,5}\b/';

// Words with more than 5 letters
//$patt = '/\b[a-zA-Z]{5,}\b/';

preg_match_all($patt, $str, $res);
Some editing department, the keyboard is broken, the 0 key can't pop out, it often plays multiple 0's
 So the word good is good, good. Please replace these words with good
$s = 'goooood,goood,goooooooooood';
$p = '/go+d/';

Chapter 8 use of or

//Search for words that are pure numbers or letters
$str = 'hello o2o 2b9 250';
$patt = '/\b[a-zA-Z]+\b|\b[0-9]+\b/';//At least one.

//Query products of Apple system
$str = 'ipad,iphone,imac,ipod,iamsorry';
$patt = '/\bi(pad|phone|mac|pod)\b/';

Chapter 9 greed and non greed

$str = 'ksda good goooood good kl s ja dfs dk ';
//Replace a string like g (any amount of content) d with god
$patt = '/g.+d/'; //Default greedy mode (will match as many as possible)
print_r($res); //god is not good

$patt = '/g.+?d/'; //After the quantity (+ * {n,}) qualifier, add?, non greedy mode
print_r($res); //god,good

Chapter 10 collection of mobile phone number

$str = 'Miss Tang, contact mobile number:18611015252,Standby telephone:18828821111,QQ:381413622,,,ID number:430426199901013478';\
//Collect phone number\
$patt = '/\b1[358]\d{9}\b/';\

Chapter 11 backward references

Find words with the same ending letter

$str = 'txt hello,high,bom,mum';
//To simplify, first find that the first and last letters are all t
$patt = '/\bt\w+t\b/';

//This method is repeated 26 times, and it can also be found

//"The nth subexpression in parentheses, hit content, followed by \ n"
//Backward reference
$patt = '/\b([a-z])\w+\1\b/';
//1. Word start and end \ b\b
//2. You can start with [a-z]
//3. Follow anything, no matter. And the number of words is unlimited \ b[a-z]\w+\b
//4. The last subexpression should be the same as the first. \ b([a-z])\w+\b subexpression, put it in another array below, and the last reference subexpression matches the result \ b([a-z])\w+ \b

Replace the middle 4 digits of the mobile phone number with*

$str = '13800138000 , 13426060134 ';
//In the first three and the last four placer expressions, the middle four bits are random, and the preserver expression. Replace the middle four bits
$patt = '/(\d{3})\d{4}(\d{4})/';
//preg_match_all($patt, $str, $res);
echo preg_replace($patt, '\1****\2', $str);

Chapter 12 mode

Pattern modifiers can affect the regular analytic behavior to some extent
For example, i means regular case insensitive, / [A-Z A-Z] + / - > / [A-Z] + / i
For example, s, single line mode means that the whole file is regarded as a "single line" and carriage return is ignored

$str = 'hello WORLD  ChINa';
//$patt = '/\b[a-z]+\b/'; //hello
$patt = '/\b[a-z]+\b/i'; // ignore case
preg_match_all($patt, $str, $matches);

$str = "abc haha
abc dgh";
$patt = '/.+/s'; # Single single line mode, which treats all content as a whole line
preg_match_all($patt, $str, $matches);
//In U mode, the input parameters are regarded as the encoding of unicode character set, and Chinese can be judged
// Regular matching of Chinese, u-mode in PHP, x{4e00}-\x{9fa5}

$str = 'bob Plum';
$patt = '/^[\x{4e00}-\x{9fa5}]+$/u';
echo preg_match($patt,$str)?'Homegrown products':'Groceries';
  1. qxz

Tags: Javascript Mobile PHP Mac encoding

Posted on Sun, 10 Nov 2019 04:38:00 -0800 by Lord Sauron