Proficiency in JS regular expressions

by Aaron: http://www.cnblogs.com/aaronjs/archive/2012/06/30/2570970.html

Proficiency in JS regular expressions

Regular expressions can:
Testing a pattern of a string. For example, you can test an input string to see if there is a phone number mode or a credit card number mode in the string. This is called data validation.
Replace text. You can use a regular expression in a document to identify a particular text, and then you can delete it all or replace it with another text.
Extract a substring from a string based on pattern matching. It can be used to find specific text in text or input fields.

Regular expression grammar
A regular expression is a literal pattern consisting of ordinary characters (such as characters a to z) and special characters (called metacharacters). This pattern describes one or more strings to be matched when finding the text body. Regular expressions act as a template to match a character pattern with the string being searched.  

Create regular expressions

  1. var re = new RegExp();//RegExp is an object, just like Array  
  2. //But that doesn't work. You need to pass in the contents of regular expressions as strings.  
  3. re =new RegExp("a");//The simplest regular expression matches the letter a  
  4. re=new RegExp("a","i");//The second parameter represents case-insensitive matching  



The first parameter of the RegExp constructor is the text content of the regular expression, while the first parameter is an optional flag. The flag can be used in combination.

* g (full text search)
i (ignoring case)
m (multi-line lookup)

  1. var re = new RegExp("a","gi");//Match all a or A  


Regular expressions have another way of declaring literal quantities of regular expressions.

  1. var re = /a/gi;  



Methods and attributes related to regular expressions


Method of Regular Expression Object

test, which returns a Boolean value indicating whether a pattern exists in the string being looked up. If it exists, it returns true, otherwise it returns false.  
exec, run the lookup in a string in regular expression mode, and return the package < script type= "text/javascript" src= "http://www.iteye.com/javascripts/tinymce/themes/advanced/langs/zh.js"> </script> < script> < Type= "text/javascript" src= "http://www.iteye.com/javascripts/tinymce/plugins/javaeye/langs/zh.js"> </script> contains an array of the results of the lookup.  
compile, compiling regular expressions into internal formats to execute faster.  
Properties of regular expression objects

source, returns a copy of the text of the regular expression pattern. Read-only.  
lastIndex, which returns the character position, is the starting position of the next successful match in the searched string.  
1... 9, return nine recently saved parts found during pattern matching. Read-only.  
_ input ($), returns the string that executes the specification expression lookup. Read-only.  
LastMatch ($&), returns the last matched character in any regular expression search process. Read-only.  
lastParen ($+), if any, returns the last subset matches in any regular expression lookup process. Read-only.  
leftContext ($`) returns the character in the searched string from the beginning of the string to the position before the last match. Read-only.  
_ rightContext ($'), returns the characters in the searched string from the last matching position to the end of the string. Read-only.  
Some methods of String objects related to regular expressions

Match, find a match for one or more regular expressions.  
replace the substring that matches the regular expression.  
search, retrieves values that match regular expressions.  
Separate the string into an array of strings.  


Test how regular expressions work!

  1. //Test method, test string, returns true when it conforms to the pattern, otherwise returns false  
  2. var re = /he/;//The simplest regular expression will match the word he  
  3. var str = "he";  
  4. alert(re.test(str));//true  
  5. str = "we";  
  6. alert(re.test(str));//false  
  7. str = "HE";  
  8. alert(re.test(str));//false, capitalized, specify I flag (i is a representation of ignoreCase or case-insensitive) if both case and case match.  
  9. re = /he/i;  
  10. alert(re.test(str));//true  
  11. str = "Certainly!He loves her!";  
  12. alert(re.test(str));//true, as long as he(HE) is included, if only he or HE, and no other characters can be used, then ^ and$  
  13. re = /^he/i;//Stripping (^) represents the starting position of the character  
  14. alert(re.test(str));//false, because he is not at the beginning of str  
  15. str = "He is a good boy!";  
  16. alert(re.test(str));//true,He is the starting position of the character and needs to be used$  
  17. re = /^he$/i;//$denotes the end of the character position  
  18. alert(re.test(str));//false  
  19. str = "He";  
  20. alert(re.test(str));//true  
  21. //Of course, we can't find out how powerful regular expressions are because we can use == or indexOf in the example above.  
  22. re = /\s/;//\ s matches any blank characters, including spaces, tabs, page breaks, and so on  
  23. str= "user Name";//User name contains spaces  
  24. alert(re.test(str));//true  
  25. str = "user     Name";//User names contain tabs  
  26. alert(re.test(str));//true  
  27. re=/^[a-z]/i;//[] Matches any character within the specified range, where the English letters are matched, case-insensitive  
  28. str="variableName";//Variable names must begin with letters  
  29. alert(re.test(str));//true  
  30. str="123abc";  
  31. alert(re.test(str));//false  



Of course, it's not enough to know whether a string matches a pattern. We also need to know which characters match a pattern.

  1. var osVersion = "Ubuntu 8";//Eight of them represent the main version number of the system.  
  2. var re = /^[a-z]+\s+\d+$/i; //+ Numbers denote that a character must appear at least once, s denotes a blank character, d denotes a number.  
  3. alert(re.test(osVersion));//true, but we want to know the main version number.  
  4. //Another method, exec, returns an array whose first element is the complete matching content  
  5. re=/^[a-z]+\s+\d+$/i;  
  6. arr = re.exec(osVersion);  
  7. alert(arr[0]);//Output osVersion in its entirety because the entire string matches the re  
  8. //I just need to take out the numbers.  
  9. re=/\d+/;  
  10. var arr = re.exec(osVersion);  
  11. alert(arr[0]);//8  



More complex usage, using sub-matching

  1. //The first to n elements of the array returned by exec contain any sub-matches that occur in the matching  
  2. re=/^[a-z]+\s+(\d+)$/i;//Use () to create submatches  
  3. arr =re.exec(osVersion);  
  4. alert(arr[0]);//The whole osVersion, that is, the complete matching of regular expressions  
  5. alert(arr[1]);//8, the first sub-match, the fact can also take out the main version number in this way  
  6. alert(arr.length);//2  
  7. osVersion = "Ubuntu 8.10";//Remove the main version number and the minor version number  
  8. re = /^[a-z]+\s+(\d+)\.(\d+)$/i;//It is one of the metacharacters of regular expressions. If you want to use it literally, you have to escape it.  
  9. arr = re.exec(osVersion);  
  10. alert(arr[0]);//Complete osVersion  
  11. alert(arr[1]);//8  
  12. alert(arr[2]);//10  



Note that when the string does not match re, the exec method returns null.

Some methods of String objects related to regular expressions

  1. //Replace method, used to replace strings  
  2. var str ="some money";  
  3. alert(str.replace("some","much"));//much money  
  4. //The first parameter of replace can be a regular expression  
  5. var re = /\s/;//Blank character  
  6. alert(str.replace(re,"%"));//some%money  
  7. //Regular expressions are extremely convenient when you don't know how many blank characters are in a string  
  8. str ="some some             \tsome\t\f";  
  9. re = /\s+/;  
  10. alert(str.replace(re,"#"));//But that would only replace the first bunch of blank characters.  
  11. //Because a regular expression can only match once, s + exits after matching the first space.  
  12. re = /\s+/g;//g, the global flag, matches the regular expression to the entire string  
  13. alert(str.replace(re,"@"));//some@some@some@  
  14. //Another similar one is split.  
  15. var str = "a-bd-c";  
  16. var arr = str.split("-");//Return ["a","bd","c"]  
  17. //If str is entered by the user, he may enter a-bd-c or a_bd_c, but it will not be abdc (so he lost the game).  
  18. str = "a_db-c";//Users add delimiters in the way they like  
  19. re=/[^a-z]/i;//We said earlier that ^ denotes the beginning of a character, but in [] it denotes a negative character set.  
  20. //Matches any character that is not in the specified range, where all characters except letters are matched  
  21. arr = str.split(re);//Return ["a","bd","c"];  
  22. //We often use indexOf when searching in strings, and the corresponding method for regular lookup is search.  
  23. str = "My age is 18.Golden age!";//Age is not fixed. We can't locate it with indexOf.  
  24. re = /\d+/;  
  25. alert(str.search(re));//Returns the start subscript 10 of the found string  
  26. //Note that since the lookup itself returns immediately after the first occurrence, there is no need to use the g flag in search.  
  27. //The following code is error-free, but the g flag is redundant  
  28. re=/\d+/g;  
  29. alert(str.search(re));//It's still 10.  


Note that when the search method does not find a match, it returns - 1

Similar to the exec method, the match method for String objects is also used to match strings to regular expressions and return an array of results.


  1. var str = "My name is CJ.Hello everyone!";  
  2. var re = /[A-Z]/;//Match all capital letters  
  3. var arr = str.match(re);//Return array  
  4. alert(arr);//Only one M will be included in the array, because we do not use global matching  
  5. re = /[A-Z]/g;  
  6. arr = str.match(re);  
  7. alert(arr);//M,C,J,H  
  8. //Extracting words from strings  
  9. re = /\b[a-z]*\b/gi;//\ b denotes word boundaries  
  10. str = "one two three four";  
  11. alert(str.match(re));//one,two,three,four  



Some properties of RegExp object instances

  1. var re = /[a-z]/i;  
  2. alert(re.source);//Output the [a-z] string  
  3. //Note that direct alert(re) outputs regular expressions with forward slashes and flags, as defined by the re.toString method  



Each instance of the RegExp object has the lastIndex attribute, which is the starting position of the next successful match in the searched string, with the default value of -1. The lastIndex property is modified by the exec and test methods of the RegExp object. And it is writable.

  1. var re = /[A-Z]/;  
  2. //After the exec method is executed, the lastIndex property of re is modified.  
  3. var str = "Hello,World!!!";  
  4. var arr = re.exec(str);  
  5. alert(re.lastIndex);//0, because no global flag is set  
  6. re = /[A-Z]/g;  
  7. arr = re.exec(str);  
  8. alert(re.lastIndex);//1  
  9. arr = re.exec(str);  
  10. alert(re.lastIndex);//7  



When the match fails (there is no match later), or the lastIndex value is larger than the string length, executing exec and other methods will set the lastIndex to 0 (start position).

  1. var re = /[A-Z]/;  
  2. var str = "Hello,World!!!";  
  3. re.lastIndex = 120;  
  4. var arr = re.exec(str);  
  5. alert(re.lastIndex);//0  



Static properties of RegExp objects

  1. //input: The last string used for matching (the string passed to the test,exec method)  
  2. var re = /[A-Z]/;  
  3. var str = "Hello,World!!!";  
  4. var arr = re.exec(str);  
  5. alert(RegExp.input);//Hello,World!!!  
  6. re.exec("tempstr");  
  7. alert(RegExp.input);//It's still Hello,World!!!, because tempstr doesn't match.  
  8. //lastMatch  
  9. re = /[a-z]/g;  
  10. str = "hi";  
  11. re.test(str);  
  12. alert(RegExp.lastMatch);//h  
  13. re.test(str);  
  14. alert(RegExp["$&"]);//i, $& is the short name of lastMatch, but because it is not a legal variable name, so...  
  15. //lastParen: Last Matched Groups  
  16. re = /[a-z](\d+)/gi;  
  17. str = "Class1 Class2 Class3";  
  18. re.test(str);  
  19. alert(RegExp.lastParen);//1  
  20. re.test(str);  
  21. alert(RegExp["$+"]);//2  
  22. //leftContext Returns the character in the searched string from the beginning of the string to the position before the last match  
  23. //rigthContext Returns the character between the last matching position and the end of the string in the searched string  
  24. re = /[A-Z]/g;  
  25. str = "123ABC456";  
  26. re.test(str);  
  27. alert(RegExp.leftContext);//123  
  28. alert(RegExp.rightContext);//BC456  
  29. re.test(str);  
  30. alert(RegExp["$`"]);//123A  
  31. alert(RegExp["$'"]);//C456  



The multiline attribute returns whether a regular expression uses a multiline pattern. This attribute is not for an instance of a regular expression, but for all regular expressions, and this attribute is writable. (IE and Opera do not support this attribute.)

  1. alert(RegExp.multiline);  
  2. //Because IE and Opera do not support this property, it is better to specify it separately  
  3. var re = /\w+/m;  
  4. alert(re.multiline);  
  5. alert(RegExp["$*"]);//The static properties of RegExp objects are not changed by specifying an m flag for an object instance of RegExp  
  6. RegExp.multiline = true;//This opens the multi-line matching pattern for all regular expression instances  
  7. alert(RegExp.multiline);  



Notes for using metacharacters: Metacharacters are part of regular expressions. When we want to match regular expressions themselves, we must escape these metacharacters. Below are all metacharacters used in regular expressions.
( [ { \ ^ $ | ) ? * + . 

  1. var str = "?";  
  2.     var re = /?/;  
  3.     alert(re.test(str));//Error, because? It is a metacharacter and must be escaped  
  4.     re = /\?/;  
  5.     alert(re.test(str));//true  



Use RegExp constructors and use regular expression literals to create regular expression notes

  1. var str = "\?";  
  2. alert(str);//Only output?  
  3. var re = /\?/;//Will it match?  
  4. alert(re.test(str));//true  
  5. re = new RegExp("\?");//Error, because it corresponds to re = /?/  
  6. re = new RegExp("\\?");//Correct. Will it match?  
  7. alert(re.test(str));//true  



Since double escape is so unfriendly, it's still a literal declaration of regular expressions.

How to use special characters in regular expressions?  

  1. //ASCII uses hexadecimal numbers to represent special characters  
  2. var re = /^\x43\x4A$/;//Will match CJ  
  3. alert(re.test("CJ"));//true  
  4. //Octal mode can also be used  
  5. re = /^\103\112$/;//Will match CJ  
  6. alert(re.test("CJ"));//true  
  7. //Unicode encoding can also be used  
  8. re =/^\u0043\u004A$/;//With Unicode, you must start with u, followed by a four-bit hexadecimal representation of character encoding  
  9. alert(re.test("CJ"));  



In addition, there are other predefined special characters, as shown in the following table:

Character Description
\ n. Line breaks
\ r) Return Character
\ t tabs
\ f) Page breaks (Tab)
\ cX. Control characters corresponding to X
\ b) Backspace
\ v. Vertical tabs
\ 0) Null character ("")

Character Class - "Simple Class, Reverse Class, Range Class, Combination Class, Predefined Class"

  1. //Simple class  
  2. var re = /[abc123]/;//One of the six characters abc123 will be matched  
  3. //Negative class  
  4. re = /[^abc]/;//A character other than abc will be matched  
  5. //Scope class  
  6. re = /[a-b]/;//Will match the lowercase a-b 26 letters  
  7. re = /[^0-9]/;//A character that divides a match by 0-9 10 characters  
  8. //Combinatorial class  
  9. re = /[a-b0-9A-Z_]/;//Matches letters, numbers and underscores  



Below are predefined classes in regular expressions


Code equivalent to match
Under IE [^ n], other [^ n\r] matches any character other than newline characters.
\ d [0-9] * Matching numbers
\ D [^ 0-9] * Matches non-numeric characters
\ s [ n r t f x0B]] Matches a blank character
\ S [^ n r t f x0B]] Matches a non-blank character
\ w [a-zA-Z0-9_]] Matches alphanumeric and underscore
\ W [^ a-zA-Z0-9_]] Matches characters other than alphanumeric underscores



Quantifiers (the following are greedy quantifiers when they appear alone)

Code description
* Match the previous subexpression zero or more times. For example, zo* matches "z" and "zoo". * Equivalent to {0,}.  
+ Match the previous subexpression one or more times. For example,'zo+'matches Zo and zoo, but not z. + Equivalent to {1,}.  
Match the previous subexpression zero or once. For example, "do(es)?" can match "do" in "do" or "do". Is equivalent to {0,1}.  
{n} n is a non-negative integer. The matching is determined n times. For example,'o{2}'does not match'o' in'Bob', but matches two o's in'food'.  
{n,} n is a non-negative integer. Match at least n times. For example,'o{2,}'does not match'o' in'Bob', but matches all o's in'foood'. 'o{1,}'is equivalent to'o+'. 'o{0,}'is equivalent to'o*'.  
{n,m} m and N are non-negative integers, where n <= M. At least n matches and at most M matches. Liu, "o{1,3}" will match the first three o in "fooooood". 'o{0,1}'is equivalent to'o?'. Please note that there should be no spaces between commas and two numbers.  


Greedy Quantifier and Inert Quantifier

When matching with a greedy quantifier, it first regards the entire assembly string as a match, exits if it matches, truncates the last character to match if it does not match, and if it does not, continues truncating the last character to match until there is a match. Up to now, the quantifiers we encounter are greedy quantifiers.
When matching with an inert quantifier, it first regards the first character as a match, exits if it succeeds, and tests the first two characters if it fails, increasing until it meets the appropriate match.

Inert quantifiers only add a "?" after greedy quantifiers, such as "a+" is greedy matching, "a+?" is inert.

  1. var str = "abc";  
  2. var re = /\w+/;//Will match abc  
  3. re = /\w+?/;//Will match a  


Multi-line mode

  1. var re = /[a-z]$/;  
  2.     var str = "ab\ncdef";  
  3.     alert(str.replace(re,"#"));//ab\ncde#  
  4.     re =/[a-z]$/m;  
  5.     alert(str.replace(re,"#"));//a#\ncde#  



Grouping and non-capturing grouping

  1. re = /abc{2}/;//Will match abcc  
  2. re = /(abc){2}/;//abcabc will be matched  
  3. //The above groupings are capturing groupings  
  4. str = "abcabc ###";  
  5. arr = re.exec(str);  
  6. alert(arr[1]);//abc  
  7. //Non-capturing grouping (?:)  
  8. re = /(?:abc){2}/;  
  9. arr = re.exec(str);  
  10. alert(arr[1]);//undefined  



Candidates (or)

  1. re = /^a|bc$/;//Will match the start position a or the end position bc  
  2. str ="add";  
  3. alert(re.test(str));//true  
  4. re = /^(a|bc)$/;//Will match a or bc  
  5. str ="bc";  
  6. alert(re.test(str));//true  



After regular expressions containing groupings have been tested, matched and searched, each grouping is placed in a special place for future use. These storage are special values in groupings, which we call reverse references.

  1. var re = /(A?(B?(C?)))/;  
  2. /*The above regular expression will generate three groupings in turn 
  3. (A?(B?(C?))) Outermost 
  4. (B?(C?)) 
  5. (C?)*/  
  6. str = "ABC";  
  7. re.test(str);//Reverse references are stored in the static properties of RegExp objects1—9 in  
  8. alert(RegExp.$1+"\n"+RegExp.$2+"\n"+RegExp.$3);  
  9. //Reverse references can also be used in regular expressions in the form of1,2...  
  10. re = /\d+(\D)\d+\1\d+/;  
  11. str = "2008-1-1";  
  12. alert(re.test(str));//true  
  13. str = "2008-4_3";  
  14. alert(re.test(str));//false  



The use of reverse references can require that characters at several positions in a string must be the same. In addition, special character sequences can be used to represent reverse references in methods such as replace.

  1. re = /(\d)\s(\d)/;  
  2. str = "1234 5678";  
  3. alert(str.replace(re,"21"));//Here it is.surfaceshowThe firstOneindividualbranchgroup1 Represents the first grouping 1234,2 That means 5678  



Others are looking forward to capturing characters that appear before a particular character, only when the character is followed by a particular character. There is a negative forward looking corresponding to a positive forward looking. It matches a character only when it is not followed by a particular character. When performing operations such as forward-looking and negative forward-looking, the regular expression engine pays attention to the part behind the string, but does not move the index.

  1. //Forward looking  
  2. re = /([a-z]+(?=\d))/i;  
  3. //We want to match the words followed by a number, and then return the words instead of the numbers.  
  4. str = "abc every1 abc";  
  5. alert(re.test(str));//true  
  6. alert(RegExp.$1);//every  
  7. alert(re.lastIndex);//The advantage of using forward-looking is that the forward-looking content (?=\ d) will not be treated as a match, and the next match will start with it.  
  8. //Negative Prospect (?!)  
  9. re = /([a-z](?!\d))/;i  
  10. //The matching will be followed by letters that do not contain numbers and will not return the contents in (?! d)  
  11. str = "abc1 one";  
  12. alert(re.test(str));  
  13. alert(RegExp.$1);//one  



Construct a regular expression to verify the validity of e-mail address. E-mail address validity requirement (let's just define it this way): User name can only contain alphanumeric and underscores, at least one, up to 25 bits, followed by @, followed by domain name, domain name requirement can only contain alphanumeric and minus sign (-), and can not start or end with minus sign, followed by domain name suffix (there can be multiple), domain name suffix Must be a dot number with 2-4 digits of English letters.

  1. var re = /^\w{1,15}(?:@(?!-))(?:(?:[a-z0-9-]*)(?:[a-z0-9](?!-))(?:\.(?!-)))+[a-z]{2,4}$/;  

Keywords: Attribute IE Javascript Ubuntu

Added by demon_athens on Sun, 26 May 2019 20:45:20 +0300