Using Regular Expressions in JavaScript

Regular expressions provide a powerful way to perform pattern matching on certain characters within strings of text. They offer a concise syntax to carry out complex tasks that otherwise would require lengthy code. Here is an example of a regular expression:

var regex = /^\d{2}$/;

A regular expression literal is specified in the form /pattern/modifiers, where "pattern" is the regular expression itself and the optional "modifiers" section specifies various options. The pattern portion above starts with an ^ indicating the beginning of a string. The \d indicates a digit followed by {2} meaning 2 consecutive digits. The $ indicates end of a string. So, this pattern will attempt to find exactly 2 consecutive digits from the beginning to the end of a string.

In the next example we apply the pattern to the string "29". As expected it will find a match, whereas if we try this pattern against the string "3g8" it will not (please note that the test() method will be explained later in this section).

var regex = /^\d{2}$/;
var str1 = "29";
alert(regex.test(str1));     // => true   
var str2 = "3g8";
alert(regex.test(str2));     // => false  
Run

You can add 3 possible modifiers to the regular expression: case-sensitivity (i), global searches (g), and multiline mode (m). These modifiers influence how the string is parsed. You can combine these by stringing them together, like so: /pattern/gim.

Let's look at the actual pattern in some more detail. A pattern can be simple or very complex. For example matching the string "Facebook" is as simple as /Facebook/, but matching emails, XML, or HTML tags can be rather complex. For example, an email pattern in which you test whether an input string is a valid email may look like this: /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$/.

Regular expressions can be used for parsing, substitution, and format checking. They are often used in form validation, for example to validate an email or social security number. JavaScript's String type has 4 built-in methods: search(), replace(), split(), and match(), all of which accept regular expressions as an argument to help them perform their respective operations.

When searching in a given string, you can specify the pattern you're looking for. Below we search for the first instance of a lower-case character [a-z] followed by one or more digits [0-9]+.

var str = "James Bond007";
var regex = /[a-z][0-9]+/;
alert(str.search(regex));   // => 9. the position of 'd' 
Run

The next expression parses the string, removes all the digits [\d] from it, and returns the remaining string. Note that the g is a global modifier which means find all matches and don't stop after the first match is found.

var str = "James Bond007";
var regex = /[\d]/g;
alert(str.replace (regex, ""));   // => James Bond
Run

The method match() tests whether a string matches a regular expression or extracts specific parts of a string. If nothing is found it returns null, else it returns an array of all matches.

var regExp = /\d+/g;
var str = "Mary: 36, Tim: 38";
alert(str.match(regExp));      // => [36,38]
Run

The method split() splits a string into an array of strings based on the regular expression. It uses the regular expression as a delimiter to split the string. The delimiters themselves are not included in the resulting array.

var regex = /[,:.]/;       // delimiters are comma, colon, and period
var str = 'Tim:20,Henry:30.Linda:35';
alert(str.split(regex));   // => [Tim,20,Henry,30,Linda,35]
Run

In JavaScript, there are actually two ways to create a regular expressions: using a regular expression literal, which we have discussed before and using a RegExp() constructor. In fact, regular expression literals implicitly make a call to the RegExp constructor. You can also call RegExp() yourself to create objects that represent regular expressions. When defining a pattern using RegExp it must be enclosed by quotes and any special character must be escaped with a backslash to retain their meaning. In the example below we see that this adds considerable complexity to the pattern, which is why they are not recommended.

var regex = new RegExp("^\\s*(\\+|-)?\\d+\\s*$");
var str = "-1";
alert(regex.test(str));     // => true
Run

The test() method on RegExp checks whether the string argument contains a match of the pattern specified by the regular expression. Here we are testing if the string is an integer, with an optional sign (+/-) character. The test() returns true.

Another method on RegExp called exec() checks whether the string argument contains one or more occurrences of the pattern specified by the regular expression. If no match is found, it returns null. If a match is found, it returns an array whose first element is the string that matches the entire pattern. In the next example we demonstrate the use of exec().

Suppose you have a web page and you wish to extract valid email addresses from it (this is called screen-scraping). This is how it can be done in JavaScript:

var page = "annie@js.org, jim@email.com, some-invalid-email-address";
var regex = 
    /[0-9a-zA-Z]+@[0-9a-zA-Z]+[\.]{1}[0-9a-zA-Z]+[\.]?[0-9a-zA-Z]+/g;
do {
     var address = regex.exec(page);
     alert(address[0]);       // => annie@js.org, then jim@email.com 
} while (address.index < page.length);
Run

Without the global modifier (g) the exec() would have returned the first email address only and the index property on the address array would have remained the same, i.e. 0, resulting in an infinite loop. In order to move ahead, the global flag must be set.