Regular Expressions Set or Range of Characters


If you want to match against a set of characters, you can place the set inside [ ]. Here’s what it would look like in JavaScript. [abc] would mean any character a, b, or c:

var pattern = /[abc]/;
console.log(pattern.test('a')); //true
console.log(pattern.test('d')); //false

You can specify that you want to match anything but the pattern by adding a ^ (caret sign) at the beginning of the pattern:

var pattern = /[^abc]/;
console.log(pattern.test('a')); //false
console.log(pattern.test('d')); //true

One critical variation of this pattern is a range of values. If we want to match against a sequential range of characters or numbers in JavaScript, we can use the following pattern 0-5, for example:

var pattern = /[0-5]/;
console.log(pattern.test(3)); //true
console.log(pattern.test(12345)); //true
console.log(pattern.test(9)); //false
console.log(pattern.test(6789)); //false
console.log(/[0123456789]/.test("This is year 2015")); //true

This post is adapted from the book Mastering JavaScript by Ved Antani, Chapter 3: Data Structures and Manipulation.

Within a RegEx, the backslash character escapes whatever character follows it, making it a literal match term. So \[ specifies a literal match to the [ character rather than the opening of a character class expression. A double backslash (\\) matches a single backslash.

The information in this post is adapted from the book Mastering JavaScript by Ved Antani, published by Packt Publishing, Chapter 3.

In the preceding examples, we saw the test() method that returns true or false based on the pattern matched.

There are other methods: exec() match() replace() split().

Several character groups have shortcut notations. For example, the shortcut \d means the same thing as [0-9]:

Notation Meaning
\d Any digit character
\w An alphanumeric character (word character)
\s Any whitespace character (space, tab, newline, and similar)
\D A character that is not a digit
\W A non-alphanumeric character
\s A non-whitespace character
. Any character except for newline

Repeating Occurrences

If I want to match four a’s, I can write /aaaa/, but what if I want to specify a pattern that can match any number of a’s? Regular expressions provide you with a wide variety of repetition quantifiers. Repetition quantifiers let us specify how many times a particular pattern can occur. We can specify fixed values (characters should appear n times) and variable values (characters can appear at least n times till they appear m times). The following table lists the various repetition quantifiers:

? Either 0 or 1 occurrence (marks the occurrence as optional)
* zero or more occurrences
+ one or more occurrences
{n} Exactly n occurrences
{n,m} Occurrences between n and m
{n,} At least an n occurrence
{,n} Zero to n occurrences

Read the /behaviou?r/ expression as 0 or 1 occurrences of character u.

In JavaScript, here is another example of using the browser’s console to send output to the user.

console.log(/'\d+'/.test("'123'")); // true

Read and interpret the \d+ expression as ‘ is a literal character match, \d matches characters [0-9], the + quantifier will allow one or more occurrences, and ‘ is a literal character match.

You can also group character expressions using ( ). Observe the following example:

var heartyLaugh = /Ha+(Ha+)+/i;
console.log(heartyLaugh.test("HaHaHaHaHaHaHaaaaaaaaaaa"));  //true
H literal character match
a+ 1 or more occurrences of character a
( start of the expression group
H literal character match
a+ 1 or more occurrences of character a
) end of expression group
+ 1 or more occurrences of expression group (Ha+)

Word Boundaries

Often, you want to match a sequence of letters or numbers on their own and not just as a substring. This is a fairly common use case when you are matching words that are not just part of any other words. We can specify the word boundaries by using the \b pattern. The word boundary with \b matches the position where one side is a word character (letter, digit, or underscore) and the other side is not. Consider the following JavaScript examples.

console.log(/cat/.test('a black cat')); //true
console.log(/\bcat/.test('a black cat')); //true
console.log(/\bcat/.test('tomcat')); //false
console.log(/cat\b/.test('tomcat')); //true
console.log(/\bcat\b/.test('a black cat')); //true
console.log(/\bcat\b/.test("concatenate")); //false