An Introduction to Regular Expressions

A look at the notation used to describe patterns in text.

A regular expression — regex for short — is a sequence of characters that defines a search pattern. Rather than hunting for one exact string, you describe the shape of what you're looking for and let the pattern find every piece of text that fits. Here we'll walk through how that works and the syntax you'll run into most often. There's a regex testing tool on this site if you want to experiment alongside the explanation.

What a pattern describes

The key shift is this: instead of matching one fixed piece of text, a regular expression describes a whole set of strings that share a common structure. A pattern might stand for a run of digits, a word followed by a number, or text in some particular format. Apply it to a body of text and it pulls out the parts that fit the description. That's what makes regex so handy whenever you're dealing with text that follows a predictable shape.

Common syntax elements

Patterns are built from special characters, each with its own job. A period matches any single character. Character classes, written in square brackets, match any one character from a set. Quantifiers say how many times something may repeat — a plus sign means one or more, an asterisk means zero or more, and a question mark means zero or one. Anchors pin a match to a position, such as the start or end of a line. And parentheses group parts of a pattern together. Combine these and you can describe a surprising amount with very little.

Character classes and shorthands

Most regex systems give you shorthand notations for the groups of characters you reach for again and again — one that matches any digit, another for any letter or digit, another for any whitespace. They keep patterns short and readable. You can also match characters that aren't in a given set, so a pattern can spell out what to exclude just as easily as what to include.

Common uses

In practice, regular expressions show up wherever text needs checking or reshaping. They're used to confirm that an entry is in the expected format, like sanity-checking a field in a form. They search through and pull information out of larger bodies of text. And they replace whatever matches a pattern. Text editors, programming languages, and command-line tools all tend to support them, so the skill carries across a lot of everyday work.

A few things to keep in mind

Regular expressions are powerful, but a complicated pattern can turn hard to read and harder to maintain — including for the person who wrote it. Tools and languages also support slightly different flavors of the syntax, so a pattern that works in one place may need tweaking in another. Testing against sample text before you rely on it is a simple habit that saves a lot of trouble.

Summary

A regular expression is a pattern that describes a set of strings, useful for finding, validating, extracting, or replacing text. Its syntax leans on special characters for matching, repetition, position, and grouping. Support is broad and the payoff is real, though complex patterns can get unwieldy and the exact syntax varies from one tool to the next.

Try the regex tester · Back to all articles