Skip to main content

Regular Expressions

What Are Regular Expressions?

Regular expressions (commonly abbreviated as regex or regexp) are sequences of characters that define search patterns. They are like powerful search tools used to match, find, and manipulate text based on specific patterns.

In simpler terms, they allow you to:

  • Search for specific patterns of text.
  • Extract data from strings.
  • Validate input formats (e.g., email addresses, phone numbers).
  • Perform search-and-replace operations.

Examples:

  • A regex like \d+ finds sequences of digits.
  • A regex like ^[a-z]+$ checks if a string contains only lowercase letters.

Who Invented Regular Expressions?

The concept of regular expressions was developed in the 1950s by American mathematician Stephen Cole Kleene. He introduced regular sets and the Kleene star (*), a mathematical notation used to describe patterns in strings, as part of his work on automata theory and formal language.

Ken Thompson, a computer scientist, implemented regular expressions in the 1960s as part of early text-processing tools for Unix systems, making regex practical and accessible in computer science.

List of Regex Syntax

Character Classes

SyntaxDescription
[abc]Matches any single character in the set (a, b, or c).
[^abc]Matches any single character not in the set.
[a-z]Matches any character in the range a to z.
\dMatches any digit (equivalent to [0-9]).
\DMatches any non-digit.
\wMatches any word character (letters, digits, or _).
\WMatches any non-word character.
\sMatches any whitespace character (spaces, tabs, etc.).
\SMatches any non-whitespace character.

Anchors

SyntaxDescription
^Matches the start of a string.
$Matches the end of a string.
\bMatches a word boundary.
\BMatches a non-word boundary.

Quantifiers

SyntaxDescription
*Matches 0 or more occurrences of the preceding element.
+Matches 1 or more occurrences of the preceding element.
?Matches 0 or 1 occurrence of the preceding element (non-greedy).
{n}Matches exactly n occurrences of the preceding element.
{n,}Matches n or more occurrences of the preceding element.
{n,m}Matches between n and m occurrences of the preceding element.

Group and Capturing

SyntaxDescription
(abc)Matches and captures abc as a group.
(?:abc)Matches abc without capturing it (non-capturing group).
(?=abc)Positive lookahead (asserts that abc follows without consuming it).
(?!abc)Negative lookahead (asserts that abc does not follow).
(?<=abc)Positive lookbehind (asserts that abc precedes).
(?<!abc)Negative lookbehind (asserts that abc does not precede).

Flags

FlagDescription
iCase-insensitive matching.
gGlobal search (matches all occurrences).
mMultiline mode (^ and $ match the start/end of lines).
sDotall mode (. matches newlines).
xExtended mode (ignores whitespace for readability).

Example

Example Regex:

(?<name>[A-Za-z]+)\s(?<age>\d{1,3})\syears\sold,\semail:\s(?i)([a-z\d._%+-]+@[a-z\d.-]+\.[a-z]{2,4})(?-i)\s\((?#flags)\d{3}-\d{3}-\d{4}\)

Sample Input:

John 28 years old, email: john.doe123@example.com (123-456-7890)

Explanation

The regex starts by capturing a name using [A-Za-z]+ in a named group called name.

It matches a space (\s) and then captures an age using \d{1,3} in the age group.

It continues with matching the text "years old" exactly as it appears.

Next, it captures an email:

  • The (?i) flag ensures case-insensitive matching for the email address.
  • Matches one or more valid email characters ([a-z\d._%+-]+).
  • Matches the @ symbol, followed by the domain and a top-level domain.
  • The (?-i) flag turns off case-insensitivity.
  • Finally, it captures a phone number in the format (123-456-7890) using parentheses and \d{3}-\d{3}-\d{4}.