Regular Expressions

What Are Regular Expressions?

Regular expressions (commonly abbreviated as regex or regexp) are sequences of characters that define search patterns. They are like powerful search tools used to match, find, and manipulate text based on specific patterns.

In simpler terms, they allow you to:

Search for specific patterns of text.
Extract data from strings.
Validate input formats (e.g., email addresses, phone numbers).
Perform search-and-replace operations.

Examples:

A regex like \d+ finds sequences of digits.
A regex like ^[a-z]+$ checks if a string contains only lowercase letters.

Who Invented Regular Expressions?

The concept of regular expressions was developed in the 1950s by American mathematician Stephen Cole Kleene. He introduced regular sets and the Kleene star (*), a mathematical notation used to describe patterns in strings, as part of his work on automata theory and formal language.

Ken Thompson, a computer scientist, implemented regular expressions in the 1960s as part of early text-processing tools for Unix systems, making regex practical and accessible in computer science.

List of Regex Syntax

Character Classes

Syntax	Description
`[abc]`	Matches any single character in the set (a, b, or c).
`[^abc]`	Matches any single character not in the set.
`[a-z]`	Matches any character in the range a to z.
`\d`	Matches any digit (equivalent to `[0-9]`).
`\D`	Matches any non-digit.
`\w`	Matches any word character (letters, digits, or `_`).
`\W`	Matches any non-word character.
`\s`	Matches any whitespace character (spaces, tabs, etc.).
`\S`	Matches any non-whitespace character.

Anchors

Syntax	Description
`^`	Matches the start of a string.
`$`	Matches the end of a string.
`\b`	Matches a word boundary.
`\B`	Matches a non-word boundary.

Quantifiers

Syntax	Description
`*`	Matches 0 or more occurrences of the preceding element.
`+`	Matches 1 or more occurrences of the preceding element.
`?`	Matches 0 or 1 occurrence of the preceding element (non-greedy).
`{n}`	Matches exactly n occurrences of the preceding element.
`{n,}`	Matches n or more occurrences of the preceding element.
`{n,m}`	Matches between n and m occurrences of the preceding element.

Group and Capturing

Syntax	Description
`(abc)`	Matches and captures `abc` as a group.
`(?:abc)`	Matches `abc` without capturing it (non-capturing group).
`(?=abc)`	Positive lookahead (asserts that `abc` follows without consuming it).
`(?!abc)`	Negative lookahead (asserts that `abc` does not follow).
`(?<=abc)`	Positive lookbehind (asserts that `abc` precedes).
`(?<!abc)`	Negative lookbehind (asserts that `abc` does not precede).

Flags

Flag	Description
`i`	Case-insensitive matching.
`g`	Global search (matches all occurrences).
`m`	Multiline mode (`^` and `$` match the start/end of lines).
`s`	Dotall mode (`.` matches newlines).
`x`	Extended mode (ignores whitespace for readability).

Example

Example Regex:

(?<name>[A-Za-z]+)\s(?<age>\d{1,3})\syears\sold,\semail:\s(?i)([a-z\d._%+-]+@[a-z\d.-]+\.[a-z]{2,4})(?-i)\s\((?#flags)\d{3}-\d{3}-\d{4}\)

Sample Input:

John 28 years old, email: john.doe123@example.com (123-456-7890)

Explanation

The regex starts by capturing a name using [A-Za-z]+ in a named group called name.

It matches a space (\s) and then captures an age using \d{1,3} in the age group.

It continues with matching the text "years old" exactly as it appears.

Next, it captures an email:

The (?i) flag ensures case-insensitive matching for the email address.
Matches one or more valid email characters ([a-z\d._%+-]+).
Matches the @ symbol, followed by the domain and a top-level domain.
The (?-i) flag turns off case-insensitivity.
Finally, it captures a phone number in the format (123-456-7890) using parentheses and \d{3}-\d{3}-\d{4}.

What Are Regular Expressions?​

Who Invented Regular Expressions?​

List of Regex Syntax​

Character Classes​

Anchors​

Quantifiers​

Group and Capturing​

Flags​

Example​

Explanation​