""
Syntax Element | Description | Example |
---|---|---|
Literals | Match exact characters | a , Z , 9 , . |
Metacharacters | Special characters with regex meaning | . ^ $ * + ? { } [ ] \ | ( ) |
Escaping | Use backslash \ to treat metacharacters as literals |
\. matches a dot . |
Syntax | Description | Example |
---|---|---|
* |
0 or more | a* matches ``, a , aa |
+ |
1 or more | a+ matches a , aa |
? |
0 or 1 (optional) | a? matches `` or a |
{n} |
Exactly n times | a{3} matches aaa |
{n,} |
At least n times | a{2,} matches aa , aaa |
{n,m} |
Between n and m times | a{2,4} matches aa , aaa , aaaa |
Syntax | Description | Example |
---|---|---|
( ... ) |
Capturing group | (abc) matches abc |
(?: ... ) |
Non-capturing group | (?:abc) groups without capturing |
(?<name> ... ) |
Named capturing group (Java 7+) | (?<year>\d{4}) |
\n |
Backreference to nth group | \1 refers to first group |
Syntax | Description | Example |
---|---|---|
^ |
Start of input (or line in multiline mode) | ^abc matches abc at start |
$ |
End of input (or line in multiline mode) | xyz$ matches xyz at end |
\b |
Word boundary | \bword\b matches word as whole word |
\B |
Non-word boundary | \Bend\B matches end within a word |
(?= ... ) |
Positive lookahead | a(?=b) matches a if followed by b |
(?! ... ) |
Negative lookahead | a(?!b) matches a if not followed by b |
(?<= ... ) |
Positive lookbehind | (?<=a)b matches b if preceded by a |
(?<! ... ) |
Negative lookbehind | (?<!a)b matches b if not preceded by a |
Syntax | Description | Example |
---|---|---|
[abc] |
Any character a, b, or c | [aeiou] vowels |
[a-z] |
Any character in the range a to z | [0-9] digits |
[^abc] |
Negated class, any char except a, b, or c | [^0-9] non-digit |
\d |
Digit (equivalent to [0-9] ) |
\d{3} matches three digits |
\D |
Non-digit | \D+ matches non-digit chars |
\w |
Word character (letters, digits, underscore) | \w+ matches words |
\W |
Non-word character | \W+ matches punctuation, spaces |
\s |
Whitespace (spaces, tabs, line breaks) | \s* matches optional spaces |
\S |
Non-whitespace | \S+ matches non-space chars |
\p{Lower} |
Unicode lowercase letter | Matches a , β , etc. |
\p{Upper} |
Unicode uppercase letter | Matches A , Γ , etc. |
\p{IsGreek} |
Unicode Greek script characters | Matches Greek letters |
Flag | Meaning | Usage example |
---|---|---|
(?i) |
Case-insensitive matching | (?i)abc matches ABC |
(?m) |
Multiline mode (^ and $ match line start/end) |
(?m)^abc |
(?s) |
Dotall mode (dot . matches line breaks) |
(?s).+ |
(?x) |
Ignore whitespace and allow comments | (?x) a \s+ b |
This cheat sheet summarizes the core regex elements essential for Java pattern matching. For complex patterns, combining these elements thoughtfully ensures clear, maintainable, and efficient regex.
Java’s regex functionality is primarily provided by two core classes in the java.util.regex
package: Pattern
and Matcher
. Here’s a concise overview of these classes and their most important methods to help you work efficiently with regex in Java.
Represents a compiled regular expression.
Created using the static factory method:
Pattern pattern = Pattern.compile(String regex);
Supports optional flags to modify matching behavior:
Pattern.CASE_INSENSITIVE
— Case-insensitive matching.Pattern.MULTILINE
— Changes ^
and $
to match start/end of lines.Pattern.DOTALL
— Makes .
match line terminators.Pattern.UNICODE_CASE
— Enables Unicode-aware case folding.Common methods:
matcher(CharSequence input)
— Creates a Matcher
to apply the pattern to the input.split(CharSequence input)
— Splits the input around matches.pattern()
— Returns the regex string.Applies a compiled Pattern
to a specific input sequence.
Created via Pattern.matcher()
method.
Core methods:
find()
— Searches for the next subsequence matching the pattern.matches()
— Attempts to match the entire input against the pattern.lookingAt()
— Attempts to match the input’s beginning.group()
— Returns the entire matched substring.group(int group)
— Returns a specific capturing group.start()
and end()
— Indicate start and end positions of the last match.replaceAll(String replacement)
— Replaces all matches with the replacement string.replaceFirst(String replacement)
— Replaces the first match.reset()
— Resets the matcher state for reuse with the same or different input.Simple matching:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("Order 1234");
if (m.find()) {
System.out.println("Found number: " + m.group());
}
Replacing all occurrences:
String cleaned = input.replaceAll("\\s+", " ");
Splitting with regex:
String[] parts = pattern.split(input);
Using Pattern
and Matcher
correctly—such as compiling a pattern once and reusing it—improves performance and readability. Flags let you tailor matching to your needs, while the rich set of methods helps perform extraction, validation, and transformation tasks smoothly in Java applications.
Here is a curated collection of frequently used regex patterns to help you quickly handle common validation and extraction tasks in Java. Each pattern includes a brief explanation and usage notes.
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
-
, dot .
, or space.(123) 456-7890
, 123-456-7890
, 123.456.7890
.\b((25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(\.|$)){4}\b
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
^(https?://)?([\w.-]+)\.([a-z]{2,6})([/\w .-]*)*/?$
http
or https
.^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
This library provides solid starting points for many common regex tasks in Java projects.
Quantifiers Symbols that specify how many times a pattern should repeat. Examples: *
(0 or more), +
(1 or more), ?
(0 or 1), {n,m}
(between n and m times).
Capturing Groups Parentheses ()
that group part of a regex and save the matched text for reuse or extraction. For example, (abc)
matches "abc" and stores it as group 1.
Named Capturing Groups Groups given names for clearer access, like (?<name>pattern)
, accessed by name instead of number.
Lookahead Assertions Zero-width checks that assert what follows the current position without consuming characters.
(?=...)
requires the pattern to follow.(?!...)
requires the pattern not to follow.Lookbehind Assertions Similar to lookahead but check the text before the current position.
(?<=...)
asserts what precedes.(?<!...)
asserts what does not precede.Backtracking The process where the regex engine revisits previous matches to try alternative paths when a match fails. Excessive backtracking can cause performance issues.
Greedy vs. Reluctant Matching
*?
, +?
, ??
) match as little as possible.Possessive Quantifiers Quantifiers like *+
or ++
that match as much as possible without backtracking, improving performance but potentially missing some matches.
Atomic Groups Subpatterns marked (?>...)
that prevent backtracking inside the group, optimizing complex regexes.
Unicode Categories Predefined character classes in regex that match Unicode character types, e.g., \p{L}
for any letter, \p{Nd}
for decimal digits, supporting international text.
Word Boundaries Zero-width assertions \b
that match positions between word (\w
) and non-word (\W
) characters, useful for matching whole words.
Flags (Modifiers) Settings that change regex behavior, such as CASE_INSENSITIVE
or MULTILINE
, usually passed when compiling patterns.
Escape Sequences Special characters preceded by \
to denote non-literal meanings, e.g., \d
for digits, \s
for whitespace.
This glossary covers foundational terms essential for understanding and writing effective Java regex patterns.