""
Index

Appendices

Java Regex

17.1 Regex Syntax Quick Reference

Syntax Element Description Example
Literals Match exact characters a, Z, 9, .
Metacharacters Special characters with regex meaning . ^ $ * + ? { } [ ] \ | ( )
Escaping Use backslash \ to treat metacharacters as literals \. matches a dot .

Quantifiers

Syntax Description Example
* 0 or more a* matches ``, a, aa
+ 1 or more a+ matches a, aa
? 0 or 1 (optional) a? matches `` or a
{n} Exactly n times a{3} matches aaa
{n,} At least n times a{2,} matches aa, aaa
{n,m} Between n and m times a{2,4} matches aa, aaa, aaaa

Groups and Capturing

Syntax Description Example
( ... ) Capturing group (abc) matches abc
(?: ... ) Non-capturing group (?:abc) groups without capturing
(?<name> ... ) Named capturing group (Java 7+) (?<year>\d{4})
\n Backreference to nth group \1 refers to first group

Assertions

Syntax Description Example
^ Start of input (or line in multiline mode) ^abc matches abc at start
$ End of input (or line in multiline mode) xyz$ matches xyz at end
\b Word boundary \bword\b matches word as whole word
\B Non-word boundary \Bend\B matches end within a word
(?= ... ) Positive lookahead a(?=b) matches a if followed by b
(?! ... ) Negative lookahead a(?!b) matches a if not followed by b
(?<= ... ) Positive lookbehind (?<=a)b matches b if preceded by a
(?<! ... ) Negative lookbehind (?<!a)b matches b if not preceded by a

Character Classes

Syntax Description Example
[abc] Any character a, b, or c [aeiou] vowels
[a-z] Any character in the range a to z [0-9] digits
[^abc] Negated class, any char except a, b, or c [^0-9] non-digit
\d Digit (equivalent to [0-9]) \d{3} matches three digits
\D Non-digit \D+ matches non-digit chars
\w Word character (letters, digits, underscore) \w+ matches words
\W Non-word character \W+ matches punctuation, spaces
\s Whitespace (spaces, tabs, line breaks) \s* matches optional spaces
\S Non-whitespace \S+ matches non-space chars
\p{Lower} Unicode lowercase letter Matches a, β, etc.
\p{Upper} Unicode uppercase letter Matches A, Γ, etc.
\p{IsGreek} Unicode Greek script characters Matches Greek letters

Flags (Java Pattern Flags)

Flag Meaning Usage example
(?i) Case-insensitive matching (?i)abc matches ABC
(?m) Multiline mode (^ and $ match line start/end) (?m)^abc
(?s) Dotall mode (dot . matches line breaks) (?s).+
(?x) Ignore whitespace and allow comments (?x) a \s+ b

This cheat sheet summarizes the core regex elements essential for Java pattern matching. For complex patterns, combining these elements thoughtfully ensures clear, maintainable, and efficient regex.

Index

17.2 Java Regex API Summary

Java’s regex functionality is primarily provided by two core classes in the java.util.regex package: Pattern and Matcher. Here’s a concise overview of these classes and their most important methods to help you work efficiently with regex in Java.

Pattern

Matcher

Common Usage Patterns

Using Pattern and Matcher correctly—such as compiling a pattern once and reusing it—improves performance and readability. Flags let you tailor matching to your needs, while the rich set of methods helps perform extraction, validation, and transformation tasks smoothly in Java applications.

Index

17.3 Common Regex Patterns Library

Here is a curated collection of frequently used regex patterns to help you quickly handle common validation and extraction tasks in Java. Each pattern includes a brief explanation and usage notes.

Email Address

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Phone Number (US Format)

^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

IPv4 Address

\b((25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(\.|$)){4}\b

Date (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

URL (Basic)

^(https?://)?([\w.-]+)\.([a-z]{2,6})([/\w .-]*)*/?$

Password (At Least 8 chars, 1 Upper, 1 Lower, 1 Digit, 1 Special)

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Usage Notes

This library provides solid starting points for many common regex tasks in Java projects.

Index

17.4 Glossary of Terms

Quantifiers Symbols that specify how many times a pattern should repeat. Examples: * (0 or more), + (1 or more), ? (0 or 1), {n,m} (between n and m times).

Capturing Groups Parentheses () that group part of a regex and save the matched text for reuse or extraction. For example, (abc) matches "abc" and stores it as group 1.

Named Capturing Groups Groups given names for clearer access, like (?<name>pattern), accessed by name instead of number.

Lookahead Assertions Zero-width checks that assert what follows the current position without consuming characters.

Lookbehind Assertions Similar to lookahead but check the text before the current position.

Backtracking The process where the regex engine revisits previous matches to try alternative paths when a match fails. Excessive backtracking can cause performance issues.

Greedy vs. Reluctant Matching

Possessive Quantifiers Quantifiers like *+ or ++ that match as much as possible without backtracking, improving performance but potentially missing some matches.

Atomic Groups Subpatterns marked (?>...) that prevent backtracking inside the group, optimizing complex regexes.

Unicode Categories Predefined character classes in regex that match Unicode character types, e.g., \p{L} for any letter, \p{Nd} for decimal digits, supporting international text.

Word Boundaries Zero-width assertions \b that match positions between word (\w) and non-word (\W) characters, useful for matching whole words.

Flags (Modifiers) Settings that change regex behavior, such as CASE_INSENSITIVE or MULTILINE, usually passed when compiling patterns.

Escape Sequences Special characters preceded by \ to denote non-literal meanings, e.g., \d for digits, \s for whitespace.

This glossary covers foundational terms essential for understanding and writing effective Java regex patterns.

Index