Pattern
and Matcher
classesIn Java’s regex framework, two core classes are central to working with regular expressions: Pattern
and Matcher
. Together, they form the foundation for defining regex patterns and applying them to text.
Pattern
Think of the Pattern
class as a compiled blueprint of a regular expression. When you write a regex as a string (e.g., "\\d+"
to match digits), Java first compiles this string into a Pattern
object. This compilation step transforms the regex from a raw sequence of characters into an optimized internal representation that can be efficiently reused.
By compiling the regex once, you save time and resources when matching it multiple times against different input strings. The Pattern
object itself is immutable—once created, its regex cannot be changed.
Matcher
Once you have a compiled Pattern
, you need a way to apply it to actual text. That’s where the Matcher
class comes in. The Matcher
is created by invoking the matcher()
method on a Pattern
object, passing the input string you want to examine.
You can think of the Matcher
as the search engine that scans through your input string to find matches based on the Pattern
blueprint. It provides various methods like matches()
, find()
, and lookingAt()
to perform different types of matching operations.
Each Matcher
instance is tied to a specific input string. If you need to match the same pattern against a different string, you create a new Matcher
.
Here’s a simple analogy:
Pattern
is like a recipe—a fixed set of instructions for making a dish (the regex).Matcher
is the chef who uses the recipe (pattern) to prepare the dish (search for matches) in a particular kitchen (input string).The typical lifecycle in code is:
Pattern
object.Matcher
by calling pattern.matcher(inputString)
.Matcher
methods to search or extract matches.This separation between pattern definition and matching provides flexibility and efficiency, making Java’s regex API powerful and easy to use.
In Java’s regex API, before you can use a regular expression to find matches in text, you must first compile it into a Pattern
object. This is done using the static method Pattern.compile()
.
Pattern
InstanceThe simplest way to compile a regex pattern is:
Pattern pattern = Pattern.compile("your-regex-here");
For example, to match one or more digits, you write:
Pattern digitPattern = Pattern.compile("\\d+");
Remember that backslashes (\
) must be escaped in Java strings, so \d
becomes "\\d"
.
Compiling a regex pattern can be a relatively expensive operation because the regex engine parses and prepares the pattern for matching. By compiling a pattern once and reusing the resulting Pattern
object for multiple inputs, you avoid repeated compilation costs, improving performance especially in loops or large-scale text processing.
For example:
Pattern wordPattern = Pattern.compile("\\w+"); // Compile once
String[] inputs = {"apple", "banana123", "cherry"};
for (String input : inputs) {
Matcher matcher = wordPattern.matcher(input);
if (matcher.matches()) {
System.out.println(input + " is a word.");
}
}
You can compile more complex patterns involving grouping, quantifiers, or character classes:
Pattern emailPattern = Pattern.compile("[\\w.%+-]+@[\\w.-]+\\.\\w{2,}");
Additionally, the compile()
method accepts optional flags to modify behavior. For example, Pattern.CASE_INSENSITIVE
makes matching ignore letter case:
Pattern caseInsensitive = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);
In summary, using Pattern.compile()
efficiently prepares your regex for repeated use and gives you options to customize matching behavior.
matches()
, find()
, lookingAt()
The Matcher
class provides several methods to check for regex matches in input strings. Among the most commonly used are matches()
, find()
, and lookingAt()
. While they all perform pattern matching, their behaviors differ in important ways.
matches()
What it does: matches()
attempts to match the entire input string against the regex pattern. The match must span from start to finish; otherwise, it returns false
.
When to use: Use matches()
when you want to verify if the whole input conforms exactly to the pattern — for example, validating formats like email addresses, phone numbers, or IDs.
Example:
String input = "12345";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.matches()); // true, entire input is digits
input = "123abc";
matcher = pattern.matcher(input);
System.out.println(matcher.matches()); // false, contains letters
find()
What it does: find()
searches the input for the next substring that matches the pattern. It can be called repeatedly to find multiple matches within the input.
When to use: Use find()
when you want to locate one or more occurrences of a pattern anywhere inside a longer string.
Example:
String input = "abc123xyz456";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Found number: " + matcher.group());
}
// Output:
// Found number: 123
// Found number: 456
lookingAt()
What it does: lookingAt()
checks if the beginning of the input matches the pattern. Unlike matches()
, it does not require the whole string to match—only the start.
When to use: Use lookingAt()
to verify that a string starts with a particular pattern, regardless of what follows.
Example:
String input = "123abc";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.lookingAt()); // true, input starts with digits
input = "abc123";
matcher = pattern.matcher(input);
System.out.println(matcher.lookingAt()); // false, input does not start with digits
Method | Matches | Use Case |
---|---|---|
matches() |
Entire input | Exact validation |
find() |
Any matching substring(s) | Searching multiple matches |
lookingAt() |
Start of input | Checking prefix patterns |
Understanding these differences helps you choose the right method for your matching needs and ensures your regex works as intended.
One of the most powerful features of regular expressions is the ability to capture parts of the matched text for further use. This is done through capturing groups, which are sections of a regex pattern enclosed in parentheses ( )
. These groups allow you to extract specific substrings from a match, such as words, numbers, or components of a date.
For example, in the pattern (\\d{4})-(\\d{2})-(\\d{2})
, which matches a date in the format YYYY-MM-DD
:
2023
),06
),22
).Matcher.group()
After a successful match, you use the group()
method of the Matcher
class to retrieve captured substrings:
group()
or group(0)
returns the entire match.group(1)
, group(2)
, etc., return the corresponding capturing groups.When a pattern matches multiple times in an input, you can use a loop with find()
to process each match. Inside the loop, you can access all groups for that match.
Example: Extracting date components from text
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DateExtractor {
public static void main(String[] args) {
String text = "Important dates are 2023-06-22 and 2024-01-15.";
String regex = "(\\d{4})-(\\d{2})-(\\d{2})";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Full date: " + matcher.group(0)); // entire match
System.out.println("Year: " + matcher.group(1));
System.out.println("Month: " + matcher.group(2));
System.out.println("Day: " + matcher.group(3));
System.out.println("---");
}
}
}
Output:
Full date: 2023-06-22
Year: 2023
Month: 06
Day: 22
Full date: 2024-01-15
Year: 2024
Month: 01
Day: 15
Capturing groups are essential when you want to extract structured data from unstructured text, such as dates, email components, phone numbers, or words. They let you break down complex matches into meaningful parts for further processing or validation.
By mastering groups and the Matcher.group()
method, you can write regex patterns that not only find matches but also retrieve useful data cleanly and efficiently.
Validating email addresses is a common task that demonstrates the power and practicality of regex in Java. Let’s walk through a complete example that compiles a regex pattern for emails, matches input strings, and explains the pattern’s components.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class EmailValidator {
public static void main(String[] args) {
// Sample emails to test
String[] emails = {
"user@example.com",
"user.name+tag+sorting@example.co.uk",
"user@localhost",
"invalid-email@.com",
"user@domain..com"
};
// Regex pattern to validate email addresses
String emailRegex = "^[\\w.+-]+@[\\w.-]+\\.[a-zA-Z]{2,}$";
// Compile the regex pattern
Pattern pattern = Pattern.compile(emailRegex);
for (String email : emails) {
Matcher matcher = pattern.matcher(email);
boolean isValid = matcher.matches();
System.out.println(email + " is valid? " + isValid);
}
}
}
The regex pattern used here is:
^[\w.+-]+@[\\w.-]+\.[a-zA-Z]{2,}$
Let's break it down:
^
and $
These are anchors that ensure the entire string matches the pattern from start to end, preventing partial matches.
[\w.+-]+
This matches the local part of the email (before the @
).
\w
matches any word character (letters, digits, underscore)..
(dot), +
, and -
are also allowed characters in the local part.+
quantifier means one or more characters from this set.@
A literal @
symbol separates the local part from the domain.
[\\w.-]+
This matches the domain name part.
\w
), dots (.
), and hyphens (-
).+
means one or more of these characters.\\.
A literal dot before the top-level domain (TLD). The backslash is doubled because of Java string escaping.
[a-zA-Z]{2,}
This matches the TLD (like com
, org
, co.uk
’s last part).
The program loops through several example email strings and prints whether each is valid according to the regex.
Expected output:
user@example.com is valid? true
user.name+tag+sorting@example.co.uk is valid? true
user@localhost is valid? false
invalid-email@.com is valid? false
user@domain..com is valid? false
This example demonstrates how Java’s regex API can validate complex text patterns like email addresses. The regex pattern balances simplicity with common email rules, but note that fully RFC-compliant email validation requires more intricate patterns or libraries.
By understanding and customizing patterns like this, you can effectively perform input validation in your Java applications.