Index

Pattern and Matcher Classes

Java Regex

3.1 Understanding Pattern and Matcher classes

In Java’s regex framework, two core classes are central to working with regular expressions: Pattern and Matcher. Together, they form the foundation for defining regex patterns and applying them to text.

The Role of Pattern

Think of the Pattern class as a compiled blueprint of a regular expression. When you write a regex as a string (e.g., "\\d+" to match digits), Java first compiles this string into a Pattern object. This compilation step transforms the regex from a raw sequence of characters into an optimized internal representation that can be efficiently reused.

By compiling the regex once, you save time and resources when matching it multiple times against different input strings. The Pattern object itself is immutable—once created, its regex cannot be changed.

The Role of Matcher

Once you have a compiled Pattern, you need a way to apply it to actual text. That’s where the Matcher class comes in. The Matcher is created by invoking the matcher() method on a Pattern object, passing the input string you want to examine.

You can think of the Matcher as the search engine that scans through your input string to find matches based on the Pattern blueprint. It provides various methods like matches(), find(), and lookingAt() to perform different types of matching operations.

Each Matcher instance is tied to a specific input string. If you need to match the same pattern against a different string, you create a new Matcher.

Relationship and Lifecycle

Here’s a simple analogy:

The typical lifecycle in code is:

  1. Compile your regex into a Pattern object.
  2. Create a Matcher by calling pattern.matcher(inputString).
  3. Use the Matcher methods to search or extract matches.

This separation between pattern definition and matching provides flexibility and efficiency, making Java’s regex API powerful and easy to use.

Index

3.2 Compiling a regex pattern

In Java’s regex API, before you can use a regular expression to find matches in text, you must first compile it into a Pattern object. This is done using the static method Pattern.compile().

Creating a Pattern Instance

The simplest way to compile a regex pattern is:

Pattern pattern = Pattern.compile("your-regex-here");

For example, to match one or more digits, you write:

Pattern digitPattern = Pattern.compile("\\d+");

Remember that backslashes (\) must be escaped in Java strings, so \d becomes "\\d".

Benefits of Compiling Once

Compiling a regex pattern can be a relatively expensive operation because the regex engine parses and prepares the pattern for matching. By compiling a pattern once and reusing the resulting Pattern object for multiple inputs, you avoid repeated compilation costs, improving performance especially in loops or large-scale text processing.

For example:

Pattern wordPattern = Pattern.compile("\\w+"); // Compile once

String[] inputs = {"apple", "banana123", "cherry"};
for (String input : inputs) {
    Matcher matcher = wordPattern.matcher(input);
    if (matcher.matches()) {
        System.out.println(input + " is a word.");
    }
}

Compiling Complex Patterns and Flags

You can compile more complex patterns involving grouping, quantifiers, or character classes:

Pattern emailPattern = Pattern.compile("[\\w.%+-]+@[\\w.-]+\\.\\w{2,}");

Additionally, the compile() method accepts optional flags to modify behavior. For example, Pattern.CASE_INSENSITIVE makes matching ignore letter case:

Pattern caseInsensitive = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);

In summary, using Pattern.compile() efficiently prepares your regex for repeated use and gives you options to customize matching behavior.

Index

3.3 Matcher methods: matches(), find(), lookingAt()

The Matcher class provides several methods to check for regex matches in input strings. Among the most commonly used are matches(), find(), and lookingAt(). While they all perform pattern matching, their behaviors differ in important ways.

matches()

String input = "12345";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);

System.out.println(matcher.matches()); // true, entire input is digits

input = "123abc";
matcher = pattern.matcher(input);
System.out.println(matcher.matches()); // false, contains letters

find()

String input = "abc123xyz456";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);

while (matcher.find()) {
    System.out.println("Found number: " + matcher.group());
}
// Output:
// Found number: 123
// Found number: 456

lookingAt()

String input = "123abc";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);

System.out.println(matcher.lookingAt()); // true, input starts with digits

input = "abc123";
matcher = pattern.matcher(input);
System.out.println(matcher.lookingAt()); // false, input does not start with digits

Summary

Method Matches Use Case
matches() Entire input Exact validation
find() Any matching substring(s) Searching multiple matches
lookingAt() Start of input Checking prefix patterns

Understanding these differences helps you choose the right method for your matching needs and ensures your regex works as intended.

Index

3.4 Extracting matches and groups

One of the most powerful features of regular expressions is the ability to capture parts of the matched text for further use. This is done through capturing groups, which are sections of a regex pattern enclosed in parentheses ( ). These groups allow you to extract specific substrings from a match, such as words, numbers, or components of a date.

Capturing Groups and Group Numbering

For example, in the pattern (\\d{4})-(\\d{2})-(\\d{2}), which matches a date in the format YYYY-MM-DD:

Retrieving Groups with Matcher.group()

After a successful match, you use the group() method of the Matcher class to retrieve captured substrings:

Iterating Over Multiple Matches and Groups

When a pattern matches multiple times in an input, you can use a loop with find() to process each match. Inside the loop, you can access all groups for that match.

Example: Extracting date components from text

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DateExtractor {
    public static void main(String[] args) {
        String text = "Important dates are 2023-06-22 and 2024-01-15.";
        String regex = "(\\d{4})-(\\d{2})-(\\d{2})";

        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            System.out.println("Full date: " + matcher.group(0));    // entire match
            System.out.println("Year: " + matcher.group(1));
            System.out.println("Month: " + matcher.group(2));
            System.out.println("Day: " + matcher.group(3));
            System.out.println("---");
        }
    }
}

Output:

Full date: 2023-06-22
Year: 2023
Month: 06
Day: 22

Full date: 2024-01-15
Year: 2024
Month: 01
Day: 15

Practical Use

Capturing groups are essential when you want to extract structured data from unstructured text, such as dates, email components, phone numbers, or words. They let you break down complex matches into meaningful parts for further processing or validation.

By mastering groups and the Matcher.group() method, you can write regex patterns that not only find matches but also retrieve useful data cleanly and efficiently.

Index

3.5 Example: Validate email addresses

Validating email addresses is a common task that demonstrates the power and practicality of regex in Java. Let’s walk through a complete example that compiles a regex pattern for emails, matches input strings, and explains the pattern’s components.

Java Example: Email Validation

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class EmailValidator {
    public static void main(String[] args) {
        // Sample emails to test
        String[] emails = {
            "user@example.com",
            "user.name+tag+sorting@example.co.uk",
            "user@localhost",
            "invalid-email@.com",
            "user@domain..com"
        };

        // Regex pattern to validate email addresses
        String emailRegex = "^[\\w.+-]+@[\\w.-]+\\.[a-zA-Z]{2,}$";

        // Compile the regex pattern
        Pattern pattern = Pattern.compile(emailRegex);

        for (String email : emails) {
            Matcher matcher = pattern.matcher(email);
            boolean isValid = matcher.matches();
            System.out.println(email + " is valid? " + isValid);
        }
    }
}

Explanation of the Regex Pattern

The regex pattern used here is:

^[\w.+-]+@[\\w.-]+\.[a-zA-Z]{2,}$

Let's break it down:

Running the Program

The program loops through several example email strings and prints whether each is valid according to the regex.

Expected output:

user@example.com is valid? true
user.name+tag+sorting@example.co.uk is valid? true
user@localhost is valid? false
invalid-email@.com is valid? false
user@domain..com is valid? false

Summary

This example demonstrates how Java’s regex API can validate complex text patterns like email addresses. The regex pattern balances simplicity with common email rules, but note that fully RFC-compliant email validation requires more intricate patterns or libraries.

By understanding and customizing patterns like this, you can effectively perform input validation in your Java applications.

Index