Index

Grouping and Capturing

Java Regex

4.1 Capturing groups and backreferences

Capturing groups are one of the most fundamental and powerful features of regular expressions. By placing part of a regex pattern inside parentheses ( ), you create a capturing group. This group not only groups the pattern elements logically but also captures the matched substring for later use.

What Are Capturing Groups?

When your regex matches a string, each capturing group remembers the exact substring that matched inside its parentheses. This allows you to:

For example, in the regex:

(\d{3})-(\d{2})-(\d{4})

which matches a pattern like a Social Security Number (123-45-6789):

Using Backreferences Inside the Regex

Backreferences allow the regex to refer back to a previously captured group. They are written as \1, \2, etc., where the number corresponds to the group number.

For instance, the regex

(\w)\1

matches two identical consecutive letters like "ee" or "ss":

Backreferences enable matching repeated substrings without explicitly rewriting the pattern.

Accessing Captured Groups in Java Code

After a successful match, you can retrieve captured groups using the Matcher.group(int groupNumber) method:

Example:

String input = "123-45-6789";
Pattern pattern = Pattern.compile("(\\d{3})-(\\d{2})-(\\d{4})");
Matcher matcher = pattern.matcher(input);

if (matcher.matches()) {
    System.out.println("Full match: " + matcher.group(0));
    System.out.println("Group 1 (area): " + matcher.group(1));
    System.out.println("Group 2 (group): " + matcher.group(2));
    System.out.println("Group 3 (serial): " + matcher.group(3));
}

Why Grouping and Backreferencing Matter

Mastering capturing groups and backreferences is key to writing efficient and effective regex patterns that can both match and manipulate text flexibly.

Index

4.2 Non-capturing groups (?:...)

In regular expressions, parentheses ( ) typically create capturing groups that store the matched substring for later use. However, sometimes you need to group parts of a pattern without capturing or storing what was matched. This is where non-capturing groups come in.

What Are Non-capturing Groups?

Non-capturing groups have the syntax:

(?:pattern)

The ?: immediately after the opening parenthesis tells the regex engine not to capture the content matched by this group.

Why Use Non-capturing Groups?

  1. Performance: Since the regex engine does not need to save the matched substring, non-capturing groups are slightly faster and use less memory. This is useful when you only need grouping to control the pattern’s structure, not to extract data.

  2. Clarity: Non-capturing groups prevent unnecessary clutter in the group numbering. Capturing groups increase group numbers, which can complicate accessing groups in code. Using non-capturing groups keeps your group numbers focused only on meaningful captures.

When to Use Non-capturing Groups

Non-capturing groups are especially helpful when:

Example: Capturing vs. Non-capturing Group

Capturing group:

(ab)+

Non-capturing group:

(?:ab)+

Summary

Use capturing groups when you need to extract or reference matched substrings later. Use non-capturing groups when you only need grouping for structure or repetition, but do not want to store the match, helping improve regex clarity and performance.

Index

4.3 Named capturing groups

Named capturing groups provide a clearer and more maintainable way to extract matched substrings from regular expressions. Instead of relying on numbered groups like group(1) or group(2), you assign meaningful names to groups, making your regex and code easier to read and understand.

Syntax of Named Capturing Groups in Java

Java supports named capturing groups using the syntax:

(?<name>pattern)

Here, name is an identifier you choose for the group, and pattern is the regex portion that you want to capture.

For example:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

This pattern captures a date in YYYY-MM-DD format and names each part of the date.

Retrieving Named Groups in Java

Once you compile your regex with named groups and perform a match, you can retrieve the captured values by their group names instead of numbers. Use the Matcher.group(String name) method:

String input = "2025-06-22";
Pattern pattern = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher matcher = pattern.matcher(input);

if (matcher.matches()) {
    String year = matcher.group("year");
    String month = matcher.group("month");
    String day = matcher.group("day");

    System.out.println("Year: " + year);
    System.out.println("Month: " + month);
    System.out.println("Day: " + day);
}

Benefits of Named Groups

Java Version Requirements

Named capturing groups were introduced in Java 7 and later. If you’re using Java 7 or newer, you can take advantage of this feature.

Named capturing groups enhance both the clarity and usability of regex in Java, especially in complex patterns where many groups are involved.

Index

4.4 Example: Extracting date components from strings

Extracting specific parts of a date—such as the day, month, and year—from text is a common task that showcases the power of capturing groups in regex. In this example, we use named capturing groups to clearly identify each date component, making the code easier to read and maintain.

Regex Pattern for Dates

We’ll work with a common date format: YYYY-MM-DD or YYYY/MM/DD. The regex pattern below handles both dash - and slash / as separators and captures the year, month, and day with named groups:

(?<year>\d{4})[-/](?<month>0[1-9]|1[0-2])[-/](?<day>0[1-9]|[12]\d|3[01])

Let’s break it down:

Java Code Example

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DateExtractor {
    public static void main(String[] args) {
        String[] dates = {
            "2023-06-22",
            "2024/01/15",
            "1999-12-31",
            "2021/07/04"
        };

        // Regex with named capturing groups for year, month, and day
        String datePattern = "(?<year>\\d{4})[-/](?<month>0[1-9]|1[0-2])[-/](?<day>0[1-9]|[12]\\d|3[01])";

        Pattern pattern = Pattern.compile(datePattern);

        for (String date : dates) {
            Matcher matcher = pattern.matcher(date);

            if (matcher.matches()) {
                String year = matcher.group("year");
                String month = matcher.group("month");
                String day = matcher.group("day");

                System.out.printf("Date: %s -> Year: %s, Month: %s, Day: %s%n",
                                  date, year, month, day);
            } else {
                System.out.println("Invalid date format: " + date);
            }
        }
    }
}

Output

Date: 2023-06-22 -> Year: 2023, Month: 06, Day: 22
Date: 2024/01/15 -> Year: 2024, Month: 01, Day: 15
Date: 1999-12-31 -> Year: 1999, Month: 12, Day: 31
Date: 2021/07/04 -> Year: 2021, Month: 07, Day: 04

Explanation

This example illustrates how capturing groups—especially named groups—simplify extracting structured data from text, improving code readability and maintainability.

Index