Capturing groups are one of the most fundamental and powerful features of regular expressions. By placing part of a regex pattern inside parentheses ( )
, you create a capturing group. This group not only groups the pattern elements logically but also captures the matched substring for later use.
When your regex matches a string, each capturing group remembers the exact substring that matched inside its parentheses. This allows you to:
For example, in the regex:
(\d{3})-(\d{2})-(\d{4})
which matches a pattern like a Social Security Number (123-45-6789
):
123
),45
),6789
).Backreferences allow the regex to refer back to a previously captured group. They are written as \1
, \2
, etc., where the number corresponds to the group number.
For instance, the regex
(\w)\1
matches two identical consecutive letters like "ee"
or "ss"
:
(\w)
captures a letter,\1
matches the same letter again immediately after.Backreferences enable matching repeated substrings without explicitly rewriting the pattern.
After a successful match, you can retrieve captured groups using the Matcher.group(int groupNumber)
method:
group(0)
returns the entire matched substring,group(1)
returns the first capturing group,group(2)
, group(3)
, and so on return subsequent groups.Example:
String input = "123-45-6789";
Pattern pattern = Pattern.compile("(\\d{3})-(\\d{2})-(\\d{4})");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println("Full match: " + matcher.group(0));
System.out.println("Group 1 (area): " + matcher.group(1));
System.out.println("Group 2 (group): " + matcher.group(2));
System.out.println("Group 3 (serial): " + matcher.group(3));
}
(ab)+
matches ab
, abab
, ababab
, etc.Mastering capturing groups and backreferences is key to writing efficient and effective regex patterns that can both match and manipulate text flexibly.
(?:...)
In regular expressions, parentheses ( )
typically create capturing groups that store the matched substring for later use. However, sometimes you need to group parts of a pattern without capturing or storing what was matched. This is where non-capturing groups come in.
Non-capturing groups have the syntax:
(?:pattern)
The ?:
immediately after the opening parenthesis tells the regex engine not to capture the content matched by this group.
Performance: Since the regex engine does not need to save the matched substring, non-capturing groups are slightly faster and use less memory. This is useful when you only need grouping to control the pattern’s structure, not to extract data.
Clarity: Non-capturing groups prevent unnecessary clutter in the group numbering. Capturing groups increase group numbers, which can complicate accessing groups in code. Using non-capturing groups keeps your group numbers focused only on meaningful captures.
Non-capturing groups are especially helpful when:
*
, +
, {n,m}
) to multiple elements as a group, but don't need to extract the matched substring.|
operator to define multiple options without capturing each alternative.Capturing group:
(ab)+
"ab"
."ab"
sequence (group 1).Non-capturing group:
(?:ab)+
"ab"
sequences).Use capturing groups when you need to extract or reference matched substrings later. Use non-capturing groups when you only need grouping for structure or repetition, but do not want to store the match, helping improve regex clarity and performance.
Named capturing groups provide a clearer and more maintainable way to extract matched substrings from regular expressions. Instead of relying on numbered groups like group(1)
or group(2)
, you assign meaningful names to groups, making your regex and code easier to read and understand.
Java supports named capturing groups using the syntax:
(?<name>pattern)
Here, name
is an identifier you choose for the group, and pattern
is the regex portion that you want to capture.
For example:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
This pattern captures a date in YYYY-MM-DD
format and names each part of the date.
Once you compile your regex with named groups and perform a match, you can retrieve the captured values by their group names instead of numbers. Use the Matcher.group(String name)
method:
String input = "2025-06-22";
Pattern pattern = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
String year = matcher.group("year");
String month = matcher.group("month");
String day = matcher.group("day");
System.out.println("Year: " + year);
System.out.println("Month: " + month);
System.out.println("Day: " + day);
}
Named capturing groups were introduced in Java 7 and later. If you’re using Java 7 or newer, you can take advantage of this feature.
Named capturing groups enhance both the clarity and usability of regex in Java, especially in complex patterns where many groups are involved.
Extracting specific parts of a date—such as the day, month, and year—from text is a common task that showcases the power of capturing groups in regex. In this example, we use named capturing groups to clearly identify each date component, making the code easier to read and maintain.
We’ll work with a common date format: YYYY-MM-DD
or YYYY/MM/DD
. The regex pattern below handles both dash -
and slash /
as separators and captures the year, month, and day with named groups:
(?<year>\d{4})[-/](?<month>0[1-9]|1[0-2])[-/](?<day>0[1-9]|[12]\d|3[01])
Let’s break it down:
(?<year>\d{4})
Captures exactly four digits as the year.
[-/]
Matches either a dash -
or a slash /
as the separator.
(?<month>0[1-9]|1[0-2])
Captures the month, allowing values from 01
to 12
.
Another separator [-/]
.
(?<day>0[1-9]|[12]\d|3[01])
Captures the day, allowing values from 01
to 31
.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DateExtractor {
public static void main(String[] args) {
String[] dates = {
"2023-06-22",
"2024/01/15",
"1999-12-31",
"2021/07/04"
};
// Regex with named capturing groups for year, month, and day
String datePattern = "(?<year>\\d{4})[-/](?<month>0[1-9]|1[0-2])[-/](?<day>0[1-9]|[12]\\d|3[01])";
Pattern pattern = Pattern.compile(datePattern);
for (String date : dates) {
Matcher matcher = pattern.matcher(date);
if (matcher.matches()) {
String year = matcher.group("year");
String month = matcher.group("month");
String day = matcher.group("day");
System.out.printf("Date: %s -> Year: %s, Month: %s, Day: %s%n",
date, year, month, day);
} else {
System.out.println("Invalid date format: " + date);
}
}
}
}
Date: 2023-06-22 -> Year: 2023, Month: 06, Day: 22
Date: 2024/01/15 -> Year: 2024, Month: 01, Day: 15
Date: 1999-12-31 -> Year: 1999, Month: 12, Day: 31
Date: 2021/07/04 -> Year: 2021, Month: 07, Day: 04
-
or /
as separators.matcher.matches()
, we check if the entire string matches the pattern.matcher.group("name")
.This example illustrates how capturing groups—especially named groups—simplify extracting structured data from text, improving code readability and maintainability.