In Java regular expressions, quantifiers define how many times a pattern element may repeat. By default, these quantifiers are greedy, meaning they match as much text as possible. However, when this behavior causes overmatching, reluctant quantifiers offer a solution by matching as little text as necessary.
A greedy quantifier tries to consume as many characters as possible while still allowing the overall pattern to match. Common greedy quantifiers include:
*
— zero or more+
— one or more?
— zero or one{n,m}
— between n and m repetitionsFor example:
String input = "<b>Hello</b><b>World</b>";
String regex = "<b>.*</b>";
This greedy pattern will match:
<b>Hello</b><b>World</b>
because .*
consumes everything between the first <b>
and the last </b>
, leading to overmatching.
Reluctant quantifiers do the opposite of greedy ones: they match as little as possible, expanding only when needed to satisfy the rest of the pattern. You can make a quantifier reluctant by appending a ?
:
*?
— zero or more (reluctant)+?
— one or more (reluctant)??
— zero or one (reluctant){n,m}?
— bounded repetition, reluctantUsing the same input:
String regex = "<b>.*?</b>";
Now the match will be:
<b>Hello</b>
<b>World</b>
This happens because .*?
matches the smallest possible substring between <b>
and </b>
, avoiding overmatching.
Pattern | Match Result |
---|---|
<b>.*</b> |
<b>Hello</b><b>World</b> |
<b>.*?</b> |
<b>Hello</b> and <b>World</b> |
Reluctant quantifiers are useful when:
*
, +
, etc.).*?
, +?
, etc.).Understanding this distinction helps you write more precise regex patterns and avoid subtle bugs in text parsing.
Possessive quantifiers are a more advanced type of quantifier in regular expressions that instruct the regex engine to match as much as possible without allowing any backtracking. This behavior makes them useful in performance-critical scenarios but can also lead to unexpected failed matches if not used carefully.
Possessive quantifiers are created by appending a +
to the end of a standard greedy quantifier:
*+
— zero or more (possessive)++
— one or more (possessive)?+
— zero or one (possessive){n,m}+
— bounded repetitions (possessive)Unlike greedy quantifiers (which backtrack if a later part of the pattern fails), possessive quantifiers never backtrack. Once they consume characters, they keep them—no matter what.
Possessive quantifiers can:
String input = "aaab";
String greedy = "a+.*b"; // Matches
String possessive = "a++.*b"; // Fails
In the greedy version, a+
matches "aaa"
, and then .*b
matches the rest. If the full match fails, it backtracks—releasing one a
at a time to allow .*b
to find a match.
In the possessive version, a++
consumes all three a
characters and refuses to give any back, so .*b
cannot match anything and the whole pattern fails.
Consider parsing large strings or logs with patterns that might otherwise cause performance issues:
String regex = ".*+@example\\.com";
This prevents .*+
from backtracking, improving efficiency when matching known suffixes.
Possessive quantifiers are powerful, but can easily cause false negatives (no match found when one should be). Avoid using them when the pattern depends on backtracking to succeed.
Possessive quantifiers:
Use them thoughtfully, especially when optimizing complex or repetitive patterns.
Understanding the behavior of greedy, reluctant, and possessive quantifiers is crucial for building correct and efficient regular expressions. This section demonstrates how each quantifier behaves differently—even when used with the same pattern and input.
We’ll use the following input string:
String input = "<tag>first</tag><tag>second</tag>";
Our goal is to match each <tag>...</tag>
block.
Pattern pattern = Pattern.compile("<tag>.*</tag>");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Greedy match: " + matcher.group());
}
Output:
Greedy match: <tag>first</tag><tag>second</tag>
Explanation: The greedy .*
consumes as much as possible while still allowing the pattern to match. It starts at the first <tag>
and captures everything until the last </tag>
. This is overmatching.
Pattern pattern = Pattern.compile("<tag>.*?</tag>");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Reluctant match: " + matcher.group());
}
Output:
Reluctant match: <tag>first</tag>
Reluctant match: <tag>second</tag>
Explanation: The .*?
matches as little as possible to satisfy the full pattern. It captures each <tag>...</tag>
block individually. This is the desired behavior when extracting multiple elements.
Pattern pattern = Pattern.compile("<tag>.*+</tag>");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Possessive match: " + matcher.group());
}
Output:
(No match)
Explanation: The possessive .*+
consumes all characters after the first <tag>
and refuses to give any back. When the engine reaches </tag>
, it can’t find a match because the text has already been consumed. This causes the pattern to fail completely.
In large inputs, possessive quantifiers can improve performance by preventing excessive backtracking. For example:
Pattern pattern = Pattern.compile(".*+@example\\.com");
This prevents .*+
from endlessly retrying when matching email addresses in large text bodies.
Quantifier | Behavior | Use When |
---|---|---|
.* (greedy) |
Matches as much as possible | General use, but can overmatch |
.*? (reluctant) |
Matches as little as needed | Precise extraction of segments |
.*+ (possessive) |
Matches as much, no backtracking | Prevent backtracking/performance |
Choosing the right quantifier depends on your intent: whether you want all data, minimal matches, or performance optimization without flexibility.
One common challenge in text processing is extracting repeated structures like HTML tags. If you use a greedy quantifier, your pattern may unintentionally match everything from the first opening tag to the last closing tag. Reluctant quantifiers can solve this problem by matching as little as possible—just enough to satisfy the pattern.
Suppose you have the following HTML fragment:
<div>Hello</div><div>World</div>
You want to extract each <div>...</div>
pair individually.
Let’s see what happens if we use a greedy quantifier (.*
):
Pattern pattern = Pattern.compile("<div>.*</div>");
Matcher matcher = pattern.matcher("<div>Hello</div><div>World</div>");
while (matcher.find()) {
System.out.println("Match: " + matcher.group());
}
Output:
Match: <div>Hello</div><div>World</div>
Explanation: The .*
greedily matches everything between the first <div>
and the last </div>
, resulting in a single match that swallows both elements. This is known as overmatching.
We can fix this with a reluctant quantifier (.*?
):
import java.util.regex.*;
public class ExtractDivTags {
public static void main(String[] args) {
String input = "<div>Hello</div><div>World</div>";
Pattern pattern = Pattern.compile("<div>.*?</div>");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Extracted: " + matcher.group());
}
}
}
Output:
Extracted: <div>Hello</div>
Extracted: <div>World</div>
Explanation: Here, .*?
matches the smallest possible string that still fits the <div>...</div>
pattern. It matches up to the nearest closing tag, giving us the correct, separate results.
Now let’s try a possessive quantifier (.*+
):
Pattern pattern = Pattern.compile("<div>.*+</div>");
This will fail completely, producing no matches. The possessive quantifier grabs everything after the first <div>
and won’t backtrack, so the closing </div>
cannot be matched. Possessive quantifiers are useful for performance but unsuitable when backtracking is required for correctness.
Quantifier Type | Result |
---|---|
Greedy (.* ) |
Overmatches across multiple tags |
Reluctant (.*? ) |
Matches each tag pair precisely |
Possessive (.*+ ) |
Fails to match due to no backtracking |
Use reluctant quantifiers when parsing nested or repeated structures like HTML. They help prevent overmatching and ensure your pattern behaves as intended.
Ready to move on to Chapter 7 or revise previous content?