\b
and \B
In regular expressions, word boundaries are zero-width assertions—they do not consume characters, but rather match a position within the input string. They are especially useful when you want to match whole words without accidentally matching parts of longer words.
\b
Word BoundaryThe \b
assertion matches a position where a word character (typically [a-zA-Z0-9_]
) is adjacent to a non-word character (such as whitespace or punctuation) or the start/end of the string.
Examples:
Pattern pattern = Pattern.compile("\\bcat\\b");
Matcher matcher = pattern.matcher("A cat sat on the cathedral.");
while (matcher.find()) {
System.out.println("Match: " + matcher.group());
}
Output:
Match: cat
Explanation: Here, \\bcat\\b
matches only the whole word "cat"
, not the "cat"
in "cathedral"
.
\B
Not a Word BoundaryThe \B
assertion is the inverse of \b
. It matches a position not at a word boundary. This is useful when you want to ensure that a substring occurs within a word, rather than at the start or end.
Example:
Pattern pattern = Pattern.compile("\\Bcat\\B");
Matcher matcher = pattern.matcher("A cat sat on the cathedral.");
while (matcher.find()) {
System.out.println("Match: " + matcher.group());
}
Output:
Match: cat
Explanation: This pattern matches the "cat"
inside "cathedral"
, but not the standalone word "cat"
.
Escaping \b
in Java Strings: Because \b
is also a backspace character in Java strings, you must escape it as \\b
in your regex pattern.
Using \b
with non-word characters: If you try to use \b
around a symbol or punctuation (e.g., \b$100\b
), it won't match as expected, since $
is not a word character. In such cases, consider using anchors or lookarounds instead.
\b
when validating or searching for standalone keywords (e.g., "cat", "dog", "yes").\B
when you want to exclude standalone matches and target substrings within words.Assertion | Description | Use Case |
---|---|---|
\b |
Matches at word boundaries | Find whole words only |
\B |
Matches not at word boundaries | Match substrings within longer words |
Word boundaries provide a powerful, efficient way to precisely target words in larger text without false positives from partial matches.
^
, $
, \A
, \Z
In regular expressions, anchors are special assertions that match a position rather than a character. Java provides two categories of anchors for marking the start and end of input: line boundaries and input boundaries.
^
and $
^
matches the start of a line$
matches the end of a lineThese anchors are affected by multiline mode (Pattern.MULTILINE
). When enabled, ^
and $
will match the start and end of each line within a string, not just the entire string.
Example:
String input = "apple\nbanana\ncherry";
Pattern pattern = Pattern.compile("^banana$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println("Found: " + matcher.group());
}
Output:
Found: banana
Explanation: With Pattern.MULTILINE
, ^banana$
matches the exact line "banana"
, not the entire input.
Without multiline mode, ^
and $
match only the start and end of the whole input string, so the pattern wouldn't find a match in the above example.
\A
and \Z
\A
matches the beginning of the entire input\Z
matches the end of the entire input (before the final newline, if any)These are not affected by multiline mode and always refer to the absolute boundaries of the input string.
Example:
String input = "start\nmiddle\nend";
Pattern pattern = Pattern.compile("\\Astart");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println("Found: " + matcher.group());
}
Output:
Found: start
Now using \Z
:
Pattern pattern = Pattern.compile("end\\Z");
This would only match "end"
if it appears at the very end of the string.
Anchor | Meaning | Affected by Multiline Mode |
---|---|---|
^ |
Start of a line | Yes |
$ |
End of a line | Yes |
\A |
Start of the input | No |
\Z |
End of the input | No |
^
and $
when processing multi-line inputs and you want to match line-by-line.\A
and \Z
for absolute start/end checks, such as validating entire strings.Understanding these anchors and when to use them ensures your regex behaves predictably in both single-line and multi-line scenarios.
When searching for specific words in text, it's important to avoid partial matches. For example, if you need to find the word "cat"
, you should not match "catalog"
or "scatter"
. This is where the word boundary anchor (\b
) becomes useful. It ensures that the match occurs only when the word is not part of a larger word.
"cat"
import java.util.regex.*;
public class WordBoundaryExample {
public static void main(String[] args) {
String input = "The cat sat on the catalog beside the catfish.";
String word = "cat";
// Pattern to match the whole word "cat"
Pattern pattern = Pattern.compile("\\b" + word + "\\b");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Found whole word: \"" + matcher.group() +
"\" at position " + matcher.start());
}
}
}
Output:
Found whole word: "cat" at position 4
\\bcat\\b
: The \b
anchors on both sides ensure that "cat"
is matched only when it's a standalone word."catalog"
and "catfish"
are ignored because they have additional word characters (a
, f
) next to "cat"
—thus not satisfying the word boundary condition.matcher.find()
loop finds all matches, and matcher.start()
returns the starting index of each match.Now let's add punctuation to the sentence:
String input = "Cat! A wild cat, not a catalog-catfish hybrid.";
The same pattern will still work:
Output:
Found whole word: "cat" at position 10
Found whole word: "cat" at position 25
Punctuation marks like !
and ,
are non-word characters, so \b
correctly identifies word boundaries near them.
Using \b
in Java regex allows you to:
For best results, always escape \b
as \\b
in Java string literals, and test your patterns with various sentence structures.