Boundary Matchers and Word Boundaries

Java Regex

7.1 Word boundaries `\b` and `\B`

In regular expressions, word boundaries are zero-width assertions—they do not consume characters, but rather match a position within the input string. They are especially useful when you want to match whole words without accidentally matching parts of longer words.

`\b` Word Boundary

The \b assertion matches a position where a word character (typically [a-zA-Z0-9_]) is adjacent to a non-word character (such as whitespace or punctuation) or the start/end of the string.

Examples:

Pattern pattern = Pattern.compile("\\bcat\\b");
Matcher matcher = pattern.matcher("A cat sat on the cathedral.");
while (matcher.find()) {
    System.out.println("Match: " + matcher.group());
}

Output:

Match: cat

Explanation: Here, \\bcat\\b matches only the whole word "cat", not the "cat" in "cathedral".

`\B` Not a Word Boundary

The \B assertion is the inverse of \b. It matches a position not at a word boundary. This is useful when you want to ensure that a substring occurs within a word, rather than at the start or end.

Example:

Pattern pattern = Pattern.compile("\\Bcat\\B");
Matcher matcher = pattern.matcher("A cat sat on the cathedral.");
while (matcher.find()) {
    System.out.println("Match: " + matcher.group());
}

Output:

Match: cat

Explanation: This pattern matches the "cat" inside "cathedral", but not the standalone word "cat".

Common Pitfalls

Escaping \b in Java Strings: Because \b is also a backspace character in Java strings, you must escape it as \\b in your regex pattern.
Using \b with non-word characters: If you try to use \b around a symbol or punctuation (e.g., \b$100\b), it won't match as expected, since $ is not a word character. In such cases, consider using anchors or lookarounds instead.

When to Use

Use \b when validating or searching for standalone keywords (e.g., "cat", "dog", "yes").
Use \B when you want to exclude standalone matches and target substrings within words.

Summary

Assertion	Description	Use Case
`\b`	Matches at word boundaries	Find whole words only
`\B`	Matches not at word boundaries	Match substrings within longer words

Word boundaries provide a powerful, efficient way to precisely target words in larger text without false positives from partial matches.

7.2 Start/end of input vs line boundaries `^`, `$`, `\A`, `\Z`

In regular expressions, anchors are special assertions that match a position rather than a character. Java provides two categories of anchors for marking the start and end of input: line boundaries and input boundaries.

Line Boundaries: `^` and `$`

^ matches the start of a line
$ matches the end of a line

These anchors are affected by multiline mode (Pattern.MULTILINE). When enabled, ^ and $ will match the start and end of each line within a string, not just the entire string.

Example:

String input = "apple\nbanana\ncherry";
Pattern pattern = Pattern.compile("^banana$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
    System.out.println("Found: " + matcher.group());
}

Output:

Found: banana

Explanation: With Pattern.MULTILINE, ^banana$ matches the exact line "banana", not the entire input.

Without multiline mode, ^ and $ match only the start and end of the whole input string, so the pattern wouldn't find a match in the above example.

Input Boundaries: `\A` and `\Z`

\A matches the beginning of the entire input
\Z matches the end of the entire input (before the final newline, if any)

These are not affected by multiline mode and always refer to the absolute boundaries of the input string.

Example:

String input = "start\nmiddle\nend";
Pattern pattern = Pattern.compile("\\Astart");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
    System.out.println("Found: " + matcher.group());
}

Output:

Found: start

Now using \Z:

Pattern pattern = Pattern.compile("end\\Z");

This would only match "end" if it appears at the very end of the string.

Choosing the Right Anchor

Anchor	Meaning	Affected by Multiline Mode
`^`	Start of a line	Yes
`$`	End of a line	Yes
`\A`	Start of the input	No
`\Z`	End of the input	No

Use ^ and $ when processing multi-line inputs and you want to match line-by-line.
Use \A and \Z for absolute start/end checks, such as validating entire strings.

Understanding these anchors and when to use them ensures your regex behaves predictably in both single-line and multi-line scenarios.

7.3 Example: Find whole words only

When searching for specific words in text, it's important to avoid partial matches. For example, if you need to find the word "cat", you should not match "catalog" or "scatter". This is where the word boundary anchor (\b) becomes useful. It ensures that the match occurs only when the word is not part of a larger word.

Java Example: Match Whole Word `"cat"`

import java.util.regex.*;

public class WordBoundaryExample {
    public static void main(String[] args) {
        String input = "The cat sat on the catalog beside the catfish.";
        String word = "cat";
        
        // Pattern to match the whole word "cat"
        Pattern pattern = Pattern.compile("\\b" + word + "\\b");
        Matcher matcher = pattern.matcher(input);

        while (matcher.find()) {
            System.out.println("Found whole word: \"" + matcher.group() +
                               "\" at position " + matcher.start());
        }
    }
}

Output:

Found whole word: "cat" at position 4

Explanation

\\bcat\\b: The \b anchors on both sides ensure that "cat" is matched only when it's a standalone word.
"catalog" and "catfish" are ignored because they have additional word characters (a, f) next to "cat"—thus not satisfying the word boundary condition.
The matcher.find() loop finds all matches, and matcher.start() returns the starting index of each match.

Edge Case: Punctuation and Boundaries

Now let's add punctuation to the sentence:

String input = "Cat! A wild cat, not a catalog-catfish hybrid.";

The same pattern will still work:

Output:

Found whole word: "cat" at position 10
Found whole word: "cat" at position 25

Punctuation marks like ! and , are non-word characters, so \b correctly identifies word boundaries near them.

Summary

Using \b in Java regex allows you to:

Match words accurately, avoiding substrings inside other words.
Handle word boundaries around whitespace, punctuation, and line breaks.
Write cleaner, more reliable pattern matching for tasks like keyword detection, search tools, and token filtering.

For best results, always escape \b as \\b in Java string literals, and test your patterns with various sentence structures.

Boundary Matchers and Word Boundaries

Java Regex

7.1 Word boundaries \b and \B

\b Word Boundary

\B Not a Word Boundary

Common Pitfalls

When to Use

Summary

7.2 Start/end of input vs line boundaries ^, $, \A, \Z

Line Boundaries: ^ and $

Input Boundaries: \A and \Z

Choosing the Right Anchor

7.3 Example: Find whole words only

Java Example: Match Whole Word "cat"

Explanation

Edge Case: Punctuation and Boundaries

Summary

Related Books

7.1 Word boundaries `\b` and `\B`

`\b` Word Boundary

`\B` Not a Word Boundary

7.2 Start/end of input vs line boundaries `^`, `$`, `\A`, `\Z`

Line Boundaries: `^` and `$`

Input Boundaries: `\A` and `\Z`

Java Example: Match Whole Word `"cat"`