Lookahead and Lookbehind Assertions

Java Regex

8.1 Positive lookahead `(?=...)`

Positive lookahead is a powerful tool in regular expressions that allows you to assert that a certain pattern must follow a given position in the input—without actually including that pattern in the match result. It is a zero-width assertion, meaning it checks for a condition ahead in the text but doesn’t consume any characters.

Syntax and Behavior

The syntax for positive lookahead is:

X(?=Y)

This matches X only if it is immediately followed by Y. Importantly, Y is not included in the final match.

Simple Example: Match "foo" followed by "bar"

import java.util.regex.*;

public class LookaheadExample {
    public static void main(String[] args) {
        String input = "foobar food fool";
        Pattern pattern = Pattern.compile("foo(?=bar)");
        Matcher matcher = pattern.matcher(input);

        while (matcher.find()) {
            System.out.println("Match found: " + matcher.group());
        }
    }
}

Output:

Match found: foo

Explanation: Only the "foo" that is followed by "bar" is matched. "food" and "fool" do not satisfy the lookahead condition and are ignored.

Use Case: Enforcing Suffix Requirement

Suppose you want to find all usernames in a log that start with user only if they are followed by a number:

user(?=\d)

This pattern matches "user" only when it is immediately followed by a digit, such as "user123".

Why Use Lookahead?

Positive lookahead is especially useful when:

You need to check for a requirement without capturing it.
You want to combine multiple rules into one pattern without consuming parts of the input.
You are validating input formats with multiple constraints (e.g., passwords, filenames, dates).

Summary

Feature	Description
Syntax	`(?=...)`
Type	Zero-width assertion (does not consume characters)
Use	Match only if a pattern follows, but don't include it
Common use cases	Validation, format checking, pattern sequencing

Lookahead allows you to express conditional logic in regex without complicating match extraction. In the next section, we’ll look at the opposite: negative lookahead, which asserts that a certain pattern must not follow.

8.2 Negative lookahead `(?!...)`

Negative lookahead is a zero-width assertion in regular expressions that allows you to specify what must not follow a certain position in the input. Like positive lookahead, it checks the upcoming text without consuming any characters. This feature is especially useful for excluding specific patterns while still allowing others.

Syntax and Meaning

The syntax for negative lookahead is:

X(?!Y)

This means: match X only if it is not followed by Y.

Since it is a zero-width assertion, the lookahead itself doesn’t become part of the match—it only determines whether the match should occur based on what follows X.

Example: Match foo not followed by bar

import java.util.regex.*;

public class NegativeLookaheadExample {
    public static void main(String[] args) {
        String input = "foobar foofoo food";
        Pattern pattern = Pattern.compile("foo(?!bar)");
        Matcher matcher = pattern.matcher(input);

        while (matcher.find()) {
            System.out.println("Match found: " + matcher.group() +
                               " at position " + matcher.start());
        }
    }
}

Output:

Match found: foo at position 7
Match found: foo at position 13

Explanation: Only "foo" strings that are not followed by "bar" are matched. The "foo" in "foobar" is excluded due to the lookahead condition.

Practical Uses

Exclude Specific Keywords

You might want to match URLs that don’t end in .jpg:

https?://[^\\s]+(?!\\.jpg)

This excludes URLs with .jpg extensions.

Prevent Forbidden Sequences

When validating passwords, you can disallow certain patterns (e.g., the word "admin" anywhere):

^(?!.*admin).*

This pattern matches any input as long as "admin" does not appear anywhere in the string.

Summary

Feature	Description
Syntax	`(?!...)`
Type	Zero-width assertion
Purpose	Exclude matches based on following content
Common uses	Blacklist patterns, conditional exclusions, input filtering

Negative lookaheads help you tighten matching rules by ruling out unwanted patterns. In the next section, we’ll look at positive lookbehind, which performs similar checks but in reverse.

8.3 Positive lookbehind `(?<=...)`

Positive lookbehind is a zero-width assertion that matches a position in the input string only if it is immediately preceded by a specific pattern. Unlike regular matching that consumes characters, lookbehind checks the context before the current position without including it in the match.

Syntax and Behavior

The syntax for positive lookbehind is:

(?<=pattern)

This asserts that the current position in the input is preceded by pattern, but the matched result does not include this preceding pattern. The lookbehind itself does not consume any characters—it only confirms the presence of the pattern behind the current position.

Important Limitation in Java Regex

Java’s regex engine requires that the pattern inside a lookbehind be fixed-length or of predictable length (no quantifiers like * or + without bounds). For example, (?<=abc) is valid, but (?<=a+) is not, because the length of a+ is variable.

This limitation ensures efficient matching but means you cannot use arbitrary-length lookbehinds in Java’s standard regex.

Example: Match world only if preceded by hello

import java.util.regex.*;

public class PositiveLookbehindExample {
    public static void main(String[] args) {
        String input = "hello world, hi world";
        Pattern pattern = Pattern.compile("(?<=hello )world");
        Matcher matcher = pattern.matcher(input);

        while (matcher.find()) {
            System.out.println("Match found: " + matcher.group() +
                               " at position " + matcher.start());
        }
    }
}

Output:

Match found: world at position 6

Explanation: Only the "world" preceded by "hello " is matched. The "world" after "hi " does not satisfy the lookbehind condition and is ignored.

Use Cases

Contextual Matching: Find words only when preceded by certain prefixes or phrases.
Parsing: Extract data that follows fixed markers or labels.
Validation: Ensure a pattern appears only after a required substring.

Summary

Feature	Description
Syntax	`(?<=pattern)`
Type	Zero-width assertion, checks preceding context
Java limitation	Pattern inside must be fixed-length
Typical use cases	Contextual matches, parsing, validation

Positive lookbehind complements lookahead by letting you apply conditions on what comes before a match. In the next section, we will explore negative lookbehind, which asserts that a pattern does not precede the current position.

8.4 Negative lookbehind `(?<!...)`

Negative lookbehind is a zero-width assertion that matches a position only if it is not immediately preceded by a specified pattern. Similar to positive lookbehind, it checks the text behind the current position without consuming any characters, but instead of requiring a match, it asserts that the preceding pattern is absent.

Syntax and Usage

The syntax for negative lookbehind is:

(?<!pattern)

This means: match at the current position only if pattern does not appear immediately before it.

As a zero-width assertion, it influences whether a match occurs based on the preceding text but does not become part of the match result.

Java Regex Constraints

Like positive lookbehind, Java requires the pattern inside negative lookbehind to be fixed-length or deterministically bounded. This means quantifiers such as * or + cannot be used inside the lookbehind unless specified with exact limits (e.g., {3}).

This limitation is important to remember because more complex, variable-length negative lookbehinds are not supported in Java’s built-in regex engine.

Practical Examples

Example 1: Match cat only if not preceded by wild

import java.util.regex.*;

public class NegativeLookbehindExample {
    public static void main(String[] args) {
        String input = "wild cat and house cat";
        Pattern pattern = Pattern.compile("(?<!wild )cat");
        Matcher matcher = pattern.matcher(input);

        while (matcher.find()) {
            System.out.println("Match found: " + matcher.group() +
                               " at position " + matcher.start());
        }
    }
}

Output:

Match found: cat at position 18

Explanation: The "cat" preceded by "wild " is not matched because of the negative lookbehind. Only the "cat" after "house " matches.

Example 2: Prevent matching numbers preceded by a dollar sign

Pattern pattern = Pattern.compile("(?<!\\$)\\d+");

This pattern matches numbers only if they are not preceded by $, which could be useful to exclude monetary amounts while extracting other numbers.

When to Use Negative Lookbehind

Excluding matches that follow certain prefixes or markers.
Preventing matches in specific contexts without consuming the preceding text.
Writing complex validations that depend on what does not appear before a pattern.

Summary

Feature	Description
Syntax	`(?<!pattern)`
Function	Zero-width assertion that asserts absence of preceding pattern
Java limitation	Pattern must be fixed-length
Use cases	Contextual exclusions, conditional matching

Negative lookbehind is a powerful tool for fine-grained control over matches based on what comes before. It complements other lookaround assertions, enabling expressive and precise regex patterns.

8.5 Practical examples: Validate complex password rules, extract context-sensitive patterns

Lookahead and lookbehind assertions in Java regex provide powerful ways to enforce complex matching rules without consuming characters. This means you can check for required or forbidden patterns in your input, and extract data based on surrounding context, all while keeping your matches precise and efficient.

Example 1: Password Validation Using Lookaheads

Suppose you want to validate a password with the following rules:

At least 8 characters long
Contains at least one uppercase letter
Contains at least one digit
Contains no whitespace characters

Using lookaheads, you can check each condition independently and combine them into a single regex:

import java.util.regex.*;

public class PasswordValidator {
    public static void main(String[] args) {
        String[] passwords = {
            "Pass1234",
            "password",
            "PASS1234",
            "Pass 123",
            "Pass12"
        };

        // Regex explanation:
        // (?=.*[A-Z])      - Positive lookahead for at least one uppercase letter
        // (?=.*\\d)        - Positive lookahead for at least one digit
        // (?!.*\\s)        - Negative lookahead to ensure no whitespace
        // .{8,}            - Match at least 8 characters (any characters)
        String regex = "^(?=.*[A-Z])(?=.*\\d)(?!.*\\s).{8,}$";

        Pattern pattern = Pattern.compile(regex);

        for (String pwd : passwords) {
            Matcher matcher = pattern.matcher(pwd);
            System.out.println(pwd + ": " + (matcher.matches() ? "Valid" : "Invalid"));
        }
    }
}

Output:

Pass1234: Valid
password: Invalid
PASS1234: Valid
Pass 123: Invalid
Pass12: Invalid

Explanation:

(?=.*[A-Z]) ensures there is at least one uppercase letter anywhere in the string.
(?=.*\d) requires at least one digit.
(?!.*\s) forbids whitespace anywhere in the string.
.{8,} ensures the total length is at least 8 characters. All combined with anchors ^ and $ to match the entire input.

Example 2: Extract Context-Sensitive Data Using Lookbehind

Imagine you want to extract prices from text but only if they are preceded by the currency symbol $:

import java.util.regex.*;

public class PriceExtractor {
    public static void main(String[] args) {
        String text = "Prices are $100, $250 and 300 without dollar sign.";

        // Pattern to match digits preceded by $ using positive lookbehind
        Pattern pattern = Pattern.compile("(?<=\\$)\\d+");
        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            System.out.println("Price found: " + matcher.group());
        }
    }
}

Output:

Price found: 100
Price found: 250

Here, (?<=\$) asserts that the digits are preceded by a dollar sign without including it in the match. Numbers without $ are ignored.

Example 3: Exclude Data Based on Lookbehind

Alternatively, if you need to find all numbers not preceded by a $, you can use negative lookbehind:

Pattern pattern = Pattern.compile("(?<!\\$)\\d+");

This matches digits only if they are not immediately preceded by $.

Summary

Lookaheads and lookbehinds enable complex validations and precise extractions by enforcing conditions on what surrounds a match without capturing those characters. This leads to more maintainable, readable, and performant regexes.

Assertion	Purpose	Example Use Case
Positive lookahead	Require something ahead	Password must include digit
Negative lookahead	Disallow something ahead	No whitespace in password
Positive lookbehind	Require something behind	Extract numbers after `$`
Negative lookbehind	Disallow something behind	Extract numbers not after `$`

Using these assertions, you can build sophisticated pattern checks and data extraction logic in your Java applications with confidence and clarity.

Lookahead and Lookbehind Assertions

Java Regex

8.1 Positive lookahead (?=...)

Syntax and Behavior

Simple Example: Match "foo" followed by "bar"

Use Case: Enforcing Suffix Requirement

Why Use Lookahead?

Summary

8.2 Negative lookahead (?!...)

Syntax and Meaning

Example: Match foo not followed by bar

Practical Uses

Exclude Specific Keywords

Prevent Forbidden Sequences

Summary

8.3 Positive lookbehind (?<=...)

Syntax and Behavior

Important Limitation in Java Regex

Example: Match world only if preceded by hello

Use Cases

Summary

8.4 Negative lookbehind (?<!...)

Syntax and Usage

Java Regex Constraints

Practical Examples

Example 1: Match cat only if not preceded by wild

Example 2: Prevent matching numbers preceded by a dollar sign

When to Use Negative Lookbehind

Summary

8.5 Practical examples: Validate complex password rules, extract context-sensitive patterns

Example 1: Password Validation Using Lookaheads

Example 2: Extract Context-Sensitive Data Using Lookbehind

Example 3: Exclude Data Based on Lookbehind

Summary

Related Books

8.1 Positive lookahead `(?=...)`

8.2 Negative lookahead `(?!...)`

8.3 Positive lookbehind `(?<=...)`

8.4 Negative lookbehind `(?<!...)`