(?=...)
Positive lookahead is a powerful tool in regular expressions that allows you to assert that a certain pattern must follow a given position in the input—without actually including that pattern in the match result. It is a zero-width assertion, meaning it checks for a condition ahead in the text but doesn’t consume any characters.
The syntax for positive lookahead is:
X(?=Y)
This matches X
only if it is immediately followed by Y
. Importantly, Y
is not included in the final match.
import java.util.regex.*;
public class LookaheadExample {
public static void main(String[] args) {
String input = "foobar food fool";
Pattern pattern = Pattern.compile("foo(?=bar)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Match found: " + matcher.group());
}
}
}
Output:
Match found: foo
Explanation: Only the "foo"
that is followed by "bar"
is matched. "food"
and "fool"
do not satisfy the lookahead condition and are ignored.
Suppose you want to find all usernames in a log that start with user
only if they are followed by a number:
user(?=\d)
This pattern matches "user"
only when it is immediately followed by a digit, such as "user123"
.
Positive lookahead is especially useful when:
Feature | Description |
---|---|
Syntax | (?=...) |
Type | Zero-width assertion (does not consume characters) |
Use | Match only if a pattern follows, but don't include it |
Common use cases | Validation, format checking, pattern sequencing |
Lookahead allows you to express conditional logic in regex without complicating match extraction. In the next section, we’ll look at the opposite: negative lookahead, which asserts that a certain pattern must not follow.
(?!...)
Negative lookahead is a zero-width assertion in regular expressions that allows you to specify what must not follow a certain position in the input. Like positive lookahead, it checks the upcoming text without consuming any characters. This feature is especially useful for excluding specific patterns while still allowing others.
The syntax for negative lookahead is:
X(?!Y)
This means: match X
only if it is not followed by Y
.
Since it is a zero-width assertion, the lookahead itself doesn’t become part of the match—it only determines whether the match should occur based on what follows X
.
import java.util.regex.*;
public class NegativeLookaheadExample {
public static void main(String[] args) {
String input = "foobar foofoo food";
Pattern pattern = Pattern.compile("foo(?!bar)");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Match found: " + matcher.group() +
" at position " + matcher.start());
}
}
}
Output:
Match found: foo at position 7
Match found: foo at position 13
Explanation: Only "foo"
strings that are not followed by "bar"
are matched. The "foo"
in "foobar"
is excluded due to the lookahead condition.
You might want to match URLs that don’t end in .jpg
:
https?://[^\\s]+(?!\\.jpg)
This excludes URLs with .jpg
extensions.
When validating passwords, you can disallow certain patterns (e.g., the word "admin"
anywhere):
^(?!.*admin).*
This pattern matches any input as long as "admin"
does not appear anywhere in the string.
Feature | Description |
---|---|
Syntax | (?!...) |
Type | Zero-width assertion |
Purpose | Exclude matches based on following content |
Common uses | Blacklist patterns, conditional exclusions, input filtering |
Negative lookaheads help you tighten matching rules by ruling out unwanted patterns. In the next section, we’ll look at positive lookbehind, which performs similar checks but in reverse.
(?<=...)
Positive lookbehind is a zero-width assertion that matches a position in the input string only if it is immediately preceded by a specific pattern. Unlike regular matching that consumes characters, lookbehind checks the context before the current position without including it in the match.
The syntax for positive lookbehind is:
(?<=pattern)
This asserts that the current position in the input is preceded by pattern
, but the matched result does not include this preceding pattern. The lookbehind itself does not consume any characters—it only confirms the presence of the pattern behind the current position.
Java’s regex engine requires that the pattern inside a lookbehind be fixed-length or of predictable length (no quantifiers like *
or +
without bounds). For example, (?<=abc)
is valid, but (?<=a+)
is not, because the length of a+
is variable.
This limitation ensures efficient matching but means you cannot use arbitrary-length lookbehinds in Java’s standard regex.
import java.util.regex.*;
public class PositiveLookbehindExample {
public static void main(String[] args) {
String input = "hello world, hi world";
Pattern pattern = Pattern.compile("(?<=hello )world");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Match found: " + matcher.group() +
" at position " + matcher.start());
}
}
}
Output:
Match found: world at position 6
Explanation: Only the "world"
preceded by "hello "
is matched. The "world"
after "hi "
does not satisfy the lookbehind condition and is ignored.
Feature | Description |
---|---|
Syntax | (?<=pattern) |
Type | Zero-width assertion, checks preceding context |
Java limitation | Pattern inside must be fixed-length |
Typical use cases | Contextual matches, parsing, validation |
Positive lookbehind complements lookahead by letting you apply conditions on what comes before a match. In the next section, we will explore negative lookbehind, which asserts that a pattern does not precede the current position.
(?<!...)
Negative lookbehind is a zero-width assertion that matches a position only if it is not immediately preceded by a specified pattern. Similar to positive lookbehind, it checks the text behind the current position without consuming any characters, but instead of requiring a match, it asserts that the preceding pattern is absent.
The syntax for negative lookbehind is:
(?<!pattern)
This means: match at the current position only if pattern
does not appear immediately before it.
As a zero-width assertion, it influences whether a match occurs based on the preceding text but does not become part of the match result.
Like positive lookbehind, Java requires the pattern inside negative lookbehind to be fixed-length or deterministically bounded. This means quantifiers such as *
or +
cannot be used inside the lookbehind unless specified with exact limits (e.g., {3}
).
This limitation is important to remember because more complex, variable-length negative lookbehinds are not supported in Java’s built-in regex engine.
import java.util.regex.*;
public class NegativeLookbehindExample {
public static void main(String[] args) {
String input = "wild cat and house cat";
Pattern pattern = Pattern.compile("(?<!wild )cat");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("Match found: " + matcher.group() +
" at position " + matcher.start());
}
}
}
Output:
Match found: cat at position 18
Explanation: The "cat"
preceded by "wild "
is not matched because of the negative lookbehind. Only the "cat"
after "house "
matches.
Pattern pattern = Pattern.compile("(?<!\\$)\\d+");
This pattern matches numbers only if they are not preceded by $
, which could be useful to exclude monetary amounts while extracting other numbers.
Feature | Description |
---|---|
Syntax | (?<!pattern) |
Function | Zero-width assertion that asserts absence of preceding pattern |
Java limitation | Pattern must be fixed-length |
Use cases | Contextual exclusions, conditional matching |
Negative lookbehind is a powerful tool for fine-grained control over matches based on what comes before. It complements other lookaround assertions, enabling expressive and precise regex patterns.
Lookahead and lookbehind assertions in Java regex provide powerful ways to enforce complex matching rules without consuming characters. This means you can check for required or forbidden patterns in your input, and extract data based on surrounding context, all while keeping your matches precise and efficient.
Suppose you want to validate a password with the following rules:
Using lookaheads, you can check each condition independently and combine them into a single regex:
import java.util.regex.*;
public class PasswordValidator {
public static void main(String[] args) {
String[] passwords = {
"Pass1234",
"password",
"PASS1234",
"Pass 123",
"Pass12"
};
// Regex explanation:
// (?=.*[A-Z]) - Positive lookahead for at least one uppercase letter
// (?=.*\\d) - Positive lookahead for at least one digit
// (?!.*\\s) - Negative lookahead to ensure no whitespace
// .{8,} - Match at least 8 characters (any characters)
String regex = "^(?=.*[A-Z])(?=.*\\d)(?!.*\\s).{8,}$";
Pattern pattern = Pattern.compile(regex);
for (String pwd : passwords) {
Matcher matcher = pattern.matcher(pwd);
System.out.println(pwd + ": " + (matcher.matches() ? "Valid" : "Invalid"));
}
}
}
Output:
Pass1234: Valid
password: Invalid
PASS1234: Valid
Pass 123: Invalid
Pass12: Invalid
Explanation:
(?=.*[A-Z])
ensures there is at least one uppercase letter anywhere in the string.(?=.*\d)
requires at least one digit.(?!.*\s)
forbids whitespace anywhere in the string..{8,}
ensures the total length is at least 8 characters. All combined with anchors ^
and $
to match the entire input.Imagine you want to extract prices from text but only if they are preceded by the currency symbol $
:
import java.util.regex.*;
public class PriceExtractor {
public static void main(String[] args) {
String text = "Prices are $100, $250 and 300 without dollar sign.";
// Pattern to match digits preceded by $ using positive lookbehind
Pattern pattern = Pattern.compile("(?<=\\$)\\d+");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Price found: " + matcher.group());
}
}
}
Output:
Price found: 100
Price found: 250
Here, (?<=\$)
asserts that the digits are preceded by a dollar sign without including it in the match. Numbers without $
are ignored.
Alternatively, if you need to find all numbers not preceded by a $
, you can use negative lookbehind:
Pattern pattern = Pattern.compile("(?<!\\$)\\d+");
This matches digits only if they are not immediately preceded by $
.
Lookaheads and lookbehinds enable complex validations and precise extractions by enforcing conditions on what surrounds a match without capturing those characters. This leads to more maintainable, readable, and performant regexes.
Assertion | Purpose | Example Use Case |
---|---|---|
Positive lookahead | Require something ahead | Password must include digit |
Negative lookahead | Disallow something ahead | No whitespace in password |
Positive lookbehind | Require something behind | Extract numbers after $ |
Negative lookbehind | Disallow something behind | Extract numbers not after $ |
Using these assertions, you can build sophisticated pattern checks and data extraction logic in your Java applications with confidence and clarity.