[abc]
and ranges [a-z]
Custom character classes are a fundamental feature in regular expressions that let you match any one character from a specific set or range. These classes are enclosed in square brackets [ ]
and give you the flexibility to specify exactly which characters you want to allow at a particular position in your pattern.
When you write a regex like [abc]
, it means match exactly one character that is either a
, b
, or c
. The regex engine checks the input string at that position and accepts the match if it finds any one of those characters.
You can also specify ranges inside the brackets. For example, [a-z]
matches any lowercase letter from a
through z
. Similarly, [0-9]
matches any digit from 0
to 9
.
You can combine multiple ranges and individual characters inside one set. For example:
[a-f0-3xZ]
matches any character that is:
a
and f
,0
and 3
,x
, orZ
.Matching vowels: To match any lowercase vowel, you can use:
[aeiou]
This matches any one of the vowels a
, e
, i
, o
, or u
.
Matching hexadecimal digits: Hex digits include digits and letters from a
to f
or A
to F
. You can specify:
[0-9a-fA-F]
This matches any single hex digit.
Matching specific letters: Suppose you want to match either x
, y
, or z
:
[xyz]
Custom character classes are especially useful when you want to restrict input to a specific set of characters or allow multiple options in a concise way. For example:
By mastering custom character classes and ranges, you gain powerful control over what your regex matches, making your patterns both precise and adaptable.
[^...]
Negated character classes are a useful extension of custom character classes that allow you to match any character except those specified. Instead of listing characters you want to match, you specify which characters to exclude.
A negated character class starts with a caret (^
) immediately following the opening square bracket:
[^abc]
This pattern matches any single character except a
, b
, or c
.
When the regex engine encounters a negated class, it checks if the character at the current position is not in the specified set. If it isn’t, the match succeeds.
For example, [^\d]
matches any character that is not a digit, because \d
represents digits, and the caret negates the class.
Exclude digits: To match any character except digits, use:
[^\d]
This matches letters, symbols, whitespace, and other non-digit characters.
Exclude whitespace: To match any character that is not whitespace, use:
[^\s]
Exclude specific letters: If you need to match any character except x
, y
, or z
:
[^xyz]
Negated classes simplify patterns when you want to filter out unwanted characters rather than explicitly listing what to accept. This is especially helpful when:
Negated character classes provide a straightforward way to express exclusions in regex, making your patterns more concise and easier to maintain.
\p{Lower}
, \p{Upper}
, etc.Java’s regex engine supports a powerful set of POSIX character classes that allow you to match characters based on their Unicode categories. These classes provide an excellent way to create patterns that work across different languages and scripts, making your regexes locale-independent and more flexible.
POSIX character classes use the syntax:
\p{Category}
where Category
specifies a Unicode character class or property. These categories represent broad sets of characters, such as lowercase letters, uppercase letters, digits, punctuation marks, and whitespace.
Some common POSIX classes include:
\p{Lower}
— Matches any lowercase letter (e.g., a
, b
, c
, including accented letters like á
).\p{Upper}
— Matches any uppercase letter (e.g., A
, B
, C
, including accented uppercase letters).\p{Digit}
— Matches any Unicode digit (similar to \d
but broader).\p{Alpha}
— Matches any alphabetic character.\p{Punct}
— Matches any punctuation character.\p{Space}
— Matches any whitespace character (spaces, tabs, newlines, etc.).To match a lowercase letter:
\p{Lower}
This matches any lowercase letter, including those beyond the ASCII range, such as ñ
, ü
, or ç
.
To match any uppercase letter:
\p{Upper}
For digits, you can use:
\p{Digit}
which covers more than just 0-9
by including digits from other scripts.
Java also provides predefined shorthand character classes like:
\w
— word characters (letters, digits, underscore)\d
— digits (0-9)\s
— whitespaceHowever, these shorthands are limited mostly to ASCII ranges and do not fully support the diversity of Unicode characters.
POSIX classes, on the other hand, are based on Unicode properties and thus better support internationalization. For example, \p{Lower}
matches lowercase letters in all languages, not just a-z
.
Using POSIX classes ensures your regex patterns work consistently with multilingual text, supporting characters from different alphabets and scripts. This is crucial when dealing with global applications that handle names, addresses, or other inputs containing non-ASCII characters.
By mastering POSIX character classes, you can create regex patterns that are robust, clear, and ready for the diverse text your Java applications may encounter.
Validating password strength is a common and practical use case for regular expressions. A good password policy typically enforces multiple rules, such as requiring uppercase letters, lowercase letters, digits, and special characters. In this example, we'll use both basic and POSIX character classes to construct a regex that checks for strong passwords.
Let’s define a password as valid if it satisfies all of the following:
\p{Lower}
)\p{Upper}
)\p{Digit}
)We’ll use positive lookahead assertions in combination with POSIX character classes to enforce the rules:
^(?=.*\p{Lower})(?=.*\p{Upper})(?=.*\p{Digit})(?=.*[^\\p{Alnum}]).{8,}$
^
and $
anchor the pattern to the start and end of the string.(?=.*\p{Lower})
ensures at least one lowercase letter.(?=.*\p{Upper})
ensures at least one uppercase letter.(?=.*\p{Digit})
ensures at least one digit.(?=.*[^\\p{Alnum}])
ensures at least one non-alphanumeric (i.e., special) character..{8,}
ensures the total length is at least 8 characters.import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class PasswordValidator {
public static void main(String[] args) {
String[] passwords = {
"Password1!", // valid
"weakpass", // no digit or uppercase or special char
"StrongPass123", // missing special character
"Short1!", // too short
"Valid#2023" // valid
};
String regex = "^(?=.*\\p{Lower})(?=.*\\p{Upper})(?=.*\\p{Digit})(?=.*[^\\p{Alnum}]).{8,}$";
Pattern pattern = Pattern.compile(regex);
for (String password : passwords) {
Matcher matcher = pattern.matcher(password);
if (matcher.matches()) {
System.out.println("Valid password: " + password);
} else {
System.out.println("Invalid password: " + password);
}
}
}
}
Valid password: Password1!
Invalid password: weakpass
Invalid password: StrongPass123
Invalid password: Short1!
Valid password: Valid#2023
This example demonstrates how powerful regex can be for enforcing complex string constraints. By combining POSIX character classes and lookaheads, we keep the pattern readable, internationalized, and robust. This technique can be adapted for login systems, form validations, and any scenario requiring secure password handling.