Index

Character Classes and POSIX Character Classes

Java Regex

5.1 Custom character classes [abc] and ranges [a-z]

Custom character classes are a fundamental feature in regular expressions that let you match any one character from a specific set or range. These classes are enclosed in square brackets [ ] and give you the flexibility to specify exactly which characters you want to allow at a particular position in your pattern.

How Custom Character Classes Work

When you write a regex like [abc], it means match exactly one character that is either a, b, or c. The regex engine checks the input string at that position and accepts the match if it finds any one of those characters.

You can also specify ranges inside the brackets. For example, [a-z] matches any lowercase letter from a through z. Similarly, [0-9] matches any digit from 0 to 9.

You can combine multiple ranges and individual characters inside one set. For example:

[a-f0-3xZ]

matches any character that is:

Practical Examples

Benefits and Use Cases

Custom character classes are especially useful when you want to restrict input to a specific set of characters or allow multiple options in a concise way. For example:

By mastering custom character classes and ranges, you gain powerful control over what your regex matches, making your patterns both precise and adaptable.

Index

5.2 Negated classes [^...]

Negated character classes are a useful extension of custom character classes that allow you to match any character except those specified. Instead of listing characters you want to match, you specify which characters to exclude.

Syntax of Negated Character Classes

A negated character class starts with a caret (^) immediately following the opening square bracket:

[^abc]

This pattern matches any single character except a, b, or c.

How They Work

When the regex engine encounters a negated class, it checks if the character at the current position is not in the specified set. If it isn’t, the match succeeds.

For example, [^\d] matches any character that is not a digit, because \d represents digits, and the caret negates the class.

Examples

Practical Uses of Negated Classes

Negated classes simplify patterns when you want to filter out unwanted characters rather than explicitly listing what to accept. This is especially helpful when:

Negated character classes provide a straightforward way to express exclusions in regex, making your patterns more concise and easier to maintain.

Index

5.3 POSIX character classes like \p{Lower}, \p{Upper}, etc.

Java’s regex engine supports a powerful set of POSIX character classes that allow you to match characters based on their Unicode categories. These classes provide an excellent way to create patterns that work across different languages and scripts, making your regexes locale-independent and more flexible.

What Are POSIX Character Classes?

POSIX character classes use the syntax:

\p{Category}

where Category specifies a Unicode character class or property. These categories represent broad sets of characters, such as lowercase letters, uppercase letters, digits, punctuation marks, and whitespace.

Some common POSIX classes include:

Examples of POSIX Classes in Use

To match a lowercase letter:

\p{Lower}

This matches any lowercase letter, including those beyond the ASCII range, such as ñ, ü, or ç.

To match any uppercase letter:

\p{Upper}

For digits, you can use:

\p{Digit}

which covers more than just 0-9 by including digits from other scripts.

POSIX Classes vs. Predefined Shorthand Classes

Java also provides predefined shorthand character classes like:

However, these shorthands are limited mostly to ASCII ranges and do not fully support the diversity of Unicode characters.

POSIX classes, on the other hand, are based on Unicode properties and thus better support internationalization. For example, \p{Lower} matches lowercase letters in all languages, not just a-z.

Benefits for Internationalization

Using POSIX classes ensures your regex patterns work consistently with multilingual text, supporting characters from different alphabets and scripts. This is crucial when dealing with global applications that handle names, addresses, or other inputs containing non-ASCII characters.

By mastering POSIX character classes, you can create regex patterns that are robust, clear, and ready for the diverse text your Java applications may encounter.

Index

5.4 Example: Validating password complexity

Validating password strength is a common and practical use case for regular expressions. A good password policy typically enforces multiple rules, such as requiring uppercase letters, lowercase letters, digits, and special characters. In this example, we'll use both basic and POSIX character classes to construct a regex that checks for strong passwords.

Password Rules

Let’s define a password as valid if it satisfies all of the following:

Java Regex Pattern

We’ll use positive lookahead assertions in combination with POSIX character classes to enforce the rules:

^(?=.*\p{Lower})(?=.*\p{Upper})(?=.*\p{Digit})(?=.*[^\\p{Alnum}]).{8,}$
Explanation:

Runnable Java Example

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class PasswordValidator {
    public static void main(String[] args) {
        String[] passwords = {
            "Password1!",      // valid
            "weakpass",        // no digit or uppercase or special char
            "StrongPass123",   // missing special character
            "Short1!",         // too short
            "Valid#2023"       // valid
        };

        String regex = "^(?=.*\\p{Lower})(?=.*\\p{Upper})(?=.*\\p{Digit})(?=.*[^\\p{Alnum}]).{8,}$";
        Pattern pattern = Pattern.compile(regex);

        for (String password : passwords) {
            Matcher matcher = pattern.matcher(password);
            if (matcher.matches()) {
                System.out.println("Valid password: " + password);
            } else {
                System.out.println("Invalid password: " + password);
            }
        }
    }
}

Output

Valid password: Password1!
Invalid password: weakpass
Invalid password: StrongPass123
Invalid password: Short1!
Valid password: Valid#2023

Summary

This example demonstrates how powerful regex can be for enforcing complex string constraints. By combining POSIX character classes and lookaheads, we keep the pattern readable, internationalized, and robust. This technique can be adapted for login systems, form validations, and any scenario requiring secure password handling.

Index