The package java.util.regex contains three classes to support the full version of regular expressions.
The classes are
|Pattern||holds the compiled form of a regular expression.|
|Matcher||associates the string to be matched with a Pattern and it performs the actual match.|
|PatternSyntaxException||represents an error in a malformed regular expression.|
Pattern class holds the compiled form of a regular expression and it is immutable.
It has no public constructor. The class contains a static compile() method, which returns a Pattern object.
The compile() method is overloaded.
static Pattern compile(String regex) static Pattern compile(String regex, int flags)
The following snippet of code compiles a regular expression into a Pattern object:
String regex = "[a-z]@."; // Compile the regular expression into a Pattern object Pattern p = Pattern.compile(regex);
The flags parameter is a bit mask which can modify the way the pattern is matched.
The flags defined as int constants in the Pattern class is listed in the following table.
|Pattern.CANON_EQ|| Enables canonical equivalence. Two characters match only if their |
full canonical decompositions match. The expression "a\u030A", for example, will match the string "\u00E5" when this flag is specified. By default, matching does not take canonical equivalence into account.
|Pattern.CASE_INSENSITIVE|| Enables case-insensitive matching. This flag sets the case-insensitive matching|
only for US-ASCII charset. For Unicode charset, use
UNICODE_CASE flag and this flag.
|Pattern.COMMENTS|| Permits whitespace and comments in pattern. When this flag is set, whitespace is |
ignored and embedded comments starting with # are ignored until the end of a
line. In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line.
|Pattern.DOTALL|| By default, . does not match line terminators. |
In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
|Pattern.LITERAL|| Enables literal parsing of the pattern. When this flag is specified then the input string that specifies the pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence will be given no special meaning.|
|Pattern.MULTILINE|| Enables multiline mode. In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.|
|Pattern.UNICODE_CASE|| Enables Unicode-aware case folding. When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.|
|Pattern.UNICODE_CHARACTER_CLASS|| Enables the Unicode version of predefined character classes and POSIX character|
|Pattern.UNIX_LINES|| Enables Unix lines mode. When this flag is set, only the \n character is recognized|
as a line terminator.
The following code compiles a regular expression setting the CASE_INSENSTIVE and DOTALL flags.
The matching will be case-insensitive for US-ASCII charset and the expression. will match a line terminator.
// Prepare a regular expression String regex = "[a-z]@."; // Compile the regular expression into a Pattern object setting the // CASE_INSENSITIVE and DOTALL flags Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE|Pattern.DOTALL);