When working with text in Java, it's common to need to find all occurrences of a particular pattern, not just the first one. The Matcher.find()
method from the java.util.regex
package is designed precisely for this purpose. Unlike matches()
, which tries to match the entire input string, find()
searches through the input to locate successive subsequences that match the pattern.
Matcher.find()
to Iterate Over MatchesTo find multiple occurrences, you typically create a Matcher
object from a compiled Pattern
and the input text, then repeatedly call find()
in a loop:
Pattern pattern = Pattern.compile("\\bJava\\b");
Matcher matcher = pattern.matcher("Java is fun. I love Java programming.");
while (matcher.find()) {
System.out.println("Found at index: " + matcher.start() + " - " + matcher.group());
}
This code searches for the whole word "Java" in the input string and prints each match’s start index and matched text.
If your regex contains capturing groups (parentheses), you can extract these groups from each match. For example:
Pattern pattern = Pattern.compile("(\\d{3})-(\\d{4})");
Matcher matcher = pattern.matcher("Call 555-1234 or 666-5678.");
while (matcher.find()) {
System.out.println("Area code: " + matcher.group(1) + ", Number: " + matcher.group(2));
}
Here, each phone number is split into area code and local number for extraction.
By default, find()
continues searching immediately after the last match’s end. This means it doesn’t detect overlapping matches. For example, searching for "ana" in "banana" will find the first "ana" starting at index 1 but will miss the overlapping "ana" starting at index 3.
To handle overlapping matches, you can advance the search manually using matcher.start()
or matcher.end()
, but it requires custom logic, such as resetting the matcher with adjusted input substrings or using lookahead patterns.
The Matcher.find()
method is a powerful way to locate multiple occurrences of regex patterns in Java strings. By iterating over matches and extracting groups, developers can implement robust search and extraction functionality. While adjacent matches are straightforward to handle, overlapping matches need extra attention, often requiring more complex regex or iteration strategies. Understanding these concepts enables efficient and flexible text processing in Java applications.
Logs and reports often contain valuable structured information embedded in semi-structured text. Extracting this data efficiently is a common task in many applications such as monitoring, debugging, and analytics. Regex offers a flexible way to isolate key fields like timestamps, error codes, user IDs, or messages, even when the input format varies slightly.
The first step in extracting data is understanding the typical structure of your log or report lines. For example, a log entry might look like this:
2025-06-22 15:45:30 ERROR 1234 User login failed for userID=5678
Here, you may want to extract the timestamp, error level, error code, and user ID. A regex pattern designed to capture these could be:
(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(\w+)\s+(\d+)\s+User login failed for userID=(\d+)
(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})
(\w+)
(\d+)
(\d+)
Each part is wrapped in parentheses to capture it as a group for later extraction.
Logs often contain optional or variable parts. For instance, sometimes the user ID may be missing, or the error message might change. You can use optional groups ((...)?)
and non-capturing groups (?:...)
to handle such cases gracefully:
(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(\w+)\s+(\d+)(?: User login failed for userID=(\d+))?
The (?: ... )?
means that the user ID part is optional. When missing, the group for user ID will be null, which your code can check and handle accordingly.
Complex extraction patterns can become hard to read. Use comments in your regex (via (?x)
mode in Java) and break down the pattern logically:
String pattern = "(?x) # Enable comments and whitespace\n" +
"(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}) \\s+ # Timestamp\n" +
"(\\w+) \\s+ # Error level\n" +
"(\\d+) # Error code\n" +
"(?: User login failed for userID=(\\d+))? # Optional userID\n";
This approach makes it easier to update patterns as log formats evolve.
Extracting structured data from logs and reports with regex requires careful pattern design that balances flexibility and precision. By capturing key fields, handling optional parts, and maintaining readable patterns, you can build robust extraction solutions that adapt well to semi-structured inputs. This approach helps automate monitoring, error tracking, and analytics in many Java applications.
Extracting IP addresses from log files is a common task in network monitoring, security auditing, and data analysis. In this section, we’ll provide a complete Java example that uses regex to find and extract both IPv4 and IPv6 addresses from log entries.
192.168.1.1
.2001:0db8:85a3::8a2e:0370:7334
.We’ll create regex patterns for both formats:
IPv4 pattern (simplified for readability):
\b(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)){3}\b
This matches numbers from 0 to 255 in four octets separated by dots.
IPv6 pattern (basic version):
\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b
This matches standard IPv6 addresses without compression (::
). Handling all IPv6 variations requires a more complex regex, but this covers many typical cases.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.ArrayList;
public class IPAddressExtractor {
public static void main(String[] args) {
// Sample log entries containing IPv4 and IPv6 addresses
String logData = """
User connected from 192.168.1.100 at 10:15
Failed login from 10.0.0.256 (invalid IP)
Access granted to 2001:0db8:85a3:0000:0000:8a2e:0370:7334
Ping from 172.16.254.1 succeeded
Unknown host 1234:5678:9abc:def0:1234:5678:9abc:defg
""";
// Regex pattern to match IPv4 and IPv6 addresses
String ipv4Pattern = "\\b(?:25[0-5]|2[0-4]\\d|1\\d{2}|[1-9]?\\d)(?:\\.(?:25[0-5]|2[0-4]\\d|1\\d{2}|[1-9]?\\d)){3}\\b";
String ipv6Pattern = "\\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\\b";
// Combine patterns with alternation
String combinedPattern = ipv4Pattern + "|" + ipv6Pattern;
Pattern pattern = Pattern.compile(combinedPattern);
Matcher matcher = pattern.matcher(logData);
ArrayList<String> foundIPs = new ArrayList<>();
// Iterate over all matches
while (matcher.find()) {
String ip = matcher.group();
foundIPs.add(ip);
}
// Print extracted IP addresses
System.out.println("Extracted IP addresses:");
for (String ip : foundIPs) {
System.out.println(ip);
}
}
}
matcher.find()
, we locate all occurrences in the input string.matcher.group()
returns the exact matched IP address.Extracted IP addresses:
192.168.1.100
2001:0db8:85a3:0000:0000:8a2e:0370:7334
172.16.254.1
Note how invalid IPs like 10.0.0.256
and malformed IPv6 like 1234:5678:9abc:def0:1234:5678:9abc:defg
are ignored because they do not match the regex patterns.
This example demonstrates a practical approach to extracting both IPv4 and IPv6 addresses from log files using Java regex. While the IPv6 regex here covers standard full addresses, extending it for compressed forms and validating IP correctness may require more sophisticated patterns or external libraries. Nonetheless, regex combined with Java’s Matcher
provides a powerful and flexible tool for parsing complex text data efficiently.