Stream
APIJava’s Stream API provides a powerful and expressive way to process collections of data in a declarative, functional style. When combined with regex, streams enable efficient and readable text processing workflows — such as filtering lines, extracting matches, or transforming input — with minimal boilerplate.
At its core, streams operate on sequences of elements, like a list of strings or lines read from a file. Regex complements this by providing powerful pattern matching to inspect or manipulate each element.
A common pattern is:
Pattern
once and reuse it throughout the stream pipelineMatcher.find()
, Pattern.matcher()
, or string methods like matches()
, replaceAll()
This approach promotes clean, modular code that’s easy to maintain.
.filter()
, .map()
, and .flatMap()
express data transformations clearly.Suppose you want to process a list of sentences and extract those containing an email address.
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class RegexStreamExample {
public static void main(String[] args) {
List<String> lines = Arrays.asList(
"Contact us at support@example.com",
"No email here",
"Send feedback to feedback@domain.org"
);
// Compile email pattern once
Pattern emailPattern = Pattern.compile("[\\w.-]+@[\\w.-]+\\.\\w+");
// Filter lines containing emails using streams and regex
List<String> linesWithEmail = lines.stream()
.filter(line -> emailPattern.matcher(line).find())
.collect(Collectors.toList());
linesWithEmail.forEach(System.out::println);
}
}
Output:
Contact us at support@example.com
Send feedback to feedback@domain.org
Here, the filter
step uses the compiled pattern’s matcher to check each line for an email substring. The result is a clean, concise pipeline that’s easy to read and maintain.
By leveraging Java Streams and regex together, you gain:
This synergy makes streams and regex an ideal pair for modern Java text-processing tasks.
Using regex within Java Stream operations unlocks powerful ways to filter data based on patterns and transform matched content. The combination of streams and regex allows you to declaratively extract meaningful information or reshape data with minimal code.
The most straightforward use case is filtering a stream of strings to retain only those that match a given regex pattern. This is commonly done using the filter()
method with Pattern.matcher()
or String.matches()
.
Pattern digitPattern = Pattern.compile("\\d+");
List<String> input = List.of("abc123", "xyz", "456def", "789");
List<String> onlyWithDigits = input.stream()
.filter(s -> digitPattern.matcher(s).find())
.collect(Collectors.toList());
// Result: ["abc123", "456def", "789"]
Here, filter()
retains strings containing at least one digit.
Often, you want not only to filter but also to extract specific parts of the matches. This requires accessing capturing groups from regex matches and using map()
to transform matched strings into desired outputs.
Example: Extract the digits from strings containing numbers.
List<String> digitsOnly = input.stream()
.map(digitPattern::matcher)
.filter(Matcher::find) // Ensure there is a match
.map(matcher -> matcher.group()) // Extract matched substring (entire match)
.collect(Collectors.toList());
// Result: ["123", "456", "789"]
By calling map()
with a lambda that accesses the matcher’s group, you extract and collect just the matched parts.
flatMap()
for Multiple Groups or MatchesIf a string contains multiple matches, for example, multiple tokens or numbers, flatMap()
can flatten the stream of multiple results per element into a single stream.
Example: Extract all numbers from a list of sentences.
Pattern numberPattern = Pattern.compile("\\d+");
List<String> sentences = List.of(
"Order 123 shipped on 2023-06-22",
"Invoice 456 and 789 pending"
);
List<String> allNumbers = sentences.stream()
.map(numberPattern::matcher)
.flatMap(matcher -> {
List<String> matches = new ArrayList<>();
while (matcher.find()) {
matches.add(matcher.group());
}
return matches.stream();
})
.collect(Collectors.toList());
// Result: ["123", "2023", "06", "22", "456", "789"]
This example collects all numbers from each sentence by iterating over multiple matches inside the flatMap()
.
When your pattern has optional groups, you can check if a group matched before extracting it:
Pattern pattern = Pattern.compile("(\\w+)(?:-(\\d+))?"); // Optional group 2
List<String> data = List.of("item-123", "item", "product-456");
List<String> ids = data.stream()
.map(pattern::matcher)
.filter(Matcher::matches)
.map(matcher -> matcher.group(2)) // May be null if group 2 did not match
.filter(Objects::nonNull)
.collect(Collectors.toList());
// Result: ["123", "456"]
Here, we safely extract the optional numeric suffix, filtering out nulls.
By combining regex with filter()
, map()
, and flatMap()
, Java streams enable concise and expressive pipelines to:
This functional style leads to maintainable, flexible text processing code that can adapt easily to evolving requirements.
In real-world applications, efficiently processing large datasets—such as log files or CSV records—is crucial. Combining Java’s Stream
API with compiled regex patterns offers a clear, performant way to filter, extract, and transform data in a declarative style.
Suppose we have a large log file where each line records an event with a timestamp, log level, and message:
2025-06-22 14:35:12 INFO User login successful for userId=1234
2025-06-22 14:36:00 ERROR Failed to connect to DB
2025-06-22 14:36:45 WARN Disk space low on server-7
2025-06-22 14:37:01 INFO User logout for userId=1234
Our goal is to:
ERROR
and WARN
level logs,We define a regex pattern with named capturing groups for clarity:
import java.util.regex.*;
import java.util.*;
import java.util.stream.*;
public class LogProcessor {
// Pattern with named groups: timestamp, level, message
private static final Pattern logPattern = Pattern.compile(
"^(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})\\s+" +
"(?<level>INFO|ERROR|WARN)\\s+" +
"(?<message>.+)$"
);
// LogEntry class to hold extracted data
static class LogEntry {
String timestamp, level, message;
LogEntry(String t, String l, String m) {
timestamp = t; level = l; message = m;
}
@Override
public String toString() {
return String.format("[%s] %s - %s", timestamp, level, message);
}
}
public static void main(String[] args) {
List<String> logLines = List.of(
"2025-06-22 14:35:12 INFO User login successful for userId=1234",
"2025-06-22 14:36:00 ERROR Failed to connect to DB",
"2025-06-22 14:36:45 WARN Disk space low on server-7",
"2025-06-22 14:37:01 INFO User logout for userId=1234"
);
List<LogEntry> alerts = logLines.stream()
.map(logPattern::matcher)
.filter(Matcher::matches) // Keep lines matching the pattern
.filter(matcher -> {
String level = matcher.group("level");
return "ERROR".equals(level) || "WARN".equals(level);
})
.map(matcher -> new LogEntry(
matcher.group("timestamp"),
matcher.group("level"),
matcher.group("message")))
.collect(Collectors.toList());
alerts.forEach(System.out::println);
}
}
logPattern
once as a static final field to avoid repeated compilation overhead on each line.Matcher
; lines not matching the pattern are discarded.ERROR
and WARN
levels are retained.LogEntry
objects.[2025-06-22 14:36:00] ERROR - Failed to connect to DB
[2025-06-22 14:36:45] WARN - Disk space low on server-7
filter(Matcher::matches)
) reduces unnecessary processing.Files.lines(Path)
to stream lines directly from disk.This approach showcases how regex and streams can be combined to efficiently parse and extract meaningful data from complex input sources while keeping the code clean and maintainable.