Index

Integrating Regex with Java Streams and Lambdas

Java Regex

16.1 Using regex with Stream API

Java’s Stream API provides a powerful and expressive way to process collections of data in a declarative, functional style. When combined with regex, streams enable efficient and readable text processing workflows — such as filtering lines, extracting matches, or transforming input — with minimal boilerplate.

Combining Streams and Regex: The Basics

At its core, streams operate on sequences of elements, like a list of strings or lines read from a file. Regex complements this by providing powerful pattern matching to inspect or manipulate each element.

A common pattern is:

This approach promotes clean, modular code that’s easy to maintain.

Benefits of Combining Streams with Regex

Practical Example

Suppose you want to process a list of sentences and extract those containing an email address.

import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
import java.util.stream.Collectors;

public class RegexStreamExample {
    public static void main(String[] args) {
        List<String> lines = Arrays.asList(
            "Contact us at support@example.com",
            "No email here",
            "Send feedback to feedback@domain.org"
        );

        // Compile email pattern once
        Pattern emailPattern = Pattern.compile("[\\w.-]+@[\\w.-]+\\.\\w+");

        // Filter lines containing emails using streams and regex
        List<String> linesWithEmail = lines.stream()
            .filter(line -> emailPattern.matcher(line).find())
            .collect(Collectors.toList());

        linesWithEmail.forEach(System.out::println);
    }
}

Output:

Contact us at support@example.com
Send feedback to feedback@domain.org

Here, the filter step uses the compiled pattern’s matcher to check each line for an email substring. The result is a clean, concise pipeline that’s easy to read and maintain.

Summary

By leveraging Java Streams and regex together, you gain:

This synergy makes streams and regex an ideal pair for modern Java text-processing tasks.

Index

16.2 Filtering and mapping with regex matches

Using regex within Java Stream operations unlocks powerful ways to filter data based on patterns and transform matched content. The combination of streams and regex allows you to declaratively extract meaningful information or reshape data with minimal code.

Filtering with Regex

The most straightforward use case is filtering a stream of strings to retain only those that match a given regex pattern. This is commonly done using the filter() method with Pattern.matcher() or String.matches().

Pattern digitPattern = Pattern.compile("\\d+");
List<String> input = List.of("abc123", "xyz", "456def", "789");

List<String> onlyWithDigits = input.stream()
    .filter(s -> digitPattern.matcher(s).find())
    .collect(Collectors.toList());
// Result: ["abc123", "456def", "789"]

Here, filter() retains strings containing at least one digit.

Mapping Matched Groups to Results

Often, you want not only to filter but also to extract specific parts of the matches. This requires accessing capturing groups from regex matches and using map() to transform matched strings into desired outputs.

Example: Extract the digits from strings containing numbers.

List<String> digitsOnly = input.stream()
    .map(digitPattern::matcher)
    .filter(Matcher::find)                // Ensure there is a match
    .map(matcher -> matcher.group())     // Extract matched substring (entire match)
    .collect(Collectors.toList());
// Result: ["123", "456", "789"]

By calling map() with a lambda that accesses the matcher’s group, you extract and collect just the matched parts.

Using flatMap() for Multiple Groups or Matches

If a string contains multiple matches, for example, multiple tokens or numbers, flatMap() can flatten the stream of multiple results per element into a single stream.

Example: Extract all numbers from a list of sentences.

Pattern numberPattern = Pattern.compile("\\d+");

List<String> sentences = List.of(
    "Order 123 shipped on 2023-06-22",
    "Invoice 456 and 789 pending"
);

List<String> allNumbers = sentences.stream()
    .map(numberPattern::matcher)
    .flatMap(matcher -> {
        List<String> matches = new ArrayList<>();
        while (matcher.find()) {
            matches.add(matcher.group());
        }
        return matches.stream();
    })
    .collect(Collectors.toList());
// Result: ["123", "2023", "06", "22", "456", "789"]

This example collects all numbers from each sentence by iterating over multiple matches inside the flatMap().

Handling Optional Groups

When your pattern has optional groups, you can check if a group matched before extracting it:

Pattern pattern = Pattern.compile("(\\w+)(?:-(\\d+))?"); // Optional group 2

List<String> data = List.of("item-123", "item", "product-456");

List<String> ids = data.stream()
    .map(pattern::matcher)
    .filter(Matcher::matches)
    .map(matcher -> matcher.group(2))   // May be null if group 2 did not match
    .filter(Objects::nonNull)
    .collect(Collectors.toList());
// Result: ["123", "456"]

Here, we safely extract the optional numeric suffix, filtering out nulls.

Summary

By combining regex with filter(), map(), and flatMap(), Java streams enable concise and expressive pipelines to:

This functional style leads to maintainable, flexible text processing code that can adapt easily to evolving requirements.

Index

16.3 Example: Processing large datasets with regex and streams

In real-world applications, efficiently processing large datasets—such as log files or CSV records—is crucial. Combining Java’s Stream API with compiled regex patterns offers a clear, performant way to filter, extract, and transform data in a declarative style.

Scenario: Processing Log Entries

Suppose we have a large log file where each line records an event with a timestamp, log level, and message:

2025-06-22 14:35:12 INFO User login successful for userId=1234
2025-06-22 14:36:00 ERROR Failed to connect to DB
2025-06-22 14:36:45 WARN Disk space low on server-7
2025-06-22 14:37:01 INFO User logout for userId=1234

Our goal is to:

Step 1: Compile the Regex Pattern Once

We define a regex pattern with named capturing groups for clarity:

import java.util.regex.*;
import java.util.*;
import java.util.stream.*;

public class LogProcessor {

    // Pattern with named groups: timestamp, level, message
    private static final Pattern logPattern = Pattern.compile(
        "^(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})\\s+" +
        "(?<level>INFO|ERROR|WARN)\\s+" +
        "(?<message>.+)$"
    );

    // LogEntry class to hold extracted data
    static class LogEntry {
        String timestamp, level, message;
        LogEntry(String t, String l, String m) {
            timestamp = t; level = l; message = m;
        }
        @Override
        public String toString() {
            return String.format("[%s] %s - %s", timestamp, level, message);
        }
    }
    
    public static void main(String[] args) {
        List<String> logLines = List.of(
            "2025-06-22 14:35:12 INFO User login successful for userId=1234",
            "2025-06-22 14:36:00 ERROR Failed to connect to DB",
            "2025-06-22 14:36:45 WARN Disk space low on server-7",
            "2025-06-22 14:37:01 INFO User logout for userId=1234"
        );

        List<LogEntry> alerts = logLines.stream()
            .map(logPattern::matcher)
            .filter(Matcher::matches) // Keep lines matching the pattern
            .filter(matcher -> {
                String level = matcher.group("level");
                return "ERROR".equals(level) || "WARN".equals(level);
            })
            .map(matcher -> new LogEntry(
                matcher.group("timestamp"),
                matcher.group("level"),
                matcher.group("message")))
            .collect(Collectors.toList());

        alerts.forEach(System.out::println);
    }
}

Explanation

Output

[2025-06-22 14:36:00] ERROR - Failed to connect to DB
[2025-06-22 14:36:45] WARN - Disk space low on server-7

Performance and Clarity Notes

This approach showcases how regex and streams can be combined to efficiently parse and extract meaningful data from complex input sources while keeping the code clean and maintainable.

Index