Java Streams offer a powerful and efficient way to read and process CSV files, especially when combined with Files.lines()
. By leveraging the stream API, you can read large datasets line by line, transform them into structured objects, filter or group them, and collect results with minimal memory usage and high readability.
Files.lines(Path)
in a try-with-resources block.String.split()
or a CSV parser.Person
ClassCSV file: people.csv
name,age,email
Alice,30,alice@example.com
Bob,25,bob@example.com
Charlie,invalid,charlie@example.com
import java.io.IOException;
import java.nio.file.*;
import java.util.*;
import java.util.stream.*;
public class CsvPersonExample {
public static void main(String[] args) {
Path file = Path.of("people.csv");
try (Stream<String> lines = Files.lines(file)) {
List<Person> people = lines
.skip(1) // Skip header
.map(CsvPersonExample::parsePerson)
.flatMap(Optional::stream) // Filter out failed parses
.filter(p -> p.age >= 18) // Filter adults
.collect(Collectors.toList());
people.forEach(System.out::println);
} catch (IOException e) {
System.err.println("Failed to read CSV file: " + e.getMessage());
}
}
static Optional<Person> parsePerson(String line) {
try {
String[] parts = line.split(",", -1);
if (parts.length < 3) return Optional.empty();
String name = parts[0].trim();
int age = Integer.parseInt(parts[1].trim());
String email = parts[2].trim();
return Optional.of(new Person(name, age, email));
} catch (Exception e) {
return Optional.empty(); // Skip malformed line
}
}
static class Person {
String name;
int age;
String email;
Person(String name, int age, String email) {
this.name = name;
this.age = age;
this.email = email;
}
public String toString() {
return name + " (" + age + ") - " + email;
}
}
}
Optional
to gracefully skip malformed records.Files.lines()
in a try-with-resources block to close the stream properly.Processing CSV files with streams enables concise, readable, and efficient data transformation pipelines. With careful error handling and attention to performance, you can parse large datasets into structured objects with minimal overhead.
Java Streams are especially powerful for processing large text files thanks to their lazy evaluation and line-by-line streaming capabilities. Unlike traditional approaches that read the entire file into memory, Files.lines(Path)
returns a lazily populated Stream<String>
, allowing efficient processing of massive files without exhausting system resources.
This is ideal for logs, large CSVs, or text analytics.
import java.io.IOException;
import java.nio.file.*;
import java.util.stream.*;
public class ErrorCounter {
public static void main(String[] args) {
Path logPath = Path.of("server.log");
try (Stream<String> lines = Files.lines(logPath)) {
long errorCount = lines
.filter(line -> line.contains("ERROR"))
.count();
System.out.println("Total ERROR lines: " + errorCount);
} catch (IOException e) {
System.err.println("Failed to read file: " + e.getMessage());
}
}
}
π Explanation: Only lines containing "ERROR"
are processed. No need to load the full log file into memory.
Suppose each line contains a numeric value. You want to compute summary statistics efficiently.
values.txt:
23
42
17
invalid
58
import java.io.IOException;
import java.nio.file.*;
import java.util.*;
import java.util.stream.*;
public class FileStats {
public static void main(String[] args) {
Path path = Path.of("values.txt");
try (Stream<String> lines = Files.lines(path)) {
IntSummaryStatistics stats = lines
.map(String::trim)
.filter(s -> s.matches("\\d+")) // Filter valid numbers
.mapToInt(Integer::parseInt)
.summaryStatistics();
System.out.println("Count: " + stats.getCount());
System.out.println("Min: " + stats.getMin());
System.out.println("Max: " + stats.getMax());
System.out.println("Average: " + stats.getAverage());
} catch (IOException e) {
System.err.println("Error reading file: " + e.getMessage());
}
}
}
π§ Efficient Design:
mapToInt()
for primitive stream processing.try-with-resources
to ensure file closure.parallel()
cautiouslyβfile I/O is typically I/O-bound, not CPU-bound.Java Streams combined with Files.lines()
provide a scalable, elegant solution for processing large text files. By processing data lazily and efficiently, you can analyze logs, parse files, and compute summaries without memory overhead, even on gigabyte-scale datasets.
Word counting is a classic problem that demonstrates the power of Java Streams for text processing. This example walks through reading a file, tokenizing text into words, normalizing and cleaning input, and then computing word frequencies using collectors.
We'll use Files.lines()
to read a file lazily, process each line to extract words, and count occurrences with Collectors.groupingBy()
.
Files.lines()
.Collectors.groupingBy()
and Collectors.counting()
to tally words.sample.txt
Hello, world!
This is a test. This is only a test.
hello HELLO? test!
import java.io.IOException;
import java.nio.file.*;
import java.util.*;
import java.util.function.Function;
import java.util.stream.*;
public class WordCountExample {
public static void main(String[] args) {
Path file = Path.of("sample.txt");
try (Stream<String> lines = Files.lines(file)) {
Map<String, Long> wordCounts = lines
.flatMap(line -> Arrays.stream(line
.toLowerCase() // Normalize case
.replaceAll("[^a-z\\s]", "") // Remove punctuation
.split("\\s+"))) // Split by whitespace
.filter(word -> !word.isBlank()) // Skip empty strings
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.counting()));
// Print sorted result
wordCounts.entrySet().stream()
.sorted(Map.Entry.<String, Long>comparingByValue(Comparator.reverseOrder()))
.forEach(entry ->
System.out.printf("%-10s -> %d%n", entry.getKey(), entry.getValue()));
} catch (IOException e) {
System.err.println("Error reading file: " + e.getMessage());
}
}
}
toLowerCase()
ensures that "Hello" and "hello" are treated the same.replaceAll("[^a-z\\s]", "")
strips punctuation.split("\\s+")
tokenizes each line into words.sample.txt
hello -> 3
test -> 3
this -> 2
is -> 2
a -> 2
only -> 1
world -> 1
This example highlights how to build a complete and efficient word count pipeline using Java Streams. With just a few transformations and collectors, you can process complex text input, handle edge cases like punctuation and blank lines, and produce clean, sorted output. This pattern is easily extendable to more advanced natural language processing tasks.