Calculating Word Frequencies with Regular Expressions : String Operation « Regular Expressions « Java

Home
Java
1.2D Graphics GUI
2.3D
3.Advanced Graphics
4.Ant
5.Apache Common
6.Chart
7.Class
8.Collections Data Structure
9.Data Type
10.Database SQL JDBC
11.Design Pattern
12.Development Class
13.EJB3
14.Email
15.Event
16.File Input Output
17.Game
18.Generics
19.GWT
20.Hibernate
21.I18N
22.J2EE
23.J2ME
24.JavaFX
25.JDK 6
26.JDK 7
27.JNDI LDAP
28.JPA
29.JSP
30.JSTL
31.Language Basics
32.Network Protocol
33.PDF RTF
34.Reflection
35.Regular Expressions
36.Scripting
37.Security
38.Servlets
39.Spring
40.Swing Components
41.Swing JFC
42.SWT JFace Eclipse
43.Threads
44.Tiny Application
45.Velocity
46.Web Services SOA
47.XML
Java » Regular Expressions » String Operation 




Calculating Word Frequencies with Regular Expressions
   
import java.io.FileInputStream;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.Map;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WordCount {
  public static void main(String args[]) throws Exception {
    String filename = "WordCount.java";

    // Map File from filename to byte buffer
    FileInputStream input = new FileInputStream(filename);
    FileChannel channel = input.getChannel();
    int fileLength = (intchannel.size();
    MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0,
        fileLength);

    // Convert to character buffer
    Charset charset = Charset.forName("ISO-8859-1");
    CharsetDecoder decoder = charset.newDecoder();
    CharBuffer charBuffer = decoder.decode(buffer);

    // Create line pattern
    Pattern linePattern = Pattern.compile(".*$", Pattern.MULTILINE);

    // Create word pattern
    Pattern wordBreakPattern = Pattern.compile("[\\p{Punct}\\s}]");

    // Match line pattern to buffer
    Matcher lineMatcher = linePattern.matcher(charBuffer);

    Map map = new TreeMap();
    Integer ONE = new Integer(1);

    // For each line
    while (lineMatcher.find()) {
      // Get line
      CharSequence line = lineMatcher.group();

      // Get array of words on line
      String words[] = wordBreakPattern.split(line);

      // For each word
      for (int i = 0, n = words.length; i < n; i++) {
        if (words[i].length() 0) {
          Integer frequency = (Integermap.get(words[i]);
          if (frequency == null) {
            frequency = ONE;
          else {
            int value = frequency.intValue();
            frequency = new Integer(value + 1);
          }
          map.put(words[i], frequency);
        }
      }
    }
    System.out.println(map);
  }
}
           
         
    
    
  














Related examples in the same category
1.Regular expression: Split DemoRegular expression: Split Demo
2.Replacing String Tokenizer Replacing String Tokenizer
3.String replaceString replace
4.String splitString split
5.Simple splitSimple split
6.Print all the strings that match a given pattern from a filePrint all the strings that match a given pattern from a file
7.Quick demo of Regular Expressions substitutionQuick demo of Regular Expressions substitution
8.Parse an Apache log file with StringTokenizerParse an Apache log file with StringTokenizer
9.StringConvenience -- demonstrate java.lang.String convenience routineStringConvenience -- demonstrate java.lang.String convenience routine
10.Split a String into a Java Array of Strings divided by an Regular ExpressionsSplit a String into a Java Array of Strings divided by an Regular Expressions
11.Regular Expression Replace
12.Java Regular Expression : Split text
13.Java Regular Expression :split 2
14.Get all digits from a string
15.Strip extra spaces in a XML string
16.Remove trailing white space from a string
17.Create a string search and replace using regex
18.Split-up string using regular expression
19.Apply proper uppercase and lowercase on a String
20.Regular Expression Search and Replace Program
21.Searching and Replacing with Nonconstant Values Using a Regular Expression
22.Use Matcher.appendReplacement() to match [a-zA-Z]+[0-9]+
23.Ignore case differences when searching for or replacing substrings.
24.Use replaceAll() to ignore case when replacing one substring with another
25.Extract a substring by matching a regular expression.
26.Match string ends
27.Match words
28.Match punct
29.Match space
30.Determining If a String Matches a Pattern Exactly
31.Removing Duplicate Whitespace in a String
32.Split the supplied content into lines, returning each line as an element in the returned list.
33.Get First Found regex
34.Get Found regex
35.Get First Not Empty String in a String list
java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.