This package contains a library, TokensRegex, for matching regular expressions over tokens. TokensRegex is incorporated into the {@link edu.stanford.nlp.pipeline.TokensRegexAnnotator} and {@link edu.stanford.nlp.pipeline.TokensRegexNERAnnotator}.

Rules for extracting expression using TokensRegex

TokensRegex provides a language for specifying rules to extract expressions over token sequence.

{@link edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor} and {@link edu.stanford.nlp.ling.tokensregex.SequenceMatchRules} describes the language and how the extraction rules are created

Core classes for token sequence matching using TokensRegex

At the core of TokensRegex are the {@link edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher} and {@link edu.stanford.nlp.ling.tokensregex.TokenSequencePattern} classes which can be used to match patterns over a sequences of tokens. The usage is designed to follow the paradigm of the Java regular expression library java.util.regex. The usage is similar except that matches are done over List<CoreMap> instead of over String.

Example:
  
   List<CoreLabel< tokens = ...;
   TokenSequencePattern pattern = TokenSequencePattern.compile(...);
   TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
  

The classes {@link edu.stanford.nlp.ling.tokensregex.SequenceMatcher} and {@link edu.stanford.nlp.ling.tokensregex.SequencePattern} can be used to build classes for recognizing regular expressions over sequences of arbitrary types

Utility classes

TokensRegex also offers a group of utility classes.

{@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher} provides utility functions for finding expressions with multiple patterns. For instance, using {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher.findNonOverlapping} you can find all nonoverlapping subsequences for a given set of patterns.

To find character offsets of multiple word expressions in a String, can also use {@link MultiWordStringMatcher.findTargetStringOffsets}.

@author Angel Chang (angelx@stanford.edu)