List of usage examples for org.apache.lucene.analysis.util CharTokenizer subclass-usage
From source file cn.edu.scut.patent.ICTCLASAnalyzer.ICTCLASTokenizer.java
/** * WhiteSpaceTokenizer.java? */ public final class ICTCLASTokenizer extends CharTokenizer { /**
From source file com.b2international.index.analyzer.CharMatcherTokenizer.java
/** * A variant of {@link CharTokenizer} which splits tokens according to the specified {@link CharMatcher}, * converting characters to lower case in the normalization step. */ public class CharMatcherTokenizer extends CharTokenizer {
From source file com.b2international.index.analyzer.DelimiterTokenizer.java
/** * A character-oriented tokenizer which splits tokens on whitespace and delimiters enumerated in * {@link IndexUtils#DELIMITERS}, and also converts characters to lower case in the normalization phase. * */ public class DelimiterTokenizer extends CharTokenizer {
From source file com.berico.clavin.index.WhitespaceLowerCaseTokenizer.java
/**
* LowerCaseTokenizer performs the function of WhitespaceTokenizer
* and LowerCaseFilter together. It divides text at whitespace and
* converts them to lower case. While it is functionally equivalent to
* a combination of WhitespaceTokenizer and LowerCaseFilter, there is a
* performance advantage to doing the two tasks at once, hence this
From source file com.berico.clavin.resolver.impl.lucene.WhitespaceLowerCaseTokenizer.java
/**
* LowerCaseTokenizer performs the function of WhitespaceTokenizer
* and LowerCaseFilter together. It divides text at whitespace and
* converts them to lower case. While it is functionally equivalent to
* a combination of WhitespaceTokenizer and LowerCaseFilter, there is a
* performance advantage to doing the two tasks at once, hence this
From source file com.bericotech.clavin.index.WhitespaceLowerCaseTokenizer.java
/**
* LowerCaseTokenizer performs the function of WhitespaceTokenizer
* and LowerCaseFilter together. It divides text at whitespace and
* converts them to lower case. While it is functionally equivalent to
* a combination of WhitespaceTokenizer and LowerCaseFilter, there is a
* performance advantage to doing the two tasks at once, hence this
From source file com.globalsight.ling.lucene.analysis.ru.RussianLetterTokenizer.java
/**
* A RussianLetterTokenizer is a tokenizer that extends
* LetterTokenizer by additionally looking up letters in a given
* "russian charset". The problem with LeterTokenizer is that it uses
* Character.isLetter() method, which doesn't know how to detect
* letters in encodings like CP1252 and KOI8 (well-known problems with
From source file com.searchcode.app.util.CodeAnalyzer.java
final class CodeTokenizer extends CharTokenizer { public CodeTokenizer() { } public CodeTokenizer(AttributeFactory factory) { super(factory);
From source file de.uop.code.disambiguation.lucene.DoserStandardTokenizer.java
public final class DoserStandardTokenizer extends CharTokenizer { /** * Construct a new WhitespaceTokenizer. * @param matchVersion Lucene version * to match See {@link <a href="#version">above</a>} *
From source file doser.lucene.analysis.DoserStandardTokenizer.java
public final class DoserStandardTokenizer extends CharTokenizer { /** * Construct a new WhitespaceTokenizer using a given * {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. *